Java實現的詞頻統計——單元測試

本文轉載自查看原文 2016-09-26 22:41 1951

　　前言：本次測試過程中發現了幾個未知字符，這里將其轉化為十六進制碼對其加以區分。

　　　　1）保存統計結果的Result文件中顯示如圖：

　　　　2）將其復制到eclipse環境下的切分方法StringTokenizer中卻沒有顯示；

　　　　　　復制前：

　　　　　　復制后：

　　　　　　前后看似沒有任何變化；

　　　　3）改動后的統計結果：

　　　　因此為了檢測這個字符做了一個將其轉化為十六進制碼的小程序：

 1         String t = "\0";
 2         String s = "\0";
 3         byte[] bbb = t.getBytes();
 4         int[] n = new int[bbb.length];
 5         for (int i = 0; i < n.length; i++) {
 6             n[i] = bbb[i] & 0xff;　　　　//將每個字符的十六進制碼保存到數組中
 7         }
 8         for (int j = 0; j < n.length; j++) {
 9             System.out.println(Integer.toString(n[j], 0x10));
10         }
11         System.out.println("-----------------");
12         byte[] b = s.getBytes();
13         int[] in = new int[b.length];
14         for (int i = 0; i < in.length; i++) {
15             in[i] = b[i] & 0xff;
16         }
17         for (int j = 0; j < in.length; j++) {
18             System.out.println(Integer.toString(in[j], 0x10));
19         }

　　　　運行結果如下：

　　　　從結果可以看出，這個未知字符是由三個字符組成，而類似的難以識別的字符還有很多。

　　　　此外，在做單元測試之前還做了一項額外的測試——read()方法讀取文件時單次讀取的字符數量對效率的影響：

　　　　選取了從1-128、129-256、257-384、385-512四個范圍，分別進行了測試。

　　　　總測試次數2560次，耗時10min左右，統計結果：

　　　　當取值在200左右時運行速率最快，平均值在210ms左右.

單元測試

　　對Java工程進行的單元測試，使用的工具為Eclipse集成的Juint4。

　　1.對FileProccessing類進行測試，測試其輸出到控制台與輸出到文件的結果與預期是否相同。Juint代碼如下：

　　　　輸出到控制台：這里用到了重定向輸出，將要輸出到控制台的字符串輸出到緩沖區，並與預期結果進行比對。

 1     @Test
 2     public void testFP() throws Exception {
 3 
 4         final ByteArrayOutputStream outContent = new ByteArrayOutputStream();
 5         System.setOut(new PrintStream(outContent));　　　　//重定向輸出，方便后面進行比對
 6         new FileProccessing("content.txt", 200);
 7         assertEquals(
 8                 "~~~~~~~~~~~~~~~~~~~~\r\ncontent\r\ntotals of the words:6\r\nquantity of vocabulary:5\r\nvery——2\r\nenglish——1\r\nis——1\r\nmy——1\r\npoor——1\r\n~~~~~~~~~~~~~~~~~~~~\r\n",
 9                 outContent.toString());
10 
11     }

　　　　輸出到文件：生成實例后分別建立兩個文件流，一個讀取實際結果文件，一個讀取預期結果文件，通過循環逐行比對。

 1     public void testFPtoFile() throws Exception{
 2         new FileProccessing("E:\\Test3\\Test1.txt");　　　　//測試文件
 3         FileReader expect = new FileReader("E:\\Test3\\Expect.txt");　　　　//用來保存期待的結果
 4         BufferedReader ep= new BufferedReader(expect);
 5         FileReader actual = new FileReader("Result.txt");　　　　//實際的結果文件
 6         BufferedReader at = new BufferedReader(actual);
 7         String temp;
 8         while((temp = at.readLine()) != null){
 9             assertEquals(ep.readLine(),temp);　　　　//對文件中內容逐行比較
10         }
11         at.close();
12         actual.close();
13         ep.close();
14         expect.close();
15     }

　　　　用例截圖：

　　　　單元測試結果：

　　　　代碼覆蓋率：覆蓋率為72%，未覆蓋到的部分為主函數中為用戶提供輸入的代碼段。

　　2.對於上面的測試用例進行改進，對main()函數不同情況的的輸入輸出進行測試。Juint代碼如下，大致分為四種情況：

　　　　1>由命令行傳入參數:

 1     @Test
 2     public void testMain1() throws Exception {
 3         String[] test = { "E:\\Test3\\Test1.txt" };
 4         WordFrequencyCount.main(test);　　　　//生成實例
 5         
 6         FileReader expect = new FileReader("E:\\Test\\Expect.txt");
 7         BufferedReader ep = new BufferedReader(expect);
 8         FileReader actual = new FileReader("Result.txt");
 9         BufferedReader at = new BufferedReader(actual);
10         
11         String temp;
12         while ((temp = at.readLine()) != null) {
13             assertEquals(ep.readLine(), temp);
14         }
15         
16         at.close();
17         actual.close();
18         ep.close();
19         expect.close();
20     }

　　　　2>傳入參數為文件夾時：

 1     @Test
 2     public void testMain2() throws Exception {
 3         String[] test = { "E:\\Test3" };　　　　//文件夾中內容為Test1.txt
 4         WordFrequencyCount.main(test);
 5         
 6         FileReader expect = new FileReader("E:\\Test\\Expect.txt");
 7         BufferedReader ep = new BufferedReader(expect);
 8         FileReader actual = new FileReader("Result.txt");
 9         BufferedReader at = new BufferedReader(actual);
10         
11         String temp;
12         while ((temp = at.readLine()) != null) {
13             assertEquals(ep.readLine(), temp);
14         }
15         
16         at.close();
17         actual.close();
18         ep.close();
19         expect.close();
20 
21     }

　　　　3>由控制台重定向輸入：這里運用了重定向輸入，並且將String轉化為輸入流。

 1     @Test
 2     public void testMain3() throws Exception {
 3         String[] test = {};
 4         String str = "< E:\\Test3\\Test1.txt\n";
 5         ByteArrayInputStream instr = new ByteArrayInputStream(str.getBytes());　　　　//將String轉化為輸入流
 6         
 7         System.setIn(instr);　　　　//重定向輸入
 8         WordFrequencyCount.main(test);
 9         
10         FileReader expect = new FileReader("E:\\Test\\Expect.txt");
11         BufferedReader ep = new BufferedReader(expect);
12         FileReader actual = new FileReader("Result.txt");
13         BufferedReader at = new BufferedReader(actual);
14         
15         String temp;
16         while ((temp = at.readLine()) != null) {
17             assertEquals(ep.readLine(), temp);
18         }
19         
20         at.close();
21         actual.close();
22         ep.close();
23         expect.close();
24     }

　　　　4>由控制台輸入文件名及內容：這一部分使用了重定向輸入和輸出；由於main()函數中為了方便用戶使用，會有輸出作為引導，因此在比對時要把這部分輸出也納入考慮。

 1     @Test
 2     public void testMain4() throws Exception {
 3         String[] test = {};
 4         String str = "content\nMy English is very very poor.\n";
 5         ByteArrayInputStream instr = new ByteArrayInputStream(str.getBytes());
 6         
 7         System.setIn(instr);
 8         
 9         final ByteArrayOutputStream outContent = new ByteArrayOutputStream();
10         System.setOut(new PrintStream(outContent));
11         WordFrequencyCount.main(test);
12         assertEquals(
13                 "請輸入文件名：\r\n請輸入內容，結尾以回車后ctrl+z結束：\r\n~~~~~~~~~~~~~~~~~~~~\r\ncontent\r\ntotals of the words:6\r\nquantity of vocabulary:5\r\nvery——2\r\nenglish——1\r\nis——1\r\nmy——1\r\npoor——1\r\n~~~~~~~~~~~~~~~~~~~~\r\ntime:1ms\r\n",
14                 outContent.toString());
15     }　　　　　　　　//要考慮到main()函數中面向用戶的輸出

　　　　（為了方便測試，這一部分用例與之前相同，這里不再展示）

　　　　單元測試結果：

　　　　代碼覆蓋率：覆蓋率為94%，加入了對於面向用戶的輸入輸出的測試。仍有未覆蓋到的代碼，主要是拋出異常、異常處理、異常檢測部分的代碼。

　　通過此次單元測試了解到：System.out.println()與System.out.print()方法差別不僅僅在於行末多了換行符\n，而是\r\n。同時通過這次單元測試發現了原程序中的bug：用read()方法讀取文件時用來保存結果的char[]不會自動清空，而是以覆蓋的方式讀取字符，因此會導致統計結果有誤。

代碼地址：

　　HTTPS https://coding.net/u/regretless/p/WordFrequencyCount/git

　　SSH git@git.coding.net:regretless/WordFrequencyCount.git

　　GIT git://git.coding.net/regretless/WordFrequencyCount.git

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 JAVA的單元測試技術英文詞頻統計的java實現方法 java的單元測試和集成spring單元測試 java 詞頻統計代碼 Java單元測試之JUnit篇 Java單元測試技巧之PowerMock Java SpringBoot單元測試Controller 單元測試java.lang.NullPointerException Java單元測試總結 java如何使用JUnit進行單元測試