JAVA 統計字符串中中文,英文,數字,空格,特殊字符的個數


引言

      可以根據各種字符在Unicode字符編碼表中的區間來進行判斷,如數字為'0'~'9'之間,英文字母為'a'~'z'或'A'~'Z'等,Java判斷一個字符串是否有中文是利用Unicode編碼來判斷,因為中文的編碼區間為:0x4e00--0x9fbb, 但通用區間來判斷中文也不非常精確,因為有些中文的標點符號利用區間判斷會得到錯誤的結果。所以通過Character.UnicodeBlock來進行判斷。代碼如下:

 

package cn.csrc.base.count;

public class CountCharacter {

  public static void main(String[] args) {

    String str ="我愛你abcd123中國 #!";
    CountCharacter countCharacter = new CountCharacter();
    countCharacter.count(str);
  }

  /**中文字符 */
  private int chCharacter = 0;

  /**英文字符 */
  private int enCharacter = 0;

  /**空格 */
  private int spaceCharacter = 0;

  /**數字 */
  private int numberCharacter = 0;

  /**其他字符 */
  private int otherCharacter = 0;

  //記錄中文字符
  private StringBuilder sb1=new StringBuilder();


  //記錄英文字符
  private StringBuilder sb2=new StringBuilder();


  //記錄數字
  private StringBuilder sb3=new StringBuilder();


  //記錄特殊字符
  private StringBuilder sb4=new StringBuilder();


  /***
  * 統計字符串中中文,英文,數字,空格等字符個數
  * @param str 需要統計的字符串
  */
  public void count(String str) {
    if(str.equals("") || str==null){
      System.out.println("字符串為空");
       return;
      }
    for (int i = 0; i < str.length(); i++) {
      char tmp = str.charAt(i);
      if ((tmp >= 'A' && tmp <= 'Z') || (tmp >= 'a' && tmp <= 'z')) {
        enCharacter ++;
        sb2.append(tmp+" ");
      } else if ((tmp >= '0') && (tmp <= '9')) {
        numberCharacter ++;
        sb3.append(tmp +" ");
      } else if (tmp ==' ') {
        spaceCharacter ++;
      } else if (isChinese(tmp)) {
        chCharacter ++;
        sb1.append(tmp+" ");
      } else {
        otherCharacter ++;
        sb4.append(tmp +" ");
      }
    }
      System.out.println("字符串:" + str + " \r\n");
      System.out.println("中文字符有:" + chCharacter +" ("+sb1.toString()+")");
      System.out.println("英文字符有:" + enCharacter +" ("+sb2.toString()+")");
      System.out.println("數字有:" + numberCharacter+" ("+sb3.toString()+")");
      System.out.println("空格有:" + spaceCharacter+"");
      System.out.println("其他字符有:" + otherCharacter+" ("+sb4.toString()+")");
    }

    /***
    * 判斷字符是否為中文
    * @param ch 需要判斷的字符
    * @return 中文返回true,非中文返回false
    */
    private boolean isChinese(char ch) {
      //獲取此字符的UniCodeBlock
      Character.UnicodeBlock ub = Character.UnicodeBlock.of(ch);
      // GENERAL_PUNCTUATION 判斷中文的“號
      // CJK_SYMBOLS_AND_PUNCTUATION 判斷中文的。號
      if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
       || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOG  RAPHS_EXTENSION_B
     || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS
|| ub == Character.UnicodeBlock.GENERAL_PUNCTUATION) {
      System.out.println(ch + " 是中文");
      //sb1.append(ch+" ");
      return true;
    }
    return false;
  
  }
}

  結果如下:

      

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM