java自動識別用戶上傳的文本文件編碼

本文轉載自查看原文 2017-06-19 13:05 2604

原文：http://www.open-open.com/code/view/1420514359234

經常碰到用戶上傳的部分數據文本文件亂碼問題,又不能限制用戶的上傳的文件編碼格式(這樣對客戶的要求可能比較高), 只好自己想辦法. 找了一部分java獲取文件編碼的.

要么就是識別錯誤. 要么就是只有一小段的代碼,也不說具體引用了什么...我就在這里分享一下吧. 工具類就一個方法. main測試方法我就不寫了.

貌似還不能上傳附件...就弄到我的資源里去吧.

引用了.這兩個jar類.

chardet.jar

cpdetector_1.0.10.jar

package com.dxx.buscredit.common.util;  
      
    import info.monitorenter.cpdetector.io.ASCIIDetector;  
    import info.monitorenter.cpdetector.io.CodepageDetectorProxy;  
    import info.monitorenter.cpdetector.io.JChardetFacade;  
    import info.monitorenter.cpdetector.io.ParsingDetector;  
    import info.monitorenter.cpdetector.io.UnicodeDetector;  
      
    import java.io.File;  
    import java.nio.charset.Charset;  
      
    public class FileCharsetDetector {  
      
        /** 
         * 利用第三方開源包cpdetector獲取文件編碼格式. 
         * @param filePath 
         * @return 
         */  
        public static String getFileEncode(File file) {  
            /** 
             * <pre> 
             * 1、cpDetector內置了一些常用的探測實現類,這些探測實現類的實例可以通過add方法加進來, 
             * 如:ParsingDetector、 JChardetFacade、ASCIIDetector、UnicodeDetector.  
             * 2、detector按照“誰最先返回非空的探測結果,就以該結果為准”的原則.  
             * 3、cpDetector是基於統計學原理的,不保證完全正確. 
             * </pre> 
             */  
            CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();  
              
            detector.add(new ParsingDetector(false));  
            detector.add(UnicodeDetector.getInstance());  
            detector.add(JChardetFacade.getInstance());//內部引用了 chardet.jar的類  
            detector.add(ASCIIDetector.getInstance());  
              
            Charset charset = null;  
            try {  
                charset = detector.detectCodepage(file.toURI().toURL());  
            } catch (Exception e) {  
                e.printStackTrace();  
            }  
              
            //默認為GBK  
            String charsetName = "GBK";  
            if (charset != null) {  
                if (charset.name().equals("US-ASCII")) {  
                    charsetName = "ISO_8859_1";  
                } else{  
                    charsetName = charset.name();  
                }  
            }  
            return charsetName;  
        }  
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 自動判斷文本文件編碼來讀取文本文件內容(.net版本和java版本) 如何檢測文本文件的編碼 centos 文本文件編碼轉換 Mac如何修改文本文件編碼 C#自動識別文件編碼 python學習之文本文件上傳 java文本文件加密 python中文本文件的編碼格式 c# 判斷文本文件的編碼格式拆分大文本文件