解決txt文件上傳oss服務器亂碼的問題

本文轉載自查看原文 2020-12-03 18:48 1700 java

今天上傳txt文件下載下來卻亂碼，搞了一下午，發現還挺復雜。記錄一下。

1.首先服務器只接受utf-8格式的文件，所以首先想到的就是轉碼問題。

這是網上很容易就找到的判斷文件編碼的代碼。判斷出來之后如果是UTF8格式的文件就正常上傳，如果不是就先轉成UTF8格式再上傳。

我以為問題解決了的時候，發現上傳之后還是亂碼。然后我就創建兩個內容一樣但是編碼不一樣的文件仔細比較，發現我轉碼之后的byte數組少了正常文件的utf8標識符。

然后byte數組前面就要加上-17 -69 -65的標識符

代碼如下

就我以為要萬事大吉的時候，那邊測試告訴我還不行。於是我把他的文件拿過來測試。

發現他的文件是UTF8格式，但是沒有標識符！！我的getCode代碼判斷不出來他是UTF8 當成GBK處理了，自然還是亂碼。

於是我百度一番，找到了一個jar包可以幫住我識別他是utf8文件，可是直接上傳還不行，因為沒有前綴 oss服務器那邊也不認。

於是我就要用jar包的方式判斷編碼來轉碼，用前綴判斷編碼的方式來給byte數組增加前綴。

下面是完整的代碼和關聯的pom文件

<dependency>
 <groupId>com.googlecode.juniversalchardet</groupId>
 <artifactId>juniversalchardet</artifactId>
 <version>1.0.3</version>
</dependency>

    private static InputStream create(MultipartFile file) throws IOException {
        if (!file.getOriginalFilename().endsWith("txt")) {
            return file.getInputStream();
        }

        OutputStream outputStream = new ByteArrayOutputStream();
        String code = getCode(file.getInputStream());
        String code2 = getCode2(file.getInputStream());
        //getFilecharset(file.getInputStream())
        if (code.equals("UTF-8")) {
            return file.getInputStream();
        } else {
            String str = IOUtils.toString(file.getInputStream(), code2);
            byte[] head = new byte[3];
            head[0] = -17;
            head[1] = -69;
            head[2] = -65;
            outputStream.write(head, 0, 3);
            outputStream.write(str.getBytes(), 0, str.getBytes().length);
            ByteArrayOutputStream baos = (ByteArrayOutputStream)outputStream;
            InputStream inputStream = new ByteArrayInputStream(baos.toByteArray());
            return inputStream;
        }
    }

    private static String getCode2(InputStream inputStream) throws IOException {
        UniversalDetector detector = new UniversalDetector(null);
        byte[] buf = new byte[4096];
        // (2)
        int nread;
        while ((nread = inputStream.read(buf)) > 0 && !detector.isDone()) {
            detector.handleData(buf, 0, nread);
        }
        // (3)
        detector.dataEnd();

        // (4)
        String encoding = detector.getDetectedCharset();

        return encoding;
    }

    private static String getCode(InputStream inputStream) {
        String charsetName = "gbk";
        byte[] head = new byte[3];
        try {
            inputStream.read(head);
            inputStream.close();
            if (head[0] == -1 && head[1] == -2 ) //0xFFFE
                charsetName = "UTF-16";
            else if (head[0] == -2 && head[1] == -1 ) //0xFEFF
                charsetName = "Unicode";//包含兩種編碼格式：UCS2-Big-Endian和UCS2-Little-Endian
            else if(head[0]==-27 && head[1]==-101 && head[2] ==-98)
                charsetName = "UTF-8"; //UTF-8(不含BOM)
            else if(head[0]==-17 && head[1]==-69 && head[2] ==-65)
                charsetName = "UTF-8"; //UTF-8-BOM
        } catch (Exception e) {

        }
        return charsetName;
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 服務器上傳文件到oss，以及備份文件上傳oss服務器解決文件上傳時，服務器中中文文件名亂碼問題關於阿里雲簡單文件上傳OSS思路整理服務器上的文件上傳到OSS windows 向ftp服務器上傳文件，文件名亂碼解決辦法 confluence上傳文件附件預覽亂碼問題（linux服務器安裝字體操作）解決Mac上打開txt文件亂碼問題 Java下載https文件上傳到阿里雲oss服務器從linux服務器下載excel文件中文亂碼問題-已解決 nodejs 寫服務器解決中文亂碼問題