關於Java解壓文件的一些坑及經驗分享

就在本周，測試人員找到我說現上的需求文檔(zip格式的)無法預覽了，讓我幫忙看看怎么回事。
這個功能也並不是我做的，於是我便先看看線上日志有沒有什么錯誤，果不其然，后台果然報錯了。
此處輸入圖片的描述

java.lang.IllegalArgumentException:MALFORMED
   at java.util.zip.ZipCoder.toString(ZipCoder.toString:58)
   ...

異常大致是這樣，前台無法預覽需求文檔的原因是該zip文件解壓失敗了。
首先網上查了下這個異常的原因，都說是因為編碼的問題，要求將UTF-8改成GBK就可以了。
然后定位代碼，看到有一個方法：unzip()

public static void unzip(File zipFile, String descDir) {
    try {
        File pathFile = new File(descDir);
        if (!pathFile.exists()) {
            pathFile.mkdirs();
        }
        ZipFile zip = getZipFile(zipFile);
        for (Enumeration entries = zip.entries(); entries.hasMoreElements(); ) {
            ZipEntry entry = (ZipEntry) entries.nextElement();
            String zipEntryName = entry.getName();
            if (StringUtils.isNotBlank(pre)) {
                zipEntryName = zipEntryName.substring(pre.length());
            }
            InputStream in = zip.getInputStream(entry);
            String outPath = (descDir + "/" + zipEntryName).replaceAll("\\*", "/");
            ;
            //判斷路徑是否存在,不存在則創建文件路徑
            File file = new File(outPath.substring(0, outPath.lastIndexOf('/')));
            if (!file.exists()) {
                file.mkdirs();
            }
            //判斷文件全路徑是否為文件夾,如果是上面已經上傳,不需要解壓
            if (new File(outPath).isDirectory()) {
                continue;
            }
            //輸出文件路徑信息
            LOG.info("解壓文件的當前路徑為:{}", outPath);
            OutputStream out = new FileOutputStream(outPath);
            IOUtils.copy(in, out);
            in.close();
            out.close();
        }
        zip.close();
        LOG.info("******************解壓完畢********************");

    } catch (Exception e) {
        LOG.error("[unzip] 解壓zip文件出錯", e);
    }
}

private static ZipFile getZipFile(File zipFile) throws Exception {
    ZipFile zip = new ZipFile(zipFile, Charset.forName("UTF-8"));
    Enumeration entries = zip.entries();
    while (entries.hasMoreElements()) {
        try {
            entries.nextElement();
            zip.close();
            zip = new ZipFile(zipFile, Charset.forName("UTF-8"));
            return zip;
        } catch (Exception e) {
            zip = new ZipFile(zipFile, Charset.forName("GBK"));
            return zip;
        }
    }
    return zip;
}

於是便將線上的zip文件down下來然后本地調試下，發現在第9行中拋出了異常，如下代碼：

ZipEntry entry = (ZipEntry) entries.nextElement();

再由最開始的異常日志找到ZipCoder中的58行:

String toString(byte[] ba, int length) {
    CharsetDecoder cd = decoder().reset();
    int len = (int)(length * cd.maxCharsPerByte());
    char[] ca = new char[len];
    if (len == 0)
        return new String(ca);
    // UTF-8 only for now. Other ArrayDeocder only handles
    // CodingErrorAction.REPLACE mode. ZipCoder uses
    // REPORT mode.
    if (isUTF8 && cd instanceof ArrayDecoder) {
        int clen = ((ArrayDecoder)cd).decode(ba, 0, length, ca);
        if (clen == -1)    // malformed
            throw new IllegalArgumentException("MALFORMED");
        return new String(ca, 0, clen);
    }
    ByteBuffer bb = ByteBuffer.wrap(ba, 0, length);
    CharBuffer cb = CharBuffer.wrap(ca);
    CoderResult cr = cd.decode(bb, cb, true);
    if (!cr.isUnderflow())
        throw new IllegalArgumentException(cr.toString());
    cr = cd.flush(cb);
    if (!cr.isUnderflow())
        throw new IllegalArgumentException(cr.toString());
    return new String(ca, 0, cb.position());
}

這里只有UTF-8才會進入if邏輯才會拋錯？果然如網上所說，將編碼格式改為GBK即可。
ZipCoder這個類似src.zip包中的，既然這里做了check當然會有它的道理，單純的改為GBK來解決這個bug顯然是不合理的。

於是便要換種思路了，線上有些zip是仍然可以預覽的。我將線上的zip文件解壓后，在自己電腦重新打個包（我用的是好壓），然后又運行了上述代碼，竟然解壓成功？？這是為什么？於是上網上找了一下，果然找到了答案：

Windows 壓縮的時候使用的是系統的編碼 GB2312，而 Mac 系統默認的編碼是 UTF-8，於是出現了亂碼。

最后去問了上傳的同事，他是在Windows下用的winRar上傳的(看來不同的解壓工具還不同)。
好了，問題基本定位到了，這里就要想着怎么解決了。
又是一通找，終於：

Apache commons-compress 解壓 zip 文件是件很幸福的事，可以解決 zip 包中文件名有中文時跨平台的亂碼問題，不管文件是在 Windows 壓縮的還是在 Mac，Linux 壓縮的，解壓后都沒有再出現亂碼問題了。

看到這里基本上問題就要解決了，於是開始使用apache的commons-compress了，下面直接上代碼，代碼是基於上面代碼進行改造的：
首先引入pom文件：

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.8.1</version>
</dependency>

public static void main(String[] args) throws Exception{
    String path = "C:\\Users\\Isuzu\\Desktop\\test.zip";
    unzip(new File(path), "D:\\data",);
}

public static void unzip(File zipFile, String descDir) {
    try (ZipArchiveInputStream inputStream = getZipFile(zipFile)) {
        File pathFile = new File(descDir);
        if (!pathFile.exists()) {
            pathFile.mkdirs();
        }
        ZipArchiveEntry entry = null;
        while ((entry = inputStream.getNextZipEntry()) != null) {
            if (entry.isDirectory()) {
                File directory = new File(descDir, entry.getName());
                directory.mkdirs();
            } else {
                OutputStream os = null;
                try {
                    os = new BufferedOutputStream(new FileOutputStream(new File(descDir, entry.getName())));
                    //輸出文件路徑信息
                    LOG.info("解壓文件的當前路徑為:{}", descDir + entry.getName());
                    IOUtils.copy(inputStream, os);
                } finally {
                    IOUtils.closeQuietly(os);
                }
            }
        }
        final File[] files = pathFile.listFiles();
        if (files != null && files.length == 1 && files[0].isDirectory()) {
            // 說明只有一個文件夾
            FileUtils.copyDirectory(files[0], pathFile);
            //免得刪除錯誤， 刪除的文件必須在/data/demand/目錄下。
            boolean isValid = files[0].getPath().contains("/data/www/");
            if (isValid) {
                FileUtils.forceDelete(files[0]);
            }
        }
        LOG.info("******************解壓完畢********************");

    } catch (Exception e) {
        LOG.error("[unzip] 解壓zip文件出錯", e);
    }
}

private static ZipArchiveInputStream getZipFile(File zipFile) throws Exception {
    return new ZipArchiveInputStream(new BufferedInputStream(new FileInputStream(zipFile)));
}

到了這里就大功告成了，原先自己遇到這個問題時百度了一圈，解決方案大都是改編碼格式為GBK，但那也只是治標不治本的方法，解壓的坑就講這么多，后續有新的坑還會繼續總結出來的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。