先上正確方法:
正確方式應該為,先創建一個ZipFile,然后對其entries做遍歷,每一個entry其實就是一個文件或者文件夾,檢測到文件夾的時候創建文件夾,其他情況創建文件,其中使用zipFile.getInputStream(entry)可以獲得當前文件的輸入流(注意是文件的輸入流不是壓縮文件的輸入流)。然后把它寫到writer里就可以了。嘛,明明很簡單的。下面是一個例子,讀取GBK格式的壓縮包,壓縮包中的文件編碼也為GBK格式(就是在windows下寫的文件並打包的情況),輸出為UTF8的解壓(跨平台使用)。
def decompressZip(source: File, dest: String, sourceCharacters: String = "GBK", destCharacters: String = "UTF-8") = { if (source.exists) { var os: OutputStream = null var inputStream: InputStreamReader = null var outWriter: OutputStreamWriter = null val zipFile = new ZipFile(source, sourceCharacters) var entries = zipFile.getEntries entries.foreach(entry => if (entry.isDirectory()) new File(dest + entry.getName).mkdirs() else if (entry != null) { try{ val name = entry.getName val path = dest + name var content = new Array[Char](entry.getSize.toInt) inputStream = new InputStreamReader(zipFile.getInputStream(entry), sourceCharacters) println(inputStream.read(content)) val entryFile = new File(path) checkFileParent(entryFile) os = new FileOutputStream(entryFile) outWriter = new OutputStreamWriter(os, destCharacters); outWriter.write(new String(content)) } catch { case e: Throwable => e.printStackTrace() }finally{ if (os != null){ os.flush os.close } if (outWriter != null){ outWriter.flush outWriter.close } if (inputStream != null) inputStream.close } }) zipFile.close } }
錯誤示范:
不知道為什么,網上很多教程都是使用ZipArchiveInputStream來進行解壓,然而:
The ZipFile
class is preferred when reading from files as ZipArchiveInputStream
is limited by not being able to read the central directory header before returning entries. In particular ZipArchiveInputStream
- may return entries that are not part of the central directory at all and shouldn't be considered part of the archive.
- may return several entries with the same name.
- will not return internal or external attributes.
- may return incomplete extra field data.
- may return unknown sizes and CRC values for entries until the next entry has been reached if the archive uses the data descriptor feature.
在commons-compress的1.3版本就開始建議使用ZipFile了。
我個人而言,嘗試過ZipArchiveInputStream之后發現一個問題,ZipArchiveInputStream創建方式很麻煩,需要指定一個InputStream,而這個方法在API文檔中是這么寫的
Constructor and Description |
---|
ZipArchiveInputStream(InputStream inputStream)
Create an instance using UTF-8 encoding
|
ZipArchiveInputStream(InputStream inputStream, String encoding)
Create an instance using the specified encoding
|
ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields)
Create an instance using the specified encoding
|
ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor)
Create an instance using the specified encoding
|
Parameters:inputStream
- the stream to wrap
這個構造方法沒有指明這個inputStream參數是什么東西,照網上的方法試了試,使用:
val zipFile = new ZipFile(source, sourceCharacters) var entries = zipFile.getEntries entries.foreach(entry => if (entry != null) { try{ val name = entry.getName val path = dest + name var content = new Array[Char](entry.getSize.toInt) zais = new ZipArchiveInputStream(zipFile.getInputStream(entry)) val entryFile = new File(path) checkFileParent(entryFile) os = new FileOutputStream(entryFile) IOUtils.copy(zais, os) ………………
讀出來的數據是空,使用zais.read讀出Array[Byte]並把它轉化為字符串發現是空白符字符串,直接輸出Array[Byte]發現都是0。后來看文檔大概知道是什么原因,這個ZipArchiveInputStream讀取的應該是Zip文件,然而zipFile.geiInputStream返回的是解壓完的文件的輸入流,所以才會出現這個問題,試了試commons-compress spark依賴12年出的1.4版本和最新的1.14版本這種方法都是錯的,所以我懷疑他們12年之后轉的那些博客並沒有經過自己使用和測試就轉發了。這個ZipFile和ZipArchiveInputStream混用總覺得怪怪的。。。