上次簡述了使用poi讀取大xls文件,這里說下讀取xlsx格式的文件的方法
環境模擬
先准備一個大的excel文件(xlsx大小5M),再將jvm的heap縮小到100m(JVM 參數 -Xmx100m)用於模擬OOM
並使用參數在OOM時dump內存 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=d://dump.hprof
使用XSSF讀取
在gradle中引入解析xlsx需要的jar包
compile 'org.apache.poi:poi:3.15'
compile 'org.apache.poi:poi-ooxml:3.15'
compile 'xerces:xercesImpl:2.11.0'
之后讀取xlsx文件
public static void main(String [] args) throws IOException {
InputStream is = new FileInputStream("d://large.xlsx");
Workbook wb = new XSSFWorkbook(is);
}
運行之后
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:136)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:56)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:342)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:285)
at blog.excel.Xlsx.main(Xlsx.java:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
同樣報出了OOM,原因也是處理xlsx時會將數據完全讀入內存,導致內存溢出。
使用EventApi流式讀取
POI也為xlsx提供了流式讀取的方式,用於減小內存的使用
public class ExampleEventUserModel{
public void processOneSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
// 獲得第一個sheet
InputStream sheet2 = r.getSheet("rId1");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser =
XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser"
);
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* 處理sax的handler
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
//元素開始時的handler
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => 單元格
if(name.equals("c")) {
System.out.print(attributes.getValue("r") + " - ");
// 獲取單元格類型
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
lastContents = "";
}
//元素結束時的handler
public void endElement(String uri, String localName, String name)
throws SAXException {
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => 單元格內容
if(name.equals("v")) {
System.out.println(lastContents);
}
}
//讀取元素間內容時的handler
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
ExampleEventUserModel example = new ExampleEventUserModel();
example.processOneSheet("d://large.xlsx");
}
}
不足
同樣的使用這種方法可以流式讀取打的xlsx文件,但是只限於讀取內部的數據,而且無法進行修改操作。之后會介紹寫大文件的方法