一、復現問題
構造測試數據
根據笛卡爾積算法,生成數據量大的Excel文件,示例代碼如下:
package com.test.demo;
import cn.hutool.core.collection.CollUtil;
import cn.hutool.poi.excel.BigExcelWriter;
import cn.hutool.poi.excel.ExcelUtil;
import java.util.ArrayList;
import java.util.List;
/**
* @description 構造測試數據
* @author rongrong
* @version 1.0
* @date 2020/11/7 19:17
*/
public class ConstructData {
public static void main(String[] args) {
List<String> row1 = CollUtil.newArrayList("aa", "bb", "cc", "dd","aa", "bb", "cc", "dd","aa", "bb", "cc", "dd","aa", "bb", "cc", "dd");
List<String> row2 = CollUtil.newArrayList("aa1", "bb1", "cc1", "dd1", "bb1", "cc1", "dd1", "bb1", "cc1", "dd1", "bb1", "cc1", "dd1", "bb1", "cc1", "dd1", "bb1", "cc1", "dd1", "bb1", "cc1", "dd1");
List<String> row3 = CollUtil.newArrayList("aa2", "bb2", "cc2", "dd2", "bb1", "cc1", "dd1", "cc3", "dd3", "bb1", "cc3", "dd3", "bb1");
List<String> row4 = CollUtil.newArrayList("cc3", "dd3", "bb1","aa3", "bb3", "cc3", "dd3", "bb1", "cc3", "dd3", "bb1", "cc1", "dd1");
List<String> row5 = CollUtil.newArrayList("aa4", "bb4", "cc4", "dd4", "bb1", "cc1" , "cc3", "dd3", "bb1", "cc3", "dd3", "bb1", "cc3", "dd3", "bb1","dd1");
List<List<String>> list = CollUtil.newArrayList(row1, row2, row3, row4, row5);
List<List<String>> result = new ArrayList<List<String>>();
descartes(list, result, 0, new ArrayList<String>());
BigExcelWriter writer= ExcelUtil.getBigWriter("e:/測試數據.xlsx");
// 一次性寫出內容,使用默認樣式
writer.write(result);
// 關閉writer,釋放內存
writer.close();
System.out.println("數據寫入成功!!");
}
/***
* 笛卡爾積算法
* @param dimvalue
* @param result
* @param layer
* @param curList
*/
private static void descartes(List<List<String>> dimvalue, List<List<String>> result, int layer, List<String> curList) {
if (layer < dimvalue.size() - 1) {
if (dimvalue.get(layer).size() == 0) {
descartes(dimvalue, result, layer + 1, curList);
} else {
for (int i = 0; i < dimvalue.get(layer).size(); i++) {
List<String> list = new ArrayList<String>(curList);
list.add(dimvalue.get(layer).get(i));
descartes(dimvalue, result, layer + 1, list);
}
}
} else if (layer == dimvalue.size() - 1) {
if (dimvalue.get(layer).size() == 0) {
result.add(curList);
} else {
for (int i = 0; i < dimvalue.get(layer).size(); i++) {
List<String> list = new ArrayList<String>(curList);
list.add(dimvalue.get(layer).get(i));
result.add(list);
}
}
}
}
}
生成Excel文件大小如圖所示:
還是不夠大,那么我在加工下,這次肯定數據量肯定夠大了。
接下來,我們用poi讀取Excel文件,示例代碼如下:
public static void main(String[] args) throws IOException {
// 獲取文件路徑和文件
FileInputStream fis = new FileInputStream("e:/測試數據.xlsx");
// 將輸入流轉換為工作簿對象
XSSFWorkbook workbook = new XSSFWorkbook(fis);
// 獲取第一個工作表
XSSFSheet sheet = workbook.getSheetAt(0);
//遍歷所有的行
for (Row row : sheet) {
System.out.println("開始遍歷第" + row.getRowNum() + "行數據:");
//遍歷所有的列
for (Cell cell : row) {
System.out.print(cell.getStringCellValue() + " ");
}
System.out.println(" ");
}
}
運行結果
果然不負眾望,終於內存溢出了,如下圖所示:
二、解決方法
使用Excel Streaming Reader,這個第三方工具會把一部分的行(可以設置)緩存到內存中,在迭代時不斷加載行到內存中,而不是一次性的加載所有記錄到內存,這樣就可以不斷的讀取excel內容並且不影響內存的使用。
但是這個工具也有一定的限制:只能用於讀取excel的內容,寫入操作不可用;可以使用getSheetAt()方法獲取到對應的Sheet,因為當前只是加載了有限的row在內存中,因此不能隨機訪問row,即不能使用getRow(int rowNum)方法;由於行數據已經加載到了內存,因此可以隨機的訪問Cell數據,即可以使用getCell(int cellnum)方法。使用這個工具,建議使用迭代器來進行迭代。具體內容可以參見:https://github.com/monitorjbl/excel-streaming-reader。
在pom.xml文件中加入依賴:
<dependency>
<groupId>com.monitorjbl</groupId>
<artifactId>xlsx-streamer</artifactId>
<version>2.0.0</version>
</dependency>
具體示例代碼如下:
public static void main(String[] args) throws IOException {
FileInputStream in = new FileInputStream("e:/測試數據.xlsx");
Workbook wk = StreamingReader.builder()
.rowCacheSize(100) //緩存到內存中的行數,默認是10
.bufferSize(8192) //讀取資源時,緩存到內存的字節大小,默認是1024
.open(in); //打開資源,必須,可以是InputStream或者是File,注意:只能打開XLSX格式的文件
Sheet sheet = wk.getSheetAt(0);
//遍歷所有的行
for (Row row : sheet) {
System.out.println("開始遍歷第" + row.getRowNum() + "行數據:");
//遍歷所有的列
for (Cell cell : row) {
System.out.print(cell.getStringCellValue() + " ");
}
System.out.println(" ");
}
}
運行結果
這次就很穩定奔放了,不報錯,而且速度很快,如下圖所示: