【小筆記】大數據量excel解析工具性能對比


1. Excel存儲格式及解析流程

1.1 存儲格式

Excel本質上是以xml存儲的,這些xml內容符合office open xml規范。把后綴名改為壓縮文件的后綴名(zip,tar等等)可以看到其基本結構:
在這里插入圖片描述

其主要數據內容保存於sharedString.xml以及worksheets里的xml:
在這里插入圖片描述
實際上對於Excel的解析就是對於XML的解析,但是各個xml之間存在着關系,解析時更為復雜。

1.2 解析流程

在這里插入圖片描述

2. 寫入性能對比

2.1 測試代碼:

見文末

2.2 結果

模式 10萬數據 100萬數據
POI(XSSF) 10833ms GC overhead limit exceeded
POI(SXSSF) 1378ms 9274ms
EasyExcel 1339ms 9077ms

結論:

10萬級別POI的SXSSF和EasyExcel的速度基本一致。

100萬級別POI的XSSF模式直接無法生成,而SXSSF模式和EasyExcel的速度基本一致。

2.3 分析

POI傳統模式XSSF會全部寫入內存,內存占用很高,然后一次性刷盤;而SXSSF模式基於滑動窗口,部分刷盤,所以避免了大量GC時間及內存占用。

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

https://poi.apache.org/components/spreadsheet/how-to.html#sxssf

所以,通過這個問題也能看出,把數據完全放在內存操作在某些場景下並不是最好的,性能反而不如多次IO操作,因為其中可能會產生FULL CG而占用大量時間。

3. 讀取性能對比

3.1 測試代碼

見文末

3.2 結果

數據量 POI耗時 EasyExcel耗時
10萬 4223ms 2813ms

10萬數據量,EasyExcel比POI快了1.4秒左右,數據量更大時差異更明顯。

3.3 分析

讀取操作其實主要耗費在了將表格數據加載到內存中(包含解析過程),而加載完成后,實際上讀取速度是差不多的。POI采用的是一次性加載到內存中,而EasyExcel則是部分加載,同時POI加載的是包含Excel樣式的數據,這部分數據占了很大一部分,EasyExcel就丟棄了樣式數據,只加載數據。

4. 多線程解析表格

讀取表格數據時,除了使用EasyExcel加快速度外,可以用多線程+POI來提高速度:

線程數 耗時
1 1807ms
2 1425ms
4 1311ms
8 2335ms

直接讀取使用單線程和多線程花費的時間差別不大,並且,對於大數據量表格,時間多花費在讀取文件上。如果在讀取過程中需要對每一行進行操作,那么多線程的效果就比較明顯了,以下是在讀取每行數據時加入耗時操作后的結果(為方便測試,數據量減小為1萬):

線程數 耗時
1 36356ms
2 18064ms
4 9082ms
8 4532ms
16 2662ms
32 1287ms
64 688ms
128 380ms
256 335ms
512 1253ms

5. 測試代碼

import cn.hutool.core.date.StopWatch;
import cn.hutool.core.util.RandomUtil;
import com.alibaba.excel.EasyExcel;
import com.alibaba.excel.ExcelReader;
import com.alibaba.excel.ExcelWriter;
import com.alibaba.excel.context.AnalysisContext;
import com.alibaba.excel.event.AnalysisEventListener;
import com.alibaba.excel.read.metadata.ReadSheet;
import com.alibaba.excel.write.metadata.WriteSheet;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.Test;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;

public class ExcelUtilTest {
    private static final String[] USERNAME = {"趙", "錢", "孫", "李", "周", "吳", "鄭", "王", "馮", "陳", "褚", "衛", "蔣", "沈", "韓", "楊", "朱", "秦", "尤", "許",
            "何", "呂", "施", "張", "孔", "曹", "嚴", "華", "金", "魏", "陶", "姜", "戚", "謝", "鄒", "喻", "柏", "水", "竇", "章", "雲", "蘇", "潘", "葛", "奚", "范", "彭", "郎",
            "魯", "韋", "昌", "馬", "苗", "鳳", "花", "方", "俞", "任", "袁", "柳", "酆", "鮑", "史", "唐", "費", "廉", "岑", "薛", "雷", "賀", "倪", "湯", "滕", "殷",
            "羅", "畢", "郝", "鄔", "安", "常", "樂", "於", "時", "傅", "皮", "卞", "齊", "康", "伍", "余", "元", "卜", "顧", "孟", "平", "黃", "和",
            "穆", "蕭", "尹", "姚", "邵", "湛", "汪", "祁", "毛", "禹", "狄", "米", "貝", "明", "臧", "計", "伏", "成", "戴", "談", "宋", "茅", "龐", "熊", "紀", "舒",
            "屈", "項", "祝", "董", "梁", "杜", "阮", "藍", "閔", "席", "季"};
    private static final String GIRL = "秀娟英華慧巧美娜靜淑惠珠翠雅芝玉萍紅娥玲芬芳燕彩春菊蘭鳳潔梅琳素雲蓮真環雪榮愛妹霞香月鶯媛艷瑞凡佳嘉瓊勤珍貞莉桂娣葉璧璐婭琦晶妍茜秋珊莎錦黛青倩婷姣婉嫻瑾穎露瑤怡嬋雁蓓紈儀荷丹蓉眉君琴蕊薇菁夢嵐苑婕馨瑗琰韻融園藝詠卿聰瀾純毓悅昭冰爽琬茗羽希寧欣飄育瀅馥筠柔竹靄凝曉歡霄楓芸菲寒伊亞宜可姬舒影荔枝思麗 ";
    private static final String BOY = "偉剛勇毅俊峰強軍平保東文輝力明永健世廣志義興良海山仁波寧貴福生龍元全國勝學祥才發武新利清飛彬富順信子傑濤昌成康星光天達安岩中茂進林有堅和彪博誠先敬震振壯會思群豪心邦承樂紹功松善厚慶磊民友裕河哲江超浩亮政謙亨奇固之輪翰朗伯宏言若鳴朋斌梁棟維啟克倫翔旭鵬澤晨辰士以建家致樹炎德行時泰盛雄琛鈞冠策騰楠榕風航弘";

    /** * 寫測試 */
    @Test
    public void test() throws IOException {
        int number = 100000;

        StopWatch sw = new StopWatch();

        sw.start();
        poiTest(number, "XSSF");
        sw.stop();
        System.out.println("POI(XSSF)寫入" + number + "條數據耗時" + sw.getLastTaskTimeMillis() + "ms");

        sw.start();
        poiTest(number, "SXSSF");
        sw.stop();
        System.out.println("POI(SXSSF)寫入" + number + "條數據耗時" + sw.getLastTaskTimeMillis() + "ms");

        sw.start();
        easyExcelTest(number);
        sw.stop();
        System.out.println("EasyExcel寫入" + number + "條數據耗時" + sw.getLastTaskTimeMillis() + "ms");

    }

    private void poiTest(int number, String type) throws IOException {
        String path = "D:\\tmp\\test.xlsx";
        try (Workbook wb = "SXSSF".equals(type) ? new SXSSFWorkbook() : new XSSFWorkbook()) {
            Sheet sheet = wb.createSheet();
            Row row = sheet.createRow(0);
            row.createCell(0).setCellValue("ID");
            row.createCell(1).setCellValue("姓名");
            row.createCell(2).setCellValue("年齡");
            row.createCell(3).setCellValue("性別");
            row.createCell(4).setCellValue("是否會員");
            for (int i = 1; i < number; i++) {
                row = sheet.createRow(i);
                List<String> randomData = this.getRandomData();
                row.createCell(0).setCellValue(i);
                for (int col = 0; col < randomData.size(); col++) {
                    row.createCell(col + 1).setCellValue(randomData.get(col));
                }
            }
            wb.write(new FileOutputStream(path));
        }
    }

    private void easyExcelTest(int number) throws IOException {
        List<List> dataList = new LinkedList<>();
        List<String> header = new LinkedList<>();
        header.add("ID");
        header.add("姓名");
        header.add("年齡");
        header.add("性別");
        header.add("是否會員");
        dataList.add(header);
        for (int i = 1; i < number; i++) {
            List<String> randomData = this.getRandomData();
            List<String> data = new LinkedList<>();
            data.add(String.valueOf(i));
            data.addAll(randomData);
            dataList.add(data);
        }
        String path = "D:\\tmp\\test.xlsx";
        File file = new File(path);
        ExcelWriter excelWriter = EasyExcel.write(file).build();
        WriteSheet writeSheet = EasyExcel.writerSheet(0).build();
        excelWriter.write(dataList, writeSheet);
        excelWriter.finish();
    }

    private List<String> getRandomData() {
        int usernameRandom = RandomUtil.randomInt(0, USERNAME.length - 1);
        String name = USERNAME[usernameRandom];
        if (usernameRandom % 2 == 0) {
            name += GIRL.substring(usernameRandom % GIRL.length(), (usernameRandom + 2) % GIRL.length());
        } else {
            name += BOY.substring(usernameRandom % BOY.length(), (usernameRandom + 1) % BOY.length());
        }
        String age = String.valueOf(RandomUtil.randomInt(10, 50));
        String sex = usernameRandom % 2 == 0 ? "女" : "男";
        String isVip = usernameRandom % 2 == 0 ? "是" : "否";

        return Arrays.asList(name, age, sex, isVip);
    }


    /** * 讀測試 */
    @Test
    public void readTest() throws IOException {
        StopWatch sw = new StopWatch();
        sw.start();
        poiReadTest();
        sw.stop();
        System.out.println("POI讀取數據耗時" + sw.getLastTaskTimeMillis() + "ms");

        sw.start();
        easyExcelReadTest();
        sw.stop();
        System.out.println("EasyExcel讀取數據耗時" + sw.getLastTaskTimeMillis() + "ms");
    }

    private void poiReadTest() throws IOException {
        String path = "D:\\tmp\\test.xlsx";
        File file = new File(path);
        List<List<String>> result = new LinkedList<>();
        Workbook wb = WorkbookFactory.create(file);
        Sheet sheet = wb.getSheetAt(0);
        Iterator<Row> rowIterator = sheet.rowIterator();
        Row row;
        while (rowIterator.hasNext()) {
            row = rowIterator.next();

            Iterator<Cell> cellIterator = row.cellIterator();
            List<String> data = new LinkedList<>();
            while (cellIterator.hasNext()) {
                data.add(cellIterator.next().getStringCellValue());
            }
            result.add(data);
        }
        wb.close();
        System.out.println("獲取到" + result.size() + "條數據");
    }

    private void easyExcelReadTest() throws IOException {
        String path = "D:\\tmp\\test.xlsx";
        File file = new File(path);
        ExcelReader excelReader = EasyExcel.read(file, new ExcelListener()).build();
        ReadSheet readSheet = new ReadSheet(0);
        excelReader.read(readSheet);
        excelReader.finish();
    }

    private class ExcelListener extends AnalysisEventListener<LinkedHashMap> {

        private List<LinkedHashMap> result = new ArrayList<>();

        @Override
        public void invoke(LinkedHashMap linkedHashMap, AnalysisContext analysisContext) {
            result.add(linkedHashMap);
        }

        @Override
        public void doAfterAllAnalysed(AnalysisContext analysisContext) {
            System.out.println("獲取到" + result.size() + "條數據");
        }
    }
}


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM