node 讀取超大Excel 文件，提取數據

本文轉載自查看原文 2019-12-16 10:38 459 nodejs

之前是用 node-xlsx 來處理excel文件，主要是讀取數據或者根據數據生成excel文件。不過，node-xlsx 似乎無法處理超大的excel（100MB以上），例如：

var xlsx = require('node-xlsx');
var sheets = xlsx.parse('./test.xlsx'); //獲取所有sheets

文件中有一個sheet的體積比較大，得到的是一個空的數組，估計是因為內存加載不進去。想要解決這個問題，似乎只有一種方法，就是用stream的方式，一段一段提取excel里面的數據。

然而，node-xlsx 不支持用流的方式讀取Excel，因為excel文件編碼的原因，只有把excel轉為xml或者csv，才能用流的方式處理。

感謝這位老哥，找到了xlsx-extract 這個庫，完美解決用流的方式讀取excel

 var powXLSX = require('xlsx-extract').XLSX;
    new powXLSX().extract('./test.xlsx', { sheet_all: true }) // 讀取文件所有sheet，默認只讀取第一張sheet，參數配置如下
      .on('sheet', function (sheet) {
        console.log('sheet', sheet);  // sheet is array [sheetname, sheetid, sheetnr]
      })
      .on('row', function (row) {
        console.log('row', row);  // row is a array of values or []
      })
      .on('cell', function (cell) {
        // console.log('cell', cell); //cell is a value or null
      })
      .on('error', function (err) {
        console.error('error', err);
      })
      .on('end', function (err) {
        console.log('eof');
      });

options = {
	// sheet selection (provide one of the following)
	sheet_name?: string; // select by sheet name
	sheet_nr?: string; // default "1" - select by number of the sheet starting on 1
	sheet_id?: string; // select by sheet id, e.g. "1"
	sheet_rid?: string; // select by internal sheet rid, e.g. "rId1'
	sheet_all?: boolean; // default false - select all sheets
	// sax parser selection
	parser?: string; // default "sax" - 'sax'|'expat'
	// row selection
	ignore_header?: number; // default 0 - the number of header lines to ignore
	include_empty_rows?: boolean; // default false - include empty rows in the middle/at start
	// how to output sheet, rows and cells
	format?: string; // default array - convert to 'array'||'json'||'tsv'||'obj'
	// tsv output options
	tsv_float_comma?: boolean; // default false - use "," als decimal point for floats
	tsv_delimiter?: string; // default '\t' - use specified character to field delimiter
	tsv_endofline?: string; // default depending on your operating system (node os.EOL) e.g. '\n'
	// cell value formats
	raw_values?: boolean;  // default false - do not apply cell formats (get values as string as in xlsx)
	round_floats?: boolean; // default true - round float values as the cell format defines (values will be reported as parsed floats otherwise)
	date1904?: boolean;   // default false - use date 1904 conversion
	ignore_timezone?: boolean; // default false - ignore timezone in date parsing
	convert_values?: { // apply cell number formats or not (values will be reported as strings otherwise)
		ints?: boolean;  // rounds to int if number format is for int
		floats?: boolean;  // rounds floats according to float number format
		dates?: boolean;  // converts xlsx date to js date
		bools?: boolean; // converts xlsx bool to js boolean
	};
	// xlsx structure options
	workfolder?: string; // default 'xl' - the workbook subfolder in zip structure
}

具體的實現原理可以去github 探究 xlsx-extract

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 如何用PHPExcel讀取超大excel文件 java 提取excel文件中的數據用python的pandas讀取excel文件中的數據 C# 讀取Excel文件數據 php讀取Excel文件數據用python讀取帶密碼的excel文件中的數據 matlab讀取excel文件中的數據 python 批處理excel文件實現數據的提取 OpenXML讀取Excel數據以及處理Excel大文件 ☕【Java深層系列】「技術盲區」讓我們一起去挑戰一下如何讀取一個較大或者超大的文件數據！