解決POI讀取Excel內存溢出的問題


https://blog.csdn.net/lishengbo/article/details/40711769
原創lishengbo 最后發布於2014-11-02 17:23:11 閱讀數 66825 收藏
展開

          Office軟件一直是一個誨譽參半的軟件,廣大普通計算機用戶用Office來滿足日常辦公需求,於是就產生了很多生產數據和文檔,需要和企業單位的專用辦公系統對接,而Office的解析工作一直是程序員非常頭痛的問題,經常招致程序員的謾罵,也被譽為是微軟最爛的發明之一。POI的誕生解決了Excel的解析難題(POI即“討厭的電子表格”,確實很討厭,我也很討厭Excel),但如果用不好POI,也會導致程序出現一些BUG,例如內存溢出,假空行,公式等等問題。下面介紹一種解決POI讀取Excel內存溢出的問題。

        POI讀取Excel有兩種模式,一種是用戶模式,一種是SAX模式,將xlsx格式的文檔轉換成CVS格式后再進行處理用戶模式相信大家都很清楚,也是POI常用的方式,用戶模式API接口豐富,我們可以很容易的使用POI的API讀取Excel,但用戶模式消耗的內存很大,當遇到很多sheet、大數據網格、假空行、公式等問題時,很容易導致內存溢出。POI官方推薦解決內存溢出的方式使用CVS格式解析,我們不可能手工將Excel文件轉換成CVS格式再上傳,這樣做太麻煩了,好再POI給出了xlsx轉換CVS的例子,基於這個例子我進行了一下改造,即可解決用戶模式讀取Excel內存溢出的問題。下面附上代碼:

    

/* ====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==================================================================== */

import java.io.IOException;
import java.io.InputStream;
import java.io.PrintStream;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.apache.poi.hssf.usermodel.HSSFDateUtil;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.BuiltinFormats;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;

/**
* 使用CVS模式解決XLSX文件,可以有效解決用戶模式內存溢出的問題
* 該模式是POI官方推薦的讀取大數據的模式,在用戶模式下,數據量較大、Sheet較多、或者是有很多無用的空行的情況
* ,容易出現內存溢出,用戶模式讀取Excel的典型代碼如下: FileInputStream file=new
* FileInputStream("c:\\test.xlsx"); Workbook wb=new XSSFWorkbook(file);
*
*
* @author 山人
*/
public class XLSXCovertCSVReader {

/**
* The type of the data value is indicated by an attribute on the cell. The
* value is usually in a "v" element within the cell.
*/
enum xssfDataType {
BOOL, ERROR, FORMULA, INLINESTR, SSTINDEX, NUMBER,
}

/**
* 使用xssf_sax_API處理Excel,請參考: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api
* <p/>
* Also see Standard ECMA-376, 1st edition, part 4, pages 1928ff, at
* http://www.ecma-international.org/publications/standards/Ecma-376.htm
* <p/>
* A web-friendly version is http://openiso.org/Ecma/376/Part4
*/
class MyXSSFSheetHandler extends DefaultHandler {

/**
* Table with styles
*/
private StylesTable stylesTable;

/**
* Table with unique strings
*/
private ReadOnlySharedStringsTable sharedStringsTable;

/**
* Destination for data
*/
private final PrintStream output;

/**
* Number of columns to read starting with leftmost
*/
private final int minColumnCount;

// Set when V start element is seen
private boolean vIsOpen;

// Set when cell start element is seen;
// used when cell close element is seen.
private xssfDataType nextDataType;

// Used to format numeric cell values.
private short formatIndex;
private String formatString;
private final DataFormatter formatter;

private int thisColumn = -1;
// The last column printed to the output stream
private int lastColumnNumber = -1;

// Gathers characters as they are seen.
private StringBuffer value;
private String[] record;
private List<String[]> rows = new ArrayList<String[]>();
private boolean isCellNull = false;

/**
* Accepts objects needed while parsing.
*
* @param styles
* Table of styles
* @param strings
* Table of shared strings
* @param cols
* Minimum number of columns to show
* @param target
* Sink for output
*/
public MyXSSFSheetHandler(StylesTable styles,
ReadOnlySharedStringsTable strings, int cols, PrintStream target) {
this.stylesTable = styles;
this.sharedStringsTable = strings;
this.minColumnCount = cols;
this.output = target;
this.value = new StringBuffer();
this.nextDataType = xssfDataType.NUMBER;
this.formatter = new DataFormatter();
record = new String[this.minColumnCount];
rows.clear();// 每次讀取都清空行集合
}

/*
* (non-Javadoc)
*
* @see
* org.xml.sax.helpers.DefaultHandler#startElement(java.lang.String,
* java.lang.String, java.lang.String, org.xml.sax.Attributes)
*/
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {

if ("inlineStr".equals(name) || "v".equals(name)) {
vIsOpen = true;
// Clear contents cache
value.setLength(0);
}
// c => cell
else if ("c".equals(name)) {
// Get the cell reference
String r = attributes.getValue("r");
int firstDigit = -1;
for (int c = 0; c < r.length(); ++c) {
if (Character.isDigit(r.charAt(c))) {
firstDigit = c;
break;
}
}
thisColumn = nameToColumn(r.substring(0, firstDigit));

// Set up defaults.
this.nextDataType = xssfDataType.NUMBER;
this.formatIndex = -1;
this.formatString = null;
String cellType = attributes.getValue("t");
String cellStyleStr = attributes.getValue("s");
if ("b".equals(cellType))
nextDataType = xssfDataType.BOOL;
else if ("e".equals(cellType))
nextDataType = xssfDataType.ERROR;
else if ("inlineStr".equals(cellType))
nextDataType = xssfDataType.INLINESTR;
else if ("s".equals(cellType))
nextDataType = xssfDataType.SSTINDEX;
else if ("str".equals(cellType))
nextDataType = xssfDataType.FORMULA;
else if (cellStyleStr != null) {
// It's a number, but almost certainly one
// with a special style or format
int styleIndex = Integer.parseInt(cellStyleStr);
XSSFCellStyle style = stylesTable.getStyleAt(styleIndex);
this.formatIndex = style.getDataFormat();
this.formatString = style.getDataFormatString();
if (this.formatString == null)
this.formatString = BuiltinFormats
.getBuiltinFormat(this.formatIndex);
}
}

}

/*
* (non-Javadoc)
*
* @see org.xml.sax.helpers.DefaultHandler#endElement(java.lang.String,
* java.lang.String, java.lang.String)
*/
public void endElement(String uri, String localName, String name)
throws SAXException {

String thisStr = null;

// v => contents of a cell
if ("v".equals(name)) {
// Process the value contents as required.
// Do now, as characters() may be called more than once
switch (nextDataType) {

case BOOL:
char first = value.charAt(0);
thisStr = first == '0' ? "FALSE" : "TRUE";
break;

case ERROR:
thisStr = "\"ERROR:" + value.toString() + '"';
break;

case FORMULA:
// A formula could result in a string value,
// so always add double-quote characters.
thisStr = '"' + value.toString() + '"';
break;

case INLINESTR:
// TODO: have seen an example of this, so it's untested.
XSSFRichTextString rtsi = new XSSFRichTextString(
value.toString());
thisStr = '"' + rtsi.toString() + '"';
break;

case SSTINDEX:
String sstIndex = value.toString();
try {
int idx = Integer.parseInt(sstIndex);
XSSFRichTextString rtss = new XSSFRichTextString(
sharedStringsTable.getEntryAt(idx));
thisStr = '"' + rtss.toString() + '"';
} catch (NumberFormatException ex) {
output.println("Failed to parse SST index '" + sstIndex
+ "': " + ex.toString());
}
break;

case NUMBER:
String n = value.toString();
// 判斷是否是日期格式
if (HSSFDateUtil.isADateFormat(this.formatIndex, n)) {
Double d = Double.parseDouble(n);
Date date=HSSFDateUtil.getJavaDate(d);
thisStr=formateDateToString(date);
} else if (this.formatString != null)
thisStr = formatter.formatRawCellContents(
Double.parseDouble(n), this.formatIndex,
this.formatString);
else
thisStr = n;
break;

default:
thisStr = "(TODO: Unexpected type: " + nextDataType + ")";
break;
}

// Output after we've seen the string contents
// Emit commas for any fields that were missing on this row
if (lastColumnNumber == -1) {
lastColumnNumber = 0;
}
//判斷單元格的值是否為空
if (thisStr == null || "".equals(isCellNull)) {
isCellNull = true;// 設置單元格是否為空值
}
record[thisColumn] = thisStr;
// Update column
if (thisColumn > -1)
lastColumnNumber = thisColumn;

} else if ("row".equals(name)) {

// Print out any missing commas if needed
if (minColumns > 0) {
// Columns are 0 based
if (lastColumnNumber == -1) {
lastColumnNumber = 0;
}
if (isCellNull == false && record[0] != null
&& record[1] != null)// 判斷是否空行
{
rows.add(record.clone());
isCellNull = false;
for (int i = 0; i < record.length; i++) {
record[i] = null;
}
}
}
lastColumnNumber = -1;
}

}

public List<String[]> getRows() {
return rows;
}

public void setRows(List<String[]> rows) {
this.rows = rows;
}

/**
* Captures characters only if a suitable element is open. Originally
* was just "v"; extended for inlineStr also.
*/
public void characters(char[] ch, int start, int length)
throws SAXException {
if (vIsOpen)
value.append(ch, start, length);
}

/**
* Converts an Excel column name like "C" to a zero-based index.
*
* @param name
* @return Index corresponding to the specified name
*/
private int nameToColumn(String name) {
int column = -1;
for (int i = 0; i < name.length(); ++i) {
int c = name.charAt(i);
column = (column + 1) * 26 + c - 'A';
}
return column;
}

private String formateDateToString(Date date) {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");//格式化日期
return sdf.format(date);

}

}

// /////////////////////////////////////

private OPCPackage xlsxPackage;
private int minColumns;
private PrintStream output;
private String sheetName;

/**
* Creates a new XLSX -> CSV converter
*
* @param pkg
* The XLSX package to process
* @param output
* The PrintStream to output the CSV to
* @param minColumns
* The minimum number of columns to output, or -1 for no minimum
*/
public XLSXCovertCSVReader(OPCPackage pkg, PrintStream output,
String sheetName, int minColumns) {
this.xlsxPackage = pkg;
this.output = output;
this.minColumns = minColumns;
this.sheetName = sheetName;
}

/**
* Parses and shows the content of one sheet using the specified styles and
* shared-strings tables.
*
* @param styles
* @param strings
* @param sheetInputStream
*/
public List<String[]> processSheet(StylesTable styles,
ReadOnlySharedStringsTable strings, InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {

InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
MyXSSFSheetHandler handler = new MyXSSFSheetHandler(styles, strings,
this.minColumns, this.output);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
return handler.getRows();
}

/**
* 初始化這個處理程序 將
*
* @throws IOException
* @throws OpenXML4JException
* @throws ParserConfigurationException
* @throws SAXException
*/
public List<String[]> process() throws IOException, OpenXML4JException,
ParserConfigurationException, SAXException {

ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(
this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
List<String[]> list = null;
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader
.getSheetsData();
int index = 0;
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetNameTemp = iter.getSheetName();
if (this.sheetName.equals(sheetNameTemp)) {
list = processSheet(styles, strings, stream);
stream.close();
++index;
}
}
return list;
}

/**
* 讀取Excel
*
* @param path
* 文件路徑
* @param sheetName
* sheet名稱
* @param minColumns
* 列總數
* @return
* @throws SAXException
* @throws ParserConfigurationException
* @throws OpenXML4JException
* @throws IOException
*/
private static List<String[]> readerExcel(String path, String sheetName,
int minColumns) throws IOException, OpenXML4JException,
ParserConfigurationException, SAXException {
OPCPackage p = OPCPackage.open(path, PackageAccess.READ);
XLSXCovertCSVReader xlsx2csv = new XLSXCovertCSVReader(p, System.out,
sheetName, minColumns);
List<String[]> list = xlsx2csv.process();
p.close();
return list;
}

public static void main(String[] args) throws Exception {
List<String[]> list = XLSXCovertCSVReader
.readerExcel(
"F:\\test.xlsx",
"Sheet1", 17);
for (String[] record : list) {
for (String cell : record) {
System.out.print(cell + " ");
}
System.out.println();
}
}

}

————————————————
版權聲明:本文為CSDN博主「lishengbo」的原創文章,遵循 CC 4.0 BY-SA 版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/lishengbo/article/details/40711769


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM