几乎每个后台管理系统都难逃excel导入和导出的功能,目前针对大数据量的导入和导出多数是使用EasyExcel,EasyExcel极大地简单了java对excel的读写操作,从pom的依赖上知道easyexcel依赖POI,那么EasyExcel是如何实现高性能的excel读写操作呢,实现上和POI有什么区别和联系呢?
要讲EasyExcel,必须先讲一下excel版本和POI。
关于Excel版本
版本 | 最大数据量 | 后缀 | 文档类型 | 解压后的文件结构 |
---|---|---|---|---|
2003 | 65535 | .xls | 复合文档 | |
2007 | 1048575 | .xlsx | XML文档 | |
Excel2007及以上版本,其文件由多个xml文件打包组成,其存数据的文件位于xl/worksheets/下,一个sheet以一个xml文件的形成存在,而正是由于这个文件结构,java可凭借其丰富强大的xml解析类库,实现对xml文件的流式追加写入和流式读取,避免全量读入数据而内存溢出,使读取大批量数据成为可能。
POI
可用于操作Microsoft Office的相关文件(可用于excel,word,ppt等)的一套程序库,apache出品。POI用于操作Excel的对象都是实现了Workbook接口的类.
Workboot是创建Excel对象或Sheet等的顶级对象,主要的子类有:E
实现类 | Excel版本 | 特点 |
---|---|---|
HSSFWorkbook | 2003及以前 | 对象常驻内存,易OOM |
XSSFWorkbook | 2007及以后 | 对象常驻内存,易OOM |
SXSSFWorkbook | 2007及以后 | 流式的XSSFWorkbook, 有效避免OOM,仅用于写 |
这三套实现,还对应着各自的家族成员
工作簿 | HSSFWorkbook | XSSFWorkbook | SXSSFWorkbook |
Sheet | HSSFSheet | XSSFSheet | SXSSFSheet |
行 | HSSFRow | XSSFRow | SXSSFRow |
单元格 | HSSFCell | XSSFCell | SXSSFCell |
其中HSSFWorkbook和XSSFWorkbook对Excel的处理和解析是非常吃内存的,每个单元格都是一个 Cell对象,每个Cell对象又有CellStyle对象,想想1个10列*10W行,单单它的Cell对象就已经有100W个,如果需要全部加载进内存,无疑是程序的灾难,于是SXSSFWorkbook随之出现。
SXSSFWorkbook
SXSSFWorkbook是streaming版本的XSSFWorkbook,它只会保存最新的excel rows(默认100行)在内存里供查看,在此之前的excel rows都会被写入到硬盘里。被写入到硬盘里的rows对程序而言是不可见的,详见:
SXSSFSheet#createRow
public SXSSFRow createRow(int rownum) {
int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();
if (rownum < 0 || rownum > maxrow) {
throw new IllegalArgumentException("Invalid row number (" + rownum
+ ") outside allowable range (0.." + maxrow + ")");
}
// attempt to overwrite a row that is already flushed to disk
if(rownum <= _writer.getLastFlushedRow() ) {
throw new IllegalArgumentException(
"Attempting to write a row["+rownum+"] " +
"in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk.");
}
// attempt to overwrite a existing row in the input template
if(_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum() ) {
throw new IllegalArgumentException(
"Attempting to write a row["+rownum+"] " +
"in the range [0," + _sh.getLastRowNum() + "] that is already written to disk.");
}
SXSSFRow newRow = new SXSSFRow(this);
_rows.put(rownum, newRow);
allFlushed = false;
if(_randomAccessWindowSize >= 0 && _rows.size() > _randomAccessWindowSize) {
try {
//如果写入的行数>_randomAccessWindowSize(默认100),则将老的记录写入磁盘
flushRows(_randomAccessWindowSize);
} catch (IOException ioe) {
throw new RuntimeException(ioe);
}
}
return newRow;
}
在创建新行的过程中,SXSSFSheet通过flushRows()将多于随机读取窗口大小的Row写入磁盘的临时文件中,同时从内存中移除,从而实现内存减负,避免OOM的出现。
原生POI,通过SXSSF系列API,可以实现100W数据无压力输出(在-Xmx20M下即可运行)
POI版的导出程序(DEMO)
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.lang.reflect.Field;
import java.util.Collection;
import org.apache.commons.beanutils.BeanUtils;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import com.alibaba.excel.annotation.ExcelProperty;
public class PoiWriter<T> {
private SXSSFWorkbook workbook = null;
private SXSSFSheet sheet = null;
private Class<T> head = null;
private boolean headerInited = false;
private int rowNum = 0;
public PoiWriter(Class<T> head) {
this.workbook = new SXSSFWorkbook();
this.sheet = workbook.createSheet("sheet0");
this.head = head;
}
public void write(Collection<T> dataset) throws Exception {
initHeader();
for (T object : dataset) {
writeRow(object);
}
}
private void writeRow(T object) throws Exception {
Field[] fields = head.getDeclaredFields();
Row row = sheet.createRow(rowNum ++);
int index = 0;
for (Field field : fields) {
String value = BeanUtils.getProperty(object, field.getName());
row.createCell(index ++).setCellValue(value);
}
}
public void flush(String fileName) throws IOException {
FileOutputStream fos = new FileOutputStream(new File(fileName));
workbook.write(fos);
fos.close();
}
private void initHeader() {
if(!headerInited) {
createHeader();
headerInited = true;
}
}
private void createHeader() {
Field[] fields = head.getDeclaredFields();
int index = 0;
Row row = sheet.createRow(rowNum ++);
for (Field field : fields) {
ExcelProperty excelProperty = field.getAnnotation(ExcelProperty.class);
String columnName = excelProperty != null ? excelProperty.value()[0] : field.getName();
row.createCell(index ++).setCellValue(columnName);
}
}
}
public class PoiWriterTest {
@Test
public void write() throws Exception {
PoiWriter<Student> poiWriter = new PoiWriter<>(Student.class);
int pageSize = 1000;
for (int pageNo = 1; pageNo <= 100; pageNo++) {
poiWriter.write(pageQuery(pageNo, pageSize));
}
poiWriter.flush("D:/stu.xlsx");
}
//模拟分页查询
private List<Student> pageQuery(int pageNo, int pageSize) {
List<Student> students = new ArrayList<>();
for (int i = 0; i < pageSize; i++) {
int index = (pageNo - 1) * pageSize + i;
students.add(build(index));
}
return students;
}
private Student build(int index) {
Student s = new Student();
s.setName("学生" + index);
s.setAddress("地址" + index);
s.setAge(String.valueOf(RandomUtils.nextInt(20, 30)));
s.setClazz("班别" + index % 8);
s.setGrade("年级" + index % 6);
s.setTel(String.valueOf(13500000000L + RandomUtils.nextInt(1, 10000000)));
s.setChinese(String.valueOf(RandomUtils.nextInt(50, 100)));
s.setEnglish(String.valueOf(RandomUtils.nextInt(50, 100)));
return s;
}
}
@Data
public class Student {
@ExcelProperty(value="英语")
private String english;
@ExcelProperty(value="姓名")
private String name;
@ExcelProperty(value="年龄")
private String age;
@ExcelProperty(value="学校")
private String school;
@ExcelProperty(value="年级")
private String grade;
@ExcelProperty(value="班别")
private String clazz;
@ExcelProperty(value="地址")
private String address;
@ExcelProperty(value="电话")
private String tel;
@ExcelProperty(value="语文")
private String chinese;
@ExcelProperty(value="数学")
private String math;
}
注意,这时的写到磁盘,只是当前用户的一个临时目录,是一个xml文件,里面记录着行的内容,格式化后形如:
<row r="1">
<c r="A1" s="1" t="inlineStr">
<is>
<t>内容1</t>
</is>
</c>
<c r="B1" s="1" t="inlineStr">
<is>
<t>内容2</t>
</is>
</c>
</row>
<row>表示一行,<c>表示一个单元格,<t>记录单元格的内容,其它各自脑补。
SXSSFWorkbook只实现了流式写,并没有实现流式读,流式读取excel需要通过Sax模式的事件API进行实现,该核心实现如下:
POI版Excel解析(DEMO)
public class PoiSaxReader {
public void read(String path, AbstractSheetHandler<?> handler) throws Exception {
// 1.根据excel报表获取OPCPackage
OPCPackage opcPackage = OPCPackage.open(path, PackageAccess.READ);
// 2.创建XSSFReader
XSSFReader reader = new XSSFReader(opcPackage);
// 3.获取SharedStringTable对象
SharedStringsTable table = reader.getSharedStringsTable();
// 4.获取styleTable对象
StylesTable stylesTable = reader.getStylesTable();
// 5.创建Sax的xmlReader对象
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
// 6.注册事件处理器
XSSFSheetXMLHandler xmlHandler = new XSSFSheetXMLHandler(stylesTable, table, handler, false);
xmlReader.setContentHandler(xmlHandler);
// 7.第一个sheet的流数据
XSSFReader.SheetIterator sheetIterator = (XSSFReader.SheetIterator) reader.getSheetsData();
InputStream stream = sheetIterator.next();
InputSource is = new InputSource(stream);
// 8.最关键:执行xml解析,解析过程中会回调XSSFSheetXMLHandler的startElement等方法
xmlReader.parse(is);
}
static class StudentHandler extends AbstractSheetHandler<Student> {
private int total = 0;
@Override
protected void invoke(Student object) {
total += 1;
if(total % 1000 == 0) {
System.out.println("读取到:"+total);
System.out.println(object);
}
}
}
public static void main(String[] args) throws Exception {
PoiSaxReader reader = new PoiSaxReader();
reader.read("D:/stu_100W.xlsx", new StudentHandler());
}
}
/**
* 基于Sax的解析处理器抽象
*/
public abstract class AbstractSheetHandler<T> implements XSSFSheetXMLHandler.SheetContentsHandler {
// 将输出到excel的对象
private T object;
private Class<T> clazz;
@SuppressWarnings("unchecked")
public AbstractSheetHandler() {
Type genType = getClass().getGenericSuperclass();
Type[] params = ((ParameterizedType) genType).getActualTypeArguments();
this.clazz = (Class<T>) params[0];
}
/**
* 当开始解析某一行的时候触发
*/
@Override
public void startRow(int i) {
try {
object = clazz.newInstance();
} catch (InstantiationException | IllegalAccessException e) {
e.printStackTrace();
}
}
/**
* 当结束解析某一行的时候触发
*/
@Override
public void endRow(int i) {
invoke(object);
}
protected abstract void invoke(T object);
/**
* cellReference:单元格引用,形如A1,B1
*/
@Override
public void cell(String cellReference, String value, XSSFComment xssfComment) {
Field[] fields = clazz.getDeclaredFields();
int columnIndex = new CellReference(cellReference).getCol();
try {
BeanUtils.setProperty(object, fields[columnIndex].getName(), value);
} catch (IllegalAccessException | InvocationTargetException e) {
e.printStackTrace();
}
}
}
以上程序,同样在-Xmx20M下可以成功运行。该程序只是孤立地对每一条数据进行处理,真实情况下程序可能需要拿到所有行的完整视图,如果需要将所有对象放进一个List中,那所消耗的内存当然远不止20M,但这种只能说是业务消耗的内存,而非POI消耗的内存。
上面用到的一些类可能都相对陌生,其核心逻辑就是解压xlsx文件,然后通过sax事件驱动模式对excel数据(即xml)进行逐行解释。这里重要的一个点是XSSFSheetXMLHandler
XSSFSheetXMLHandler
该类继承自DefaultHandler,XMLReader执行解析的过程中,会回调其startElement/endElement/characters等方法,后面说到的easyexcel,其解析过程是easyexcel自己搞一个XlsxRowHandler。
原生的POI虽然可以实现大数据量的导入和导出,但对于业务开发人员来说,使用起来还是没那么友好,特别是在处理大数据量导入的时候,EasyExcel基于POI,主要解决易用度的问题,同时也对POI使用Sax事件API进行数据行解释那一块进行了重写,减少了内存消耗。见EasyExcel官网的解释
Java解析、生成Excel比较有名的框架有Apache poi、jxl。但他们都存在一个严重的问题就是非常的耗内存,poi有一套SAX模式的API可以一定程度的解决一些内存溢出的问题,但POI还是有一些缺陷,比如07版Excel解压缩以及解压后存储都是在内存中完成的,内存消耗依然很大。easyexcel重写了poi对07版Excel的解析,一个3M的excel用POI sax解析依然需要100M左右内存,改用easyexcel可以降低到几M,并且再大的excel也不会出现内存溢出;03版依赖POI的sax模式,在上层做了模型转换的封装,让使用者更加简单方便
EasyExcel
读Excel
@Test
public void read() {
EasyExcel.read("D:/stu_100W.xlsx", Student.class, new PageReadListener<Student>(dataList -> {
log.info("读取到:{}", dataList.size());
})).sheet().doRead();
}
EasyExcel的API确实很简洁,一波debug跟进,来到XlsxSaxAnalyser#execute
XlsxSaxAnalyser#execute
@Override
public void execute() {
for (ReadSheet readSheet : sheetList) {
readSheet = SheetUtils.match(readSheet, xlsxReadContext);
if (readSheet != null) {
xlsxReadContext.currentSheet(readSheet);
// 注意这个 XlsxRowHandler
parseXmlSource(sheetMap.get(readSheet.getSheetNo()), new XlsxRowHandler(xlsxReadContext));
readComments(readSheet);
xlsxReadContext.analysisEventProcessor().endSheet(xlsxReadContext);
}
}
}
private void parseXmlSource(InputStream inputStream, ContentHandler handler) {
InputSource inputSource = new InputSource(inputStream);
try {
SAXParserFactory saxFactory;
String xlsxSAXParserFactoryName = xlsxReadContext.xlsxReadWorkbookHolder().getSaxParserFactoryName();
if (StringUtils.isEmpty(xlsxSAXParserFactoryName)) {
saxFactory = SAXParserFactory.newInstance();
} else {
saxFactory = SAXParserFactory.newInstance(xlsxSAXParserFactoryName, null);
}
try {
saxFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
} catch (Throwable ignore) {}
try {
saxFactory.setFeature("http://xml.org/sax/features/external-general-entities", false);
} catch (Throwable ignore) {}
try {
saxFactory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
} catch (Throwable ignore) {}
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(handler);
//和POI一样的套路
xmlReader.parse(inputSource);
inputStream.close();
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new ExcelAnalysisException(e);
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
throw new ExcelAnalysisException("Can not close 'inputStream'!");
}
}
}
}
XlsxRowHandler
public class XlsxRowHandler extends DefaultHandler {
private final XlsxReadContext xlsxReadContext;
private static final Map<String, XlsxTagHandler> XLSX_CELL_HANDLER_MAP = new HashMap<String, XlsxTagHandler>(32);
static {
CellFormulaTagHandler cellFormulaTagHandler = new CellFormulaTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_FORMULA_TAG, cellFormulaTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_FORMULA_TAG, cellFormulaTagHandler);
CellInlineStringValueTagHandler cellInlineStringValueTagHandler = new CellInlineStringValueTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_INLINE_STRING_VALUE_TAG, cellInlineStringValueTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_INLINE_STRING_VALUE_TAG, cellInlineStringValueTagHandler);
CellTagHandler cellTagHandler = new CellTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_TAG, cellTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_TAG, cellTagHandler);
CellValueTagHandler cellValueTagHandler = new CellValueTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.CELL_VALUE_TAG, cellValueTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_CELL_VALUE_TAG, cellValueTagHandler);
CountTagHandler countTagHandler = new CountTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.DIMENSION_TAG, countTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_DIMENSION_TAG, countTagHandler);
HyperlinkTagHandler hyperlinkTagHandler = new HyperlinkTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.HYPERLINK_TAG, hyperlinkTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_HYPERLINK_TAG, hyperlinkTagHandler);
MergeCellTagHandler mergeCellTagHandler = new MergeCellTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.MERGE_CELL_TAG, mergeCellTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_MERGE_CELL_TAG, mergeCellTagHandler);
RowTagHandler rowTagHandler = new RowTagHandler();
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.ROW_TAG, rowTagHandler);
XLSX_CELL_HANDLER_MAP.put(ExcelXmlConstants.X_ROW_TAG, rowTagHandler);
}
public XlsxRowHandler(XlsxReadContext xlsxReadContext) {
this.xlsxReadContext = xlsxReadContext;
}
@Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
XlsxTagHandler handler = XLSX_CELL_HANDLER_MAP.get(name);
if (handler == null || !handler.support(xlsxReadContext)) {
return;
}
xlsxReadContext.xlsxReadSheetHolder().getTagDeque().push(name);
handler.startElement(xlsxReadContext, name, attributes);
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
String currentTag = xlsxReadContext.xlsxReadSheetHolder().getTagDeque().peek();
if (currentTag == null) {
return;
}
XlsxTagHandler handler = XLSX_CELL_HANDLER_MAP.get(currentTag);
if (handler == null || !handler.support(xlsxReadContext)) {
return;
}
handler.characters(xlsxReadContext, ch, start, length);
}
@Override
public void endElement(String uri, String localName, String name) throws SAXException {
XlsxTagHandler handler = XLSX_CELL_HANDLER_MAP.get(name);
if (handler == null || !handler.support(xlsxReadContext)) {
return;
}
handler.endElement(xlsxReadContext, name);
xlsxReadContext.xlsxReadSheetHolder().getTagDeque().pop();
}
}
相比于XSSFSheetXMLHandler,XlsxRowHandler维护一个XlsxTagHandler的集合,XlsxRowHandler充当一个门面,解析逻辑交给各个具体的XlsxTagHandler
XlsxTagHandler
easyexcel实现了如下的XlsxTagHandler:
其中CellTagHandler解析<c>元素,CellValueTagHandler和CellInlineStringValueTagHandler解析<t>和<x:t>等元素,提取出单元格内容,这里不做过多阐述。
写Excel
@Test
public void export() {
ExcelWriter excelWriter = EasyExcel.write("D:/student_100W.xlsx", Student.class).build();
WriteSheet sheet0 = EasyExcel.writerSheet("sheet0").build();
for (int pageNo = 1; pageNo <= 2; pageNo++) {
excelWriter.write(studentService.pageQuery(pageNo, pageSize), sheet0);
}
excelWriter.finish();
}
EasyExcel的POI进行了良好的封装,简单易用,API在使用上很符合大家使用日常操作excel的习惯:
得益于POI的SXSSFSheet.createRow时,会进行自动将老的Row进行刷盘操作,EasyExcel只在SXSSF家族上面做一个封装,即可实现大数据量的写入操作。主要代码见:
ExcelWriteAddExecutor#add
public void add(Collection<?> data) {
if (CollectionUtils.isEmpty(data)) {
data = new ArrayList<>();
}
WriteSheetHolder writeSheetHolder = writeContext.writeSheetHolder();
int newRowIndex = writeSheetHolder.getNewRowIndexAndStartDoWrite();
if (writeSheetHolder.isNew() && !writeSheetHolder.getExcelWriteHeadProperty().hasHead()) {
newRowIndex += writeContext.currentWriteHolder().relativeHeadRowIndex();
}
// BeanMap is out of order, so use sortedAllFieldMap
Map<Integer, Field> sortedAllFieldMap = new TreeMap<>();
int relativeRowIndex = 0;
for (Object oneRowData : data) {
int lastRowIndex = relativeRowIndex + newRowIndex;
addOneRowOfDataToExcel(oneRowData, lastRowIndex, relativeRowIndex, sortedAllFieldMap);
relativeRowIndex++;
}
}
//向Excel写入一行
private void addOneRowOfDataToExcel(Object oneRowData, int rowIndex, int relativeRowIndex,
Map<Integer, Field> sortedAllFieldMap) {
if (oneRowData == null) {
return;
}
RowWriteHandlerContext rowWriteHandlerContext = WriteHandlerUtils.createRowWriteHandlerContext(writeContext,
rowIndex, relativeRowIndex, Boolean.FALSE);
WriteHandlerUtils.beforeRowCreate(rowWriteHandlerContext);
//这里调用:SXSSFSheet#createRow
Row row = WorkBookUtil.createRow(writeContext.writeSheetHolder().getSheet(), rowIndex);
rowWriteHandlerContext.setRow(row);
WriteHandlerUtils.afterRowCreate(rowWriteHandlerContext);
if (oneRowData instanceof Collection<?>) {
addBasicTypeToExcel(new CollectionRowData((Collection<?>)oneRowData), row, rowIndex, relativeRowIndex);
} else if (oneRowData instanceof Map) {
addBasicTypeToExcel(new MapRowData((Map<Integer, ?>)oneRowData), row, rowIndex, relativeRowIndex);
} else {
addJavaObjectToExcel(oneRowData, row, rowIndex, relativeRowIndex, sortedAllFieldMap);
}
WriteHandlerUtils.afterRowDispose(rowWriteHandlerContext);
}
总结:
1.EasyExcel的写,借助于SXSSFSheet.createRow解决导出时的内存消耗问题
2.EasyExcel的读,自己实现了DefaultHandler,对Excel文件包中xml数据文件进行流式解释,解决读时的内存消耗问题。
by simple