대용량 스프레드시트용 Apache POI Java Excel 성능

source

대용량 스프레드시트용 Apache POI Java Excel 성능

lovecheck 2023. 10. 29. 19:48

대용량 스프레드시트용 Apache POI Java Excel 성능

POI로 읽으려는 스프레드시트가 있는데(xls와 xlsx 형식이 모두 있습니다), 이 경우 xls 파일에 문제가 있습니다.제 스프레드시트에는 약 10,000개의 행과 75개의 열이 있으며, Excel이 몇 초 안에 열리지만 이를 읽는 데는 몇 분이 걸릴 수 있습니다.파일 전체를 메모리로 읽는 것보다는 이벤트 기반 읽기를 사용하고 있습니다.제 코드의 고기는 아래와 같습니다.지금은 좀 지저분하지만, POI 예제에서 대부분 복사한 긴 스위치 설명일 뿐입니다.

이벤트 모델을 이용한 POI 수행이 이렇게 느리다는 것이 일반적인 일입니까?제가 할 수 있는 일이 있을까요?저는 몇 분이면 제 신청을 받아들일 수 없을 것 같습니다.

    POIFSFileSystem poifs = new POIFSFileSystem(fis);
    InputStream din = poifs.createDocumentInputStream("Workbook");
    try
    {
        HSSFRequest req = new HSSFRequest();
        listener = new FormatTrackingHSSFListener(new HSSFListener() {
            @Override
            public void processRecord(Record rec)
            {
                thisString = null;
                int sid = rec.getSid();
                switch (sid)
                {
                    case SSTRecord.sid:
                        strTable = (SSTRecord) rec;
                        break;
                    case LabelSSTRecord.sid:
                        LabelSSTRecord labelSstRec = (LabelSSTRecord) rec;
                        thisString = strTable.getString(labelSstRec
                                .getSSTIndex()).getString();
                        row = labelSstRec.getRow();
                        col = labelSstRec.getColumn();
                        break;
                    case RKRecord.sid:
                        RKRecord rrk = (RKRecord) rec;
                        thisString = "";
                        row = rrk.getRow();
                        col = rrk.getColumn();
                        break;
                    case LabelRecord.sid:
                        LabelRecord lrec = (LabelRecord) rec;
                        thisString = lrec.getValue();
                        row = lrec.getRow();
                        col = lrec.getColumn();
                        break;
                    case BlankRecord.sid:
                        BlankRecord blrec = (BlankRecord) rec;
                        thisString = "";
                        row = blrec.getRow();
                        col = blrec.getColumn();
                        break;
                    case BoolErrRecord.sid:
                        BoolErrRecord berec = (BoolErrRecord) rec;
                        row = berec.getRow();
                        col = berec.getColumn();
                        byte errVal = berec.getErrorValue();
                        thisString = errVal == 0 ? Boolean.toString(berec
                                .getBooleanValue()) : ErrorConstants
                                .getText(errVal);
                        break;
                    case FormulaRecord.sid:
                        FormulaRecord frec = (FormulaRecord) rec;
                        switch (frec.getCachedResultType())
                        {
                            case Cell.CELL_TYPE_NUMERIC:
                                double num = frec.getValue();
                                if (Double.isNaN(num))
                                {
                                    // Formula result is a string
                                    // This is stored in the next record
                                    outputNextStringRecord = true;
                                }
                                else
                                {
                                    thisString = formatNumericValue(frec, num);
                                }
                                break;
                            case Cell.CELL_TYPE_BOOLEAN:
                                thisString = Boolean.toString(frec
                                        .getCachedBooleanValue());
                                break;
                            case Cell.CELL_TYPE_ERROR:
                                thisString = HSSFErrorConstants
                                        .getText(frec.getCachedErrorValue());
                                break;
                            case Cell.CELL_TYPE_STRING:
                                outputNextStringRecord = true;
                                break;
                        }
                        row = frec.getRow();
                        col = frec.getColumn();
                        break;
                    case StringRecord.sid:
                        if (outputNextStringRecord)
                        {
                            // String for formula
                            StringRecord srec = (StringRecord) rec;
                            thisString = srec.getString();
                            outputNextStringRecord = false;
                        }
                        break;
                    case NumberRecord.sid:
                        NumberRecord numRec = (NumberRecord) rec;
                        row = numRec.getRow();
                        col = numRec.getColumn();
                        thisString = formatNumericValue(numRec, numRec
                                .getValue());
                        break;
                    case NoteRecord.sid:
                        NoteRecord noteRec = (NoteRecord) rec;
                        row = noteRec.getRow();
                        col = noteRec.getColumn();
                        thisString = "";
                        break;
                    case EOFRecord.sid:
                        inSheet = false;
                }
                if (thisString != null)
                {
                    // do something with the cell value 
                }
            }
        });
        req.addListenerForAllRecords(listener);
        HSSFEventFactory factory = new HSSFEventFactory();
        factory.processEvents(req, din);

Apache POI를 사용하여 대용량 엑셀 파일을 생성하는 경우 다음 행을 참고하시기 바랍니다.

sheet.autoSizeColumn((short) p);

왜냐하면 이렇게 되면 성능이 저하되기 때문입니다.

또한 수천개의 큰 엑셀파일로 약간의 처리를 했는데 POI가 매우 빠르다고 생각합니다.엑셀 파일을 로딩하는 데도 엑셀 자체에서 1분 정도 걸렸습니다.그래서 문제가 POI 코드를 벗어났음을 확인하고자 합니다.

저는 poi-beta3에 소개된 streaming hssf를 사용하려고 합니다.이를 통해 1000개 이상의 열이 있는 대형 스프레드시트의 메모리 문제를 해결할 수 있었습니다.

Apache POI를 사용하여 대용량 엑셀 파일을 생성하는 경우 시트를 참고하시기 바랍니다.autoSizeColumn((짧은) p); 이는 성능에 영향을 주기 때문입니다.

http://stanicblog.blogspot.sg/2013/07/generate-large-excel-report-by-using.html

좀 더 상세한 프로파일링을 해보았는데 POI 이외의 코드에 문제가 있는 것 같습니다.저는 이것이 병목현상이라고 생각했을 뿐입니다만, 이것은 잘못된 것이라고 생각합니다.

언급URL : https://stackoverflow.com/questions/5992536/apache-poi-java-excel-performance-for-large-spreadsheets

'source' 카테고리의 다른 글

@Service 클래스의 스프링 부트 캐싱이 작동하지 않음 (0)	2023.10.29
중첩 함수 구현 (0)	2023.10.29
Oracle SQL*Plus를 시작하는 동안 TNS Protocol 어댑터 오류가 발생했습니다. (0)	2023.10.29
"중복 키 업데이트 시 [...] 삽입" 문에 대한 권한이 없지만 "삽입" 및 "업데이트"를 개별적으로 수행하기에 충분합니다. (0)	2023.10.29
워드프레스의 빈 광고를 보여주는 Google AdSense (0)	2023.10.29

현재글대용량 스프레드시트용 Apache POI Java Excel 성능

각종 프로그래밍 정보를 다루는 블로그입니다.

Wordpress, ASP.NET, Python, WPF, mariaDB, C, JQuery, mysql, javascript, AngularJS, java, AJAX, JSON, git, oracle, PHP, sql-server, Excel, spring-boot, reactjs,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

lovecheck

대용량 스프레드시트용 Apache POI Java Excel 성능

대용량 스프레드시트용 Apache POI Java Excel 성능

'source' 카테고리의 다른 글

'source'의 다른글

티스토리툴바

대용량 스프레드시트용 Apache POI Java Excel 성능

대용량 스프레드시트용 Apache POI Java Excel 성능

'source' 카테고리의 다른 글

'source'의 다른글

관련글

티스토리툴바