org.apache.poi.extractor.ExtractorFactory Java Examples

The following examples show how to use org.apache.poi.extractor.ExtractorFactory. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.

Example #1

Source File: FileDocumentFactory.java From olat with Apache License 2.0

6 votes

FileDocumentFactory(final SearchModule searchModule, MimeTypeProvider mimeTypeProvider) {
    fileBlackList = searchModule.getFileBlackList();
    pptFileEnabled = searchModule.isPptFileEnabled();
    if (!pptFileEnabled) {
        log.info("PPT files are disabled in indexer.");
    }
    excelFileEnabled = searchModule.isExcelFileEnabled();
    if (!excelFileEnabled) {
        log.info("Excel files are disabled in indexer.");
    }
    checkFileSizeSuffixes = searchModule.getFileSizeSuffixes();
    maxFileSize = searchModule.getMaxFileSize();

    FileDocumentFactory.mimeTypeProvider = mimeTypeProvider;
    // there are two ways of how text extraction for MS Open XML documents (Office >= 2003) is handled technically:
    // model based or event based (similar to DOM/SAX parsing of XML)
    // for complex Excel files model based text extraction leads to intolerably long processing times!
    // therefore we switched to event based text extraction (despite extraction for headers/footers
    // is not implemented for this extraction method).
    ExtractorFactory.setAllThreadsPreferEventExtractors(true);
}

Example #2

Source File: GetTextFromFile.java From ApiManager with GNU Affero General Public License v3.0

4 votes

private static String getTextFromExcel(String fileName)
		throws InvalidFormatException, IOException, OpenXML4JException, XmlException {
	File inputFile = new File(fileName);
	POITextExtractor extractor = ExtractorFactory.createExtractor(inputFile);
	return extractor.getText();
}