org.apache.poi.POITextExtractor Java Examples

The following examples show how to use org.apache.poi.POITextExtractor. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: OLE2ExtractorFactory.java    From lams with GNU General Public License v2.0 6 votes vote down vote up
public static POITextExtractor createExtractor(InputStream input) throws IOException {
    Class<?> cls = getOOXMLClass();
    if (cls != null) {
        // Use Reflection to get us the full OOXML-enabled version
        try {
            Method m = cls.getDeclaredMethod("createExtractor", InputStream.class);
            return (POITextExtractor)m.invoke(null, input);
        } catch (IllegalArgumentException iae) {
            throw iae;
        } catch (Exception e) {
            throw new IllegalArgumentException("Error creating Extractor for InputStream", e);
        }
    } else {
        // Best hope it's OLE2....
        return createExtractor(new NPOIFSFileSystem(input));
    }
}
 
Example #2
Source File: OLE2ExtractorFactory.java    From lams with GNU General Public License v2.0 5 votes vote down vote up
/**
 * Create the Extractor, if possible. Generally needs the Scratchpad jar.
 * Note that this won't check for embedded OOXML resources either, use
 *  {@link org.apache.poi.extractor.ExtractorFactory} for that.
 */
public static POITextExtractor createExtractor(DirectoryNode poifsDir) throws IOException {
    // Look for certain entries in the stream, to figure it
    // out from
    for (String workbookName : WORKBOOK_DIR_ENTRY_NAMES) {
        if (poifsDir.hasEntry(workbookName)) {
            if (getPreferEventExtractor()) {
                return new EventBasedExcelExtractor(poifsDir);
            }
            return new ExcelExtractor(poifsDir);
        }
    }
    if (poifsDir.hasEntry(OLD_WORKBOOK_DIR_ENTRY_NAME)) {
        throw new OldExcelFormatException("Old Excel Spreadsheet format (1-95) "
                + "found. Please call OldExcelExtractor directly for basic text extraction");
    }
    
    // Ask Scratchpad, or fail trying
    Class<?> cls = getScratchpadClass();
    try {
        Method m = cls.getDeclaredMethod("createExtractor", DirectoryNode.class);
        POITextExtractor ext = (POITextExtractor)m.invoke(null, poifsDir);
        if (ext != null) return ext;
    } catch (IllegalArgumentException iae) {
        throw iae;
    } catch (Exception e) {
        throw new IllegalArgumentException("Error creating Scratchpad Extractor", e);
    }

    throw new IllegalArgumentException("No supported documents found in the OLE2 stream");
}
 
Example #3
Source File: HPSFPropertiesExtractor.java    From lams with GNU General Public License v2.0 4 votes vote down vote up
/**
 * Prevent recursion!
 */
public POITextExtractor getMetadataTextExtractor() {
    throw new IllegalStateException("You already have the Metadata Text Extractor, not recursing!");
}
 
Example #4
Source File: GetTextFromFile.java    From ApiManager with GNU Affero General Public License v3.0 4 votes vote down vote up
private static String getTextFromExcel(String fileName)
		throws InvalidFormatException, IOException, OpenXML4JException, XmlException {
	File inputFile = new File(fileName);
	POITextExtractor extractor = ExtractorFactory.createExtractor(inputFile);
	return extractor.getText();
}