Java Code Examples for org.apache.poi.xwpf.extractor.XWPFWordExtractor#getText()

The following examples show how to use org.apache.poi.xwpf.extractor.XWPFWordExtractor#getText() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: FileBeanParser.java    From everywhere with Apache License 2.0 6 votes vote down vote up
private static String readDoc (String filePath, InputStream is) throws Exception {
    String text= "";
    is = FileMagic.prepareToCheckMagic(is);
    try {
        if (FileMagic.valueOf(is) == FileMagic.OLE2) {
            WordExtractor ex = new WordExtractor(is);
            text = ex.getText();
            ex.close();
        } else if(FileMagic.valueOf(is) == FileMagic.OOXML) {
            XWPFDocument doc = new XWPFDocument(is);
            XWPFWordExtractor extractor = new XWPFWordExtractor(doc);
            text = extractor.getText();
            extractor.close();
        }
    } catch (OfficeXmlFileException e) {
        logger.error(filePath, e);
    } finally {
        if (is != null) {
            is.close();
        }
    }
    return text;
}
 
Example 2
Source File: IndexerTextExtractor.java    From eplmp with Eclipse Public License 1.0 6 votes vote down vote up
private String microsoftWordDocumentToString(InputStream inputStream) throws IOException {
    String strRet;

    try (InputStream wordStream = new BufferedInputStream(inputStream)) {
        if (POIFSFileSystem.hasPOIFSHeader(wordStream)) {
            WordExtractor wordExtractor = new WordExtractor(wordStream);
            strRet = wordExtractor.getText();
            wordExtractor.close();
        } else {
            XWPFWordExtractor wordXExtractor = new XWPFWordExtractor(new XWPFDocument(wordStream));
            strRet = wordXExtractor.getText();
            wordXExtractor.close();
        }
    }

    return strRet;
}
 
Example 3
Source File: OOXMLWordFormatModule.java    From ontopia with Apache License 2.0 5 votes vote down vote up
@Override
public void readContent(ClassifiableContentIF cc, TextHandlerIF handler) {
  try {
    OPCPackage opc = OPCPackage.open(new ByteArrayInputStream(cc.getContent()));
    XWPFWordExtractor extractor = new XWPFWordExtractor(opc);
    String s = extractor.getText();
    char[] c = s.toCharArray();
    handler.startRegion("document");
    handler.text(c, 0, c.length);
    handler.endRegion();
  } catch (Exception e) {
    throw new OntopiaRuntimeException(e);
  }    
}
 
Example 4
Source File: MSOfficeBox.java    From wandora with GNU General Public License v3.0 4 votes vote down vote up
public static String getDocxText(File file) {
    try {
        XWPFDocument docx = new XWPFDocument(new FileInputStream(file));
        XWPFWordExtractor extractor = new XWPFWordExtractor(docx);
        String text = extractor.getText();
        return text;
    }
    catch(Exception e) {
        e.printStackTrace();
    }
    return null;
}