com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy Java Examples

The following examples show how to use com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: PDF2WordExample.java    From tutorials with MIT License 6 votes vote down vote up
private static void generateDocFromPDF(String filename) throws IOException {
	XWPFDocument doc = new XWPFDocument();

	String pdf = filename;
	PdfReader reader = new PdfReader(pdf);
	PdfReaderContentParser parser = new PdfReaderContentParser(reader);

	for (int i = 1; i <= reader.getNumberOfPages(); i++) {
		TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
		String text = strategy.getResultantText();
		XWPFParagraph p = doc.createParagraph();
		XWPFRun run = p.createRun();
		run.setText(text);
		run.addBreak(BreakType.PAGE);
	}
	FileOutputStream out = new FileOutputStream("src/output/pdf.docx");
	doc.write(out);
	out.close();
	reader.close();
	doc.close();
}
 
Example #2
Source File: OfficeUtils.java    From dk-fitting with Apache License 2.0 5 votes vote down vote up
public static String itextPdf2Txt(String filePath) throws Exception {
        PdfReader reader = new PdfReader(filePath);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        StringBuffer buff = new StringBuffer();
        TextExtractionStrategy strategy;
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
             strategy = parser.processContent(i,
                    new SimpleTextExtractionStrategy());
             buff.append(strategy.getResultantText());
          }
//        String res = new String(buff.toString().getBytes("utf-8"), "utf-8");
        return buff.toString();
    }
 
Example #3
Source File: TextExtraction.java    From testarea-itext5 with GNU Affero General Public License v3.0 5 votes vote down vote up
String extractSimple(PdfReader reader, int pageNo) throws IOException
{
    return PdfTextExtractor.getTextFromPage(reader, pageNo, new SimpleTextExtractionStrategy()
    {
        boolean empty = true;

        @Override
        public void beginTextBlock()
        {
            if (!empty)
                appendTextChunk("<BLOCK>");
            super.beginTextBlock();
        }

        @Override
        public void endTextBlock()
        {
            if (!empty)
                appendTextChunk("</BLOCK>\n");
            super.endTextBlock();
        }

        @Override
        public String getResultantText()
        {
            if (empty)
                return super.getResultantText();
            else
                return "<BLOCK>" + super.getResultantText();
        }

        @Override
        public void renderText(TextRenderInfo renderInfo)
        {
            empty = false;
            super.renderText(renderInfo);
        }
        
    });
}