com.itextpdf.text.pdf.parser.TextExtractionStrategy Java Examples

The following examples show how to use com.itextpdf.text.pdf.parser.TextExtractionStrategy. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.

Example #1

Source File: TextExtraction.java From testarea-itext5 with GNU Affero General Public License v3.0

6 votes

<E extends TextExtractionStrategy> String extractAndStore(PdfReader reader, String format, Class<E> strategyClass, RenderFilter... filters) throws Exception
{
    StringBuilder builder = new StringBuilder();

    for (int page = 1; page <= reader.getNumberOfPages(); page++)
    {
        TextExtractionStrategy strategy = strategyClass.getConstructor().newInstance();
        if (filters != null && filters.length > 0)
        {
            strategy = new FilteredTextRenderListener(strategy, filters);
        }
        String pageText = extract(reader, page, strategy);
        Files.write(Paths.get(String.format(format, page)), pageText.getBytes("UTF8"));

        if (page > 1)
            builder.append("\n\n");
        builder.append(pageText);
    }

    return builder.toString();
}

Example #2

Source File: PDF2WordExample.java From tutorials with MIT License

6 votes

private static void generateDocFromPDF(String filename) throws IOException {
	XWPFDocument doc = new XWPFDocument();

	String pdf = filename;
	PdfReader reader = new PdfReader(pdf);
	PdfReaderContentParser parser = new PdfReaderContentParser(reader);

	for (int i = 1; i <= reader.getNumberOfPages(); i++) {
		TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
		String text = strategy.getResultantText();
		XWPFParagraph p = doc.createParagraph();
		XWPFRun run = p.createRun();
		run.setText(text);
		run.addBreak(BreakType.PAGE);
	}
	FileOutputStream out = new FileOutputStream("src/output/pdf.docx");
	doc.write(out);
	out.close();
	reader.close();
	doc.close();
}

Example #3

Source File: OfficeUtils.java From dk-fitting with Apache License 2.0

5 votes

public static String itextPdf2Txt(String filePath) throws Exception {
        PdfReader reader = new PdfReader(filePath);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        StringBuffer buff = new StringBuffer();
        TextExtractionStrategy strategy;
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
             strategy = parser.processContent(i,
                    new SimpleTextExtractionStrategy());
             buff.append(strategy.getResultantText());
          }
//        String res = new String(buff.toString().getBytes("utf-8"), "utf-8");
        return buff.toString();
    }

Example #4

Source File: RemappingExtractionFilter.java From testarea-itext5 with GNU Affero General Public License v3.0

4 votes

public RemappingExtractionFilter(TextExtractionStrategy strategy) throws NoSuchFieldException, SecurityException
{
    this.strategy = strategy;
    this.stringField = TextRenderInfo.class.getDeclaredField("text");
    this.stringField.setAccessible(true);
}

Example #5

Source File: TextExtraction.java From testarea-itext5 with GNU Affero General Public License v3.0

4 votes

String extract(PdfReader reader, int pageNo, TextExtractionStrategy strategy) throws IOException
{
    return PdfTextExtractor.getTextFromPage(reader, pageNo, strategy);
}

Example #6

Source File: RemappedExtraction.java From testarea-itext5 with GNU Affero General Public License v3.0

4 votes

String extractRemapped(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
    TextExtractionStrategy strategy = new RemappingExtractionFilter(new LocationTextExtractionStrategy());
    return PdfTextExtractor.getTextFromPage(reader, pageNo, strategy);
}