com.itextpdf.text.pdf.parser.TextExtractionStrategy Java Examples

The following examples show how to use com.itextpdf.text.pdf.parser.TextExtractionStrategy. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: TextExtraction.java    From testarea-itext5 with GNU Affero General Public License v3.0 6 votes vote down vote up
<E extends TextExtractionStrategy> String extractAndStore(PdfReader reader, String format, Class<E> strategyClass, RenderFilter... filters) throws Exception
{
    StringBuilder builder = new StringBuilder();

    for (int page = 1; page <= reader.getNumberOfPages(); page++)
    {
        TextExtractionStrategy strategy = strategyClass.getConstructor().newInstance();
        if (filters != null && filters.length > 0)
        {
            strategy = new FilteredTextRenderListener(strategy, filters);
        }
        String pageText = extract(reader, page, strategy);
        Files.write(Paths.get(String.format(format, page)), pageText.getBytes("UTF8"));

        if (page > 1)
            builder.append("\n\n");
        builder.append(pageText);
    }

    return builder.toString();
}
 
Example #2
Source File: PDF2WordExample.java    From tutorials with MIT License 6 votes vote down vote up
private static void generateDocFromPDF(String filename) throws IOException {
	XWPFDocument doc = new XWPFDocument();

	String pdf = filename;
	PdfReader reader = new PdfReader(pdf);
	PdfReaderContentParser parser = new PdfReaderContentParser(reader);

	for (int i = 1; i <= reader.getNumberOfPages(); i++) {
		TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
		String text = strategy.getResultantText();
		XWPFParagraph p = doc.createParagraph();
		XWPFRun run = p.createRun();
		run.setText(text);
		run.addBreak(BreakType.PAGE);
	}
	FileOutputStream out = new FileOutputStream("src/output/pdf.docx");
	doc.write(out);
	out.close();
	reader.close();
	doc.close();
}
 
Example #3
Source File: OfficeUtils.java    From dk-fitting with Apache License 2.0 5 votes vote down vote up
public static String itextPdf2Txt(String filePath) throws Exception {
        PdfReader reader = new PdfReader(filePath);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        StringBuffer buff = new StringBuffer();
        TextExtractionStrategy strategy;
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
             strategy = parser.processContent(i,
                    new SimpleTextExtractionStrategy());
             buff.append(strategy.getResultantText());
          }
//        String res = new String(buff.toString().getBytes("utf-8"), "utf-8");
        return buff.toString();
    }
 
Example #4
Source File: RemappingExtractionFilter.java    From testarea-itext5 with GNU Affero General Public License v3.0 4 votes vote down vote up
public RemappingExtractionFilter(TextExtractionStrategy strategy) throws NoSuchFieldException, SecurityException
{
    this.strategy = strategy;
    this.stringField = TextRenderInfo.class.getDeclaredField("text");
    this.stringField.setAccessible(true);
}
 
Example #5
Source File: TextExtraction.java    From testarea-itext5 with GNU Affero General Public License v3.0 4 votes vote down vote up
String extract(PdfReader reader, int pageNo, TextExtractionStrategy strategy) throws IOException
{
    return PdfTextExtractor.getTextFromPage(reader, pageNo, strategy);
}
 
Example #6
Source File: RemappedExtraction.java    From testarea-itext5 with GNU Affero General Public License v3.0 4 votes vote down vote up
String extractRemapped(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
    TextExtractionStrategy strategy = new RemappingExtractionFilter(new LocationTextExtractionStrategy());
    return PdfTextExtractor.getTextFromPage(reader, pageNo, strategy);
}