Java Code Examples for org.apache.pdfbox.pdfparser.PDFStreamParser#getTokens()

The following examples show how to use org.apache.pdfbox.pdfparser.PDFStreamParser#getTokens() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: PdfContentTypeChecker.java    From tika-server with Apache License 2.0 5 votes vote down vote up
private void calculateTextObjectsOnPage(PDPage page) throws IOException {
    PDFStreamParser parser = new PDFStreamParser(page);
    parser.parse();
    List<Object> pageTokens = parser.getTokens();
    for (Object token : pageTokens) {
        if (token instanceof Operator) {
            String opName = ((Operator) token).getName();
            if (opName.equals("BT")) // Begin Text
                textBlocks++;
        }
    }
}
 
Example 2
Source File: AppearanceGeneratorHelper.java    From gcs with Mozilla Public License 2.0 5 votes vote down vote up
/**
 * Parses an appearance stream into tokens.
 */
private List<Object> tokenize(PDAppearanceStream appearanceStream) throws IOException
{
    PDFStreamParser parser = new PDFStreamParser(appearanceStream);
    parser.parse();
    return parser.getTokens();
}
 
Example 3
Source File: NurminenDetectionAlgorithm.java    From tabula-java with MIT License 5 votes vote down vote up
private PDDocument removeText(PDPage page) throws IOException {

        PDFStreamParser parser = new PDFStreamParser(page);
        parser.parse();
        List<Object> tokens = parser.getTokens();
        List<Object> newTokens = new ArrayList<>();
        for (Object token : tokens) {
            if (token instanceof Operator) {
                Operator op = (Operator) token;
                if (op.getName().equals("TJ") || op.getName().equals("Tj")) {
                    //remove the one argument to this operator
                    newTokens.remove(newTokens.size() - 1);
                    continue;
                }
            }
            newTokens.add(token);
        }

        PDDocument document = new PDDocument();
        PDPage newPage = document.importPage(page);
        newPage.setResources(page.getResources());

        PDStream newContents = new PDStream(document);
        OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
        ContentStreamWriter writer = new ContentStreamWriter(out);
        writer.writeTokens(newTokens);
        out.close();
        newPage.setContents(newContents);
        return document;
    }