Java Code Examples for org.apache.pdfbox.pdmodel.PDDocument#getPage()

The following examples show how to use org.apache.pdfbox.pdmodel.PDDocument#getPage() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: TableDrawer.java    From easytable with MIT License 6 votes vote down vote up
public void draw(Supplier<PDDocument> documentSupplier, Supplier<PDPage> pageSupplier, float yOffset) throws IOException {
    final PDDocument document = documentSupplier.get();

    // We create one throwaway page to be able to calculate the page data upfront
    float startOnNewPage = pageSupplier.get().getMediaBox().getHeight() - yOffset;
    final Queue<PageData> pageDataQueue = computeRowsOnPagesWithNewPageStartOf(startOnNewPage);

    for (int i = 0; !pageDataQueue.isEmpty(); i++) {
        final PDPage pageToDrawOn;

        if (i > 0 || document.getNumberOfPages() == 0) {
            pageToDrawOn = pageSupplier.get();
            document.addPage(pageToDrawOn);
        } else {
            pageToDrawOn = document.getPage(document.getNumberOfPages() - 1);
        }

        try (final PDPageContentStream newPageContentStream = new PDPageContentStream(document, pageToDrawOn, APPEND, false)) {
            this.contentStream(newPageContentStream)
                    .page(pageToDrawOn)
                    .drawPage(pageDataQueue.poll());
        }

        startY(pageToDrawOn.getMediaBox().getHeight() - yOffset);
    }
}
 
Example 2
Source File: TestClipPathFinder.java    From testarea-pdfbox2 with Apache License 2.0 6 votes vote down vote up
/**
 * <a href="http://stackoverflow.com/questions/28321374/how-to-get-page-content-height-using-pdfbox">
 * How to get page content height using pdfbox
 * </a>
 * <br/>
 * <a href="http://d.pr/f/137PF">
 * test-pdf4.pdf
 * </a>
 * <br/>
 * <a href="http://d.pr/f/15uBF">
 * test-pdf5.pdf
 * </a>
 * <p>
 * The clip paths found here correspond to the Illustrator compound elements.
 * </p>
 */
@Test
public void testTestPdf5() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("test-pdf5.pdf"))
    {
        System.out.println("test-pdf5.pdf");
        PDDocument document = Loader.loadPDF(resource);
        PDPage page = document.getPage(0);
        ClipPathFinder finder = new ClipPathFinder(page);
        finder.findClipPaths();
        
        for (Path path : finder)
        {
            System.out.println(path);
        }
        
        document.close();
    }
}
 
Example 3
Source File: AddLink.java    From testarea-pdfbox2 with Apache License 2.0 6 votes vote down vote up
/**
 * <a href="https://stackoverflow.com/questions/54986135/how-to-use-pdfbox-to-create-a-link-i-can-click-to-go-to-another-page-in-the-same">
 * How to use PDFBox to create a link i can click to go to another page in the same document
 * </a>
 * <p>
 * The OP used destination.setPageNumber which is not ok for local
 * links. Furthermore, he forgot to add the link to the page and
 * to give it a rectangle.
 * </p>
 */
@Test
public void testAddLinkToMwb_I_201711() throws IOException {
    try (   InputStream resource = getClass().getResourceAsStream("/mkl/testarea/pdfbox2/content/mwb_I_201711.pdf")) {
        PDDocument document = Loader.loadPDF(resource);

        PDPage page = document.getPage(1);

        PDAnnotationLink link         = new PDAnnotationLink();
        PDPageDestination destination = new PDPageFitWidthDestination();
        PDActionGoTo action           = new PDActionGoTo();

        //destination.setPageNumber(2);
        destination.setPage(document.getPage(2));
        action.setDestination(destination);
        link.setAction(action);
        link.setPage(page);

        link.setRectangle(page.getMediaBox());
        page.getAnnotations().add(link);

        document.save(new File(RESULT_FOLDER, "mwb_I_201711-with-link.pdf"));
    }
}
 
Example 4
Source File: TestClipPathFinder.java    From testarea-pdfbox2 with Apache License 2.0 6 votes vote down vote up
/**
 * <a href="http://stackoverflow.com/questions/28321374/how-to-get-page-content-height-using-pdfbox">
 * How to get page content height using pdfbox
 * </a>
 * <br/>
 * <a href="http://d.pr/f/137PF">
 * test-pdf4.pdf
 * </a>
 * <br/>
 * <a href="http://d.pr/f/15uBF">
 * test-pdf5.pdf
 * </a>
 * <p>
 * The clip paths found here correspond to the Illustrator compound elements.
 * </p>
 */
@Test
public void testTestPdf4() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("test-pdf4.pdf"))
    {
        System.out.println("test-pdf4.pdf");
        PDDocument document = Loader.loadPDF(resource);
        PDPage page = document.getPage(0);
        ClipPathFinder finder = new ClipPathFinder(page);
        finder.findClipPaths();
        
        for (Path path : finder)
        {
            System.out.println(path);
        }
        
        document.close();
    }
}
 
Example 5
Source File: SetCropBox.java    From testarea-pdfbox2 with Apache License 2.0 6 votes vote down vote up
/**
 * <a href="http://stackoverflow.com/questions/39689004/pdfbox-2-0-3-set-cropbox-using-textposition-coordinates">
 * PDFBox 2.0.3 Set cropBox using TextPosition coordinates
 * </a>
 * <br/>
 * <a href="http://downloadcenter.samsung.com/content/UM/201504/20150407095631744/ENG-US_NMATSCJ-1.103-0330.pdf">
 * ENG-US_NMATSCJ-1.103-0330.pdf
 * </a>
 * <p>
 * This test shows how to set the crop box on page twelve and render the cropped page as image.
 * </p>
 */
@Test
public void testSetCropBoxImgENG_US_NMATSCJ_1_103_0330() throws IOException
{
    try (   InputStream resource = getClass().getResourceAsStream("ENG-US_NMATSCJ-1.103-0330.pdf"))
    {
        PDDocument pdDocument = Loader.loadPDF(resource);
        PDPage page = pdDocument.getPage(12-1);
        page.setCropBox(new PDRectangle(40f, 680f, 510f, 100f));

        PDFRenderer renderer = new PDFRenderer(pdDocument);
        BufferedImage img = renderer.renderImage(12 - 1, 4f);
        ImageIOUtil.writeImage(img, new File(RESULT_FOLDER, "ENG-US_NMATSCJ-1.103-0330-page12cropped.jpg").getAbsolutePath(), 300);
        pdDocument.close();
    }
}
 
Example 6
Source File: MCRPdfThumbnailGenerator.java    From mycore with GNU General Public License v3.0 6 votes vote down vote up
private PDPage resolveOpenActionPage(PDDocument pdf) throws IOException {
    PDDestinationOrAction openAction = pdf.getDocumentCatalog().getOpenAction();

    if( openAction instanceof PDActionGoTo){
        final PDDestination destination = ((PDActionGoTo) openAction).getDestination();
        if(destination instanceof PDPageDestination) {
            openAction = destination;
        }
    }

    if (openAction instanceof PDPageDestination) {
        final PDPageDestination namedDestination = (PDPageDestination) openAction;
        final PDPage pdPage = namedDestination.getPage();
        if (pdPage != null) {
            return pdPage;
        } else {
            int pageNumber = namedDestination.getPageNumber();
            if (pageNumber != -1) {
                return pdf.getPage(pageNumber);
            }
        }
    }

    return pdf.getPage(0);
}
 
Example 7
Source File: MCRPDFTools.java    From mycore with GNU General Public License v3.0 6 votes vote down vote up
/**
 *
 * @param pdf - the pdf document
 * @return
 * @throws IOException
 *
 * @see org.mycore.media.services.MCRPdfThumbnailGenerator
 */
private static PDPage resolveOpenActionPage(PDDocument pdf) throws IOException {
    PDDestinationOrAction openAction = pdf.getDocumentCatalog().getOpenAction();

    if (openAction instanceof PDActionGoTo) {
        final PDDestination destination = ((PDActionGoTo) openAction).getDestination();
        if (destination instanceof PDPageDestination) {
            openAction = destination;
        }
    }

    if (openAction instanceof PDPageDestination) {
        final PDPageDestination namedDestination = (PDPageDestination) openAction;
        final PDPage pdPage = namedDestination.getPage();
        if (pdPage != null) {
            return pdPage;
        } else {
            int pageNumber = namedDestination.getPageNumber();
            if (pageNumber != -1) {
                return pdf.getPage(pageNumber);
            }
        }
    }

    return pdf.getPage(0);
}
 
Example 8
Source File: ExtractText.java    From testarea-pdfbox2 with Apache License 2.0 6 votes vote down vote up
/**
 * <a href="https://stackoverflow.com/questions/45895768/pdfbox-2-0-7-extracttext-not-working-but-1-8-13-does-and-pdfreader-as-well">
 * PDFBox 2.0.7 ExtractText not working but 1.8.13 does and PDFReader as well
 * </a>
 * <br/>
 * <a href="https://wetransfer.com/downloads/214674449c23713ee481c5a8f529418320170827201941/b2bea6">
 * test-2.pdf
 * </a>
 * <p>
 * Due to the broken <b>ToUnicode</b> maps the output of immediate text
 * extraction from this document is unsatisfying, cf. {@link #testTest2()}.
 * It can be improved by removing these <b>ToUnicode</b> maps as this test
 * shows.
 * </p>
 */
@Test
public void testNoToUnicodeTest2() throws IOException
{
    try (   InputStream resource = getClass().getResourceAsStream("test-2.pdf")    )
    {
        PDDocument document = Loader.loadPDF(resource);

        for (int pageNr = 0; pageNr < document.getNumberOfPages(); pageNr++)
        {
            PDPage page = document.getPage(pageNr);
            PDResources resources = page.getResources();
            removeToUnicodeMaps(resources);
        }

        PDFTextStripper stripper = new PDFTextStripper();
        String text = stripper.getText(document);

        System.out.printf("\n*\n* test-2.pdf without ToUnicode\n*\n%s\n", text);
        Files.write(new File(RESULT_FOLDER, "test-2_NoToUnicode.txt").toPath(), Collections.singleton(text));
    }
}
 
Example 9
Source File: PdfComparator.java    From pdfcompare with Apache License 2.0 5 votes vote down vote up
public static ImageWithDimension renderPageAsImage(final PDDocument document, final PDFRenderer expectedPdfRenderer, final int pageIndex, Environment environment)
        throws IOException {
    final BufferedImage bufferedImage = expectedPdfRenderer.renderImageWithDPI(pageIndex, environment.getDPI());
    final PDPage page = document.getPage(pageIndex);
    final PDRectangle mediaBox = page.getMediaBox();
    if (page.getRotation() == 90 || page.getRotation() == 270)
        return new ImageWithDimension(bufferedImage, mediaBox.getHeight(), mediaBox.getWidth());
    else
        return new ImageWithDimension(bufferedImage, mediaBox.getWidth(), mediaBox.getHeight());
}
 
Example 10
Source File: PDVisibleSignDesigner.java    From gcs with Mozilla Public License 2.0 5 votes vote down vote up
/**
 * Each page of document can be different sizes. This method calculates the page size based on
 * the page media box.
 * 
 * @param document
 * @param page The 1-based page number for which the page size should be calculated.
 * @throws IllegalArgumentException if the page argument is lower than 0.
 */
private void calculatePageSize(PDDocument document, int page)
{
    if (page < 1)
    {
        throw new IllegalArgumentException("First page of pdf is 1, not " + page);
    }

    PDPage firstPage = document.getPage(page - 1);
    PDRectangle mediaBox = firstPage.getMediaBox();
    pageHeight(mediaBox.getHeight());
    pageWidth = mediaBox.getWidth();
    imageSizeInPercents = 100;
    rotation = firstPage.getRotation() % 360;
}
 
Example 11
Source File: AddFormField.java    From testarea-pdfbox2 with Apache License 2.0 5 votes vote down vote up
/**
     * <a href="https://stackoverflow.com/questions/46433388/pdfbox-could-not-find-font-helv">
     * PDFbox Could not find font: /Helv
     * </a>
     * <br/>
     * <a href="https://drive.google.com/file/d/0B2--NSDOiujoR3hOZFYteUl2UE0/view?usp=sharing">
     * 4.pdf
     * </a>
     * <p>
     * The cause is a combination of the OP and the source PDF not providing
     * a default appearance for the text field and PDFBox providing defaults
     * inconsequentially.
     * </p>
     * <p>
     * This is fixed here by setting the default appearance explicitly.
     * </p>
     */
    @Test
    public void testAddFieldLikeEugenePodoliako() throws IOException {
        try (   InputStream originalStream = getClass().getResourceAsStream("4.pdf") )
        {
            PDDocument pdf = Loader.loadPDF(originalStream);
            PDDocumentCatalog docCatalog = pdf.getDocumentCatalog();
            PDAcroForm acroForm = docCatalog.getAcroForm();
            PDPage page = pdf.getPage(0);

            PDTextField textBox = new PDTextField(acroForm);
            textBox.setPartialName("SampleField");
            acroForm.getFields().add(textBox);
            PDAnnotationWidget widget = textBox.getWidgets().get(0);
            PDRectangle rect = new PDRectangle(0, 0, 0, 0);
            widget.setRectangle(rect);
            widget.setPage(page);
//  Unnecessary code from OP
//            widget.setAppearance(acroForm.getFields().get(0).getWidgets().get(0).getAppearance());
//  Fix added to set default appearance accordingly
            textBox.setDefaultAppearance(acroForm.getFields().get(0).getCOSObject().getString("DA"));

            widget.setPrinted(false);

            page.getAnnotations().add(widget);

            acroForm.refreshAppearances();
            acroForm.flatten();
            pdf.save(new File(RESULT_FOLDER, "4-add-field.pdf"));
            pdf.close();
        }
    }
 
Example 12
Source File: PDFCreator.java    From Knowage-Server with GNU Affero General Public License v3.0 5 votes vote down vote up
private static void writePageNumbering(PDDocument doc, PDFont font, float fontSize, PageNumbering pageNumbering) throws IOException {
	int totalPages = doc.getNumberOfPages();
	int numberOfPages = pageNumbering.isLastIncluded() ? doc.getNumberOfPages() : doc.getNumberOfPages() - 1;
	for (int pageIndex = pageNumbering.isFirstIncluded() ? 0 : 1; pageIndex < numberOfPages; pageIndex++) {
		String footer = "Page " + (pageIndex + 1) + " of " + totalPages;
		PDPage page = doc.getPage(pageIndex);
		PDRectangle pageSize = page.getMediaBox();
		float stringWidth = font.getStringWidth(footer) * fontSize / 1000f;
		float stringHeight = font.getFontDescriptor().getFontBoundingBox().getHeight() * fontSize / 1000f;

		int rotation = page.getRotation();
		boolean rotate = rotation == 90 || rotation == 270;
		float pageWidth = rotate ? pageSize.getHeight() : pageSize.getWidth();
		float pageHeight = rotate ? pageSize.getWidth() : pageSize.getHeight();
		float startX = rotate ? pageHeight / 2f : (pageWidth - stringWidth - stringHeight) / 2f;
		float startY = rotate ? (pageWidth - stringWidth) : stringHeight;

		// append the content to the existing stream
		try (PDPageContentStream contentStream = new PDPageContentStream(doc, page, AppendMode.APPEND, true, true)) {

			// draw rectangle
			contentStream.setNonStrokingColor(255, 255, 255); // gray background
			// Draw a white filled rectangle
			drawRect(contentStream, Color.WHITE, new java.awt.Rectangle((int) startX, (int) startY - 3, (int) stringWidth + 2, (int) stringHeight), true);
			writeText(contentStream, new Color(4, 44, 86), font, fontSize, rotate, startX, startY, footer);
		}
	}
}
 
Example 13
Source File: AddImageSaveIncremental.java    From testarea-pdfbox2 with Apache License 2.0 5 votes vote down vote up
/** @see #testAddImagesLikeUser11465050Improved() */
void addImageLikeUser11465050Improved(PDDocument document, PDImageXObject image) throws IOException {
    PDPage page = document.getPage(0);
    PDRectangle pageSize = page.getMediaBox();
    PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true, true);
    contentStream.drawImage(image, pageSize.getLowerLeftX(), pageSize.getLowerLeftY(), pageSize.getWidth(), pageSize.getHeight());
    contentStream.close();

    page.getCOSObject().setNeedToBeUpdated(true);
    page.getResources().getCOSObject().setNeedToBeUpdated(true);
    page.getResources().getCOSObject().getCOSDictionary(COSName.XOBJECT).setNeedToBeUpdated(true);
    document.getDocumentCatalog().getPages().getCOSObject().setNeedToBeUpdated(true);
    document.getDocumentCatalog().getCOSObject().setNeedToBeUpdated(true);
}
 
Example 14
Source File: AddImage.java    From testarea-pdfbox2 with Apache License 2.0 5 votes vote down vote up
/**
 * <a href="https://stackoverflow.com/questions/49958604/draw-image-at-mid-position-using-pdfbox-java">
 * Draw image at mid position using pdfbox Java
 * </a>
 * <p>
 * This is the OP's original code. It mirrors the image.
 * This can be fixed as shown in {@link #testImageAppendNoMirror()}.
 * </p>
 */
@Test
public void testImageAppendLikeShanky() throws IOException {
    try (   InputStream resource = getClass().getResourceAsStream("/mkl/testarea/pdfbox2/sign/test.pdf");
            InputStream imageResource = getClass().getResourceAsStream("Willi-1.jpg")   )
    {
        PDDocument doc = Loader.loadPDF(resource);
        PDImageXObject pdImage = PDImageXObject.createFromByteArray(doc, ByteStreams.toByteArray(imageResource), "Willi");

        int w = pdImage.getWidth();
        int h = pdImage.getHeight();

        PDPage page = doc.getPage(0);
        PDPageContentStream contentStream = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true);

        float x_pos = page.getCropBox().getWidth();
        float y_pos = page.getCropBox().getHeight();

        float x_adjusted = ( x_pos - w ) / 2;
        float y_adjusted = ( y_pos - h ) / 2;

        Matrix mt = new Matrix(1f, 0f, 0f, -1f, page.getCropBox().getLowerLeftX(), page.getCropBox().getUpperRightY());
        contentStream.transform(mt);
        contentStream.drawImage(pdImage, x_adjusted, y_adjusted, w, h);
        contentStream.close();

        doc.save(new File(RESULT_FOLDER, "test-with-image-shanky.pdf"));
        doc.close();

    }
}
 
Example 15
Source File: DenseMerging.java    From testarea-pdfbox2 with Apache License 2.0 4 votes vote down vote up
/**
 * <a href="https://stackoverflow.com/questions/60052967/how-to-dense-merge-pdf-files-using-pdfbox-2-without-whitespace-near-page-breaks">
 * How to dense merge PDF files using PDFBox 2 without whitespace near page breaks?
 * </a>
 * <p>
 * This test checks the {@link PdfVeryDenseMergeTool} which allows
 * a very dense merging of multiple input PDFs.
 * </p>
 * <p>
 * Beware, as mentioned in the {@link PageVerticalAnalyzer} comments,
 * the processing in particular of curves is incorrect. The curve
 * used in this test is chosen not to create wrong results due to
 * this known issue.
 * </p>
 */
@Test
public void testVeryDenseMerging() throws IOException {
    PDDocument document1 = createTextDocument(new PDRectangle(0, 0, 400, 600), 
            Matrix.getTranslateInstance(30, 300),
            "Doc 1 line 1", "Doc 1 line 2", "Doc 1 line 3");
    PDDocument document2 = createTextDocument(new PDRectangle(0, 0, 400, 600), 
            Matrix.getTranslateInstance(40, 400),
            "Doc 2 line 1", "Doc 2 line 2", "Doc 2 line 3");
    PDDocument document3 = createTextDocument(new PDRectangle(0, -300, 400, 600), 
            Matrix.getTranslateInstance(50, -100),
            "Doc 3 line 1", "Doc 3 line 2", "Doc 3 line 3");
    PDDocument document4 = createTextDocument(new PDRectangle(-200, -300, 400, 600), 
            Matrix.getTranslateInstance(-140, -100),
            "Doc 4 line 1", "Doc 4 line 2", "Doc 4 line 3");
    PDDocument document5 = createTextDocument(new PDRectangle(-200, -300, 400, 600), 
            Matrix.getTranslateInstance(-140, -100),
            "Doc 5 line 1", "Doc 5 line 2", "Doc 5 line 3");
    PDDocument document6 = createTextDocument(new PDRectangle(-200, -300, 400, 600), 
            Matrix.getRotateInstance(Math.PI / 4, -120, 0),
            "Doc 6 line 1", "Doc 6 line 2", "Doc 6 line 3");
    try (   PDPageContentStream content = new PDPageContentStream(document6, document6.getPage(0), AppendMode.APPEND, false, true)) {
        content.setStrokingColor(Color.BLACK);
        content.moveTo(40, 40);
        content.lineTo(80, 80);
        content.lineTo(120, 100);
        content.stroke();

        content.moveTo(40, 140);
        content.curveTo(80, 140, 160, 140, 80, 180);
        content.closeAndFillAndStroke();
    }
    document6.save(new File(RESULT_FOLDER, "Test Text and Graphics.pdf"));

    PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(PDRectangle.A4, 30, 30, 10);
    tool.merge(new FileOutputStream(new File(RESULT_FOLDER, "Merge with Text and Graphics, very dense.pdf")),
            Arrays.asList(document1, document2, document3, document4, document5, document6,
                    document1, document2, document3, document4, document5, document6,
                    document1, document2, document3, document4, document5, document6,
                    document1, document2, document3, document4, document5, document6,
                    document1, document2, document3, document4, document5, document6));
}
 
Example 16
Source File: Debug.java    From tabula-java with MIT License 4 votes vote down vote up
public static void renderPage(String pdfPath, String outPath, int pageNumber, Rectangle area,
                              boolean drawTextChunks, boolean drawSpreadsheets, boolean drawRulings, boolean drawIntersections,
                              boolean drawColumns, boolean drawCharacters, boolean drawArea, boolean drawCells,
                              boolean drawUnprocessedRulings, boolean drawProjectionProfile, boolean drawClippingPaths,
                              boolean drawDetectedTables) throws IOException {
    PDDocument document = PDDocument.load(new File(pdfPath));

    ObjectExtractor oe = new ObjectExtractor(document);

    Page page = oe.extract(pageNumber + 1);

    if (area != null) {
        page = page.getArea(area);
    }

    PDPage p = document.getPage(pageNumber);

    BufferedImage image = Utils.pageConvertToImage(p, 72, ImageType.RGB);

    Graphics2D g = (Graphics2D) image.getGraphics();

    if (drawTextChunks) {
        debugTextChunks(g, page);
    }
    if (drawSpreadsheets) {
        debugSpreadsheets(g, page);
    }
    if (drawRulings) {
        debugRulings(g, page);
    }
    if (drawIntersections) {
        debugIntersections(g, page);
    }
    if (drawColumns) {
        debugColumns(g, page);
    }
    if (drawCharacters) {
        debugCharacters(g, page);
    }
    if (drawArea) {
        g.setColor(Color.ORANGE);
        drawShape(g, area);
    }
    if (drawCells) {
        debugCells(g, area, page);
    }
    if (drawUnprocessedRulings) {
        debugNonCleanRulings(g, page);
    }
    if (drawProjectionProfile) {
        debugProjectionProfile(g, page);
    }
    if (drawClippingPaths) {
        // TODO: Enable when oe.clippingPaths is done
        //drawShapes(g, oe.clippingPaths,
        //		new BasicStroke(2f, BasicStroke.CAP_BUTT, BasicStroke.JOIN_MITER, 10f, new float[] { 3f }, 0f));
    }
    if (drawDetectedTables) {
        debugDetectedTables(g, page);
    }

    document.close();

    ImageIO.write(image, "jpg", new File(outPath));
}
 
Example 17
Source File: TestUtils.java    From tabula-java with MIT License 4 votes vote down vote up
@Test
public void testJPEG2000DoesNotRaise() throws IOException {
    PDDocument pdf_document = PDDocument.load(new File("src/test/resources/technology/tabula/jpeg2000.pdf"));
    PDPage page = pdf_document.getPage(0);
    Utils.pageConvertToImage(page, 360, ImageType.RGB);
}
 
Example 18
Source File: RenderType3Character.java    From testarea-pdfbox2 with Apache License 2.0 4 votes vote down vote up
/**
 * <a href="http://stackoverflow.com/questions/42032729/render-type3-font-character-as-image-using-pdfbox">
 * Render Type3 font character as image using PDFBox
 * </a>
 * <br/>
 * <a href="https://drive.google.com/file/d/0B0f6X4SAMh2KRDJTbm4tb3E1a1U/view">
 * 4700198773.pdf
 * </a>
 * from
 * <a href="http://stackoverflow.com/questions/37754112/extract-text-with-custom-font-result-non-readble">
 * extract text with custom font result non readble
 * </a>
 * <p>
 * This test shows how one can render individual Type 3 font glyphs as bitmaps.
 * Unfortunately PDFBox out-of-the-box does not provide a class to render contents
 * of arbitrary XObjects, merely for rendering pages; thus, we simply create a page
 * with the glyph in question and render that page.   
 * </p>
 * <p>
 * As the OP did not provide a sample PDF, we simply use one from another
 * stackoverflow question. There obviously might remain issues with the
 * OP's files.
 * </p>
 */
@Test
public void testRenderSdnList() throws IOException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException
{
    Method PDPageContentStreamWrite = PDPageContentStream.class.getSuperclass().getDeclaredMethod("write", String.class);
    PDPageContentStreamWrite.setAccessible(true);

    try (   InputStream resource = getClass().getResourceAsStream("sdnlist.pdf"))
    {
        PDDocument document = Loader.loadPDF(resource);

        PDPage page = document.getPage(1);
        PDResources pageResources = page.getResources();
        COSName f1Name = COSName.getPDFName("R144");
        PDType3Font fontF1 = (PDType3Font) pageResources.getFont(f1Name);
        Map<String, Integer> f1NameToCode = fontF1.getEncoding().getNameToCodeMap();

        COSDictionary charProcsDictionary = fontF1.getCharProcs();
        for (COSName key : charProcsDictionary.keySet())
        {
            COSStream stream = (COSStream) charProcsDictionary.getDictionaryObject(key);
            PDType3CharProc charProc = new PDType3CharProc(fontF1, stream);
            PDRectangle bbox = charProc.getGlyphBBox();
            if (bbox == null)
                bbox = charProc.getBBox();
            Integer code = f1NameToCode.get(key.getName());

            if (code != null)
            {
                PDDocument charDocument = new PDDocument();
                PDPage charPage = new PDPage(bbox);
                charDocument.addPage(charPage);
                charPage.setResources(pageResources);
                PDPageContentStream charContentStream = new PDPageContentStream(charDocument, charPage);
                charContentStream.beginText();
                charContentStream.setFont(fontF1, bbox.getHeight());
                //charContentStream.getOutputStream().write(String.format("<%2X> Tj\n", code).getBytes());
                PDPageContentStreamWrite.invoke(charContentStream, String.format("<%2X> Tj\n", code));
                charContentStream.endText();
                charContentStream.close();

                File result = new File(RESULT_FOLDER, String.format("sdnlist-%s-%s.png", key.getName(), code));
                PDFRenderer renderer = new PDFRenderer(charDocument);
                BufferedImage image = renderer.renderImageWithDPI(0, 96);
                ImageIO.write(image, "PNG", result);
                charDocument.save(new File(RESULT_FOLDER, String.format("sdnlist-%s-%s.pdf", key.getName(), code)));
                charDocument.close();
            }
        }
    }
}
 
Example 19
Source File: TestGraphicsCounter.java    From testarea-pdfbox2 with Apache License 2.0 4 votes vote down vote up
/**
 * <a href="http://stackoverflow.com/questions/28321374/how-to-get-page-content-height-using-pdfbox">
 * How to get page content height using pdfbox
 * </a>
 * <br/>
 * <a href="https://drive.google.com/file/d/0B65bQnJhC1mvbEVQQ0o0QU9STlU/view?usp=sharing">
 * test.pdf
 * </a>, here as <code>test-rivu.pdf</code>
 * <p>
 * Rivu's code from a comment to count lines etc.
 * </p>
 */
@Test
public void testCountTestLikeRivu() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("test-rivu.pdf"))
    {
        System.out.println("test-rivu.pdf");
        PDDocument document = Loader.loadPDF(resource);

        PDPage page = document.getPage(4);
        PDFStreamParser parser = new PDFStreamParser(page.getContents());
        List<Object> tokens = parser.parse();
        int lines=0;
        int curves=0;
        int rectangles=0;
        int doOps=0;
        int clipPaths=0;
        for (Object token:tokens){
            if (token instanceof Operator) {
                Operator op=(Operator) token;
                if ("do".equals(op.getName()))
                    doOps+=1;
                else if ("W".equals(op.getName())|| "W*".equals(op.getName()))
                    clipPaths+=1;
                else if ("l".equals(op.getName()) || "h".equals(op.getName()))
                    lines+=1;
                else if ("c".equals(op.getName())||"y".equals(op.getName()) ||"v".equals(op.getName())){
                    System.out.println(op);
                    curves+=1;
                }
                else if ("re".equals(op.getName()))
                    rectangles+=1;


            }
        }
        System.out.println(lines + " lines, " + curves + " curves, " + rectangles + " rectangles, " + doOps + " xobjects, " + clipPaths + " clip paths");

        document.close();
    }
}
 
Example 20
Source File: LayerUtility.java    From gcs with Mozilla Public License 2.0 2 votes vote down vote up
/**
 * Imports a page from some PDF file as a Form XObject so it can be placed on another page
 * in the target document.
 * <p>
 * You may want to call {@link #wrapInSaveRestore(PDPage) wrapInSaveRestore(PDPage)} before invoking the Form XObject to
 * make sure that the graphics state is reset.
 * 
 * @param sourceDoc the source PDF document that contains the page to be copied
 * @param pageNumber the page number of the page to be copied
 * @return a Form XObject containing the original page's content
 * @throws IOException if an I/O error occurs
 */
public PDFormXObject importPageAsForm(PDDocument sourceDoc, int pageNumber) throws IOException
{
    PDPage page = sourceDoc.getPage(pageNumber);
    return importPageAsForm(sourceDoc, page);
}