org.apache.pdfbox.io.RandomAccessFile Java Examples

The following examples show how to use org.apache.pdfbox.io.RandomAccessFile. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: PDF2TextExample.java    From tutorials with MIT License 5 votes vote down vote up
private static void generateTxtFromPDF(String filename) throws IOException {
	File f = new File(filename);
	String parsedText;
	PDFParser parser = new PDFParser(new RandomAccessFile(f, "r"));
	parser.parse();

	COSDocument cosDoc = parser.getDocument();

	PDFTextStripper pdfStripper = new PDFTextStripper();
	PDDocument pdDoc = new PDDocument(cosDoc);

	parsedText = pdfStripper.getText(pdDoc);

	if (cosDoc != null)
		cosDoc.close();
	if (pdDoc != null)
		pdDoc.close();

	PrintWriter pw = new PrintWriter("src/output/pdf.txt");
	pw.print(parsedText);
	pw.close();
}
 
Example #2
Source File: TxtCreator.java    From pdf-converter with Apache License 2.0 4 votes vote down vote up
public void process(File pdf, File output){
    PDDocument pdDoc;
    try {//Kudos for closing: http://stackoverflow.com/questions/156508/closing-a-java-fileinputstream
        File tmpfile = File.createTempFile(String.format("txttmp-%s", UUID.randomUUID().toString()), null);
        RandomAccessFile raf = new RandomAccessFile(tmpfile, "rw");
        pdDoc = PDDocument.loadNonSeq(pdf, raf);
        FileWriter writer = new FileWriter(output);
        try {
            PDFTextStripper stripper = new PDFTextStripper();
            int numberOfPages = pdDoc.getNumberOfPages();

            for (int j = 1; j < numberOfPages+1; j++) {
                stripper.setStartPage(j);
                stripper.setEndPage(j);
                writer.write(stripper.getText(pdDoc));
                writer.flush();
            }
        } finally {
            pdDoc.close();
            raf.close();
            tmpfile.delete();
            writer.close();
        }
    } catch (IOException ioe) {
        log.warn(String.format("Failed to create txt for file: %s", pdf.getName()), ioe);
    }
}
 
Example #3
Source File: FDFParser.java    From gcs with Mozilla Public License 2.0 3 votes vote down vote up
/**
 * Constructs parser for given file using given buffer for temporary
 * storage.
 * 
 * @param file the pdf to be parsed
 * 
 * @throws IOException If something went wrong.
 */
public FDFParser(File file) throws IOException
{
    super(new RandomAccessFile(file, "r"));
    fileLen = file.length();
    init();
}
 
Example #4
Source File: CCITTFactory.java    From gcs with Mozilla Public License 2.0 3 votes vote down vote up
/**
 * Creates a new CCITT Fax compressed image XObject from a specific image of a TIFF file. Only
 * single-strip CCITT T4 or T6 compressed TIFF files are supported. If you're not sure what TIFF
 * files you have, use
 * {@link LosslessFactory#createFromImage(PDDocument, BufferedImage) }
 * or {@link CCITTFactory#createFromImage(PDDocument, BufferedImage) }
 * instead.
 *
 * @param document the document to create the image as part of.
 * @param file the TIFF file which contains a suitable CCITT compressed image
 * @param number TIFF image number, starting from 0
 * @return a new Image XObject
 * @throws IOException if there is an error reading the TIFF data.
 */
public static PDImageXObject createFromFile(PDDocument document, File file, int number)
        throws IOException
{
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    try
    {
        return createFromRandomAccessImpl(document, raf, number);
    }
    finally
    {
        raf.close();
    }
}