Java Code Examples for org.apache.parquet.bytes.BytesUtils#readIntLittleEndian()

The following examples show how to use org.apache.parquet.bytes.BytesUtils#readIntLittleEndian() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: FooterGatherer.java    From Bats with Apache License 2.0 5 votes vote down vote up
/**
 * An updated footer reader that tries to read the entire footer without knowing the length.
 * This should reduce the amount of seek/read roundtrips in most workloads.
 * @param fs
 * @param status
 * @return
 * @throws IOException
 */
public static Footer readFooter(final Configuration config, final FileStatus status) throws IOException {
  final FileSystem fs = status.getPath().getFileSystem(config);
  try(FSDataInputStream file = fs.open(status.getPath())) {

    final long fileLength = status.getLen();
    Preconditions.checkArgument(fileLength >= MIN_FILE_SIZE, "%s is not a Parquet file (too small)", status.getPath());

    int len = (int) Math.min( fileLength, (long) DEFAULT_READ_SIZE);
    byte[] footerBytes = new byte[len];
    readFully(file, fileLength - len, footerBytes, 0, len);

    checkMagicBytes(status, footerBytes, footerBytes.length - ParquetFileWriter.MAGIC.length);
    final int size = BytesUtils.readIntLittleEndian(footerBytes, footerBytes.length - FOOTER_METADATA_SIZE);

    if(size > footerBytes.length - FOOTER_METADATA_SIZE){
      // if the footer is larger than our initial read, we need to read the rest.
      byte[] origFooterBytes = footerBytes;
      int origFooterRead = origFooterBytes.length - FOOTER_METADATA_SIZE;

      footerBytes = new byte[size];

      readFully(file, fileLength - size - FOOTER_METADATA_SIZE, footerBytes, 0, size - origFooterRead);
      System.arraycopy(origFooterBytes, 0, footerBytes, size - origFooterRead, origFooterRead);
    }else{
      int start = footerBytes.length - (size + FOOTER_METADATA_SIZE);
      footerBytes = ArrayUtils.subarray(footerBytes, start, start + size);
    }

    final ByteArrayInputStream from = new ByteArrayInputStream(footerBytes);
    ParquetMetadata metadata = ParquetFormatPlugin.parquetMetadataConverter.readParquetMetadata(from, NO_FILTER);
    Footer footer = new Footer(status.getPath(), metadata);
    return footer;
  }
}
 
Example 2
Source File: RunLengthBitPackingHybridValuesReader.java    From parquet-mr with Apache License 2.0 5 votes vote down vote up
@Override
public void initFromPage(int valueCountL, ByteBufferInputStream stream) throws IOException {
  int length = BytesUtils.readIntLittleEndian(stream);
  this.decoder = new RunLengthBitPackingHybridDecoder(
      bitWidth, stream.sliceStream(length));

  // 4 is for the length which is stored as 4 bytes little endian
  updateNextOffset(length + 4);
}
 
Example 3
Source File: BinaryPlainValuesReader.java    From parquet-mr with Apache License 2.0 5 votes vote down vote up
@Override
public Binary readBytes() {
  try {
    int length = BytesUtils.readIntLittleEndian(in);
    return Binary.fromConstantByteBuffer(in.slice(length));
  } catch (IOException | RuntimeException e) {
    throw new ParquetDecodingException("could not read bytes at offset " + in.position(), e);
  }
}
 
Example 4
Source File: BinaryPlainValuesReader.java    From parquet-mr with Apache License 2.0 5 votes vote down vote up
@Override
public void skip() {
  try {
    int length = BytesUtils.readIntLittleEndian(in);
    in.skipFully(length);
  } catch (IOException | RuntimeException e) {
    throw new ParquetDecodingException("could not skip bytes at offset " + in.position(), e);
  }
}