Java Code Examples for org.apache.avro.file.DataFileConstants#SYNC_SIZE

The following examples show how to use org.apache.avro.file.DataFileConstants#SYNC_SIZE . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: AvroSource.java    From beam with Apache License 2.0 4 votes vote down vote up
/**
 * Reads the {@link AvroMetadata} from the header of an Avro file.
 *
 * <p>This method parses the header of an Avro <a
 * href="https://avro.apache.org/docs/1.7.7/spec.html#Object+Container+Files">Object Container
 * File</a>.
 *
 * @throws IOException if the file is an invalid format.
 */
@VisibleForTesting
static AvroMetadata readMetadataFromFile(ResourceId fileResource) throws IOException {
  String codec = null;
  String schemaString = null;
  byte[] syncMarker;
  try (InputStream stream = Channels.newInputStream(FileSystems.open(fileResource))) {
    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(stream, null);

    // The header of an object container file begins with a four-byte magic number, followed
    // by the file metadata (including the schema and codec), encoded as a map. Finally, the
    // header ends with the file's 16-byte sync marker.
    // See https://avro.apache.org/docs/1.7.7/spec.html#Object+Container+Files for details on
    // the encoding of container files.

    // Read the magic number.
    byte[] magic = new byte[DataFileConstants.MAGIC.length];
    decoder.readFixed(magic);
    if (!Arrays.equals(magic, DataFileConstants.MAGIC)) {
      throw new IOException("Missing Avro file signature: " + fileResource);
    }

    // Read the metadata to find the codec and schema.
    ByteBuffer valueBuffer = ByteBuffer.allocate(512);
    long numRecords = decoder.readMapStart();
    while (numRecords > 0) {
      for (long recordIndex = 0; recordIndex < numRecords; recordIndex++) {
        String key = decoder.readString();
        // readBytes() clears the buffer and returns a buffer where:
        // - position is the start of the bytes read
        // - limit is the end of the bytes read
        valueBuffer = decoder.readBytes(valueBuffer);
        byte[] bytes = new byte[valueBuffer.remaining()];
        valueBuffer.get(bytes);
        if (key.equals(DataFileConstants.CODEC)) {
          codec = new String(bytes, StandardCharsets.UTF_8);
        } else if (key.equals(DataFileConstants.SCHEMA)) {
          schemaString = new String(bytes, StandardCharsets.UTF_8);
        }
      }
      numRecords = decoder.mapNext();
    }
    if (codec == null) {
      codec = DataFileConstants.NULL_CODEC;
    }

    // Finally, read the sync marker.
    syncMarker = new byte[DataFileConstants.SYNC_SIZE];
    decoder.readFixed(syncMarker);
  }
  checkState(schemaString != null, "No schema present in Avro file metadata %s", fileResource);
  return new AvroMetadata(syncMarker, codec, schemaString);
}
 
Example 2
Source File: FlumeEventAvroEventDeserializer.java    From mt-flume with Apache License 2.0 4 votes vote down vote up
@Override
public void mark() throws IOException {
  long pos = fileReader.previousSync() - DataFileConstants.SYNC_SIZE;
  if (pos < 0) pos = 0;
  ((RemoteMarkable) ris).markPosition(pos);
}
 
Example 3
Source File: AvroEventDeserializer.java    From mt-flume with Apache License 2.0 4 votes vote down vote up
@Override
public void mark() throws IOException {
  long pos = fileReader.previousSync() - DataFileConstants.SYNC_SIZE;
  if (pos < 0) pos = 0;
  ((RemoteMarkable) ris).markPosition(pos);
}