Java Code Examples for org.apache.avro.file.DataFileConstants#XZ_CODEC

The following examples show how to use org.apache.avro.file.DataFileConstants#XZ_CODEC . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: AvroSource.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * Decodes a byte array as an InputStream. The byte array may be compressed using some codec.
 * Reads from the returned stream will result in decompressed bytes.
 *
 * <p>This supports the same codecs as Avro's {@link CodecFactory}, namely those defined in
 * {@link DataFileConstants}.
 *
 * <ul>
 *   <li>"snappy" : Google's Snappy compression
 *   <li>"deflate" : deflate compression
 *   <li>"bzip2" : Bzip2 compression
 *   <li>"xz" : xz compression
 *   <li>"null" (the string, not the value): Uncompressed data
 * </ul>
 */
private static InputStream decodeAsInputStream(byte[] data, String codec) throws IOException {
  ByteArrayInputStream byteStream = new ByteArrayInputStream(data);
  switch (codec) {
    case DataFileConstants.SNAPPY_CODEC:
      return new SnappyCompressorInputStream(byteStream, 1 << 16 /* Avro uses 64KB blocks */);
    case DataFileConstants.DEFLATE_CODEC:
      // nowrap == true: Do not expect ZLIB header or checksum, as Avro does not write them.
      Inflater inflater = new Inflater(true);
      return new InflaterInputStream(byteStream, inflater);
    case DataFileConstants.XZ_CODEC:
      return new XZCompressorInputStream(byteStream);
    case DataFileConstants.BZIP2_CODEC:
      return new BZip2CompressorInputStream(byteStream);
    case DataFileConstants.NULL_CODEC:
      return byteStream;
    default:
      throw new IllegalArgumentException("Unsupported codec: " + codec);
  }
}
 
Example 2
Source File: AvroSourceTest.java    From beam with Apache License 2.0 6 votes vote down vote up
@Test
public void testReadWithDifferentCodecs() throws Exception {
  // Test reading files generated using all codecs.
  String[] codecs = {
    DataFileConstants.NULL_CODEC,
    DataFileConstants.BZIP2_CODEC,
    DataFileConstants.DEFLATE_CODEC,
    DataFileConstants.SNAPPY_CODEC,
    DataFileConstants.XZ_CODEC,
  };
  // As Avro's default block size is 64KB, write 64K records to ensure at least one full block.
  // We could make this smaller than 64KB assuming each record is at least B bytes, but then the
  // test could silently stop testing the failure condition from BEAM-422.
  List<Bird> expected = createRandomRecords(1 << 16);

  for (String codec : codecs) {
    String filename =
        generateTestFile(
            codec, expected, SyncBehavior.SYNC_DEFAULT, 0, AvroCoder.of(Bird.class), codec);
    AvroSource<Bird> source = AvroSource.from(filename).withSchema(Bird.class);
    List<Bird> actual = SourceTestUtils.readFromSource(source, null);
    assertThat(expected, containsInAnyOrder(actual.toArray()));
  }
}
 
Example 3
Source File: AvroSourceTest.java    From beam with Apache License 2.0 6 votes vote down vote up
@Test
public void testReadMetadataWithCodecs() throws Exception {
  // Test reading files generated using all codecs.
  String[] codecs = {
    DataFileConstants.NULL_CODEC,
    DataFileConstants.BZIP2_CODEC,
    DataFileConstants.DEFLATE_CODEC,
    DataFileConstants.SNAPPY_CODEC,
    DataFileConstants.XZ_CODEC
  };
  List<Bird> expected = createRandomRecords(DEFAULT_RECORD_COUNT);

  for (String codec : codecs) {
    String filename =
        generateTestFile(
            codec, expected, SyncBehavior.SYNC_DEFAULT, 0, AvroCoder.of(Bird.class), codec);

    Metadata fileMeta = FileSystems.matchSingleFileSpec(filename);
    AvroMetadata metadata = AvroSource.readMetadataFromFile(fileMeta.resourceId());
    assertEquals(codec, metadata.getCodec());
  }
}