Java Code Examples for org.apache.beam.sdk.coders.AvroCoder#of()

The following examples show how to use org.apache.beam.sdk.coders.AvroCoder#of() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: HadoopFormatIOReadTest.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * This test validates behavior of {@link
 * HadoopInputFormatBoundedSource#computeSplitsIfNecessary() computeSplits()} when Hadoop
 * InputFormat's {@link InputFormat#getSplits(JobContext)} returns empty list.
 */
@Test
public void testComputeSplitsIfGetSplitsReturnsEmptyList() throws Exception {
  InputFormat<?, ?> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
  SerializableSplit mockInputSplit = Mockito.mock(SerializableSplit.class);
  Mockito.when(mockInputFormat.getSplits(Mockito.any(JobContext.class)))
      .thenReturn(new ArrayList<>());
  HadoopInputFormatBoundedSource<Text, Employee> hifSource =
      new HadoopInputFormatBoundedSource<>(
          serConf,
          WritableCoder.of(Text.class),
          AvroCoder.of(Employee.class),
          null, // No key translation required.
          null, // No value translation required.
          mockInputSplit);
  thrown.expect(IOException.class);
  thrown.expectMessage("Error in computing splits, getSplits() returns a empty list");
  hifSource.setInputFormatObj(mockInputFormat);
  hifSource.computeSplitsIfNecessary();
}
 
Example 2
Source File: HadoopFormatIOReadTest.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * This test validates behavior of HadoopInputFormatSource if {@link
 * InputFormat#createRecordReader(InputSplit, TaskAttemptContext)} createRecordReader(InputSplit,
 * TaskAttemptContext)} of InputFormat returns null.
 */
@Test
public void testReadWithNullCreateRecordReader() throws Exception {
  InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
  thrown.expect(IOException.class);
  thrown.expectMessage(
      String.format("Null RecordReader object returned by %s", mockInputFormat.getClass()));
  Mockito.when(
          mockInputFormat.createRecordReader(
              Mockito.any(InputSplit.class), Mockito.any(TaskAttemptContext.class)))
      .thenReturn(null);
  HadoopInputFormatBoundedSource<Text, Employee> boundedSource =
      new HadoopInputFormatBoundedSource<>(
          serConf,
          WritableCoder.of(Text.class),
          AvroCoder.of(Employee.class),
          null, // No key translation required.
          null, // No value translation required.
          new SerializableSplit());
  boundedSource.setInputFormatObj(mockInputFormat);
  SourceTestUtils.readFromSource(boundedSource, p.getOptions());
}
 
Example 3
Source File: HadoopFormatIOReadTest.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * This test validates behavior of {@link HadoopInputFormatBoundedSource} if RecordReader object
 * creation fails.
 */
@Test
public void testReadIfCreateRecordReaderFails() throws Exception {
  thrown.expect(Exception.class);
  thrown.expectMessage("Exception in creating RecordReader");
  InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
  Mockito.when(
          mockInputFormat.createRecordReader(
              Mockito.any(InputSplit.class), Mockito.any(TaskAttemptContext.class)))
      .thenThrow(new IOException("Exception in creating RecordReader"));
  HadoopInputFormatBoundedSource<Text, Employee> boundedSource =
      new HadoopInputFormatBoundedSource<>(
          serConf,
          WritableCoder.of(Text.class),
          AvroCoder.of(Employee.class),
          null, // No key translation required.
          null, // No value translation required.
          new SerializableSplit());
  boundedSource.setInputFormatObj(mockInputFormat);
  SourceTestUtils.readFromSource(boundedSource, p.getOptions());
}
 
Example 4
Source File: JDBCInputPTransformRuntime.java    From components with Apache License 2.0 6 votes vote down vote up
@Override
public ValidationResult initialize(RuntimeContainer container, JDBCInputProperties properties) {
    this.properties = properties;

    // In Beam, JdbcIO always has a repartition event, so we are obligated to fetch the schema before any processing
    // occurs in the nodes.
    Schema schema = properties.getDatasetProperties().main.schema.getValue();
    if (schema == null || AvroUtils.isSchemaEmpty(schema) || AvroUtils.isIncludeAllFields(schema)) {
        JDBCDatasetRuntime schemaFetcher = new JDBCDatasetRuntime();
        schemaFetcher.initialize(container, properties.getDatasetProperties());
        schema = schemaFetcher.getSchema();
    }

    this.defaultOutputCoder = AvroCoder.of(schema);

    return ValidationResult.OK;
}
 
Example 5
Source File: PubsubIOTest.java    From beam with Apache License 2.0 6 votes vote down vote up
@Test
public void testAvroSpecificRecord() {
  AvroCoder<AvroGeneratedUser> coder = AvroCoder.of(AvroGeneratedUser.class);
  List<AvroGeneratedUser> inputs =
      ImmutableList.of(
          new AvroGeneratedUser("Bob", 256, null),
          new AvroGeneratedUser("Alice", 128, null),
          new AvroGeneratedUser("Ted", null, "white"));
  setupTestClient(inputs, coder);
  PCollection<AvroGeneratedUser> read =
      readPipeline.apply(
          PubsubIO.readAvrosWithBeamSchema(AvroGeneratedUser.class)
              .fromSubscription(SUBSCRIPTION.getPath())
              .withClock(CLOCK)
              .withClientFactory(clientFactory));
  PAssert.that(read).containsInAnyOrder(inputs);
  readPipeline.run();
}
 
Example 6
Source File: HadoopFormatIOReadTest.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * This test verifies that the method {@link
 * HadoopInputFormatBoundedSource.HadoopInputFormatReader#getCurrentSource() getCurrentSource()}
 * returns correct source object.
 */
@Test
public void testGetCurrentSourceFunction() throws Exception {
  SerializableSplit split = new SerializableSplit();
  BoundedSource<KV<Text, Employee>> source =
      new HadoopInputFormatBoundedSource<>(
          serConf,
          WritableCoder.of(Text.class),
          AvroCoder.of(Employee.class),
          null, // No key translation required.
          null, // No value translation required.
          split);
  BoundedReader<KV<Text, Employee>> hifReader = source.createReader(p.getOptions());
  BoundedSource<KV<Text, Employee>> hifSource = hifReader.getCurrentSource();
  assertEquals(hifSource, source);
}
 
Example 7
Source File: PubsubIO.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * Returns a {@link PTransform} that continuously reads binary encoded Avro messages of the
 * specific type.
 *
 * <p>Beam will infer a schema for the Avro schema. This allows the output to be used by SQL and
 * by the schema-transform library.
 */
@Experimental(Kind.SCHEMAS)
public static <T> Read<T> readAvrosWithBeamSchema(Class<T> clazz) {
  if (clazz.equals(GenericRecord.class)) {
    throw new IllegalArgumentException("For GenericRecord, please call readAvroGenericRecords");
  }
  org.apache.avro.Schema avroSchema = ReflectData.get().getSchema(clazz);
  AvroCoder<T> coder = AvroCoder.of(clazz);
  Schema schema = AvroUtils.getSchema(clazz, null);
  return Read.newBuilder(parsePayloadUsingCoder(coder))
      .setCoder(
          SchemaCoder.of(
              schema,
              TypeDescriptor.of(clazz),
              AvroUtils.getToRowFunction(clazz, avroSchema),
              AvroUtils.getFromRowFunction(clazz)))
      .build();
}
 
Example 8
Source File: PubsubIO.java    From beam with Apache License 2.0 5 votes vote down vote up
/**
 * Returns a {@link PTransform} that continuously reads binary encoded Avro messages into the Avro
 * {@link GenericRecord} type.
 *
 * <p>Beam will infer a schema for the Avro schema. This allows the output to be used by SQL and
 * by the schema-transform library.
 */
@Experimental(Kind.SCHEMAS)
public static Read<GenericRecord> readAvroGenericRecords(org.apache.avro.Schema avroSchema) {
  Schema schema = AvroUtils.getSchema(GenericRecord.class, avroSchema);
  AvroCoder<GenericRecord> coder = AvroCoder.of(GenericRecord.class, avroSchema);
  return Read.newBuilder(parsePayloadUsingCoder(coder))
      .setCoder(
          SchemaCoder.of(
              schema,
              TypeDescriptor.of(GenericRecord.class),
              AvroUtils.getToRowFunction(GenericRecord.class, avroSchema),
              AvroUtils.getFromRowFunction(GenericRecord.class)))
      .build();
}
 
Example 9
Source File: HadoopFormatIOReadTest.java    From beam with Apache License 2.0 5 votes vote down vote up
/**
 * This test validates the method getFractionConsumed()- when a bad progress value is returned by
 * the inputformat.
 */
@Test
public void testGetFractionConsumedForBadProgressValue() throws Exception {
  InputFormat<Text, Employee> mockInputFormat = Mockito.mock(EmployeeInputFormat.class);
  EmployeeRecordReader mockReader = Mockito.mock(EmployeeRecordReader.class);
  Mockito.when(mockInputFormat.createRecordReader(Mockito.any(), Mockito.any()))
      .thenReturn(mockReader);
  Mockito.when(mockReader.nextKeyValue()).thenReturn(true);
  // Set to a bad value , not in range of 0 to 1
  Mockito.when(mockReader.getProgress()).thenReturn(2.0F);
  InputSplit mockInputSplit = Mockito.mock(NewObjectsEmployeeInputSplit.class);
  HadoopInputFormatBoundedSource<Text, Employee> boundedSource =
      new HadoopInputFormatBoundedSource<>(
          serConf,
          WritableCoder.of(Text.class),
          AvroCoder.of(Employee.class),
          null, // No key translation required.
          null, // No value translation required.
          new SerializableSplit(mockInputSplit));
  boundedSource.setInputFormatObj(mockInputFormat);
  BoundedReader<KV<Text, Employee>> reader = boundedSource.createReader(p.getOptions());
  assertEquals(Double.valueOf(0), reader.getFractionConsumed());
  boolean start = reader.start();
  assertTrue(start);
  if (start) {
    boolean advance = reader.advance();
    assertEquals(null, reader.getFractionConsumed());
    assertTrue(advance);
    if (advance) {
      advance = reader.advance();
      assertEquals(null, reader.getFractionConsumed());
    }
  }
  // Validate if getFractionConsumed() returns null after few number of reads as getProgress
  // returns invalid value '2' which is not in the range of 0 to 1.
  assertEquals(null, reader.getFractionConsumed());
  reader.close();
}
 
Example 10
Source File: AvroCoderCloudObjectTranslator.java    From beam with Apache License 2.0 5 votes vote down vote up
@Override
public AvroCoder<?> fromCloudObject(CloudObject cloudObject) {
  Schema.Parser parser = new Schema.Parser();
  String className = Structs.getString(cloudObject, TYPE_FIELD);
  String schemaString = Structs.getString(cloudObject, SCHEMA_FIELD);
  try {
    Class<?> type = Class.forName(className);
    Schema schema = parser.parse(schemaString);
    return AvroCoder.of(type, schema);
  } catch (ClassNotFoundException e) {
    throw new IllegalArgumentException(e);
  }
}
 
Example 11
Source File: HadoopFormatIOReadTest.java    From beam with Apache License 2.0 5 votes vote down vote up
/** This test validates that reader and its parent source reads the same records. */
@Test
public void testReaderAndParentSourceReadsSameData() throws Exception {
  InputSplit mockInputSplit = Mockito.mock(NewObjectsEmployeeInputSplit.class);
  HadoopInputFormatBoundedSource<Text, Employee> boundedSource =
      new HadoopInputFormatBoundedSource<>(
          serConf,
          WritableCoder.of(Text.class),
          AvroCoder.of(Employee.class),
          null, // No key translation required.
          null, // No value translation required.
          new SerializableSplit(mockInputSplit));
  BoundedReader<KV<Text, Employee>> reader = boundedSource.createReader(p.getOptions());
  SourceTestUtils.assertUnstartedReaderReadsSameAsItsSource(reader, p.getOptions());
}
 
Example 12
Source File: PubsubIO.java    From beam with Apache License 2.0 5 votes vote down vote up
/**
 * Returns A {@link PTransform} that continuously reads binary encoded Avro messages of the given
 * type from a Google Cloud Pub/Sub stream.
 */
public static <T> Read<T> readAvros(Class<T> clazz) {
  // TODO: Stop using AvroCoder and instead parse the payload directly.
  // We should not be relying on the fact that AvroCoder's wire format is identical to
  // the Avro wire format, as the wire format is not part of a coder's API.
  AvroCoder<T> coder = AvroCoder.of(clazz);
  return Read.newBuilder(parsePayloadUsingCoder(coder)).setCoder(coder).build();
}
 
Example 13
Source File: GenericRecordToRowTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testConvertsGenericRecordToRow() {
  String schemaString =
      "{\"namespace\": \"example.avro\",\n"
          + " \"type\": \"record\",\n"
          + " \"name\": \"User\",\n"
          + " \"fields\": [\n"
          + "     {\"name\": \"name\", \"type\": \"string\"},\n"
          + "     {\"name\": \"favorite_number\", \"type\": \"int\"},\n"
          + "     {\"name\": \"favorite_color\", \"type\": \"string\"},\n"
          + "     {\"name\": \"price\", \"type\": \"double\"}\n"
          + " ]\n"
          + "}";
  Schema schema = (new Schema.Parser()).parse(schemaString);

  GenericRecord before = new GenericData.Record(schema);
  before.put("name", "Bob");
  before.put("favorite_number", 256);
  before.put("favorite_color", "red");
  before.put("price", 2.4);

  AvroCoder<GenericRecord> coder = AvroCoder.of(schema);

  PCollection<Row> rows =
      pipeline
          .apply("create PCollection<GenericRecord>", Create.of(before).withCoder(coder))
          .apply(
              "convert", GenericRecordReadConverter.builder().beamSchema(payloadSchema).build());

  PAssert.that(rows)
      .containsInAnyOrder(
          Row.withSchema(payloadSchema).addValues("Bob", 256, "red", 2.4).build());
  pipeline.run();
}
 
Example 14
Source File: BigQueryInputRuntime.java    From components with Apache License 2.0 5 votes vote down vote up
@Override
public ValidationResult initialize(RuntimeContainer container, BigQueryInputProperties properties) {
    this.properties = properties;
    this.dataset = properties.getDatasetProperties();
    this.datastore = dataset.getDatastoreProperties();

    // Data returned by BigQueryIO do not contains self schema, so have to retrieve it before read and write
    // operations
    Schema schema = properties.getDatasetProperties().main.schema.getValue();
    if (schema == null || AvroUtils.isSchemaEmpty(schema) || AvroUtils.isIncludeAllFields(schema)) {
        BigQueryDatasetRuntime schemaFetcher = new BigQueryDatasetRuntime();
        schemaFetcher.initialize(container, properties.getDatasetProperties());
        schema = schemaFetcher.getSchema();
    }

    Object pipelineOptionsObj = container.getGlobalData(BeamJobRuntimeContainer.PIPELINE_OPTIONS);
    if (pipelineOptionsObj != null) {
        PipelineOptions pipelineOptions = (PipelineOptions) pipelineOptionsObj;
        GcpServiceAccountOptions gcpOptions = pipelineOptions.as(GcpServiceAccountOptions.class);
        if (!"DataflowRunner".equals(gcpOptions.getRunner().getSimpleName())) {
            // when using Dataflow runner, these properties has been set on pipeline level
            gcpOptions.setProject(datastore.projectName.getValue());
            gcpOptions.setTempLocation(datastore.tempGsFolder.getValue());
            gcpOptions.setCredentialFactoryClass(ServiceAccountCredentialFactory.class);
            gcpOptions.setServiceAccountFile(datastore.serviceAccountFile.getValue());
            gcpOptions.setGcpCredential(BigQueryConnection.createCredentials(datastore));
        }
    }

    this.defaultOutputCoder = AvroCoder.of(schema);

    return ValidationResult.OK;
}
 
Example 15
Source File: AvroCoderCache.java    From component-runtime with Apache License 2.0 5 votes vote down vote up
@Override
public synchronized AvroCoder<IndexedRecord> get(final Object key) {
    AvroCoder<IndexedRecord> coder = super.get(key);
    if (coder == null) {
        final Schema schema = Schema.class.cast(key);
        coder = AvroCoder.of(IndexedRecord.class, schema);
        put(schema, coder);
    }
    return coder;
}
 
Example 16
Source File: KafkaUnboundedSource.java    From beam with Apache License 2.0 4 votes vote down vote up
@Override
public Coder<KafkaCheckpointMark> getCheckpointMarkCoder() {
  return AvroCoder.of(KafkaCheckpointMark.class);
}
 
Example 17
Source File: CountingSource.java    From beam with Apache License 2.0 4 votes vote down vote up
@Override
public Coder<CountingSource.CounterMark> getCheckpointMarkCoder() {
  return AvroCoder.of(CountingSource.CounterMark.class);
}
 
Example 18
Source File: JsonToIndexedRecord.java    From component-runtime with Apache License 2.0 4 votes vote down vote up
@Override
protected Coder<?> getDefaultOutputCoder() {
    return AvroCoder.of(outputSchema);
}
 
Example 19
Source File: ConfluentSchemaRegistryDeserializerProvider.java    From beam with Apache License 2.0 4 votes vote down vote up
@Override
public Coder<T> getCoder(CoderRegistry coderRegistry) {
  final Schema avroSchema = new Schema.Parser().parse(getSchemaMetadata().getSchema());
  return (Coder<T>) AvroCoder.of(avroSchema);
}
 
Example 20
Source File: KafkaUnboundedSource.java    From DataflowTemplates with Apache License 2.0 4 votes vote down vote up
@Override
public Coder<KafkaCheckpointMark> getCheckpointMarkCoder() {
  return AvroCoder.of(KafkaCheckpointMark.class);
}