Java Code Examples for org.apache.beam.sdk.io.UnboundedSource#getCheckpointMarkCoder()

The following examples show how to use org.apache.beam.sdk.io.UnboundedSource#getCheckpointMarkCoder() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.

Example 1

Source File: WorkerCustomSources.java From beam with Apache License 2.0

6 votes

@Override
@SuppressWarnings("unchecked")
public NativeReaderIterator<WindowedValue<ValueWithRecordId<T>>> iterator() throws IOException {
  UnboundedSource.UnboundedReader<T> reader =
      (UnboundedSource.UnboundedReader<T>) context.getCachedReader();
  final boolean started = reader != null;
  if (reader == null) {
    String key = context.getSerializedKey().toStringUtf8();
    // Key is expected to be a zero-padded integer representing the split index.
    int splitIndex = Integer.parseInt(key.substring(0, 16), 16) - 1;

    UnboundedSource<T, UnboundedSource.CheckpointMark> splitSource = parseSource(splitIndex);

    UnboundedSource.CheckpointMark checkpoint = null;
    if (splitSource.getCheckpointMarkCoder() != null) {
      checkpoint = context.getReaderCheckpoint(splitSource.getCheckpointMarkCoder());
    }

    reader = splitSource.createReader(options, checkpoint);
  }

  context.setActiveReader(reader);

  return new UnboundedReaderIterator<>(reader, context, started);
}

Example 2

Source File: UnboundedSourceWrapper.java From beam with Apache License 2.0

5 votes

@SuppressWarnings("unchecked")
public UnboundedSourceWrapper(
    String stepName,
    PipelineOptions pipelineOptions,
    UnboundedSource<OutputT, CheckpointMarkT> source,
    int parallelism)
    throws Exception {
  this.stepName = stepName;
  this.serializedOptions = new SerializablePipelineOptions(pipelineOptions);
  this.isConvertedBoundedSource =
      source instanceof UnboundedReadFromBoundedSource.BoundedToUnboundedSourceAdapter;

  if (source.requiresDeduping()) {
    LOG.warn("Source {} requires deduping but Flink runner doesn't support this yet.", source);
  }

  Coder<CheckpointMarkT> checkpointMarkCoder = source.getCheckpointMarkCoder();
  if (checkpointMarkCoder == null) {
    LOG.info("No CheckpointMarkCoder specified for this source. Won't create snapshots.");
    checkpointCoder = null;
  } else {

    Coder<? extends UnboundedSource<OutputT, CheckpointMarkT>> sourceCoder =
        (Coder) SerializableCoder.of(new TypeDescriptor<UnboundedSource>() {});

    checkpointCoder = KvCoder.of(sourceCoder, checkpointMarkCoder);
  }

  // get the splits early. we assume that the generated splits are stable,
  // this is necessary so that the mapping of state to source is correct
  // when restoring
  splitSources = source.split(parallelism, pipelineOptions);

  FlinkPipelineOptions options = pipelineOptions.as(FlinkPipelineOptions.class);
  idleTimeoutMs = options.getShutdownSourcesAfterIdleMs();
}

Example 3

Source File: UnboundedSourceSystem.java From beam with Apache License 2.0

5 votes

Consumer(
    UnboundedSource<T, CheckpointMarkT> source,
    SamzaPipelineOptions pipelineOptions,
    SamzaMetricsContainer metricsContainer,
    String stepName) {
  try {
    this.splits = split(source, pipelineOptions);
  } catch (Exception e) {
    throw new SamzaException("Fail to split source", e);
  }
  this.checkpointMarkCoder = source.getCheckpointMarkCoder();
  this.pipelineOptions = pipelineOptions;
  this.metricsContainer = metricsContainer;
  this.stepName = stepName;
}