Java Code Examples for org.apache.beam.sdk.options.ValueProvider.NestedValueProvider#of()

The following examples show how to use org.apache.beam.sdk.options.ValueProvider.NestedValueProvider#of() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: BigQueryIO.java    From beam with Apache License 2.0 6 votes vote down vote up
/**
 * Returns the table to write, or {@code null} if writing with {@code tableFunction}.
 *
 * <p>If the table's project is not specified, use the executing project.
 */
@Nullable
ValueProvider<TableReference> getTableWithDefaultProject(BigQueryOptions bqOptions) {
  ValueProvider<TableReference> table = getTable();
  if (table == null) {
    return table;
  }

  if (!table.isAccessible()) {
    LOG.info(
        "Using a dynamic value for table input. This must contain a project"
            + " in the table reference: {}",
        table);
    return table;
  }
  if (Strings.isNullOrEmpty(table.get().getProjectId())) {
    // If user does not specify a project we assume the table to be located in
    // the default project.
    TableReference tableRef = table.get();
    tableRef.setProjectId(bqOptions.getProject());
    return NestedValueProvider.of(
        StaticValueProvider.of(BigQueryHelpers.toJsonString(tableRef)),
        new JsonTableRefToTableRef());
  }
  return table;
}
 
Example 2
Source File: InvoicingUtils.java    From nomulus with Apache License 2.0 6 votes vote down vote up
/**
 * Returns a provider that creates a Bigquery query for a given project and yearMonth at runtime.
 *
 * <p>We only know yearMonth at runtime, so this provider fills in the {@code
 * sql/billing_events.sql} template at runtime.
 *
 * @param yearMonthProvider a runtime provider that returns which month we're invoicing for.
 * @param projectId the projectId we're generating invoicing for.
 */
static ValueProvider<String> makeQueryProvider(
    ValueProvider<String> yearMonthProvider, String projectId) {
  return NestedValueProvider.of(
      yearMonthProvider,
      (yearMonth) -> {
        // Get the timestamp endpoints capturing the entire month with microsecond precision
        YearMonth reportingMonth = YearMonth.parse(yearMonth);
        LocalDateTime firstMoment = reportingMonth.atDay(1).atTime(LocalTime.MIDNIGHT);
        LocalDateTime lastMoment = reportingMonth.atEndOfMonth().atTime(LocalTime.MAX);
        // Construct the month's query by filling in the billing_events.sql template
        return SqlTemplate.create(getQueryFromFile(InvoicingPipeline.class, "billing_events.sql"))
            .put("FIRST_TIMESTAMP_OF_MONTH", firstMoment.format(TIMESTAMP_FORMATTER))
            .put("LAST_TIMESTAMP_OF_MONTH", lastMoment.format(TIMESTAMP_FORMATTER))
            .put("PROJECT_ID", projectId)
            .put("DATASTORE_EXPORT_DATA_SET", "latest_datastore_export")
            .put("ONETIME_TABLE", "OneTime")
            .put("REGISTRY_TABLE", "Registry")
            .put("REGISTRAR_TABLE", "Registrar")
            .put("CANCELLATION_TABLE", "Cancellation")
            .build();
      });
}
 
Example 3
Source File: TextToBigQueryStreaming.java    From DataflowTemplates with Apache License 2.0 6 votes vote down vote up
/**
 * Method to read a BigQuery schema file from GCS and return the file contents as a string.
 *
 * @param gcsPath Path string for the schema file in GCS.
 * @return File contents as a string.
 */
private static ValueProvider<String> getSchemaFromGCS(ValueProvider<String> gcsPath) {
  return NestedValueProvider.of(
      gcsPath,
      new SimpleFunction<String, String>() {
        @Override
        public String apply(String input) {
          ResourceId sourceResourceId = FileSystems.matchNewResource(input, false);

          String schema;
          try (ReadableByteChannel rbc = FileSystems.open(sourceResourceId)) {
            try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
              try (WritableByteChannel wbc = Channels.newChannel(baos)) {
                ByteStreams.copy(rbc, wbc);
                schema = baos.toString(Charsets.UTF_8.name());
                LOG.info("Extracted schema: " + schema);
              }
            }
          } catch (IOException e) {
            LOG.error("Error extracting schema: " + e.getMessage());
            throw new RuntimeException(e);
          }
          return schema;
        }
      });
}
 
Example 4
Source File: BigQueryHelpers.java    From beam with Apache License 2.0 5 votes vote down vote up
@Nullable
static ValueProvider<String> displayTableRefProto(
    @Nullable ValueProvider<TableReferenceProto.TableReference> table) {
  if (table == null) {
    return null;
  }

  return NestedValueProvider.of(table, new TableRefProtoToTableSpec());
}
 
Example 5
Source File: BigQueryHelpers.java    From beam with Apache License 2.0 5 votes vote down vote up
/** Return a displayable string representation for a {@link TableReference}. */
@Nullable
static ValueProvider<String> displayTable(@Nullable ValueProvider<TableReference> table) {
  if (table == null) {
    return null;
  }
  return NestedValueProvider.of(table, new TableRefToTableSpec());
}
 
Example 6
Source File: ValueProviderTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testNestedValueProviderCached() throws Exception {
  AtomicInteger increment = new AtomicInteger();
  ValueProvider<Integer> nvp =
      NestedValueProvider.of(
          StaticValueProvider.of(increment), new IncrementAtomicIntegerTranslator());
  Integer originalValue = nvp.get();
  Integer cachedValue = nvp.get();
  Integer incrementValue = increment.incrementAndGet();
  Integer secondCachedValue = nvp.get();
  assertEquals(originalValue, cachedValue);
  assertEquals(secondCachedValue, cachedValue);
  assertNotEquals(originalValue, incrementValue);
}
 
Example 7
Source File: ValueProviderTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testNestedValueProviderRuntime() throws Exception {
  TestOptions options = PipelineOptionsFactory.as(TestOptions.class);
  ValueProvider<String> rvp = options.getBar();
  ValueProvider<String> nvp = NestedValueProvider.of(rvp, from -> from + "bar");
  ValueProvider<String> doubleNvp = NestedValueProvider.of(nvp, from -> from);
  assertEquals("bar", ((NestedValueProvider) nvp).propertyName());
  assertEquals("bar", ((NestedValueProvider) doubleNvp).propertyName());
  assertFalse(nvp.isAccessible());
  expectedException.expect(RuntimeException.class);
  expectedException.expectMessage("Value only available at runtime");
  nvp.get();
}
 
Example 8
Source File: ValueProviderTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testNestedValueProviderStatic() throws Exception {
  SerializableFunction<String, String> function = from -> from + "bar";
  ValueProvider<String> svp = StaticValueProvider.of("foo");
  ValueProvider<String> nvp = NestedValueProvider.of(svp, function);
  assertTrue(nvp.isAccessible());
  assertEquals("foobar", nvp.get());
  assertEquals("foobar", nvp.toString());
  assertEquals(nvp, NestedValueProvider.of(svp, function));
}
 
Example 9
Source File: BigQueryIO.java    From beam with Apache License 2.0 5 votes vote down vote up
/** Returns the table reference, or {@code null}. */
@Nullable
public ValueProvider<TableReference> getTable() {
  return getJsonTableRef() == null
      ? null
      : NestedValueProvider.of(getJsonTableRef(), new JsonTableRefToTableRef());
}
 
Example 10
Source File: FileBasedSink.java    From beam with Apache License 2.0 5 votes vote down vote up
/** Construct a {@link FileBasedSink} with the given temp directory and output channel type. */
@Experimental(Kind.FILESYSTEM)
public FileBasedSink(
    ValueProvider<ResourceId> tempDirectoryProvider,
    DynamicDestinations<?, DestinationT, OutputT> dynamicDestinations,
    WritableByteChannelFactory writableByteChannelFactory) {
  this.tempDirectoryProvider =
      NestedValueProvider.of(tempDirectoryProvider, new ExtractDirectory());
  this.dynamicDestinations = checkNotNull(dynamicDestinations);
  this.writableByteChannelFactory = writableByteChannelFactory;
}
 
Example 11
Source File: ImportTransform.java    From DataflowTemplates with Apache License 2.0 5 votes vote down vote up
@Override
public PCollection<Export> expand(PBegin input) {
  NestedValueProvider<String, String> manifestFile =
      NestedValueProvider.of(importDirectory, s -> GcsUtil.joinPath(s, "spanner-export.json"));
  return input
      .apply("Read manifest", FileIO.match().filepattern(manifestFile))
      .apply(
          "Resource id",
          MapElements.into(TypeDescriptor.of(ResourceId.class))
              .via((MatchResult.Metadata::resourceId)))
      .apply(
          "Read manifest json",
          MapElements.into(TypeDescriptor.of(Export.class))
              .via(ReadExportManifestFile::readManifest));
}
 
Example 12
Source File: PubsubIO.java    From beam with Apache License 2.0 5 votes vote down vote up
@Override
public PCollection<T> expand(PBegin input) {
  if (getTopicProvider() == null && getSubscriptionProvider() == null) {
    throw new IllegalStateException(
        "Need to set either the topic or the subscription for " + "a PubsubIO.Read transform");
  }
  if (getTopicProvider() != null && getSubscriptionProvider() != null) {
    throw new IllegalStateException(
        "Can't set both the topic and the subscription for " + "a PubsubIO.Read transform");
  }

  @Nullable
  ValueProvider<TopicPath> topicPath =
      getTopicProvider() == null
          ? null
          : NestedValueProvider.of(getTopicProvider(), new TopicPathTranslator());
  @Nullable
  ValueProvider<SubscriptionPath> subscriptionPath =
      getSubscriptionProvider() == null
          ? null
          : NestedValueProvider.of(getSubscriptionProvider(), new SubscriptionPathTranslator());
  PubsubUnboundedSource source =
      new PubsubUnboundedSource(
          getClock(),
          getPubsubClientFactory(),
          null /* always get project from runtime PipelineOptions */,
          topicPath,
          subscriptionPath,
          getTimestampAttribute(),
          getIdAttribute(),
          getNeedsAttributes(),
          getNeedsMessageId());
  PCollection<T> read =
      input.apply(source).apply(MapElements.into(new TypeDescriptor<T>() {}).via(getParseFn()));
  return read.setCoder(getCoder());
}
 
Example 13
Source File: Write.java    From gcp-ingestion with Mozilla Public License 2.0 5 votes vote down vote up
/** Public constructor. */
public AvroOutput(ValueProvider<String> outputPrefix, Duration windowDuration,
    ValueProvider<Integer> numShards, Compression compression, InputType inputType,
    ValueProvider<String> schemasLocation) {
  this.outputPrefix = outputPrefix;
  this.windowDuration = windowDuration;
  this.numShards = numShards;
  this.compression = compression;
  this.inputType = inputType;
  this.schemasLocation = schemasLocation;
  this.pathTemplate = NestedValueProvider.of(outputPrefix, DynamicPathTemplate::new);
}
 
Example 14
Source File: Write.java    From gcp-ingestion with Mozilla Public License 2.0 5 votes vote down vote up
@Override
public WithFailures.Result<PDone, PubsubMessage> expand(PCollection<PubsubMessage> input) {
  ValueProvider<DynamicPathTemplate> pathTemplate = NestedValueProvider.of(outputPrefix,
      DynamicPathTemplate::new);
  ValueProvider<String> staticPrefix = NestedValueProvider.of(pathTemplate,
      value -> value.staticPrefix);

  FileIO.Write<List<String>, PubsubMessage> write = FileIO
      .<List<String>, PubsubMessage>writeDynamic()
      // We can't pass the attribute map to by() directly since MapCoder isn't
      // deterministic;
      // instead, we extract an ordered list of the needed placeholder values.
      // That list is later available to withNaming() to determine output location.
      .by(message -> pathTemplate.get()
          .extractValuesFrom(DerivedAttributesMap.of(message.getAttributeMap())))
      .withDestinationCoder(ListCoder.of(StringUtf8Coder.of())) //
      .withCompression(compression) //
      .via(Contextful.fn(format::encodeSingleMessage), TextIO.sink()) //
      .to(staticPrefix) //
      .withNaming(placeholderValues -> NoColonFileNaming.defaultNaming(
          pathTemplate.get().replaceDynamicPart(placeholderValues), format.suffix()));

  if (inputType == InputType.pubsub) {
    // Passing a ValueProvider to withNumShards disables runner-determined sharding, so we
    // need to be careful to pass this only for streaming input (where runner-determined
    // sharding is not an option).
    write = write.withNumShards(numShards);
  }

  input //
      .apply(Window.<PubsubMessage>into(FixedWindows.of(windowDuration))
          // We allow lateness up to the maximum Cloud Pub/Sub retention of 7 days documented in
          // https://cloud.google.com/pubsub/docs/subscriber
          .withAllowedLateness(Duration.standardDays(7)) //
          .discardingFiredPanes())
      .apply(write);
  return WithFailures.Result.of(PDone.in(input.getPipeline()),
      EmptyErrors.in(input.getPipeline()));
}
 
Example 15
Source File: BigQueryIO.java    From beam with Apache License 2.0 5 votes vote down vote up
/** See {@link Read#getTableProvider()}. */
@Nullable
public ValueProvider<TableReference> getTableProvider() {
  return getJsonTableRef() == null
      ? null
      : NestedValueProvider.of(getJsonTableRef(), new JsonTableRefToTableRef());
}
 
Example 16
Source File: LocalResources.java    From beam with Apache License 2.0 4 votes vote down vote up
public static ValueProvider<ResourceId> fromString(
    ValueProvider<String> resourceProvider, final boolean isDirectory) {
  return NestedValueProvider.of(resourceProvider, input -> fromString(input, isDirectory));
}
 
Example 17
Source File: DynamicDestinationsHelpers.java    From beam with Apache License 2.0 4 votes vote down vote up
static <T> ConstantTableDestinations<T> fromJsonTableRef(
    ValueProvider<String> jsonTableRef, String tableDescription) {
  return new ConstantTableDestinations<>(
      NestedValueProvider.of(jsonTableRef, new JsonTableRefToTableSpec()), tableDescription);
}
 
Example 18
Source File: FileBasedSink.java    From beam with Apache License 2.0 2 votes vote down vote up
/**
 * Constructs a WriteOperation using the default strategy for generating a temporary directory
 * from the base output filename.
 *
 * <p>Default is a uniquely named subdirectory of the provided tempDirectory, e.g. if
 * tempDirectory is /path/to/foo/, the temporary directory will be
 * /path/to/foo/.temp-beam-$uuid.
 *
 * @param sink the FileBasedSink that will be used to configure this write operation.
 */
public WriteOperation(FileBasedSink<?, DestinationT, OutputT> sink) {
  this(
      sink,
      NestedValueProvider.of(sink.getTempDirectoryProvider(), new TemporaryDirectoryBuilder()));
}
 
Example 19
Source File: Time.java    From gcp-ingestion with Mozilla Public License 2.0 2 votes vote down vote up
/**
 * Like {@link #parseSeconds(String)}, but using a {@link ValueProvider}.
 *
 * <p>The value will be parsed once when first used; the result is cached and reused.
 */
public static ValueProvider<Long> parseSeconds(ValueProvider<String> value) {
  return NestedValueProvider.of(value, Time::parseSeconds);
}
 
Example 20
Source File: Time.java    From gcp-ingestion with Mozilla Public License 2.0 2 votes vote down vote up
/**
 * Like {@link #parseDuration(String)}, but using a {@link ValueProvider}.
 *
 * <p>The value will be parsed once when first used; the result is cached and reused.
 */
public static ValueProvider<org.joda.time.Duration> parseDuration(ValueProvider<String> value) {
  return NestedValueProvider.of(value, Time::parseDuration);
}