Java Code Examples for org.apache.beam.sdk.io.fs.ResourceId#resolve()

The following examples show how to use org.apache.beam.sdk.io.fs.ResourceId#resolve() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: BulkDecompressor.java    From DataflowTemplates with Apache License 2.0 5 votes vote down vote up
/**
 * Decompresses the inputFile using the specified compression and outputs to the main output of
 * the {@link Decompress} doFn. Files output to the destination will be first written as temp
 * files with a "temp-" prefix within the output directory. If a file fails decompression, the
 * filename and the associated error will be output to the dead-letter.
 *
 * @param inputFile The inputFile to decompress.
 * @return A {@link ResourceId} which points to the resulting file from the decompression.
 */
private ResourceId decompress(ResourceId inputFile) throws IOException {
  // Remove the compressed extension from the file. Example: demo.txt.gz -> demo.txt
  String outputFilename = Files.getNameWithoutExtension(inputFile.toString());

  // Resolve the necessary resources to perform the transfer.
  ResourceId outputDir = FileSystems.matchNewResource(destinationLocation.get(), true);
  ResourceId outputFile =
      outputDir.resolve(outputFilename, StandardResolveOptions.RESOLVE_FILE);
  ResourceId tempFile =
      outputDir.resolve(Files.getFileExtension(inputFile.toString())
          + "-temp-" + outputFilename, StandardResolveOptions.RESOLVE_FILE);

  // Resolve the compression
  Compression compression = Compression.detect(inputFile.toString());

  // Perform the copy of the decompressed channel into the destination.
  try (ReadableByteChannel readerChannel =
      compression.readDecompressed(FileSystems.open(inputFile))) {
    try (WritableByteChannel writerChannel = FileSystems.create(tempFile, MimeTypes.TEXT)) {
      ByteStreams.copy(readerChannel, writerChannel);
    }

    // Rename the temp file to the output file.
    FileSystems.rename(
        ImmutableList.of(tempFile),
        ImmutableList.of(outputFile),
        MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES);
  } catch (IOException e) {
    String msg = e.getMessage();

    LOG.error("Error occurred during decompression of {}", inputFile.toString(), e);
    throw new IOException(sanitizeDecompressionErrorMsg(msg, inputFile, compression));
  }

  return outputFile;
}
 
Example 2
Source File: BulkCompressor.java    From DataflowTemplates with Apache License 2.0 5 votes vote down vote up
@ProcessElement
public void processElement(ProcessContext context) {
  ResourceId inputFile = context.element().resourceId();
  Compression compression = compressionValue.get();

  // Add the compression extension to the output filename. Example: demo.txt -> demo.txt.gz
  String outputFilename = inputFile.getFilename() + compression.getSuggestedSuffix();

  // Resolve the necessary resources to perform the transfer
  ResourceId outputDir = FileSystems.matchNewResource(destinationLocation.get(), true);
  ResourceId outputFile =
      outputDir.resolve(outputFilename, StandardResolveOptions.RESOLVE_FILE);
  ResourceId tempFile =
      outputDir.resolve("temp-" + outputFilename, StandardResolveOptions.RESOLVE_FILE);

  // Perform the copy of the compressed channel to the destination.
  try (ReadableByteChannel readerChannel = FileSystems.open(inputFile)) {
    try (WritableByteChannel writerChannel =
        compression.writeCompressed(FileSystems.create(tempFile, MimeTypes.BINARY))) {

      // Execute the copy to the temporary file
      ByteStreams.copy(readerChannel, writerChannel);
    }

    // Rename the temporary file to the output file
    FileSystems.rename(ImmutableList.of(tempFile), ImmutableList.of(outputFile));

    // Output the path to the uncompressed file
    context.output(outputFile.toString());
  } catch (IOException e) {
    LOG.error("Error occurred during compression of {}", inputFile.toString(), e);
    context.output(DEADLETTER_TAG, KV.of(inputFile.toString(), e.getMessage()));
  }
}
 
Example 3
Source File: GcsResourceIdTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testResolveInvalidNotDirectory() {
  ResourceId tmpDir =
      toResourceIdentifier("gs://my_bucket/")
          .resolve("tmp dir", StandardResolveOptions.RESOLVE_FILE);

  thrown.expect(IllegalStateException.class);
  thrown.expectMessage("Expected the gcsPath is a directory, but had [gs://my_bucket/tmp dir].");
  tmpDir.resolve("aa", StandardResolveOptions.RESOLVE_FILE);
}
 
Example 4
Source File: LocalResourceIdTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testResolveInvalidNotDirectory() {
  ResourceId tmp =
      toResourceIdentifier("/root/").resolve("tmp", StandardResolveOptions.RESOLVE_FILE);
  thrown.expect(IllegalStateException.class);
  thrown.expectMessage("Expected the path is a directory, but had [/root/tmp].");
  tmp.resolve("aa", StandardResolveOptions.RESOLVE_FILE);
}
 
Example 5
Source File: BigQueryIO.java    From beam with Apache License 2.0 5 votes vote down vote up
static List<ResourceId> getExtractFilePaths(String extractDestinationDir, Job extractJob)
    throws IOException {
  JobStatistics jobStats = extractJob.getStatistics();
  List<Long> counts = jobStats.getExtract().getDestinationUriFileCounts();
  if (counts.size() != 1) {
    String errorMessage =
        counts.isEmpty()
            ? "No destination uri file count received."
            : String.format(
                "More than one destination uri file count received. First two are %s, %s",
                counts.get(0), counts.get(1));
    throw new RuntimeException(errorMessage);
  }
  long filesCount = counts.get(0);

  ImmutableList.Builder<ResourceId> paths = ImmutableList.builder();
  ResourceId extractDestinationDirResourceId =
      FileSystems.matchNewResource(extractDestinationDir, true /* isDirectory */);
  for (long i = 0; i < filesCount; ++i) {
    ResourceId filePath =
        extractDestinationDirResourceId.resolve(
            String.format("%012d%s", i, ".avro"),
            ResolveOptions.StandardResolveOptions.RESOLVE_FILE);
    paths.add(filePath);
  }
  return paths.build();
}
 
Example 6
Source File: FhirIO.java    From beam with Apache License 2.0 5 votes vote down vote up
/**
 * Init batch.
 *
 * @throws IOException the io exception
 */
@StartBundle
public void initFile() throws IOException {
  // Write each bundle to newline delimited JSON file.
  String filename = String.format("fhirImportBatch-%s.ndjson", UUID.randomUUID().toString());
  ResourceId tempDir = FileSystems.matchNewResource(this.tempGcsPath.get(), true);
  this.resourceId = tempDir.resolve(filename, StandardResolveOptions.RESOLVE_FILE);
  this.ndJsonChannel = FileSystems.create(resourceId, "application/ld+json");
  if (mapper == null) {
    this.mapper = new ObjectMapper();
  }
}
 
Example 7
Source File: S3ResourceIdTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testResolve() {
  for (TestCase testCase : PATH_TEST_CASES) {
    ResourceId resourceId = S3ResourceId.fromUri(testCase.baseUri);
    ResourceId resolved = resourceId.resolve(testCase.relativePath, testCase.resolveOptions);
    assertEquals(testCase.expectedResult, resolved.toString());
  }

  // Tests for common gcs paths.
  assertEquals(
      S3ResourceId.fromUri("s3://bucket/tmp/aa"),
      S3ResourceId.fromUri("s3://bucket/tmp/").resolve("aa", RESOLVE_FILE));
  assertEquals(
      S3ResourceId.fromUri("s3://bucket/tmp/aa/bb/cc/"),
      S3ResourceId.fromUri("s3://bucket/tmp/")
          .resolve("aa", RESOLVE_DIRECTORY)
          .resolve("bb", RESOLVE_DIRECTORY)
          .resolve("cc", RESOLVE_DIRECTORY));

  // Tests absolute path.
  assertEquals(
      S3ResourceId.fromUri("s3://bucket/tmp/aa"),
      S3ResourceId.fromUri("s3://bucket/tmp/bb/").resolve("s3://bucket/tmp/aa", RESOLVE_FILE));

  // Tests bucket with no ending '/'.
  assertEquals(
      S3ResourceId.fromUri("s3://my-bucket/tmp"),
      S3ResourceId.fromUri("s3://my-bucket").resolve("tmp", RESOLVE_FILE));

  // Tests path with unicode
  assertEquals(
      S3ResourceId.fromUri("s3://bucket/输出 目录/输出 文件01.txt"),
      S3ResourceId.fromUri("s3://bucket/输出 目录/").resolve("输出 文件01.txt", RESOLVE_FILE));
}
 
Example 8
Source File: S3ResourceIdTest.java    From beam with Apache License 2.0 5 votes vote down vote up
@Test
public void testResolveInvalidNotDirectory() {
  ResourceId tmpDir = S3ResourceId.fromUri("s3://my_bucket/").resolve("tmp dir", RESOLVE_FILE);

  thrown.expect(IllegalStateException.class);
  thrown.expectMessage(
      "Expected this resource to be a directory, but was [s3://my_bucket/tmp dir]");
  tmpDir.resolve("aa", RESOLVE_FILE);
}
 
Example 9
Source File: FileBasedSink.java    From beam with Apache License 2.0 4 votes vote down vote up
/** Constructs a temporary file resource given the temporary directory and a filename. */
@Experimental(Kind.FILESYSTEM)
protected static ResourceId buildTemporaryFilename(ResourceId tempDirectory, String filename)
    throws IOException {
  return tempDirectory.resolve(filename, StandardResolveOptions.RESOLVE_FILE);
}
 
Example 10
Source File: S3ResourceIdTest.java    From beam with Apache License 2.0 4 votes vote down vote up
@Test
public void testS3ResolveWithFileBase() {
  ResourceId resourceId = S3ResourceId.fromUri("s3://bucket/path/to/file");
  thrown.expect(IllegalStateException.class);
  resourceId.resolve("child-path", RESOLVE_DIRECTORY); // resource is not a directory
}
 
Example 11
Source File: S3ResourceIdTest.java    From beam with Apache License 2.0 4 votes vote down vote up
@Test
public void testResolveParentToFile() {
  ResourceId resourceId = S3ResourceId.fromUri("s3://bucket/path/to/dir/");
  thrown.expect(IllegalArgumentException.class);
  resourceId.resolve("..", RESOLVE_FILE); // '..' only resolves as dir, not as file
}