org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter Java Examples

The following examples show how to use org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: HadoopReadOptions.java    From parquet-mr with Apache License 2.0 6 votes vote down vote up
private HadoopReadOptions(boolean useSignedStringMinMax,
                          boolean useStatsFilter,
                          boolean useDictionaryFilter,
                          boolean useRecordFilter,
                          boolean useColumnIndexFilter,
                          boolean usePageChecksumVerification,
                          boolean useBloomFilter,
                          FilterCompat.Filter recordFilter,
                          MetadataFilter metadataFilter,
                          CompressionCodecFactory codecFactory,
                          ByteBufferAllocator allocator,
                          int maxAllocationSize,
                          Map<String, String> properties,
                          Configuration conf) {
  super(
      useSignedStringMinMax, useStatsFilter, useDictionaryFilter, useRecordFilter, useColumnIndexFilter,
      usePageChecksumVerification, useBloomFilter, recordFilter, metadataFilter, codecFactory, allocator,
      maxAllocationSize, properties
  );
  this.conf = conf;
}
 
Example #2
Source File: SingletonParquetFooterCache.java    From dremio-oss with Apache License 2.0 5 votes vote down vote up
/**
 * An updated footer reader that tries to read the entire footer without knowing the length.
 * This should reduce the amount of seek/read roundtrips in most workloads.
 * @param fs
 * @param status
 * @return
 * @throws IOException
 */
public static ParquetMetadata readFooter(
  final FileSystem fs,
  final FileAttributes attributes,
  ParquetMetadataConverter.MetadataFilter filter,
  long maxFooterLen) throws IOException {
  try(BulkInputStream file = BulkInputStream.wrap(Streams.wrap(fs.open(attributes.getPath())))) {
    return readFooter(file, attributes.getPath().toString(), attributes.size(), filter, fs, maxFooterLen);
  }
}
 
Example #3
Source File: ParquetFileReader.java    From parquet-mr with Apache License 2.0 5 votes vote down vote up
/**
 * Reads the meta data block in the footer of the file using provided input stream
 * @param file a {@link InputFile} to read
 * @param filter the filter to apply to row groups
 * @return the metadata blocks in the footer
 * @throws IOException if an error occurs while reading the file
 * @deprecated will be removed in 2.0.0;
 *             use {@link ParquetFileReader#open(InputFile, ParquetReadOptions)}
 */
@Deprecated
public static final ParquetMetadata readFooter(InputFile file, MetadataFilter filter) throws IOException {
  ParquetReadOptions options;
  if (file instanceof HadoopInputFile) {
    options = HadoopReadOptions.builder(((HadoopInputFile) file).getConfiguration())
        .withMetadataFilter(filter).build();
  } else {
    options = ParquetReadOptions.builder().withMetadataFilter(filter).build();
  }

  try (SeekableInputStream in = file.newStream()) {
    return readFooter(file, options, in);
  }
}
 
Example #4
Source File: SingletonParquetFooterCache.java    From dremio-oss with Apache License 2.0 4 votes vote down vote up
public static ParquetMetadata readFooter(final FileSystem fs, final Path file, ParquetMetadataConverter.MetadataFilter filter,
                                         long maxFooterLen) throws IOException  {
  return readFooter(fs, fs.getFileAttributes(file), filter, maxFooterLen);
}
 
Example #5
Source File: ParquetFileReader.java    From parquet-mr with Apache License 2.0 4 votes vote down vote up
private static MetadataFilter filter(boolean skipRowGroups) {
  return skipRowGroups ? SKIP_ROW_GROUPS : NO_FILTER;
}
 
Example #6
Source File: ParquetReader.java    From reef with Apache License 2.0 3 votes vote down vote up
/**
 * Retrieve avro schema from parquet file.
 * @param configuration Hadoop configuration.
 * @param filter Filter for Avro metadata.
 * @return avro schema from parquet file.
 * @throws IOException if the Avro schema couldn't be parsed from the parquet file.
 */
private Schema createAvroSchema(final Configuration configuration, final MetadataFilter filter) throws IOException {
  final ParquetMetadata footer = ParquetFileReader.readFooter(configuration, parquetFilePath, filter);
  final AvroSchemaConverter converter = new AvroSchemaConverter();
  final MessageType schema = footer.getFileMetaData().getSchema();
  return converter.convert(schema);
}
 
Example #7
Source File: ParquetFileReader.java    From parquet-mr with Apache License 2.0 2 votes vote down vote up
/**
 * Reads the meta data in the footer of the file.
 * Skipping row groups (or not) based on the provided filter
 * @param configuration a configuration
 * @param file the Parquet File
 * @param filter the filter to apply to row groups
 * @return the metadata with row groups filtered.
 * @throws IOException  if an error occurs while reading the file
 * @deprecated will be removed in 2.0.0;
 *             use {@link ParquetFileReader#open(InputFile, ParquetReadOptions)}
 */
public static ParquetMetadata readFooter(Configuration configuration, Path file, MetadataFilter filter) throws IOException {
  return readFooter(HadoopInputFile.fromPath(file, configuration), filter);
}
 
Example #8
Source File: ParquetFileReader.java    From parquet-mr with Apache License 2.0 2 votes vote down vote up
/**
 * Reads the meta data block in the footer of the file
 * @param configuration a configuration
 * @param file the parquet File
 * @param filter the filter to apply to row groups
 * @return the metadata blocks in the footer
 * @throws IOException if an error occurs while reading the file
 * @deprecated will be removed in 2.0.0;
 *             use {@link ParquetFileReader#open(InputFile, ParquetReadOptions)}
 */
@Deprecated
public static final ParquetMetadata readFooter(Configuration configuration, FileStatus file, MetadataFilter filter) throws IOException {
  return readFooter(HadoopInputFile.fromStatus(file, configuration), filter);
}
 
Example #9
Source File: ParquetFileReader.java    From parquet-mr with Apache License 2.0 2 votes vote down vote up
/**
 * @param conf a configuration
 * @param file a file path to open
 * @param filter a metadata filter
 * @return a parquet file reader
 * @throws IOException if there is an error while opening the file
 * @deprecated will be removed in 2.0.0; use {@link #open(InputFile,ParquetReadOptions)}
 */
@Deprecated
public static ParquetFileReader open(Configuration conf, Path file, MetadataFilter filter) throws IOException {
  return open(HadoopInputFile.fromPath(file, conf),
      HadoopReadOptions.builder(conf).withMetadataFilter(filter).build());
}
 
Example #10
Source File: ParquetFileReader.java    From parquet-mr with Apache License 2.0 2 votes vote down vote up
/**
 * @param conf the Hadoop Configuration
 * @param file Path to a parquet file
 * @param filter a {@link MetadataFilter} for selecting row groups
 * @throws IOException if the file can not be opened
 * @deprecated will be removed in 2.0.0.
 */
@Deprecated
public ParquetFileReader(Configuration conf, Path file, MetadataFilter filter) throws IOException {
  this(HadoopInputFile.fromPath(file, conf),
      HadoopReadOptions.builder(conf).withMetadataFilter(filter).build());
}