org.apache.spark.partial.PartialResult Java Examples

The following examples show how to use org.apache.spark.partial.PartialResult. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: SparkJavaRDD.java    From incubator-nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<BoundedDouble> countApprox(final long timeout) {
  throw new UnsupportedOperationException(NOT_YET_SUPPORTED);
}
 
Example #2
Source File: SparkJavaRDD.java    From incubator-nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<BoundedDouble> countApprox(final long timeout, final double confidence) {
  throw new UnsupportedOperationException(NOT_YET_SUPPORTED);
}
 
Example #3
Source File: SparkJavaRDD.java    From incubator-nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<Map<T, BoundedDouble>> countByValueApprox(final long timeout) {
  throw new UnsupportedOperationException(NOT_YET_SUPPORTED);
}
 
Example #4
Source File: SparkJavaRDD.java    From incubator-nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<Map<T, BoundedDouble>> countByValueApprox(final long timeout, final double confidence) {
  throw new UnsupportedOperationException(NOT_YET_SUPPORTED);
}
 
Example #5
Source File: SparkJavaPairRDD.java    From incubator-nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<Map<K, BoundedDouble>> countByKeyApprox(final long timeout) {
  throw new UnsupportedOperationException(NOT_YET_SUPPORTED);
}
 
Example #6
Source File: SparkJavaPairRDD.java    From incubator-nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<Map<K, BoundedDouble>> countByKeyApprox(final long timeout,
                                                             final double confidence) {
  throw new UnsupportedOperationException(NOT_YET_SUPPORTED);
}
 
Example #7
Source File: SparkVerifier.java    From tablasco with Apache License 2.0 4 votes vote down vote up
/**
 * Compares two HDFS datasets and produces a detailed yet compact HTML break report
 * @param dataName the name to use in the output HTML
 * @param actualDataSupplier the actual data supplier
 * @param expectedDataSupplier the expected data supplier
 * @return a SparkResult containing pass/fail and the HTML report
 */
public SparkResult verify(String dataName, Supplier<DistributedTable> actualDataSupplier, Supplier<DistributedTable> expectedDataSupplier)
{
    DistributedTable actualDistributedTable = actualDataSupplier.get();
    if (!new HashSet<>(actualDistributedTable.getHeaders()).containsAll(this.groupKeyColumns)) {
        throw new IllegalArgumentException("Actual data does not contain all group key columns: " + this.groupKeyColumns);
    }
    DistributedTable expectedDistributedTable = expectedDataSupplier.get();
    if (!new HashSet<>(expectedDistributedTable.getHeaders()).containsAll(this.groupKeyColumns)) {
        throw new IllegalArgumentException("Expected data does not contain all group key columns: " + this.groupKeyColumns);
    }
    PartialResult<BoundedDouble> countApproxPartialResult = expectedDistributedTable.getRows().countApprox(TimeUnit.SECONDS.toMillis(5), 0.9);
    int maximumNumberOfGroups = getMaximumNumberOfGroups(countApproxPartialResult.getFinalValue(), maxGroupSize);
    LOGGER.info("Maximum number of groups : " + maximumNumberOfGroups);
    Set<String> groupKeyColumnSet = new LinkedHashSet<>(this.groupKeyColumns);
    JavaPairRDD<Integer, Iterable<List<Object>>> actualGroups = actualDistributedTable.getRows()
            .mapToPair(new GroupRowsFunction(actualDistributedTable.getHeaders(), groupKeyColumnSet, maximumNumberOfGroups))
            .groupByKey();
    JavaPairRDD<Integer, Iterable<List<Object>>> expectedGroups = expectedDistributedTable.getRows()
            .mapToPair(new GroupRowsFunction(expectedDistributedTable.getHeaders(), groupKeyColumnSet, maximumNumberOfGroups))
            .groupByKey();
    JavaPairRDD<Integer, Tuple2<Optional<Iterable<List<Object>>>, Optional<Iterable<List<Object>>>>> joinedRdd = actualGroups.fullOuterJoin(expectedGroups);
    VerifyGroupFunction verifyGroupFunction = new VerifyGroupFunction(
            groupKeyColumnSet,
            actualDistributedTable.getHeaders(),
            expectedDistributedTable.getHeaders(),
            this.ignoreSurplusColumns,
            this.columnComparatorsBuilder.build(),
            this.columnsToIgnore);
    SummaryResultTable summaryResultTable = joinedRdd.map(verifyGroupFunction).reduce(new SummaryResultTableReducer());
    HtmlOptions htmlOptions = new HtmlOptions(false, HtmlFormatter.DEFAULT_ROW_LIMIT, false, false, false, Collections.emptySet());
    HtmlFormatter htmlFormatter = new HtmlFormatter(null, htmlOptions);
    ByteArrayOutputStream bytes = new ByteArrayOutputStream();
    try
    {
        htmlFormatter.appendResults(dataName, Collections.singletonMap("Summary", summaryResultTable), metadata, 1, null, bytes);
        return new SparkResult(summaryResultTable.isSuccess(), new String(bytes.toByteArray(), StandardCharsets.UTF_8));
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}
 
Example #8
Source File: JavaRDD.java    From nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<BoundedDouble> countApprox(final long timeout)  {
  throw new UnsupportedOperationException("Operation not yet implemented.");
}
 
Example #9
Source File: JavaRDD.java    From nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<BoundedDouble> countApprox(final long timeout, final double confidence) {
  throw new UnsupportedOperationException("Operation not yet implemented.");
}
 
Example #10
Source File: JavaRDD.java    From nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<Map<T, BoundedDouble>> countByValueApprox(final long timeout) {
  throw new UnsupportedOperationException("Operation not yet implemented.");
}
 
Example #11
Source File: JavaRDD.java    From nemo with Apache License 2.0 4 votes vote down vote up
@Override
public PartialResult<Map<T, BoundedDouble>> countByValueApprox(final long timeout, final double confidence) {
  throw new UnsupportedOperationException("Operation not yet implemented.");
}