org.apache.flink.api.java.Utils Java Examples

The following examples show how to use org.apache.flink.api.java.Utils. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: DataStream.java    From Flink-CEPplus with Apache License 2.0 7 votes vote down vote up
/**
 * Applies the given {@link ProcessFunction} on the input stream, thereby
 * creating a transformed output stream.
 *
 * <p>The function will be called for every element in the input streams and can produce zero
 * or more output elements.
 *
 * @param processFunction The {@link ProcessFunction} that is called for each element
 *                      in the stream.
 *
 * @param <R> The type of elements emitted by the {@code ProcessFunction}.
 *
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T, R> processFunction) {

	TypeInformation<R> outType = TypeExtractor.getUnaryOperatorReturnType(
		processFunction,
		ProcessFunction.class,
		0,
		1,
		TypeExtractor.NO_INDEX,
		getType(),
		Utils.getCallLocationName(),
		true);

	return process(processFunction, outType);
}
 
Example #2
Source File: BroadcastConnectedStream.java    From Flink-CEPplus with Apache License 2.0 6 votes vote down vote up
/**
 * Assumes as inputs a {@link BroadcastStream} and a {@link KeyedStream} and applies the given
 * {@link KeyedBroadcastProcessFunction} on them, thereby creating a transformed output stream.
 *
 * @param function The {@link KeyedBroadcastProcessFunction} that is called for each element in the stream.
 * @param <KS> The type of the keys in the keyed stream.
 * @param <OUT> The type of the output elements.
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <KS, OUT> SingleOutputStreamOperator<OUT> process(final KeyedBroadcastProcessFunction<KS, IN1, IN2, OUT> function) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
			function,
			KeyedBroadcastProcessFunction.class,
			1,
			2,
			3,
			TypeExtractor.NO_INDEX,
			getType1(),
			getType2(),
			Utils.getCallLocationName(),
			true);

	return process(function, outTypeInfo);
}
 
Example #3
Source File: ConnectedStreams.java    From Flink-CEPplus with Apache License 2.0 6 votes vote down vote up
/**
 * Applies a CoFlatMap transformation on a {@link ConnectedStreams} and
 * maps the output to a common type. The transformation calls a
 * {@link CoFlatMapFunction#flatMap1} for each element of the first input
 * and {@link CoFlatMapFunction#flatMap2} for each element of the second
 * input. Each CoFlatMapFunction call returns any number of elements
 * including none.
 *
 * @param coFlatMapper
 *            The CoFlatMapFunction used to jointly transform the two input
 *            DataStreams
 * @return The transformed {@link DataStream}
 */
public <R> SingleOutputStreamOperator<R> flatMap(
		CoFlatMapFunction<IN1, IN2, R> coFlatMapper) {

	TypeInformation<R> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
		coFlatMapper,
		CoFlatMapFunction.class,
		0,
		1,
		2,
		TypeExtractor.NO_INDEX,
		getType1(),
		getType2(),
		Utils.getCallLocationName(),
		true);

	return transform("Co-Flat Map", outTypeInfo, new CoStreamFlatMap<>(inputStream1.clean(coFlatMapper)));
}
 
Example #4
Source File: ConnectedStreams.java    From Flink-CEPplus with Apache License 2.0 6 votes vote down vote up
/**
 * Applies the given {@link CoProcessFunction} on the connected input streams,
 * thereby creating a transformed output stream.
 *
 * <p>The function will be called for every element in the input streams and can produce zero or
 * more output elements. Contrary to the {@link #flatMap(CoFlatMapFunction)} function, this
 * function can also query the time and set timers. When reacting to the firing of set timers
 * the function can directly emit elements and/or register yet more timers.
 *
 * @param coProcessFunction The {@link CoProcessFunction} that is called for each element
 *                      in the stream.
 *
 * @param <R> The type of elements emitted by the {@code CoProcessFunction}.
 *
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <R> SingleOutputStreamOperator<R> process(
		CoProcessFunction<IN1, IN2, R> coProcessFunction) {

	TypeInformation<R> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
		coProcessFunction,
		CoProcessFunction.class,
		0,
		1,
		2,
		TypeExtractor.NO_INDEX,
		getType1(),
		getType2(),
		Utils.getCallLocationName(),
		true);

	return process(coProcessFunction, outTypeInfo);
}
 
Example #5
Source File: CsvReader.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Configures the reader to read the CSV data and parse it to the given type. The type must be a subclass of
 * {@link Tuple}. The type information for the fields is obtained from the type class. The type
 * consequently needs to specify all generic field types of the tuple.
 *
 * @param targetType The class of the target type, needs to be a subclass of Tuple.
 * @return The DataSet representing the parsed CSV data.
 */
public <T extends Tuple> DataSource<T> tupleType(Class<T> targetType) {
	Preconditions.checkNotNull(targetType, "The target type class must not be null.");
	if (!Tuple.class.isAssignableFrom(targetType)) {
		throw new IllegalArgumentException("The target type must be a subclass of " + Tuple.class.getName());
	}

	@SuppressWarnings("unchecked")
	TupleTypeInfo<T> typeInfo = (TupleTypeInfo<T>) TypeExtractor.createTypeInfo(targetType);
	CsvInputFormat<T> inputFormat = new TupleCsvInputFormat<T>(path, this.lineDelimiter, this.fieldDelimiter, typeInfo, this.includedMask);

	Class<?>[] classes = new Class<?>[typeInfo.getArity()];
	for (int i = 0; i < typeInfo.getArity(); i++) {
		classes[i] = typeInfo.getTypeAt(i).getTypeClass();
	}

	configureInputFormat(inputFormat);
	return new DataSource<T>(executionContext, inputFormat, typeInfo, Utils.getCallLocationName());
}
 
Example #6
Source File: BroadcastConnectedStream.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Assumes as inputs a {@link BroadcastStream} and a {@link KeyedStream} and applies the given
 * {@link KeyedBroadcastProcessFunction} on them, thereby creating a transformed output stream.
 *
 * @param function The {@link KeyedBroadcastProcessFunction} that is called for each element in the stream.
 * @param <KS> The type of the keys in the keyed stream.
 * @param <OUT> The type of the output elements.
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <KS, OUT> SingleOutputStreamOperator<OUT> process(final KeyedBroadcastProcessFunction<KS, IN1, IN2, OUT> function) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
			function,
			KeyedBroadcastProcessFunction.class,
			1,
			2,
			3,
			TypeExtractor.NO_INDEX,
			getType1(),
			getType2(),
			Utils.getCallLocationName(),
			true);

	return process(function, outTypeInfo);
}
 
Example #7
Source File: JoinOperatorSetsBase.java    From flink with Apache License 2.0 6 votes vote down vote up
protected DefaultJoin<I1, I2> createDefaultJoin(Keys<I2> keys2) {
	if (keys2 == null) {
		throw new NullPointerException("The join keys may not be null.");
	}

	if (keys2.isEmpty()) {
		throw new InvalidProgramException("The join keys may not be empty.");
	}

	try {
		keys1.areCompatible(keys2);
	} catch (Keys.IncompatibleKeysException e) {
		throw new InvalidProgramException("The pair of join keys are not compatible with each other.", e);
	}
	return new DefaultJoin<>(input1, input2, keys1, keys2, joinHint, Utils.getCallLocationName(4), joinType);
}
 
Example #8
Source File: ConnectedStreams.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Applies the given {@link CoProcessFunction} on the connected input streams,
 * thereby creating a transformed output stream.
 *
 * <p>The function will be called for every element in the input streams and can produce zero or
 * more output elements. Contrary to the {@link #flatMap(CoFlatMapFunction)} function, this
 * function can also query the time and set timers. When reacting to the firing of set timers
 * the function can directly emit elements and/or register yet more timers.
 *
 * @param coProcessFunction The {@link CoProcessFunction} that is called for each element
 *                      in the stream.
 *
 * @param <R> The type of elements emitted by the {@code CoProcessFunction}.
 *
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <R> SingleOutputStreamOperator<R> process(
		CoProcessFunction<IN1, IN2, R> coProcessFunction) {

	TypeInformation<R> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
		coProcessFunction,
		CoProcessFunction.class,
		0,
		1,
		2,
		TypeExtractor.NO_INDEX,
		getType1(),
		getType2(),
		Utils.getCallLocationName(),
		true);

	return process(coProcessFunction, outTypeInfo);
}
 
Example #9
Source File: ConnectedStreams.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Applies a CoMap transformation on a {@link ConnectedStreams} and maps
 * the output to a common type. The transformation calls a
 * {@link CoMapFunction#map1} for each element of the first input and
 * {@link CoMapFunction#map2} for each element of the second input. Each
 * CoMapFunction call returns exactly one element.
 *
 * @param coMapper The CoMapFunction used to jointly transform the two input DataStreams
 * @return The transformed {@link DataStream}
 */
public <R> SingleOutputStreamOperator<R> map(CoMapFunction<IN1, IN2, R> coMapper) {

	TypeInformation<R> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
		coMapper,
		CoMapFunction.class,
		0,
		1,
		2,
		TypeExtractor.NO_INDEX,
		getType1(),
		getType2(),
		Utils.getCallLocationName(),
		true);

	return map(coMapper, outTypeInfo);
}
 
Example #10
Source File: KeyedStream.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Applies the given {@link ProcessFunction} on the input stream, thereby creating a transformed output stream.
 *
 * <p>The function will be called for every element in the input streams and can produce zero
 * or more output elements. Contrary to the {@link DataStream#flatMap(FlatMapFunction)}
 * function, this function can also query the time and set timers. When reacting to the firing
 * of set timers the function can directly emit elements and/or register yet more timers.
 *
 * @param processFunction The {@link ProcessFunction} that is called for each element
 *                      in the stream.
 *
 * @param <R> The type of elements emitted by the {@code ProcessFunction}.
 *
 * @return The transformed {@link DataStream}.
 *
 * @deprecated Use {@link KeyedStream#process(KeyedProcessFunction)}
 */
@Deprecated
@Override
@PublicEvolving
public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T, R> processFunction) {

	TypeInformation<R> outType = TypeExtractor.getUnaryOperatorReturnType(
		processFunction,
		ProcessFunction.class,
		0,
		1,
		TypeExtractor.NO_INDEX,
		getType(),
		Utils.getCallLocationName(),
		true);

	return process(processFunction, outType);
}
 
Example #11
Source File: DataStream.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Applies the given {@link ProcessFunction} on the input stream, thereby
 * creating a transformed output stream.
 *
 * <p>The function will be called for every element in the input streams and can produce zero
 * or more output elements.
 *
 * @param processFunction The {@link ProcessFunction} that is called for each element
 *                      in the stream.
 *
 * @param <R> The type of elements emitted by the {@code ProcessFunction}.
 *
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T, R> processFunction) {

	TypeInformation<R> outType = TypeExtractor.getUnaryOperatorReturnType(
		processFunction,
		ProcessFunction.class,
		0,
		1,
		TypeExtractor.NO_INDEX,
		getType(),
		Utils.getCallLocationName(),
		true);

	return process(processFunction, outType);
}
 
Example #12
Source File: BroadcastConnectedStream.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Assumes as inputs a {@link BroadcastStream} and a {@link KeyedStream} and applies the given
 * {@link KeyedBroadcastProcessFunction} on them, thereby creating a transformed output stream.
 *
 * @param function The {@link KeyedBroadcastProcessFunction} that is called for each element in the stream.
 * @param <KS> The type of the keys in the keyed stream.
 * @param <OUT> The type of the output elements.
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <KS, OUT> SingleOutputStreamOperator<OUT> process(final KeyedBroadcastProcessFunction<KS, IN1, IN2, OUT> function) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
			function,
			KeyedBroadcastProcessFunction.class,
			1,
			2,
			3,
			TypeExtractor.NO_INDEX,
			getType1(),
			getType2(),
			Utils.getCallLocationName(),
			true);

	return process(function, outTypeInfo);
}
 
Example #13
Source File: BroadcastConnectedStream.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Assumes as inputs a {@link BroadcastStream} and a non-keyed {@link DataStream} and applies the given
 * {@link BroadcastProcessFunction} on them, thereby creating a transformed output stream.
 *
 * @param function The {@link BroadcastProcessFunction} that is called for each element in the stream.
 * @param <OUT> The type of the output elements.
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <OUT> SingleOutputStreamOperator<OUT> process(final BroadcastProcessFunction<IN1, IN2, OUT> function) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
			function,
			BroadcastProcessFunction.class,
			0,
			1,
			2,
			TypeExtractor.NO_INDEX,
			getType1(),
			getType2(),
			Utils.getCallLocationName(),
			true);

	return process(function, outTypeInfo);
}
 
Example #14
Source File: KeyedStream.java    From Flink-CEPplus with Apache License 2.0 6 votes vote down vote up
/**
 * Applies the given {@link KeyedProcessFunction} on the input stream, thereby creating a transformed output stream.
 *
 * <p>The function will be called for every element in the input streams and can produce zero
 * or more output elements. Contrary to the {@link DataStream#flatMap(FlatMapFunction)}
 * function, this function can also query the time and set timers. When reacting to the firing
 * of set timers the function can directly emit elements and/or register yet more timers.
 *
 * @param keyedProcessFunction The {@link KeyedProcessFunction} that is called for each element in the stream.
 *
 * @param <R> The type of elements emitted by the {@code KeyedProcessFunction}.
 *
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <R> SingleOutputStreamOperator<R> process(KeyedProcessFunction<KEY, T, R> keyedProcessFunction) {

	TypeInformation<R> outType = TypeExtractor.getUnaryOperatorReturnType(
			keyedProcessFunction,
			KeyedProcessFunction.class,
			1,
			2,
			TypeExtractor.NO_INDEX,
			getType(),
			Utils.getCallLocationName(),
			true);

	return process(keyedProcessFunction, outType);
}
 
Example #15
Source File: BroadcastConnectedStream.java    From Flink-CEPplus with Apache License 2.0 6 votes vote down vote up
/**
 * Assumes as inputs a {@link BroadcastStream} and a non-keyed {@link DataStream} and applies the given
 * {@link BroadcastProcessFunction} on them, thereby creating a transformed output stream.
 *
 * @param function The {@link BroadcastProcessFunction} that is called for each element in the stream.
 * @param <OUT> The type of the output elements.
 * @return The transformed {@link DataStream}.
 */
@PublicEvolving
public <OUT> SingleOutputStreamOperator<OUT> process(final BroadcastProcessFunction<IN1, IN2, OUT> function) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
			function,
			BroadcastProcessFunction.class,
			0,
			1,
			2,
			TypeExtractor.NO_INDEX,
			getType1(),
			getType2(),
			Utils.getCallLocationName(),
			true);

	return process(function, outTypeInfo);
}
 
Example #16
Source File: CsvReader.java    From flink with Apache License 2.0 6 votes vote down vote up
/**
 * Configures the reader to read the CSV data and parse it to the given type. The type must be a subclass of
 * {@link Tuple}. The type information for the fields is obtained from the type class. The type
 * consequently needs to specify all generic field types of the tuple.
 *
 * @param targetType The class of the target type, needs to be a subclass of Tuple.
 * @return The DataSet representing the parsed CSV data.
 */
public <T extends Tuple> DataSource<T> tupleType(Class<T> targetType) {
	Preconditions.checkNotNull(targetType, "The target type class must not be null.");
	if (!Tuple.class.isAssignableFrom(targetType)) {
		throw new IllegalArgumentException("The target type must be a subclass of " + Tuple.class.getName());
	}

	@SuppressWarnings("unchecked")
	TupleTypeInfo<T> typeInfo = (TupleTypeInfo<T>) TypeExtractor.createTypeInfo(targetType);
	CsvInputFormat<T> inputFormat = new TupleCsvInputFormat<T>(path, this.lineDelimiter, this.fieldDelimiter, typeInfo, this.includedMask);

	Class<?>[] classes = new Class<?>[typeInfo.getArity()];
	for (int i = 0; i < typeInfo.getArity(); i++) {
		classes[i] = typeInfo.getTypeAt(i).getTypeClass();
	}

	configureInputFormat(inputFormat);
	return new DataSource<T>(executionContext, inputFormat, typeInfo, Utils.getCallLocationName());
}
 
Example #17
Source File: WindowedStream.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * Applies the given fold function to each window. The window function is called for each
 * evaluation of the window for each key individually. The output of the reduce function is
 * interpreted as a regular non-windowed stream.
 *
 * @param function The fold function.
 * @return The data stream that is the result of applying the fold function to the window.
 *
 * @deprecated use {@link #aggregate(AggregationFunction)} instead
 */
@Deprecated
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) {
	if (function instanceof RichFunction) {
		throw new UnsupportedOperationException("FoldFunction can not be a RichFunction. " +
			"Please use fold(FoldFunction, WindowFunction) instead.");
	}

	TypeInformation<R> resultType = TypeExtractor.getFoldReturnTypes(function, input.getType(),
			Utils.getCallLocationName(), true);

	return fold(initialValue, function, resultType);
}
 
Example #18
Source File: WindowedStream.java    From Flink-CEPplus with Apache License 2.0 5 votes vote down vote up
/**
 * Applies the given fold function to each window. The window function is called for each
 * evaluation of the window for each key individually. The output of the reduce function is
 * interpreted as a regular non-windowed stream.
 *
 * @param function The fold function.
 * @return The data stream that is the result of applying the fold function to the window.
 *
 * @deprecated use {@link #aggregate(AggregationFunction)} instead
 */
@Deprecated
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) {
	if (function instanceof RichFunction) {
		throw new UnsupportedOperationException("FoldFunction can not be a RichFunction. " +
			"Please use fold(FoldFunction, WindowFunction) instead.");
	}

	TypeInformation<R> resultType = TypeExtractor.getFoldReturnTypes(function, input.getType(),
			Utils.getCallLocationName(), true);

	return fold(initialValue, function, resultType);
}
 
Example #19
Source File: DataSetUtilsITCase.java    From flink with Apache License 2.0 5 votes vote down vote up
@Test
public void testIntegerDataSetChecksumHashCode() throws Exception {
	final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

	DataSet<Integer> ds = CollectionDataSets.getIntegerDataSet(env);

	Utils.ChecksumHashCode checksum = DataSetUtils.checksumHashCode(ds);
	Assert.assertEquals(checksum.getCount(), 15);
	Assert.assertEquals(checksum.getChecksum(), 55);
}
 
Example #20
Source File: JoinOperator.java    From Flink-CEPplus with Apache License 2.0 5 votes vote down vote up
public <R> EquiJoin<I1, I2, R> with(JoinFunction<I1, I2, R> function) {
	if (function == null) {
		throw new NullPointerException("Join function must not be null.");
	}
	FlatJoinFunction<I1, I2, R> generatedFunction = new WrappingFlatJoinFunction<>(clean(function));
	TypeInformation<R> returnType = TypeExtractor.getJoinReturnTypes(function, getInput1Type(), getInput2Type(), Utils.getCallLocationName(), true);
	return new EquiJoin<>(getInput1(), getInput2(), getKeys1(), getKeys2(), generatedFunction, function, returnType, getJoinHint(), Utils.getCallLocationName(), joinType);
}
 
Example #21
Source File: WindowedStream.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * Applies the given window function to each window. The window function is called for each
 * evaluation of the window for each key individually. The output of the window function is
 * interpreted as a regular non-windowed stream.
 *
 * <p>Arriving data is incrementally aggregated using the given fold function.
 *
 * @param initialValue The initial value of the fold.
 * @param foldFunction The fold function that is used for incremental aggregation.
 * @param windowFunction The window function.
 * @return The data stream that is the result of applying the window function to the window.
 *
 * @deprecated use {@link #aggregate(AggregateFunction, WindowFunction)} instead
 */
@PublicEvolving
@Deprecated
public <R, ACC> SingleOutputStreamOperator<R> fold(ACC initialValue, FoldFunction<T, ACC> foldFunction, ProcessWindowFunction<ACC, R, K, W> windowFunction) {
	if (foldFunction instanceof RichFunction) {
		throw new UnsupportedOperationException("FoldFunction can not be a RichFunction.");
	}

	TypeInformation<ACC> foldResultType = TypeExtractor.getFoldReturnTypes(foldFunction, input.getType(),
			Utils.getCallLocationName(), true);

	TypeInformation<R> windowResultType = getProcessWindowFunctionReturnType(windowFunction, foldResultType, Utils.getCallLocationName());

	return fold(initialValue, foldFunction, windowFunction, foldResultType, windowResultType);
}
 
Example #22
Source File: CoGroupOperator.java    From Flink-CEPplus with Apache License 2.0 5 votes vote down vote up
/**
 * Finalizes a CoGroup transformation by applying a {@link org.apache.flink.api.common.functions.RichCoGroupFunction} to groups of elements with identical keys.
 *
 * <p>Each CoGroupFunction call returns an arbitrary number of keys.
 *
 * @param function The CoGroupFunction that is called for all groups of elements with identical keys.
 * @return An CoGroupOperator that represents the co-grouped result DataSet.
 *
 * @see org.apache.flink.api.common.functions.RichCoGroupFunction
 * @see DataSet
 */
public <R> CoGroupOperator<I1, I2, R> with(CoGroupFunction<I1, I2, R> function) {
	if (function == null) {
		throw new NullPointerException("CoGroup function must not be null.");
	}
	TypeInformation<R> returnType = TypeExtractor.getCoGroupReturnTypes(function, input1.getType(), input2.getType(),
			Utils.getCallLocationName(), true);

	return new CoGroupOperator<>(input1, input2, keys1, keys2, input1.clean(function), returnType,
			groupSortKeyOrderFirst, groupSortKeyOrderSecond,
			customPartitioner, Utils.getCallLocationName());
}
 
Example #23
Source File: DataSetUtils.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * Generate a sample of DataSet which contains fixed size elements.
 *
 * <p><strong>NOTE:</strong> Sample with fixed size is not as efficient as sample with fraction, use sample with
 * fraction unless you need exact precision.
 *
 * @param withReplacement Whether element can be selected more than once.
 * @param numSamples       The expected sample size.
 * @param seed            Random number generator seed.
 * @return The sampled DataSet
 */
public static <T> DataSet<T> sampleWithSize(
	DataSet <T> input,
	final boolean withReplacement,
	final int numSamples,
	final long seed) {

	SampleInPartition<T> sampleInPartition = new SampleInPartition<>(withReplacement, numSamples, seed);
	MapPartitionOperator mapPartitionOperator = input.mapPartition(sampleInPartition);

	// There is no previous group, so the parallelism of GroupReduceOperator is always 1.
	String callLocation = Utils.getCallLocationName();
	SampleInCoordinator<T> sampleInCoordinator = new SampleInCoordinator<>(withReplacement, numSamples, seed);
	return new GroupReduceOperator<>(mapPartitionOperator, input.getType(), sampleInCoordinator, callLocation);
}
 
Example #24
Source File: AsyncWaitOperatorTest.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * This helper function is needed to check that the temporary fix for FLINK-13063 can be backwards compatible with
 * the old chaining behavior by setting the ChainingStrategy manually. TODO: remove after a proper fix for
 * FLINK-13063 is in place that allows chaining.
 */
private <IN, OUT> SingleOutputStreamOperator<OUT> addAsyncOperatorLegacyChained(
	DataStream<IN> in,
	AsyncFunction<IN, OUT> func,
	long timeout,
	int bufSize,
	AsyncDataStream.OutputMode mode) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getUnaryOperatorReturnType(
		func,
		AsyncFunction.class,
		0,
		1,
		new int[]{1, 0},
		in.getType(),
		Utils.getCallLocationName(),
		true);

	// create transform
	AsyncWaitOperatorFactory<IN, OUT> factory = new AsyncWaitOperatorFactory<>(
		in.getExecutionEnvironment().clean(func),
		timeout,
		bufSize,
		mode);

	factory.setChainingStrategy(ChainingStrategy.ALWAYS);

	return in.transform("async wait operator", outTypeInfo, factory);
}
 
Example #25
Source File: AllWindowedStream.java    From Flink-CEPplus with Apache License 2.0 5 votes vote down vote up
/**
 * Applies the given fold function to each window. The window function is called for each
 * evaluation of the window for each key individually. The output of the reduce function is
 * interpreted as a regular non-windowed stream.
 *
 * @param function The fold function.
 * @return The data stream that is the result of applying the fold function to the window.
 *
 * @deprecated use {@link #aggregate(AggregateFunction)} instead
 */
@Deprecated
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) {
	if (function instanceof RichFunction) {
		throw new UnsupportedOperationException("FoldFunction of fold can not be a RichFunction. " +
				"Please use fold(FoldFunction, WindowFunction) instead.");
	}

	TypeInformation<R> resultType = TypeExtractor.getFoldReturnTypes(function, input.getType(),
			Utils.getCallLocationName(), true);

	return fold(initialValue, function, resultType);
}
 
Example #26
Source File: DataSetUtilsITCase.java    From flink with Apache License 2.0 5 votes vote down vote up
@Test
public void testIntegerDataSetChecksumHashCode() throws Exception {
	final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

	DataSet<Integer> ds = CollectionDataSets.getIntegerDataSet(env);

	Utils.ChecksumHashCode checksum = DataSetUtils.checksumHashCode(ds);
	Assert.assertEquals(checksum.getCount(), 15);
	Assert.assertEquals(checksum.getChecksum(), 55);
}
 
Example #27
Source File: AllWindowedStream.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * Applies the given fold function to each window. The window function is called for each
 * evaluation of the window for each key individually. The output of the reduce function is
 * interpreted as a regular non-windowed stream.
 *
 * @param function The fold function.
 * @return The data stream that is the result of applying the fold function to the window.
 *
 * @deprecated use {@link #aggregate(AggregateFunction)} instead
 */
@Deprecated
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) {
	if (function instanceof RichFunction) {
		throw new UnsupportedOperationException("FoldFunction of fold can not be a RichFunction. " +
				"Please use fold(FoldFunction, WindowFunction) instead.");
	}

	TypeInformation<R> resultType = TypeExtractor.getFoldReturnTypes(function, input.getType(),
			Utils.getCallLocationName(), true);

	return fold(initialValue, function, resultType);
}
 
Example #28
Source File: AsyncDataStream.java    From Flink-CEPplus with Apache License 2.0 5 votes vote down vote up
/**
 * Add an AsyncWaitOperator.
 *
 * @param in The {@link DataStream} where the {@link AsyncWaitOperator} will be added.
 * @param func {@link AsyncFunction} wrapped inside {@link AsyncWaitOperator}.
 * @param timeout for the asynchronous operation to complete
 * @param bufSize The max number of inputs the {@link AsyncWaitOperator} can hold inside.
 * @param mode Processing mode for {@link AsyncWaitOperator}.
 * @param <IN> Input type.
 * @param <OUT> Output type.
 * @return A new {@link SingleOutputStreamOperator}
 */
private static <IN, OUT> SingleOutputStreamOperator<OUT> addOperator(
		DataStream<IN> in,
		AsyncFunction<IN, OUT> func,
		long timeout,
		int bufSize,
		OutputMode mode) {

	TypeInformation<OUT> outTypeInfo = TypeExtractor.getUnaryOperatorReturnType(
		func,
		AsyncFunction.class,
		0,
		1,
		new int[]{1, 0},
		in.getType(),
		Utils.getCallLocationName(),
		true);

	// create transform
	AsyncWaitOperator<IN, OUT> operator = new AsyncWaitOperator<>(
		in.getExecutionEnvironment().clean(func),
		timeout,
		bufSize,
		mode);

	return in.transform("async wait operator", outTypeInfo, operator);
}
 
Example #29
Source File: AllWindowedStream.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * Applies the given fold function to each window. The window function is called for each
 * evaluation of the window for each key individually. The output of the reduce function is
 * interpreted as a regular non-windowed stream.
 *
 * @param function The fold function.
 * @return The data stream that is the result of applying the fold function to the window.
 *
 * @deprecated use {@link #aggregate(AggregateFunction)} instead
 */
@Deprecated
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) {
	if (function instanceof RichFunction) {
		throw new UnsupportedOperationException("FoldFunction of fold can not be a RichFunction. " +
				"Please use fold(FoldFunction, WindowFunction) instead.");
	}

	TypeInformation<R> resultType = TypeExtractor.getFoldReturnTypes(function, input.getType(),
			Utils.getCallLocationName(), true);

	return fold(initialValue, function, resultType);
}
 
Example #30
Source File: UnsortedGrouping.java    From flink with Apache License 2.0 5 votes vote down vote up
/**
 * Applies a special case of a reduce transformation (maxBy) on a grouped {@link DataSet}.
 *
 * <p>The transformation consecutively calls a {@link ReduceFunction}
 * until only a single element remains which is the result of the transformation.
 * A ReduceFunction combines two elements into one new element of the same type.
 *
 * @param fields Keys taken into account for finding the minimum.
 * @return A {@link ReduceOperator} representing the minimum.
 */
@SuppressWarnings({ "unchecked", "rawtypes" })
public ReduceOperator<T> maxBy(int... fields)  {

	// Check for using a tuple
	if (!this.inputDataSet.getType().isTupleType() || !(this.inputDataSet.getType() instanceof TupleTypeInfo)) {
		throw new InvalidProgramException("Method maxBy(int) only works on tuples.");
	}

	return new ReduceOperator<T>(this, new SelectByMaxFunction(
			(TupleTypeInfo) this.inputDataSet.getType(), fields), Utils.getCallLocationName());
}