What is Stream?

Stream vs. Collection

The definition of stream in Java 8 is “a sequence of elements from a source that supports aggregate operations.” Streams consume from a source such as collections, arrays, or I/O resources. Streams support the common operations from functional programing languages, such as map, filter, reduce, find, sorted, etc.

A Motivational Example

It is easy to chain multiple methods which accept lambda expression as parameter. A complex computation before Java 8 can be written in much less lines.

Here is a very good motivational example from Java doc[1]:

int sum = components.stream()
                      .filter(c -> c.getColor() == RED)
                      .mapToInt(c -> c.getWeight())
                      .sum();

In this example we use components, a Collection, as a source for the stream, and then perform a filter-map-reduce operation on the stream to obtain the sum of the weights of the red widgets.

Stream vs. Collection

Streams may seem similar to Collections, but they are very different concepts. When you’re working with large datasets, you may not want to process the entire dataset as a collection. There is a very good metaphor when comparing the difference between Collection and Stream [5]. Consider a movie stored on a DVD vs streamed over the internet. A movie can be viewed as a set of bytes. The DVD can be viewed as a Collection of bytes because it contains the whole data elements. When watching the movie over the internet, the movie can be viewed as a Stream of bytes. The streaming video player only needs to have a few bytes in advance of where the user is watching. In this way, the video player can start displaying the movie from the beginning of the Stream before most of the data in a the stream has even been computed.

The difference between Stream and Collection can be itemized as follows:

  1. A stream provides an interface to a sequenced set of values of a specific element type. However, unlike collections, streams don’t actually store elements. The elements are computed on demand. Streams can be viewed as lazily constructed Collections, whose values are computed when they are needed.
  2. Stream operations don’t change its source. Instead, they return new streams that store the result.
  3. Possibly unbounded. Collections have a finite size, but streams don’t. Short-circuiting operations such as limit(n) or findFirst() can allow computations on infinite streams to complete in finite time.
  4. Consumable. During the lifetime of a stream, the elements of the stream are only visited once. If you want to revisit the same element in the stream, you will need to regenerate a new stream based on the source.

Intermediate Operations vs. Terminal Operations

Intermediate operations transforms one stream to another, while terminal operations produce a result. When a terminal operation is executed on a stream, the stream can no longer be used.

For example, in the code below:

list.stream().filter(s -> s.length() > 2).count();

filter() is an intermediate operation, and count() is a terminal operation. When count() is called, we can not use the stream any more.

Frequently used intermediate operations include: filter, map, distinct, sorted, skip, limit, flatMap

Frequently used terminal operations include:

  1. forEach, toArray, collect, reduce, count, min, max
  2. findFirst, findAny, anyMatch, allMatch, noneMatch

References:
1. Java Doc of Stream
2. Monad in functional programming
3. Processing Data with Java SE 8 Streams, Part 1
4. java.util.stream package
5. Java 8 Lambdas in Action, by Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft

Leave a Comment