Return a Stream vs. Return a Collection

Many Java API classes have methods that return a Collection. Since Java 8, methods can also return a Stream. Since the Stream is more flexible and efficient in many cases, should we return a Steam or Collection for our API methods?

A Motivational Example

To return a collection, the items of the collection are created first, and then put to memory. So there are two costs: computation and memory allocation.

Consider the following two methods from the java.nio.file.Files class:

static List<String>    readAllLines(Path path)
static Stream<String>  lines(Path path)

To return a list of strings, the readAllLines() method has to read to the end of the file first and then store the entire file contents in memory for the result. In contrast, the lines() method can return lines immediately right after it starts reading the file. The cost is greatly reduced by taking advantage of the Stream’s laziness-seeking behavior. For example, if a caller method only need to search something once, the program do not even need to read to the end of the file.

boolean result = Files.lines(path).anyMatch(x -> x.charAt(0) == 'z');

Why Stream Should be Preferred in Many Cases?

The reasons that Stream should be preferred in many cases (not all) lies in the behavior of Stream. They are summarized below:

  1. If the result might be infinite, Stream should be used.
  2. If the result might be very large, you probably prefer Stream, since materializing the collection has significant heap requirement.
  3. If the caller only iterates through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there’s no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.
  4. The Collection you choose to hold results may not be the form the caller wants, then the caller has to copy it anyway. If a stream is returned, the caller can do collect(toCollection(factory)) and get what they want.

The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.

In summary, Stream is the right choice in most cases, it doesn’t impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.

This question is originally post by FredOverflow on Stack Overflow. This post is reorganized based on the two popular answers provided by Brian Goetz and Stuart Marks. This post is under license CC-BY-SA.

Leave a Comment