Processing Data in Memory
The Stream API is probably the second most important feature added to Java SE 8, after the lambda expressions. In a nutshell, the Stream API is about providing an implementation of the well known map-filter-reduce algorithm to the JDK.
map-filter-reduce:
代码语言:javascript复制List<Sale> sales = ...; // this is the list of all the sales
int amountSoldInMarch = 0;
for (Sale sale: sales) {
if (sale.getDate().getMonth() == Month.MARCH) {
amountSoldInMarch = sale.getAmount();
}
}
System.out.println("Amount sold in March: " amountSoldInMarch);
map:通过get取值,将部分字段映射到新数据(select字段)
filter:根据if判断过滤部分数据(where条件)
reduce:聚合,求和(聚合函数)
简而言之,相当于写一段SQL:
代码语言:javascript复制select sum(amount)
from Sales
where extract(month from date) = 3;
看看是如何从原始代码转换为Stream API的:
代码语言:javascript复制List<City> cities = ...;
int sum = 0;
for (City city: cities) {
int population = city.getPopulation();
if (population > 100_000) {
sum = population;
}
}
System.out.println("Sum = " sum);
假设Collection有这几个方法:
代码语言:javascript复制int sum = cities.map(city -> city.getPopulation())
.filter(population -> population > 100_000)
.sum();
为什么Collection不提供这些方法呢?拆分为每一步:
代码语言:javascript复制Collection<Integer> populations = cities.map(city -> city.getPopulation());
Collection<Integer> filteredPopulations = populations.filter(population -> population > 100_000);
int sum = filteredPopulations.sum();
假如有1000个city,那么中间数据也是Collection,就会产生很多冗余的中间数据。而for循环却不存在这个问题,因为它不会存储中间数据。虽然Collection提供方法能让代码看起来更好理解,但却会导致大量的冗余数据。所以不得不设计一套Stream API来支持map-filter-reduce。
代码语言:javascript复制Stream<City> streamOfCities = cities.stream();
Stream<Integer> populations = streamOfCities.map(city -> city.getPopulation());
Stream<Integer> filteredPopulations = populations.filter(population -> population > 100_000);
int sum = filteredPopulations.sum(); // in fact this code does not compile; we'll fix it later
The streams created in this code, streamOfCities
, populations
and filteredPopulations
must all be empty objects.
It leads to a very important property of streams:
A stream is an object that does not store any data.
Using streams is about creating pipelines of operations. A pipeline is made of a series of method calls on a stream. Each call produces another stream. Then at some point, a last call produces a result.
Adding Intermediate Operations
collect()
Stream本身不会存储数据,通过collect存储为List:
代码语言:javascript复制List<String> strings = List.of("one", "two", "three", "four");
Function<String, Integer> toLength = String::length;
Stream<Integer> ints = strings.stream()
.map(toLength);
代码语言:javascript复制List<String> strings = List.of("one", "two", "three", "four");
List<Integer> lengths = strings.stream()
.map(String::length)
.collect(Collectors.toList());
System.out.println("lengths = " lengths);
代码语言:javascript复制lengths = [3, 3, 5, 4]
一些方法
-
distinct()
-
sorted()
-
skip()
-
limit()
contact()
连接流
代码语言:javascript复制List<Integer> list0 = List.of(1, 2, 3);
List<Integer> list1 = List.of(4, 5, 6);
List<Integer> list2 = List.of(7, 8, 9);
// 1st pattern: concat
List<Integer> concat =
Stream.concat(list0.stream(), list1.stream())
.collect(Collectors.toList());
// 2nd pattern: flatMap
List<Integer> flatMap =
Stream.of(list0.stream(), list1.stream(), list2.stream())
.flatMap(Function.identity())
.collect(Collectors.toList());
System.out.println("concat = " concat);
System.out.println("flatMap = " flatMap);
代码语言:javascript复制concat = [1, 2, 3, 4, 5, 6]
flatMap = [1, 2, 3, 4, 5, 6, 7, 8, 9]
连接流,推荐使用flatMap()
With the flatmap pattern, you just create a single stream to hold all your streams and do the flatmap. The overhead is much lower.
concat produces a SIZED
stream, whereas flatmap does not.
Creating Streams
前面我们看到Collection的stream()方法可以创建流,此外还有很多其他方式创建流:
- a vararg argument;
- a supplier;
- a unary operator, that generates the next element from the previous one;
- a builder;
- the characters of a string;
- the lines of a text file;
- the elements created by splitting a string of characters with a regular expressions;
- a random variable, that can create a stream of random numbers.
Iterator<String> iterator = ...;
long estimateSize = 10L;
int characteristics = 0;
Spliterator<String> spliterator = Spliterators.spliterator(strings.iterator(), estimateSize, characteristics);
boolean parallel = false;
Stream<String> stream = StreamSupport.stream(spliterator, parallel);
空流:
代码语言:javascript复制Stream<String> empty = Stream.empty();
List<String> strings = empty.collect(Collectors.toList());
System.out.println("strings = " strings);
Creating a Stream from a Vararg or an Array
代码语言:javascript复制Stream<Integer> intStream = Stream.of(1, 2, 3);
List<Integer> ints = intStream.collect(Collectors.toList());
System.out.println("ints = " ints);
代码语言:javascript复制String[] stringArray = {"one", "two", "three"};
Stream<String> stringStream = Arrays.stream(stringArray);
List<String> strings = stringStream.collect(Collectors.toList());
System.out.println("strings = " strings);
Creating a Stream from a Supplier
代码语言:javascript复制Stream<String> generated = Stream.generate(() -> " ");
List<String> strings =
generated
.limit(10L)
.collect(Collectors.toList());
System.out.println("strings = " strings);
Creating a Stream from a UnaryOperator and a Seed
代码语言:javascript复制Stream<String> iterated = Stream.iterate(" ", s -> s " ");
iterated.limit(5L).forEach(System.out::println);
Creating a Stream from a Range of Numbers
代码语言:javascript复制String[] letters = {"A", "B", "C", "D"};
List<String> listLetters =
IntStream.range(0, 10)
.mapToObj(index -> letters[index % letters.length])
.collect(Collectors.toList());
System.out.println("listLetters = " listLetters);
Creating a Stream of Random Numbers
代码语言:javascript复制Random random = new Random(314L);
List<Integer> randomInts =
random.ints(10, 1, 5)
.boxed()
.collect(Collectors.toList());
System.out.println("randomInts = " randomInts);
Creating a Stream from the Characters of a String
Java SE 10
代码语言:javascript复制String sentence = "Hello Duke";
List<String> letters =
sentence.chars()
.mapToObj(codePoint -> (char)codePoint)
.map(Object::toString)
.collect(Collectors.toList());
System.out.println("letters = " letters);
Creating a Stream from the Lines of a Text File
代码语言:javascript复制Path log = Path.of("/tmp/debug.log"); // adjust to fit your installation
try (Stream<String> lines = Files.lines(log)) {
long warnings =
lines.filter(line -> line.contains("WARNING"))
.count();
System.out.println("Number of warnings = " warnings);
} catch (IOException e) {
// do something with the exception
}
Creating a Stream from a Regular Expression
代码语言:javascript复制String sentence = "For there is good news yet to hear and fine things to be seen";
Pattern pattern = Pattern.compile(" ");
Stream<String> stream = pattern.splitAsStream(sentence);
List<String> words = stream.collect(Collectors.toList());
System.out.println("words = " words);
Creating a Stream with the Builder Pattern
代码语言:javascript复制Stream.Builder<String> builder = Stream.<String>builder();
builder.add("one")
.add("two")
.add("three")
.add("four");
Stream<String> stream = builder.build();
List<String> list = stream.collect(Collectors.toList());
System.out.println("list = " list);
Creating a Stream on an HTTP Source
代码语言:javascript复制// The URI of the file
URI uri = URI.create("https://www.gutenberg.org/files/98/98-0.txt");
// The code to open create an HTTP request
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder(uri).build();
// The sending of the request
HttpResponse<Stream<String>> response = client.send(request, HttpResponse.BodyHandlers.ofLines());
List<String> lines;
try (Stream<String> stream = response.body()) {
lines = stream
.dropWhile(line -> !line.equals("A TALE OF TWO CITIES"))
.takeWhile(line -> !line.equals("*** END OF THE PROJECT GUTENBERG EBOOK A TALE OF TWO CITIES ***"))
.collect(Collectors.toList());
}
System.out.println("# lines = " lines.size());
Reducing a Stream
Compute a reduction by just providing a binary operator that operates on only two elements. This is how the reduce()
method works in the Stream API.
Stream<Integer> ints = Stream.of(0, 0, 0, 0);
int sum = ints.reduce(10, (a, b) -> a b);
System.out.println("sum = " sum);
Adding a Terminal Operation
In fact, you should use this reduce()
method as a last resort, only if you have no other solution.
要想reduce stream,还有其他更多方法,比如count()、sum()等。
count()
Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten");
long count =
strings.stream()
.filter(s -> s.length() == 3)
.count();
System.out.println("count = " count);
forEach()
Stream<String> strings = Stream.of("one", "two", "three", "four");
strings.filter(s -> s.length() == 3)
.map(String::toUpperCase)
.forEach(System.out::println);
collect()
代码语言:javascript复制Stream<String> strings = Stream.of("one", "two", "three", "four");
List<String> result =
strings.filter(s -> s.length() == 3)
.map(String::toUpperCase)
.collect(Collectors.toList());
max() min()
代码语言:javascript复制Stream<String> strings = Stream.of("one", "two", "three", "four");
String longest =
strings.max(Comparator.comparing(String::length))
.orElseThrow();
System.out.println("longest = " longest);
findFirst() findAny()
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten");
String first =
strings.stream()
// .unordered()
// .parallel()
.filter(s -> s.length() == 3)
.findFirst()
.orElseThrow();
System.out.println("first = " first);
allMatch() anyMatch() noneMatch()
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten");
boolean noBlank =
strings.stream()
.allMatch(Predicate.not(String::isBlank));
boolean oneGT3 =
strings.stream()
.anyMatch(s -> s.length() == 3);
boolean allLT10 =
strings.stream()
.noneMatch(s -> s.length() > 10);
System.out.println("noBlank = " noBlank);
System.out.println("oneGT3 = " oneGT3);
System.out.println("allLT10 = " allLT10);
Finding the Characteristics
ORDERED | The order in which the elements of the stream are processed matters. |
---|---|
DISTINCT | There are no doubles in the elements processed by that stream. |
NONNULL | There are no null elements in that stream. |
SORTED | The elements of that stream are sorted. |
SIZED | The number of elements this stream processes is known. |
SUBSIZED | Splitting this stream produces two SIZED streams. |
Collection<String> stringCollection = List.of("one", "two", "two", "three", "four", "five");
Stream<String> strings = stringCollection.stream().sorted();
Stream<String> filteredStrings = strings.filtered(s -> s.length() < 5);
Stream<Integer> lengths = filteredStrings.map(String::length);
代码语言:javascript复制Collection<String> stringCollection = List.of("one", "two", "two", "three", "four", "five");
Stream<String> strings = stringCollection.stream().distinct();
Stream<String> filteredStrings = strings.filtered(s -> s.length() < 5);
Stream<Integer> lengths = filteredStrings.map(String::length);
Using a Collector
代码语言:javascript复制List<Integer> numbers =
IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
System.out.println("numbers = " numbers);
代码语言:javascript复制Set<Integer> evenNumbers =
IntStream.range(0, 10)
.map(number -> number / 2)
.boxed()
.collect(Collectors.toSet());
System.out.println("evenNumbers = " evenNumbers);
代码语言:javascript复制LinkedList<Integer> linkedList =
IntStream.range(0, 10)
.boxed()
.collect(Collectors.toCollection(LinkedList::new));
System.out.println("linked listS = " linkedList);
couting
代码语言:javascript复制Collection<String> strings = List.of("one", "two", "three");
long count = strings.stream().count();
long countWithACollector = strings.stream().collect(Collectors.counting());
System.out.println("count = " count);
System.out.println("countWithACollector = " countWithACollector);
joining
代码语言:javascript复制String joined =
IntStream.range(0, 10)
.boxed()
.map(Object::toString)
.collect(Collectors.joining(", "));
System.out.println("joined = " joined);
partitioningBy
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve");
Map<Boolean, List<String>> map =
strings.stream()
.collect(Collectors.partitioningBy(s -> s.length() > 4));
map.forEach((key, value) -> System.out.println(key " :: " value));
groupingBy
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve");
Map<Integer, List<String>> map =
strings.stream()
.collect(Collectors.groupingBy(String::length));
map.forEach((key, value) -> System.out.println(key " :: " value));
groupingBy counting
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve");
Map<Integer, Long> map =
strings.stream()
.collect(
Collectors.groupingBy(
String::length,
Collectors.counting()));
map.forEach((key, value) -> System.out.println(key " :: " value));
代码语言:javascript复制3 :: 4
4 :: 3
5 :: 3
6 :: 2
groupingBy joining
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve");
Map<Integer, String> map =
strings.stream()
.collect(
Collectors.groupingBy(
String::length,
Collectors.joining(", ")));
map.forEach((key, value) -> System.out.println(key " :: " value));
代码语言:javascript复制3 :: one, two, six, ten
4 :: four, five, nine
5 :: three, seven, eight
6 :: eleven, twelve
toMap
代码语言:javascript复制Collection<String> strings =
List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve");
Map<Integer, String> map =
strings.stream()
.collect(
Collectors.toMap(
element -> element.length(),
element -> element,
(element1, element2) -> element1 ", " element2));
map.forEach((key, value) -> System.out.println(key " :: " value));
代码语言:javascript复制3 :: one, two, six, ten
4 :: four, five, nine
5 :: three, seven, eight
6 :: eleven, twelve
element -> element.length()
is the key mapper.element -> element
is the value mapper.(element1, element2) -> element1 ", " element2)
is the merge function, called with the two elements that have generated the same key.
Parallelizing Streams
代码语言:javascript复制int parallelSum =
IntStream.range(0, 10)
.parallel()
.sum();
参考资料: The Stream API https://dev.java/learn/api/streams/