代码语言:javascript复制
首先看下两个简短的代码:
代码1:
@Test
public void stringTest(){
List<String> list = new ArrayList<>();
list.add(new String("hello"));
list.add("world");
list.add("hello");
list.stream().distinct().forEach(System.out::println);
}
输出结果:
hello
world
代码2:
代码语言:javascript复制@Test
public void testClass(){
List<A> list = new ArrayList<>();
list.add(new A("hello"));
list.add(new A("world"));
list.add(new A("hello"));
list.stream().distinct().forEach(t->System.out.println(t.getName()));
}
代码语言:javascript复制@Getter
public class A{
private String name;
public A(String name){
this.name = name;
}
}
如果类A没有重写equals以及hashcode方法时,输出结果如下:
hello
world
hello
如果类A定义如下:
代码语言:javascript复制@Getter
public class A{
private String name;
public A(String name){
this.name = name;
}
@Override
public int hashCode() {
return name.hashCode();
}
@Override
public boolean equals(Object obj) {
if(obj = this){
return true;
}
if(obj == null || !(obj instanceof A)){
return false;
}
return this.name.equals(((A) obj).getName());
}
}
输出结果如下:
hello
world
具体为什么会有上述结果呢,让我们来看下DistinctOps源码:
代码语言:javascript复制static <T> ReferencePipeline<T, T> makeRef(AbstractPipeline<?, T, ?> upstream) {
return new ReferencePipeline.StatefulOp<T, T>(upstream, StreamShape.REFERENCE,
StreamOpFlag.IS_DISTINCT | StreamOpFlag.NOT_SIZED) {
<P_IN> Node<T> reduce(PipelineHelper<T> helper, Spliterator<P_IN> spliterator) {
// If the stream is SORTED then it should also be ORDERED so the following will also
// preserve the sort order
TerminalOp<T, LinkedHashSet<T>> reduceOp
= ReduceOps.<T, LinkedHashSet<T>>makeRef(LinkedHashSet::new, LinkedHashSet::add,
LinkedHashSet::addAll);
return Nodes.node(reduceOp.evaluateParallel(helper, spliterator));
}
@Override
<P_IN> Node<T> opEvaluateParallel(PipelineHelper<T> helper,
Spliterator<P_IN> spliterator,
IntFunction<T[]> generator) {
if (StreamOpFlag.DISTINCT.isKnown(helper.getStreamAndOpFlags())) {
// No-op
return helper.evaluate(spliterator, false, generator);
}
else if (StreamOpFlag.ORDERED.isKnown(helper.getStreamAndOpFlags())) {
return reduce(helper, spliterator);
}
else {
// Holder of null state since ConcurrentHashMap does not support null values
AtomicBoolean seenNull = new AtomicBoolean(false);
ConcurrentHashMap<T, Boolean> map = new ConcurrentHashMap<>();
TerminalOp<T, Void> forEachOp = ForEachOps.makeRef(t -> {
if (t == null)
seenNull.set(true);
else
map.putIfAbsent(t, Boolean.TRUE);
}, false);
forEachOp.evaluateParallel(helper, spliterator);
// If null has been seen then copy the key set into a HashSet that supports null values
// and add null
Set<T> keys = map.keySet();
if (seenNull.get()) {
// TODO Implement a more efficient set-union view, rather than copying
keys = new HashSet<>(keys);
keys.add(null);
}
return Nodes.node(keys);
}
}
@Override
<P_IN> Spliterator<T> opEvaluateParallelLazy(PipelineHelper<T> helper, Spliterator<P_IN> spliterator) {
if (StreamOpFlag.DISTINCT.isKnown(helper.getStreamAndOpFlags())) {
// No-op
return helper.wrapSpliterator(spliterator);
}
else if (StreamOpFlag.ORDERED.isKnown(helper.getStreamAndOpFlags())) {
// Not lazy, barrier required to preserve order
return reduce(helper, spliterator).spliterator();
}
else {
// Lazy
return new StreamSpliterators.DistinctSpliterator<>(helper.wrapSpliterator(spliterator));
}
}
@Override
Sink<T> opWrapSink(int flags, Sink<T> sink) {
Objects.requireNonNull(sink);
if (StreamOpFlag.DISTINCT.isKnown(flags)) {
return sink;
} else if (StreamOpFlag.SORTED.isKnown(flags)) {
return new Sink.ChainedReference<T, T>(sink) {
boolean seenNull;
T lastSeen;
@Override
public void begin(long size) {
seenNull = false;
lastSeen = null;
downstream.begin(-1);
}
@Override
public void end() {
seenNull = false;
lastSeen = null;
downstream.end();
}
@Override
public void accept(T t) {
if (t == null) {
if (!seenNull) {
seenNull = true;
downstream.accept(lastSeen = null);
}
} else if (lastSeen == null || !t.equals(lastSeen)) {
downstream.accept(lastSeen = t);
}
}
};
} else {
return new Sink.ChainedReference<T, T>(sink) {
Set<T> seen;
@Override
public void begin(long size) {
seen = new HashSet<>();
downstream.begin(-1);
}
@Override
public void end() {
seen = null;
downstream.end();
}
@Override
public void accept(T t) {
if (!seen.contains(t)) {
seen.add(t);
downstream.accept(t);
}
}
};
}
}
};
}
上面标浅蓝色部分就是原因,即java stream distinct底层是使用HashSet来实现去重处理的,HashSet本身又是基于HashMap来去重的,正如我们平时使用HashMap时需要保证HashMap的key必须重写equals以及hashcode方法,要想使用stream的distinct方法去重也必须保证涉及的类必须重写equals以及hashcode方法,否则就可能无法去重!!!