Remove duplicates from a Java List

How to remove duplicates from a List in Java

There are several ways to find duplicates in a Java List, array or other collection class. The following list describes some of the most commonly used approaches:

  1. Use a method in the Java Streams API to remove duplicates from a List.
  2. Use a HashSet to automatically dedupe the List.
  3. Write your own Java algorithm to remove duplicates from a List.
  4. Optimize your own algorithms with standard Java classes.

Remove duplicates with Java’s Stream API

The fastest and most efficient way to remove items from a List in Java is to use the distinct method of the Stream API.

The following code snippet starts with a List that contains 10 elements, of which only six are unique. When we initially print the size of the List, we get the number 10.

List<Integer> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1);
System.out.println(myList.size());  //prints 10

When we call the distinct method of the Java Stream API, this removes duplicates from the returned list. When we print out the list’s size a second time the output is six, which is the number of unique elements in the List.

myList = myList.stream().distinct().toList()
System.out.println(myList.size());  //prints 6

The Streams API is the fastest and most efficient method to remove duplicates from a Java List. It is also typically the best approach so long as you use version 8 or newer of the JDK. (Java 21 was released in September of 2023 — there’s no excuse to still be on a JDK that old.)

Dedupe with a Java HashSet

Sometimes a HashSet is better than a List.

A Java List can contain duplicate entries, as shown in the prior example. However, every item in a HashSet must be unique.

If you simply pour a Java List into a HashSet automatically dedupes the contents.

List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1);
System.out.println(myList.size()); //prints 10
HashSet<Object> set = new HashSet(myList);
myList = List.of(set.toArray());
System.out.println(myList.size()); //prints 6

Depending upon the use case, you can leave the content in the HashSet and use that component to keep the data set unique. You could also use the List.of method to hold the HashSet elements in a List again.

Remove Java Duplicate Primitive Types

These examples removed duplication Java primitive types from list, but the same logic applies to reference types.

Remove Java List duplicates through code

The first two examples to solve this deduping problem use specialized Java components and APIs. However, it’s a fun exercise to just use the standard loop function and conditional operations to remove duplicates from a List in Java. This might even be necessary if will require further customization.

In this example, we iterate through each element of a List and compare it to elements in a second List. We then add the element to the second List if it does not exist there.

When the loop concludes, the second list holds only unique elements.

List<Object> items = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1);
List<Object> deduped = new ArrayList<>();
for (int i = 0; i < items.size(); i++) {
  if (!deduped.contains(items.get(i)))
    deduped.add(items.get(i));
  }
}
System.out.println(deduped);

This solution is highly functional and relatively easy to understand. However, the performance of this algorithm degrades significantly as the number of duplicate elements grows.

A reviewer for a coding job at Google or Facebook would not be particularly fond of this algorithm.

Use standard Java classes to optimize algorithms

One of the problems with the previous example is the speed of the lookup to see if a duplicate entry exists in the second List. One way to improve upon this example is to use a standard data class that has optimized lookup functionality. One such datatype is the HashMap.

With a HashMap we must still look up matching elements, but the HashMap stores keys and values in an optimized fashion that provides a constant speed lookup regardless of the size of the collection.

List<Object> items = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1);
HashMap<Object, Object> deduped = new HashMap<>();
for (int i = 0; i < items.size(); i++) {
  if (!deduped.containsKey(items.get(i)))
    deduped.put(items.get(i), 0);
   }
}
System.out.println(deduped.keySet());

This approach performs faster than the nested loop algorithm for all but the smallest of Lists, and when the Lists are small the difference in speed is negligible.

Deduping Lists in Java

There are a variety of ways to remove duplicates from a List in Java. Which one should you use?

For the vast majority of use cases, the Stream API’s distinct method should fit the bill.

For corner cases that require customization during the deduping process, consider writing your own algorithm to remove duplicates.

If your need is speed, perhaps a HashSet or HashMap makes more sense for your overall goals.

 

App Architecture
Software Quality
Cloud Computing
Security
SearchAWS
Close