How a Java HashMap internal implementation works

Name: How does a HashMap Work Internally? | Core Java Interview Question 12
Uploaded: 2024-09-09T17:29Z
Duration: 27 min 18 s

Cameron McKenzie, TechTarget

A frequently asked question in a Java interview is: How to implement a Java HashMap? Java job seekers must fully grok this important concept if they want to ace the interview.

The HashMap internal implementation

In this quick tutorial on how the Java HashMap works internally, you'll learn the following aspects:

The basic syntax of the Java HashMap.
How keys, values, hash codes and node references interact.
How hash codes are mapped to buckets.
What happens when hash node collisions occur.
How the HashMap changed in Java 8.
The ways the JDK now optimizes HashMap storage.

The HashMap tutorial also includes some useful code snippets that help developers understand what's going on with a Java HashMap under the covers.

For example, the following code used in the tutorial overrides the JDK's standard equals() and hashCode() methods, forcing hash collisions that would not normally occur. The result clearly demonstrates behaviors that would not commonly be observed with the default implementation.

import java.util.HashMap;
import java.util.HashSet;

public class HashAndProperty {
	
	int hashcode;
	String property;
	
    public HashAndProperty(int hashcode, String property) {
		this.hashcode = hashcode;
		this.property = property;
	}

	public int hashCode() {
        System.out.println("Calling hashCode()");
        return hashcode;
    } 
 
    public boolean equals(Object obj) {
        System.out.println("Calling equals() for key: " + obj);
        return property.equals(obj.toString());
    }
    
    public String toString() {
    	return property;
    }
	
    public static void main(String args[]) {
    	HashMap<HashAndProperty, String> map = new HashMap<>(99);
    	
    	
        System.out.println("storing value for k1");
        map.put(new HashAndProperty(2, "Andrew"), "alpha");
        System.out.println("storing value for k2");
        map.put(new HashAndProperty(2, "Betty"), "bravo");
        System.out.println("storing value for k3");
        map.put(new HashAndProperty(2, "Claire"), "charlie");
        System.out.println("storing value for k4");
        map.put(new HashAndProperty(2, "Darcy"), "devo");
        System.out.println(map.size());
        
        
        HashSet<HashAndProperty> set = new HashSet<>();
        set.add(new HashAndProperty(1, "Andrew"));
        set.add(new HashAndProperty(2, "Betty"));
        set.add(new HashAndProperty(3, "Claire"));
        set.add(new HashAndProperty(3, "Darcy"));
        set.add(new HashAndProperty(3, "Echo"));
        System.out.println(set.size());
        
    }
}

The following transcript provides the full text of the Java HashMap implementation video above.

Cameron McKenzie has been a Java EE software engineer for 20 years. His current specialties include Agile development; DevOps; Spring; and container-based technologies such as Docker, Swarm and Kubernetes.

View All Videos

Transcript - How a Java HashMap internal implementation works

One of the most common Java interview questions, and it is an advanced Java interview question, is: How does a Java HashMap work? They're not just asking how do you use the HashMap class from the Java Collections API, but they're asking: how does it work internally?

The thing is, there's actually a little bit of a trick to this question because the implementation was changed in Java 8 to make it more efficient. So usually, the person doing the interview wants to check if you know what that update is all about.

Hi, I'm Cameron McKenzie. I'm the editor and chief over at TheServerSide.com, and I've helped coach thousands of people to get their first Java job and pass that Java interview. I want to just explain to you how you should respond when asked: How does a Java HashMap work?

What I'd like to do in the next few minutes is quickly take you through just how you use a Java HashMap, but we're not going to spend too much time on that. I'm then going to show you how the Java HashMap is implemented behind the scenes. After that, I'm going to actually show you what the update was that makes these HashMaps so much faster.

Then I'd like to do a little bit of live coding with you, where I go through a little bit of an example, and we take a look at things like the hashCode method and when the equals method is called, and how that HashMap is actually working behind the scenes.

But doing a little example of the HashMap and just getting familiar with the code, that is what we're going to do next.

Starting off, let's take a quick look at just how the HashMap class works. In order to use it, you've got to import java.util.HashMap, and you can create an instance just with HashMap, give it a variable name, equals new HashMap, and a couple of things. Notice I put a number here. Behind the scenes, here's your first piece of information on how things work behind the scenes.

The HashMap actually just uses an array, and if you can guess how many elements are going to be in that array, you can specify that -- the size of that array -- right when you create the HashMap. If you get close to that number, the HashMap will expand its size, so it's not a hard limit, but if you want to stop the HashMap from expanding all the time and wasting clock cycles, if you can peg the, uh, the right HashMap size when you create it, it's not a horrible idea.

Now, the other thing too, I mentioned that the HashMap is part of the Collections API, and I don't know if you want to be a 'it's not part of the Collections API' because it actually doesn't implement the Collections interface. So there's a Collections interface that things like Vector and ArrayList actually implement.

If you actually dig into the HashMap class, you'll notice that it actually extends AbstractMap, and AbstractMap extends Map, and none of those actually implement that Collections interface. So if you want to be a real stickler, you can say, 'Well, it's actually not really one of the Collection classes because, well, it doesn't implement that Collections interface.' But what it does is it implements that Map interface. And really, when we're working with a HashMap, we're working with name-value pairs.

So this is all of my favorites. Let's, uh, find out what my favorite movie is. So, favorites dot put name-value pair. So, I'm going to say my favorite movie is … and what's my favorite movie? Well, let's say it's Shrek -- and there you go, we've got one element inside of our HashMap.

Now, here I've got two strings. A lot of people's first introduction to a HashMap is putting strings as the name on the left-hand side and the value on the right-hand side. But you can actually have objects on both of those sides. It doesn't have to be, uh, a text string.

Now, let me throw, uh, another couple of values in here. We'll throw in my favorite band, which is "The Lowest of the Low." We'll throw in my favorite car, which is a Dodge Viper. My favorite YouTuber, which is Scrumtuous. Control S, that saves.

Now, you can see that idea of putting different values into that HashMap, into that class that's going to keep track of all of this data. You can always do a system.out.println and print out the map. The toString method, I think, just represents that as a funky little JSON string or something like that.

So let's give that a run. Run as a Java application. Yeah, you can see everything printed out there. The name-value pair: car equals Viper, YouTuber equals Scrumtuous. So that all looks good to me.

And then, by the way, if you ever want to pull something out of that HashMap, um, it's just a, uh, matter of saying we'll do system.out.println, and we'll say favorites.get for objects, so we'll get, and we'll get my favorite YouTuber. And if I spelled everything correctly, we can always do a copy and paste just to confirm. That should print out Scrumtuous right after the actual map is printed out.

So there you go. Those are the fundamentals, the very basics of working with a HashMap.

Now, the question that you get on Java interviews is not just the basics of, 'How do you use the API?' It's, 'What's going on behind the scenes here in order to support that?'

Now, that gets very, very interesting because it, uh, requires a bit of knowledge of the hashCode method, and it also requires a little bit of knowledge of the equals method.

And by the way, just to give you an idea of the HashMap method -- uh, the hashCode method -- every time you create, um, an object in Java, each instance that you create gets assigned its own hashCode. It's supposed to be a unique number. I think technically it's not universally unique, but it's going to be pretty rare for you to run into two numbers that are the same when you ask for an object's hashCode.

So if I created, uh, a string, you know, handle = cameronmcnz -- that's my Twitter handle if you want to follow me over there. And then, a System.out.println, and I said handle.hashCode, this is going to print out some long number, probably even put a, uh, negative in front of it, that represents that particular object.

That becomes really important in a moment because, you see, when the Java virtual machine -- or computers, for that matter -- start organizing data, they don't like text strings. They don't like objects. They like numbers. And that number becomes very important when we start looking at the behind-the-scenes implementation of a HashMap. And that is what we're going to take a look at next.

So, here's how a HashMap works behind the scenes. When you create a HashMap, what the Java virtual machine does is it creates a single-dimensional array to hold all of your data, to hold everything that you want to put into your HashMap. Now, there's a little bit of a problem with that because a single-dimensional array can only hold one object per element -- one object per bucket. And, of course, when we put things into a HashMap, it's always a key-value pair, right? It's always two things.

So, here's how the Java Virtual Machine gets around that. It creates its own class called a node, and a node has four properties. You can probably guess the first two. One property points to the key, the other property points to the value, when you put something into a HashMap. The other property is actually the hashCode of the key, right? Every time you have an object in Java, Java assigns it, uh, a unique key, a unique numerical identifier. Computers, the Java virtual machine -- it much prefers working with numbers. So, anytime it can use that hashCode of an object rather than dealing with properties, it prefers it.

So, we've got the key, the value. It keeps track of the hashCode of the key. And then it actually holds a reference to another node, which we'll get to in just a moment.

Now, when you actually start adding objects into your HashMap, what happens is the Java virtual machine will take a look at the key and the value. So, you know, favorites.put(movie, Shrek), it will put 'movie' and 'Shrek' into this new node instance that it creates. It'll take a look at the hashCode for 'movie,' set that as the hashCode, and then it'll take a look at your array.

Let's just imagine we've got a 10-element array behind the scenes to support this HashMap. So, we've got a 10-element array, and when we put our first object into that array -- our first name-value pair, 'movie: Shrek' -- we find out that the hashCode for the key 'movie' is 9722.

Well, what the Java Virtual Machine does is it says, "Well, I've got 10 elements, and, well, you know, the last number of this hashCode is the number two. What I'm going to do is I'm going to take that node object that I just created and put it into the third element." Because, array, zero-based counting, so the third element maps to the number two, you get the idea, right? So, uh, so it says, "OK, the number two is the last number of the hashCode. I've only got 10 elements, so I've only got 10 spots to choose from. Uh, since the hashCode ends in a number two, I'm going to put it in bucket number two." Zero-based counting, that's the third bucket.

Now, let's say another instance is created. So, we now put into our HashMap favorites.put(YouTuber, Scrumtuous). Well, the HashMap is going to look at the hashCode for 'Scrumtuous,' and it'll say, 'Oh, the hash code is 3053.'

And, OK, so now we've got the hash code of 3053. Um, we've got 10 buckets. The last number is a three. I'll actually throw this new node object -- this thing that holds all of that data -- into bucket number three, which is actually the fourth bucket, um, because of zero-based counting. But again, you get the idea. So, it's using that last number, right? That last number. And it's only using the last number 'cause we've only got 10 buckets. So, we'll do a little modulus on the hashCode and figure out where to put it.

And then now, let's assume, you know, we get another one. You know, 'What's your favorite food?' It's ice cream. The key to 'food' is 8320. The last number on that hashCode is a zero. We've only got 10 spots. May as well just throw that right into element zero, right? Essentially, the first element in the array -- zero-based counting.

And so, that's how the Java virtual machine figures out where to put data. It also makes it really, really fast for the Java virtual machine to pull something out. Because imagine we now say, 'OK, you know, I want to get out the information about my favorite food, which is ice cream.' It looks up the hashCode for the word 'food.' It realizes that it ends in a zero. So, it now knows, 'OK, the favorite food is what's in bucket zero.' Right? That hashCode, the last number is the zero. We've only got 10, so we're doing a modulus of 10 to get the bucket number.

OK, we know that whatever is in bucket zero, uh, maps to this. Pull it out. As I said, it's using numbers rather than using properties and all this other stuff. It's incredibly, incredibly efficient.

So, yeah, whenever you add something into, uh, a HashMap, behind the scenes what's happening is the virtual machine is creating a node object. It's populating the node object with the hashCcode of the key, the key, and the value. And then, based on the last numbers of that hashCode, it's mapping that to a corresponding bucket -- a corresponding element in the array that it's using.

So, that is fundamentally how it works.

Now, imagine we ended up adding something into our HashMap, and it ends up having a hashCode of 9973, and we've already got something at 3053.

So, now, all of a sudden, the bucket's not empty, right? Like, it's all nice, it's all fine and dandy when all the buckets are empty. OK, there's a hashCode that ends in zero -- we'll put it into bucket zero. Here's a hashCode that ends in three -- we'll put it into bucket three.

All of a sudden, you get another object. It says it's the hashCode three, and it's like, 'Uh-oh, we've already got something in bucket three.' This is where that node reference comes in.

So, if we see the hashCode is the number three and we've already got a node in bucket three, we still create a new node instance. We still take that new node instance and put it into bucket number three. But that node property, OK -- that node reference that the node itself has -- points to the other node in that bucket.

So, we end up getting a linked list.

So, yeah, the first element that went into element number three -- it's still there. We now put a new node, uh, a new instance, a new key-value pair into bucket number three. We've now got two in there. And so, what we do is that second instance -- we use that node property to point to the first one.

And so, uh, that is kind of, you know, some of the behind-the-scenes stuff of how it's implemented. That's what happens when we get what they call a collision, when we start putting things into our buckets. And we can potentially -- depending on how many elements are being put into the array and what the size of the array is -- we could end up with more and more and more elements all inside one bucket, with all of these different nodes pointing to each other through that linked list that gets created.

Now, there's a problem with that, and it's a big problem with that. With a linked list, what ends up happening is, if you want to go through the list and find a particular element in it, you've actually got to go through every element in the list. And in fact, what happens when we try and get an element out of a HashMap and we end up getting sent to a bucket that's got multiple elements in it, what we do is we go through the elements one at a time.

And we know that the hash code ends in three, but let's just say that, you know, we've got a whole bunch of things where the hashCode ends in three, but we're specifically looking for the key-value pair where the key was 'movie.'

We pull that linked list, and then we start doing a, uh, a dot-equals comparison on each node, saying, 'OK, I know that, you know, the favorite movie is going to be somewhere in this linked list.' So, we get the list, and then one element at a time, one node at a time, we start saying, 'Does the key dot equals movie?' Does the, the, the key's, you know, name -- does the key dot equal 'movie'? We want to do an equals comparison, not just a hash code, and we do that until we find it.

Now, if we've got 10, 20, 30 elements linked together, there's a possibility that the last one we're going after is going to be the last one in the list. Um, and so that ends up becoming a very, very slow search.

What they did in Java 8 was they said, 'If we hit the critical mass of eight elements in this linked list, what we're going to do is we're going to break the linked list and create a binary tree.'

And the thing about a binary tree is that you can do very, very fast searching on it.

So, for example, if I had a binary tree with one, two, three, four, five, six, seven, eight, nine elements in it, um, and potentially I want to find element 8271, I wouldn't have to search through all nine. With a binary tree, what you can do is you can take a look at the first node, and in this case, it might be 8051. And then you can say, 'Uh, you know what, it's actually higher than that.' And then you look at the next node and you do a comparison and you say, 'OK, it's actually going to be lower than that one.' And then you keep going through the tree. You keep breaking the tree in half on every search.

That's a binary search of the tree, and that's extremely efficient. It's the fastest type of search that you can do with, uh, a group of data. And so, that was the big change that they made with Java 8.

So, I've put together this little code example that I really hope you actually code on your own and play around with a little bit because I think it really drives home how the HashMap works and how it works with dot equals, dot hashCode, and the various buckets that get created behind the scenes.

I've created a class here that's going to represent the key in a key-value pair, right? When you put something into a HashMap, you'll say HashMap.put('car,' 'Viper'), and 'car' would be the key, 'Viper' would be the value.

In this case, I'm not going to put 'car' as the key. I'm going to create a HashAndKey as the key, and this HashAndKey will have the key 'car,' and it'll also have a hashCode that I'm going to specify. I'm not going to allow the virtual machine to create a key for a hashCode for the key. I'm going to specify it. The reason I'm going to specify it is it's going to allow me to create multiple keys with the same number and simulate a collision.

Um, and so you can see there's a constructor here, HashAndKey, which takes the hashCode and the key, and I get to assign both of those.

Now, behind the scenes, the virtual machine calls the hashCode method of the, the, the class of the key that's put in as the key-value pair. But you can override that, right? So, you can see that I've overridden that hashCode method here. And as I've overridden it, I'm just returning that value that I specified for the key. I can make it a random number, but I can put every key in with the same number as well and see what type of havoc that creates.

Now, as we discussed earlier, bucket selection -- index selection in that behind-the-scenes array -- is determined by the hashCode of the key. However, if there's more than one element in that array -- more than one node, more than one, uh, set of values inside that bucket in that array -- well, we can identify which bucket to go to with the hashCode. But we have to do an equals comparison on, uh, all of the different nodes that might be linked together in there.

So, I've overridden the equals method as well, and I just put a printout in there saying, 'Calling the equals method' and what object we're calling the equals method for. And then we just do the equals method.

Similarly, in that hashCode method, I printed out, uh, 'Calling hash code,' so we can see when the hash code actually gets called.

Now, uh, what else do I have there? The toString method, which will make things easier. And then I've got the main method here.

OK, so I create the HashMap, and the HashMap has the HashAndKey as the key -- that's the class we've been looking at -- and then just, uh, a String for the value.

So, you can see, I put my first HashAndKey in. The HashAndKey is one, and 'Andrew,' and the value is 'alpha.' Then, I put a second HashAndKey. The hash is two. The key is 'Betty.' The value for the key is 'Betty,' uh, and then the value is actually 'bravo.' HashAndKey three -- 'Claire' -- and then the value was 'charlie.' And then, HashAndKey four -- 'Darcy' -- and the value is 'devo.'

And, uh, so, you can kind of see how this is getting put in there. And I try to make it alphabetical so you can kind of keep track of the order. Now in this case, every single key has a hashCode that is unique: one, two, three and four.

So as I put elements into this HashMap, really there shouldn't be any problem putting things in. And if I pull things out, the only thing we should ever have to take a look at is the, the HashMap. Because the array behind the scenes is only ever going to have one element, one node, one piece of data per element in the array.

So, if I know which element in the array to, to go to -- say, when I look up a, a value based in a property -- I will get to the, the bucket right away. There'll only be one thing, and then there's no problem.

So let's do this. Let's run this code. So, I'll run this as a Java application. And when this code runs, let's take a quick look at it here, maybe even throw that up there. Give me a moment to do some formatting.

Notice that when we store key1, we call hashCode, and that tells us to put it in element one in the array. Key2, we call hashCode, and that puts it in element two of the array. Three -- third element. Four -- fourth element. Everything is fine. Even when we want to go and find the key-value associated with hp here. So, we've got a hash key named hp. We try and pull that out again a little bit later.

Um, we'll call that 'darcy,' 'darcy,' and 'darcy' there -- to make it look a little bit more obvious just which key we're pulling out there.

So, we put in Darcy. We now pull that out. And when we pull that out again, we only need that hashCode call. It tells us which bucket in the array to go to. We pull out the node that's stored there and get the property from it. And then, the property associated with Darcy is devo.

So, this is the happy path. This is perfect when there's no collisions.

Now, here's the thing. We can force the same hash key, hashCode, for each key.

So, notice how over in the console, we only ever called hashCode in order to search for an element or even to figure out where to place an element inside of our array.

Well, now, watch this. I've set everything to have the same ID of one. Watch the madness.

OK, we put the first one in. OK, all we had to do was call hashCode, because there was no conflict.

We put the second one in, but that second one does have a conflict. So, now, we actually have to get that first element. And then, we have to actually go through what's already in there. So, we've got to call for equals on Andrew.

Add the third one in. OK, now we call the hashCode on the third one. It's one.

Uh-oh. Problem now. Um, we've actually got to put that into that third element in the array, but we're going to have to go over the other elements that are already in that same bucket, which are Andrew and Betty.

We now store the key for four. Again, that's a hashCode.

Now, we have to go through Andrew, Betty, and Claire again. And so, now we're putting it in, but it's not good enough just to have the hashCode. Now, we have to go through each element in that array and then figure out, uh, whether that element is in there or not already. And then, if it's not already in there, then we add it in. And, and that's what that equals is doing there, right?

So, you put a new element into a bucket. Uh, what happens is we say, 'OK, we're putting something into bucket number one. OK, there's three things in bucket number one. Let's make sure that this thing we're putting in there doesn't already exist in there, right? Let's make sure there's, you know -- if you're putting something with a key of Betty -- let's make sure that something doesn't already exist in there with a key of Betty.'

So, that is why we're forcing that equals.

It's also really interesting when we go in and we try and get Darcy, right? That Darcy key was the fourth one that was put in there. So, we say, 'OK, let's get Darcy.'

Um, that is now key number one, which takes us to a bucket that has four things in it.

Um, so the first time -- first element in that bucket -- we have to say, 'OK, I know that we're going to bucket number one. We're looking for some key of Darcy. Hey, that first element in there -- do you match the key of Darcy? Do you have a dot equals for Darcy that's true?'

And it says, 'No, I'm Andrew.' We go to the next one. 'No, I'm Betty.' Next one. 'No, I'm Claire.'

And the next one -- it's like, 'OK, that's the last one left there. Then, that's got to be it.' And then we pull that out, and then we get devo.

Um, and so that is the chain of events if we force this to have the same key each time.

And let's just take a look at this again. When every key is unique, right, it's a, a nice comparison. We don't have any of those equals calls. It's all just hashCode.

So, there you go. Play around with this code. There's a link in the description that will help you get the code for this example, as I said. And I think this really drives home how that HashMap is working behind the scenes.

Now, if you enjoy that tutorial, why don't you head over to TheServerSide.com. I'm the editor in chief over there. We got lots of great tutorials on Java, on Spring Boot, on microservices development, on Git, on GitHub, on Scrum and Agile.

Um, if you're interested in my personal antics, you can always follow me on Twitter at CameronMcKenzie. I have written a couple of books, and you can see back there -- well, What Is WebSphere? Doesn't sell too well anymore. Um, there's Hibernate Made Easy, all about working with Hibernate and JPA. Uh, there is also Pickering is Springfield, which is a book that I wrote about the fictional hometown of The Simpsons.

And then also, you'll see Darcy DeClute's Scrum Master Certification Guide. A lot of people have been reading that book and scoring 100% on the Product Owner or Scrum Master certification exam.

So, if you're Agile and working with Scrum and you really want to enhance your career, I highly recommend picking up that book.

And then finally, if you're watching this on YouTube, then why don't you subscribe on YouTube?

+ Show Transcript