Separate Chaining FAQ - Hash Tables

To build our own spatial hash table, we will need to understand how to resolve the hash collisions we encounter when adding elements with separate chaining.

Keywords: javascript, tutorial, spatial hash table, hash map, data structures, leetcode

By Carmen Cincotti  

Last week, we talked about hash tables in order to understand them better.

The goal is to eventually use a spatial hash table for our fabric simulation because we will have to use it to handle particle collisions.

Cloth simulation result

When adding elements to such a structure, we have to be careful of hash collisions. Over the next few articles we will learn how to deal with them as they arise.

What is a hash collision?

A hash collision occurs when two or more keys are hashed at the same index of the underlying array of a hash table.

A hash collision hitting the same index

Without sufficient means to manage such a situation, we will have unexpected results.

Why is hash collision a problem?

Because we will overwrite existing data that already is stored in the hash table.

To illustrate such a situation, imagine that we would like to store the key-value pair { "pomme": 2 }. We have a hash function that hashes the key, pomme, to an index of 22.

Hash collision example one with pomme

Next, we would like to store { "pamplemousse": 3} in our hash table. Coincidentally, the hash function hashes pamplemousse to an index of 22… but the value of pamplemousse is already there!

Hash collision example two with pamplemousse

What can we do to handle this situation?

How do we handle a hash collision?

To resolve a hash collision, we have several options:

  1. Separate Chaining - A linked list is used at each index in the array to store the data ​​of colliding keys.
  2. Open Addressing - If a collision occurs, the array is searched to find an open bucket to store data. An additional data structure is not necessary.

The complexity of each method is measured by what is known as the load factor of the hash table. We will see this concept in the next article.

What is Separated Chaining?

Separate chaining is a way to resolve hash collisions in a hash table. Linked lists are placed at each index of the underlying array. Each key-value pair that matches an index in the array (after hashing the key) is added to the linked list.

Separated chain example

The time complexity for inserting, searching, and deleting on average trends toward O(1)O(1) given that the underlying array is large enough to hold the unique elements we want to store, and that the hash function distributes the elements uniformly.

At the other extreme, the time complexity for these operations can resemble that of a linked list if the elements group together in a single bucket. That is, time complexity degrades to O(n)O(n) for insertion, search, and deletion.

Separate chaining - how to insert an element?

Let’s imagine that there is a key-value pair { "key": 1 } which is hashed at index 44. I could insert this pair by concatenating it to the linked list at that index like so:

Separated chain example one

Next, imagine that I would like to insert the key-value pair { "two": 5 } which is hashed at the same index 44 :

Separated chain example two

You just need to concatenate this new pair to the linked list!

Separate chaining - how to search for an element?

To find an element, we hash the key to find the index in the underlying array. Then, we traverse the linked list that is there. Using the key, we can visit and compare each node in the linked list until we find a match.

Search for element in hash table

Of course - there’s the case where nothing is found. Each language handles this case differently. Maybe it’ll return ‘NULL’ or it may even throw an error.

Separate chaining - how to delete an element?

To delete an element, we do the same thing as to search for an element. Upon finding the node that matches the key, we remove it by disconnecting the node from the linked list.

Delete element in hash table

What is the time complexity?

With the help of this strategy to resolve collisions, the time complexity to find, insert or delete an element (on average) is the following:

Search Insert Deletion
O(1)O(1) O(1)O(1) O(1)O(1)

And in worst-case scenario, we would see these time complexities:

Search Insert Deletion
O(n)O(n) O(n)O(n) O(n)O(n)

A worst-case scenario can be caused by an underlying array not being large enough. Another cause could be a hash function that creates groupings in a single bucket.

Next Time

Next week - we’ll look at open addressing, double hashing and load factor - stay tuned!

Resources


Comments for Separate Chaining FAQ - Hash Tables



Written by Carmen Cincotti, computer graphics enthusiast, language learner, and improv actor currently living in San Francisco, CA.  Follow @CarmenCincotti

Contribute

Interested in contributing to Carmen's Graphics Blog? Click here for details!