Hash Tables

A hash table, also known as a hash map, is an inbuilt data structure that is used to store key-value pairs. It uses a hash function to compute an index into an array in which an element will be inserted or searched. The advantage of the hash table is its complexity. It offers an completxity of O(1)

Key Components

  1. Key: The input provided to the hash function.
  2. Hash Function: A function that converts the key into an index within the hash table.
  3. Hash table Array: Each index in the hash table array, which may contain a chain of key-value pairs.

Hash Function:

A good hash function should distribute keys uniformly across the array to minimize collisions. Common hash functions include division-remainder, multiplication, and cryptographic hash functions.

Why use Hash Tables

The most valuable aspect of a hash table over other abstract data structures is its speed to perform insertion, deletion, and search operations. Hash tables can do them all in constant time.

For those unfamiliar with time complexity (big O notation), constant time is the fastest possible time complexity. Hash tables can perform nearly all methods (except list) very fast in O(1) time.

AlgorithmAverageWorst case
ListO(n)O(n)
SearchO(1)O(n)
InsertO(1)O(n)
DeleteO(1)O(n)

Because of this efficiency, you’ll find hash tables to be pretty dang useful for many use cases. And if you look carefully, you’ll notice that they’re actually implemented in a variety of places throughout your tools like your databases, caches, data-fetching libraries, and so on.

    What is Collision?

    In hash tables, a collision occurs when two or more keys hash to the same index in the hash table. This situation can arise because hash functions map a potentially infinite set of keys to a finite set of indices, leading to the possibility of multiple keys having the same hash value.

    1. Chaining: Each bucket stores a linked list (or another data structure) of key-value pairs that hash to the same index.
    2. Open Addressing: The system searches for the next open slot in the hash table when a collision occurs.

    Types of Collisions:

    1. Chaining (Separate Chaining):
      • In this approach, each bucket in the hash table contains a linked list or another data structure that stores all keys hashing to the same index.
      • When a collision occurs, the new key-value pair is simply added to the linked list at that index.
    2. Open Addressing:
      • In this approach, collisions are resolved by finding an alternative location within the hash table for the key that collided.
      • Various techniques include linear probing (checking the next location), quadratic probing, and double hashing.

    Causes of Collisions:

    1. Limited Range of Hash Values:
      • If the range of hash values is limited, collisions are more likely because different keys may be mapped to the same index.
    2. Poor Hash Function:
      • A poorly designed hash function may not distribute keys evenly across the hash table, increasing the chance of collisions.

    Handling Collisions:

    1. Chaining:
      • In chaining, when a collision occurs, a linked list is used to store all keys that hash to the same index.
      • Each bucket contains a chain of elements, and searching or inserting involves traversing the chain.
    2. Open Addressing:
      • In open addressing, when a collision occurs, the algorithm searches for the next available slot in the hash table.
      • Linear probing checks the next slot, quadratic probing checks slots based on a quadratic function, and double hashing uses a secondary hash function to determine the step size.

    Collision Resolution Strategies:

    1. Linear Probing:
      • If the desired location is occupied, check the next position, and continue until an empty slot is found.
    2. Quadratic Probing:
      • If the desired location is occupied, probe with increasing quadratic steps until an empty slot is found.
    3. Double Hashing:
      • Use a secondary hash function to determine the step size. If a collision occurs, try the next slot based on the secondary hash function.

    Choosing a Good Hash Function:

    To minimize collisions, a good hash function should:

    • Distribute keys evenly across the hash table.
    • Avoid clustering of keys in specific regions.
    • Provide a wide range of hash values.

    Trade-offs:

    • Chaining:
      • Pros: Simple implementation, efficient for small to moderately sized datasets.
      • Cons: Requires additional memory for the linked lists, may become inefficient for large datasets.
    • Open Addressing:
      • Pros: No need for additional data structures, more memory-efficient for large datasets.
      • Cons: More complex implementation, may become inefficient as the load factor increases.

    The choice between chaining and open addressing depends on factors such as the expected dataset size, the efficiency of the hash function, and the desired trade-offs between memory usage and computational complexity.

    Example Implementation (Python):

    class HashTable:
        def __init__(self, size):
            self.size = size
            self.table = [[] for _ in range(size)]
    
        def hash_function(self, key):
            return hash(key) % self.size
    
        def insert(self, key, value):
            index = self.hash_function(key)
            self.table[index].append((key, value))
    
        def search(self, key):
            index = self.hash_function(key)
            for k, v in self.table[index]:
                if k == key:
                    return v
            return None
    
        def delete(self, key):
            index = self.hash_function(key)
            for i, (k, v) in enumerate(self.table[index]):
                if k == key:
                    del self.table[index][i]
                    return
    

    Common Use Cases:

    1. Databases: Hash tables are used to implement database indexes for quick data retrieval.
    2. Caching: They are employed in caching systems to quickly check if a value is already stored.
    3. Language Interpreters: Many programming language interpreters use hash tables for symbol table implementation.

    Thus, we can say that hash tables are efficient data structures for mapping keys to values, providing fast retrieval times when designed well and handling collisions appropriately. They are widely used in various applications due to their efficiency in searching and retrieval.

    Categories DAA

    Leave a Comment

    Share this