A number of people have reported problems with the email processing on this site. I am happy to report that I have finally figured out the cause of the problem ... (more)

**Question:** I want to know about direct addressing, dictionary lookup, and indirect addressing. Can you explain about division remainder, digit analysis/extraction, folding, mid-square, and radix conversion? What is collision and what are the differences between double hashing, chaining, bucket addressing, perfect hashing, and dynamic hashing? What is linear probing and what are synonyms?*anonymous*

**Answer:** Hash tables are one of the most efficient means of searching for data. A hash table establishes a mapping between all of the possible keys and the position where the data associated with those keys is stored. I don't know exactly how all of the hashing methods that you are asking about work but the following should go at least partway towards answering your questions.

*Direct Addressing* is where the hashing algorithm guarantees that no two keys will generate the exact same hash value. This is the ideal situation but is impractical in most situations. Instead most hash functions map multiple keys to the same has value and provide *indirect addressing* mechanisms to handle the situation where two keys map to the same value. The situation where this occurs is called a *collision*. Two keys that hash to the same value like this are called *synonyms*.

A hash function attempts to distribute keys in a random manner so as to minimize the need to resolve collisions. One of the simplest hashing algorithms is *division remainder* where you divide the key value by some fixed number and take the remainder as the key location (also known as modulo arithmetic). For example let's say we have space for one thousand entries, so we divide the key of each value by 1000 and take the remainder so the entry with key 5078950770 will occupy position 770 in our array (5078950770 modulo 1000 is 770). Of course with this particular example every 1000th key will map to the same location.

A *Chained hash table* is one where the data is stored in linked lists which contain all of the data entries whose keys map to the same hash value. The linked list (or *bucket*) can grow to contain as many entries as required but as the list grows it becomes gradually less efficient in the speed at which the entries can be accessed.

An *open addressed hash table* stores the data in a table instead of in a linked list and can use a variety of probing methods to locate the required entry within the table. *Linear probing* is the least efficient method as it involves searching sequentially through the table until the desired entry is found. Where only a few entries map to the same hash value this may be good enough but where lots of entries map to the same hash value, a more efficient method is required.

*Double hashing* is one of the more effective methods of probing an open addressed hash table. It works by adding the hash codings of two auxiliary hash functions. The function for performing double hashing is *h(k,i) = (h _{1}(k) + ih_{2}(k)) mod m* where 0 <=

*Dynamic* or *Universal hashing* involves using a random number generator to dynamically generate the hashing algorithm at run time. This method reduces the chance that your keys will produce a bad distribution across the table.

As you have noticed from the above examples, hashing algorithms work on numeric keys. *digit analysis/extraction* is the method that you use to convert the actual keys into numerical values that cal be fed through your hashing algorithm. To get the best results you should aim to base your hashing values on those parts of the actual keys most likely to vary from one record to another.