Wednesday, August 24, 2016

Creating a Simple Compiler Symbol Table using Hashing C++ and Explanation


Explanations:

Compiler symbol table stores information that is needed in various phases if compiler. It may store information such as the data type, value, memory location, class type(identifier, number, operator), for functions(function name, arguments types, argument number, return type) etc.



In the code below it contains only two columns name and class type. Ex: $ a = a + b $. Operators are internal nodes and variables are leaf nodes. No need to store a variable twice. In the example above, a will be stored exactly once.
Symbol Table for, $ a = a + b $:


Required Functions:
Functions Names:
1. insert 2. delete 3. update 4. search 5. show


Efficient Data Structure:
A 2D array could be used to store the data, but it may have underutilized space. So, linked list is better. Now using linked list if we keep inserting elements then a single chain gets longer and longer. We may have do a linear search on the whole thing to find a necessary element. It is neither good in time and space complexity.

We can get rid of this problem using hashing. Through hashing we can insert elements in various chains. Using this method we can skip searching a lot of unnecessary elements by skipping many chains. Collision resolution is done through separate chaining as new elements are appended to the end of chain when an element present.

Worst case time complexity arises when all the elements are present in the single chain. It is also better in terms of space complexity as space is better utilized when using multiple chains and there is less fragmented space.

Selecting Hashing Method:
If hashing method was chosen based on class type some chains may be completely empty and some have all elements. Ex:

This leaves the name to create chain with. The hashing is done adding the corresponding ASCII characters of a string then positioning it inside the hash table by modulo with table size or, chain size.

Hashing Example:

If the input was $ \text{<} abc, identifier \text{>} $ then, the chain or, bucket is calculated using below method,

$ idx \xrightarrow[ASCII]{SUM} ( 'i' + 'd' + 'x' ) % CHAIN_LENGTH \rightarrow ( 105 + 100 + 120 ) % 53 \rightarrow ( 325 ) % 53 \rightarrow 7 $

How to use the Code:

Insertion:



Insertion and Collision Resolution Diagram:




Showing the Symbol Table:

Code:


No comments: