Monday, September 20, 2010

Lesson 10. PATRICIA trees

Lesson 10. Patricia trees


Another type of trie is the PATRICIA tree. PATRICIA stands for Practical Algorithm To Retrieve Information Coded In Alphanumeric. The Patricia Tree is based on using C-strings as its data, and its nodes look something like this:

struct pNode

{
    char* data;//A C-string
    bool isLeaf;
    int index;
    pNode* left;
    pNode* right;
};

Leaves are the only nodes capable of storing strings; if isLeaf is false, then data is set to “”, an empty C-string. Index means the index of the first difference at the binary levels between the strings inserted.







To understand Patricia trees, one must break down strings/C-strings based on what we know.


A string is essentially an array of characters plus one final character called the null terminator.


A character is essentially an array of eight bits.


Bits are binary digits, which can only be 0 or 1, much like bools can only be false or true.


If a bool is set to the int value 0 it is evaluated as false. If it is set to the int value 1 it is evaluated as true.


So since bools only have two possible values, much like bits,


We can think of a character as an array of eight bools.


So we can think of a string as an array of bools that is eight times the number of characters in the string plus eight 0’s to represent a null terminator at the end of the string.


Now prepare to see the Noobtricia Tree’s nodes.


struct pNode

{
    string data;
    bool bits[40];
    bool isLeaf;
    int index;
    pNode* zero;
    pNode* one;
};

A Noobtricia tree is a version of a Patricia tree that I made up as a way to teach Patricia trees. Instead of dealing with the C-string and its bits as part of the same data, Noobtricia trees deal with them separately, making it so that we have a regular string instead of a C-string as our data, and I also added an array of 40 bools to represent the bits of a string with a length up to four characters (first 8, 16, 24, or 32 bools) plus a null terminator (the eight bits after the character bits; if the string is less than four characters long, then the first eight-bit character to be 00000000 is the null terminator and all bits after that are not used). This makes it so that the Noobtricia tree removes confusion by dealing with the string and its bits separately.



In addition, since nodes that are not leaves are used to separate nodes at the first bit in which there is a difference between them (this bit is stored in that node as index), the pointers left and right were renamed zero and one; if a string’s bool at the current node’s index is false or 0, the string is stored in the node’s zero/left subtree. If that bool is true or 1, it is stored in a node within the current node’s one/right subtree.


If a node is a leaf, then we will give the node an index of -1 because by definition, leaves do not have subtrees, so their pointers point to NULL.


If a node is not a leaf, then we will give the node an empty string because only leaves store strings.






pNodes look like this in Noobtricia trees, and in Patricia trees they look like that except without the array of bits:

However, since if isLeaf is true we know that data will be a string and index will be -1 but if isLeaf is false data will be an empty string and index will be an int, in all diagrams I will use a shorthand drawing in which in leaves only the data will be shown and in non-leaves only the index will be shown.







So, the tree has two pointers and it is used to do a binary search for strings. Sound familiar? Well, if it reminds you of binary search trees there’s a good reason why.


In binary search trees, if you insert an int, to find the subtree it is inserted into, the int can only be one thing or the other; it can either be greater than the current node’s data or less than the current node’s data.


In Patricia trees, if you insert a string and its array of bools, then by the definition of a bool, at each index, the string’s array can only have one value or the other; it can only be false/zero or true/one. Because of this property, Patricia trees make it so that they are essentially the binary search trees of strings.


Interestingly enough, in addition to giving the Patricia tree a binary search property, looking at strings as an array of eight-bit characters ending with an eight-bit null terminator means that the null terminator makes it so that no string is a prefix of another string. For example, THE as a C-string is


THE\0


and THEM is


THEM\0


THE\0 is not a substring of THEM\0 because THE’s fourth character is a null terminator while THEM’s fourth character is an M, with the fifth character being its null terminator. If we look at those words in the form of their bits, we see this:


01010100(T),01001000(H),01000101(E),00000000(\0),00000000(\0) THE


01010100(T),01001000(H),01000101(E),01001101(M), 00000000(\0) THEM


Look at the fourth set of eight bits, the one that corresponds to the first \0 on THE and the M on THEM. You will see that in the second character of that set, which is also position 25 of the whole array, THE has a 0 and THEM has a 1. This means that the first difference between THE\0 and THEM\0 is in the character at position 3 and the bit at position 25. They would be separated by a node with 25 as its index, with THE in its zero subtree and THEM in its one subtree.






Now that the information about bits is established, it’s time to see what the Noobtricia Tree’s class and constructor looks like:

class pTree

{
    private:
        queue nodes;
    public:
    pNode* root;
        pTree();
        ~pTree();
        void clear(pNode* ptr);
        bool isEmpty();
        void insert(pNode* ptr, string word, bool wordBits[]);
        void remove(pNode* ptr, string word, bool wordBits[]);
        void levelOrder(pNode* ptr, bool isRoot);
        bool contains(pNode* ptr, string word, bool wordBits[]);
};
pTree::pTree()
{
    root = NULL;
}

Just as a preview, I will show you the whole Patricia tree that we will be working with, which has every word inserted into the tree.
(Note that we are working with the same ten words from the trie tutorial)
That’s a big tree there.  Yep, looks like now you’re playing in the big leagues!  You’ll also realize you’re in the big leagues because you’re working with this bad boy, a chart of every word in the tree translated to binary:
So how do we code all the functions of this data structure on steroids? Let’s find out!


We’ll start out with one of the easy functions:

bool pTree::isEmpty()

{
    return root == NULL;
}

Okay, no surprises there… it’s just like any other isEmpty function of a linked data structure.  Now you will see another familiar-looking function, contains.
//For contains to work, pass head as ptr in the first function call. Also, if you are passing wordBits to a function, what that means is that you are passsing the array of bools that corresponds to the string you are passing.

bool pTree::contains(pNode* ptr, string word, bool wordBits[])
{
    if(isEmpty())//1
        return false;
    if(!(ptr->isLeaf))//2
    {
        if(wordBits[ptr->index]==false)
        {
            return contains(ptr->zero, word, wordBits);
        }
        else
        {
            return contains(ptr->one, word, wordBits);
        }
    }
    else//3
    {
        if(word.compare(ptr->data) == 0)
            return true;
        else
            return false;
    }
}

1. If the tree is empty, then the tree does not contain the string you are looking for, for obvious reasons.



2. Only leaves contain strings, so the function recurses to get to a leaf. If our string’s bool at the index of the current node (ex if the current node has 1 as its index we check the second bool in our bit array) is 0/false, we call the next recursion in the zero/left subtree of the current node. If the bool is 1/true, we call the next recursion in the one/right subtree of the current node.


3. When we reach a leaf, we check if the leaf’s data is the same string we are searching for. If it is, then the tree contains the string so the function returns true. Otherwise, the tree does not contain the string so the function returns false.






You are now about to learn the really hard functions: insert and remove (which, by the way, wasn’t even in the Patricia trees we used when I took COMP 15). Between these functions, there are 213 lines of code. But before we start that, there’s a function I’d like to introduce to you: firstDiff.


In the insert function, we will be inserting, and one part of the insert function is a recursion section in which we recurse until we reach a leaf. Once we reach that leaf, we compare the string in the leaf to the string that we are trying to insert to find the first bit which is different between the two strings.


We could do that by simply comparing the bit arrays, but in a real Patricia tree, where a bit array is never passed, we first find the first character that is different between the two strings and then we find the first bit that is different.


This is a confusing process since we are using a new operator called the bitwise &, but to simplify the process while still starting the search for the first different bit with a search for the first different character, we use the function firstDiff to find the first different character:

int firstDiff(string wordA, string wordB)//1

{
    if(wordA.compare(wordB)==0)//2
        return -1;
    else
    {
        bool foundDiff = false;//3
        unsigned int diff = 0;
        while(!foundDiff)
        {
            if(diff==wordA.length()||diff==wordB.length())//4
                foundDiff = true;
            else
                if(wordA.at(diff) != wordB.at(diff))//5
                    foundDiff = true;
            if(!foundDiff)//6
            diff++;
        }
    return diff;
    }
}

1. The first thing I would like you to note is that this is not a member function of the pTree class; it can be used on any two strings in and out of the class.



2. If the two strings are the same, then -1 is returned, signifying that the two strings are the same.


3. foundDiff is the value that tells us whether or not we have found the difference between the two strings. diff is the iterator that keeps increasing until we have found the first difference between the two strings, changing foundDiff to true.


4. If diff is equal to the length of one of the words, then one of the words does not have a character at position diff (analogous to the word’s null terminator), so foundDiff is true.


5. If the character at position diff is not the same in both words, then foundDiff is true.


6. If there is no difference between the strings at position diff, then diff increases by one. Otherwise, diff is returned as the position at which we see the first difference between the two strings.


//For insert to work, pass root as ptr in the function call.

void pTree::insert(pNode* ptr, string word, bool wordBits[])
{
    if(isEmpty())//1
    {
        root = new pNode;
        root->data = word;
        root->isLeaf = true;
        root->index = -1;
        for(int i = 0; i < 40; i++)//2
        {
            root->bits[i] = wordBits[i];
        }
        root->zero = NULL;
        root->one = NULL;
    }
    else
    {
        if(!(ptr->isLeaf))//3
        {
            if(wordBits[ptr->index] == false)
                insert(ptr->zero, word, wordBits);
            else
                insert(ptr->one, word, wordBits);
        }
        else
        {
            if(word.compare(ptr->data) == 0)//4
                cout<<"Error: This word is already in the tree!”;
            else
            {
                int bitLoc = 8 * firstDiff(word, ptr->data);//5
                int firstDiff = -1;
                for(int i = bitLoc; i < bitLoc + 8; i++)
                    if(wordBits[i] != ptr->bits[i] && firstDiff == -1)
                firstDiff = i;
                pNode* diffNode = new pNode;//6
                diffNode->data = "";
                diffNode->isLeaf = false;
                diffNode->index = firstDiff;
                diffNode->zero = NULL;
                diffNode->one = NULL;
                if(wordBits[firstDiff] == false)//7
                {
                    diffNode->zero = new pNode;
                    diffNode->zero->data = word;
                    diffNode->zero->isLeaf = true;
                    diffNode->zero->index = -1;
                    diffNode->zero->zero = NULL;
                    diffNode->zero->one = NULL;
                    for(int i = 0; i < 40; i++)
                        diffNode->zero->bits[i] = wordBits[i];
                }
                else
                {
                    diffNode->one = new pNode;
                    diffNode->one->data = word;
                    diffNode->one->isLeaf = true;
                    diffNode->one->index = -1;
                    diffNode->one->zero = NULL;
                    diffNode->one->one = NULL;
                    for(int i = 0; i < 40; i++)
                        diffNode->one->bits[i] = wordBits[i];
                }
                if(root->isLeaf)//8
                {
                    if(wordBits[firstDiff] == false)
                        diffNode->one = root;
                    else
                        diffNode->zero = root;
                    root = diffNode;
                }
                else
                {
                    pNode* before = root;//9
                    bool placed = false;
                    while(!placed)//10
                    {
                        if(wordBits[before->index] == false)
                        {
                            if(before->zero->index < firstDiff && before->zero != ptr)
                                before = before->zero;
                            else
                                placed = true;
                        }
                        else
                        {
                            if(before->one->index < firstDiff && before->one != ptr)
                                before = before->one;
                            else
                                placed = true;
                        }
                    }
                    if(before == root && firstDiff < root->index)//11
                    {
                        if(diffNode->zero == NULL)
                            diffNode->zero = root;
                        else
                            diffNode->one = root;
                        root = diffNode;
                    }
                    else//12
                    {
                        if(wordBits[before->index] == false)
                        {
                            if(diffNode-> zero == NULL)
                                diffNode->zero = before->zero;
                            else
                                diffNode->one = before->zero;
                            before->zero = diffNode;
                        }
                        else
                        {
                            if(diffNode-> zero == NULL)
                                diffNode->zero = before->one;
                            else
                                diffNode->one = before->one;
                            before->one = diffNode;
                        }
                    }
                }
            }
        }
    }
}

Now for the explanation of all this code. Here goes!


1. If the tree is empty, then we make a new leaf node at root and initialize its values. Its pointers both point to NULL, its string is the string we passed in the function, its isLeaf is true, and its index is -1.

2. We iterate through the bit array we passed as wordBits to set the node’s bits one by one. You’ll be seeing that section of code a lot here.

3. Much like with the contains function, there is a recursion section of the function in which we recurse through the tree until we find a leaf node, which works the same way as the recursions in the contains function.

4. When we do find a leaf, we check if the leaf’s data matches the string that we are trying to insert into the tree. If it does, then we are inserting a duplicate value, and the function tells us this and then ends.

5. If we are not making a duplicte entry, then we now check to find the first bit at which the string we are entering and the string at the cureent node differ. To do that, first we find the first character at which they differ, which is found by using the firstDiff function. By definition, a character is a set of eight bits, so if there are two different characters, they can’t have the same eight bits. Therefore, the first bit to differ in the two strings must be one of the eight bits of the first character to differ.

So first, we find the index of the first character to differ between the two strings.

Then, we iterate through the eight bits of that character. To do this, we first create a variable equal to eight times the index of the character. This variable is bitLoc.

Then, we do a for loop to find the first bit of this character to differ. Since that character is eight bits long, we iterate through the bits ranging from the index of the first bit of the character, bitLoc, to the last bit of the character, bitLoc + 7.

The first time we find a bit that differs, we set firstDiff equal to that index. After that, since firstDiff no longer equals the dummy value of -1, any other differences between the bits are not taken into account.

6. We then create a node diffNode. This node is not a leaf, so its isLeaf is set to false, its string is set to “”, we don’t even bother initializing its bit array, its index is set to firstDiff, and for now its pointers will both be set to NULL, but not for long.

7. If the bit of the word we are inserting at the index of firstDiff is a zero/false, that means that the new node is supposed to be to the left/zero of diffNode, so its values are then initialized. If that bit is a one/true, then we do the same thing to the right.

8. If root is a leaf, then that means that the string we compared to the string we are inserting is the only string in the tree. Because of this, whichever pointer of diffNode is not occupied points to this node; at diffNode’s index, if the bit array of the string being inserted has a zero then the bit array of this other string has to have a one and vice versa.

9. Otherwise, we need to have a pointer pointing to the node that is to be before diffNode.

We know that the index of any node that is before diffNode must be lower than diffNode’s index.

Therefore, we have before iterate through the tree following the same path that was followed by ptr’s recursions until the node it is about to reach has an index greater than diffNode’s index.

10. We keep track of this using a bool called placed. When placed is true, before is placed at the node that has a pointer pointing to where we want diffNode to be. Before iterates through the tree, and the pointer it follows from each node is chosen based on the string’s bit that is at the current node’s index; therefore, before goes toward ptr’s leaf in the same direction ptr went.

However, before we make this pointer point to another node, the node that before is about to point to is checked; if the index of the node before is about to point to is greater than the index of diffNode or the node that before is about to point to is ptr’s leaf, then that means that before is now at the position that is before where we want to insert diffNode, so placed is set to true and the loop ends. Otherwise, placed remains set to false, before now points to the node it was about to point to before being checked, and the loop continues.

11. If before is root, then that means one of two things. Either diffNode is supposed to be the new root or diffNode goes directly to the left/zero or right/one of the root. If before is root and diffNode’s index is less than root’s index, then that means that diffNode MUST be the new root node.

Because of this, if diffNode’s zero pointer is pointing to the node we inserted (the string’s bit at diffNode’s index is a zero), then diffNode’s one pointer now points to the root node and the rest of the tree is diffNode’s right/one subtree. If diffNode’s one pointer is pointing to the node we inserted (the string’s bit at diffNode’s index is a one), then diffNode’s zero pointer now points to the root node and the rest of the tree is now diffNode’s left/zero subtree. After this, we have root’s pointer point to diffNode to make diffNode the new root node.

12. If before does not point to root’s node or root’s index is less than diffNode’s index, then that means that before has to be the node that has one of its pointers pointing to diffNode.

If the bit at before’s index is a zero, then before’s zero node points to diffNode and the pointer in diffNode that isn’t pointing to the leaf node of the string we are inserting is now pointing to before’s zero/left subtree.

If the bit at before’s index is a one, then before’s one node points to diffNode and the pointer in diffNode that isn’t pointing to the leaf node of the string we are inserting is now pointing to before’s one/right subtree.

Confused? Have a picture or four!



 

 
Well, that was a lot of text and pictures.  Here’s  some more for the remove function!
//Invariant: Remove is not to be used on an empty tree.
//To make it so that remove works, pass head as ptr.
void pTree::remove(pNode* ptr, string word, bool wordBits[])
{
    if(root->isLeaf)//1
    {
        if(word.compare(root->data) == 0)
        {
            delete root;
            root = NULL;
        }
        else
        {
            cout<<"Error: This word is not in the tree!”;
        }
    }
    else
    {
        if(word.compare(ptr->zero->data)!=0&&word.compare(ptr->one->data)!=0)//2
        {
            if(wordBits[ptr->index] == false && !(ptr->zero->isLeaf))//3
                remove(ptr->zero, word, wordBits);
            else if(wordBits[ptr->index] == true && !(ptr->one->isLeaf))
                remove(ptr->one, word, wordBits);
            else
                cout<<"Error: This word is not in the tree!";
       
}
        else//4
        {
            if(ptr == root)//5
            {
                if(word.compare(ptr->zero->data)==0)
                {
                    delete ptr->zero;
                    root = ptr-> one;
                    delete ptr;
                }
                else
                {
                    delete ptr->one;
                    root = ptr->zero;
                    delete ptr;
                }
            }
            else
            {
                pNode* before = root;//6
                while(before->zero != ptr && before->one != ptr)//7
                {
                    if(wordBits[before->index] == false)
                        before = before -> zero;
                    else
                        before = before -> one;
                }
                if(before->zero == ptr)//8
                {
                    if(word.compare(ptr->zero->data) == 0)
                    {
                        before->zero = ptr->one;
                        delete ptr->zero;
                        delete ptr;
                    }
                    else
                    {
                        before->zero = ptr->zero;
                        delete ptr->one;
                        delete ptr;
                    }
                }
                else
                {
                    if(word.compare(ptr->zero->data) == 0)
                    {
                        before->one = ptr->one;
                        delete ptr->zero;
                        delete ptr;
                    }
                    else
                    {
                        before->one = ptr->zero;
                        delete ptr->one;
                        delete ptr;
                    }
                }
            }
        }
    }
}

1. If root’s node is a leaf, we check to see if root’s node’s data matches the string we want to remove. If it does, then we delete root’s node and the tree is now empty. Otherwise, the tree does not contain the string we want to remove and the program tells us this.



2/3. We now have a recursion section in which we recurse through the tree until we find the node that is THE PARENT OF the leaf containing the string we want to remove (NOT THE LEAF ITSELF).


4. When we reach that node, we are now in the deletion section.


5. If ptr is pointing to root’s node, then one of root’s node’s pointers is pointing to the leaf containing the string we want to remove. We check which pointer that is and delete its node. We then have root point to the root node’s other child while root’s node is deleted.


6. If ptr is not pointing to root’s node, then we need to use a before node to find the node that is the parent of ptr’s node.


7. Before now iterates through the tree guided by the bit array of thestring we want to remove until before’s node has ptr’s node as one of its children.


8. We then take whichever one of before’s pNode*s is pointing to ptr and make it point to whichever one of ptr’s children is not the node with the string we want to remove. We then delete the node containing the string we want to delete and then we delete ptr as well, completing the removal.





We’re now on the home stretch!  All that’s left is levelOrder and the duo of functions known as clear and ~pTree!  All of them should be familiar to you when you compare them to their counterparts in binary search trees.
//For levelOrder to work, pass head as ptr and pass true for isRoot.
void pTree::levelOrder(pNode* ptr, bool isRoot)
{
    if(isRoot)
        nodes.push(ptr);
    if(ptr -> zero != NULL)
        nodes.push(ptr->zero);
    if(ptr -> one != NULL)
    {
        nodes.push(ptr->one);
    }
    if(nodes.front()->isLeaf)//1
    {
        for(unsigned int i = 0; i < nodes.front()->data.length(); i++)
            cout<data.at(i);
        cout<<' ';
    }
    else
    {
        cout<index << ' ';
    }
    nodes.pop();
    if(!nodes.empty())
        levelOrder(nodes.front(), false);
    else
        cout<<” ”;//This is supposed to be an endl, but endl isn’t compatible with Blogger.
}

1. The only thing in levelOrder that really changed is that now, we are dealing with two types of data to print. This if statement and the else statement make up the printing section of the function. If the current node is a leaf, then we will print its data (by the way, in order to print that string we need to print it one character at a time since string variables are incompatible with cout). Otherwise, we will print its index.



Now, here’s the clear function and the destructor:

//For clear to work properly, pass head as ptr.

void pTree::clear(pNode *ptr)
{
    if(!isEmpty())
    {
        if(ptr -> zero != NULL)
            clear(ptr -> zero);
        if(ptr -> one != NULL)
            clear(ptr -> one);
        delete ptr;
    }
    root = NULL;
}
pTree::~pTree()
{
    clear(root);
}

As you can see, the clear function and destructor are pretty much the same as the clear and destructor functions in binary search trees, doing a postorder traversal and deletion of every node in the tree.  Also, please note that the traversal only happens if the tree is not empty so that if the tree is empty a segfault doesn't happen due to accessing a node that isn't there.




You now know how to implement a Noobtricia tree. While this is not a real Patricia tree, every function of the real thing works the same way as the corresponding function of a Noobtrica tree, with a few differences you will see when learning about Patricia trees such as:


-Instead of passing a string and a corresponding array of boolean variables, you are passing a C-string and accessing its bits with the bitwise & operator (not covered in Stranded In Halligan).


-Since you are passing a C-string, you are not limited by array size anymore.


-You won’t be totally clueless when you learn about the real thing since you will have already learned just about every concept of Patricia trees in Noobtricia trees.






CONGRATULATIONS! You now know all of the concepts of the hardest data structure taught in COMP 15!

Lesson 9. Tries

Lesson 9. Tries

NOTE: For whatever reason, on this post Blogger does not show it when I cout an endl, so if you see a single less than sign where there's supposed to be a stream operator, it's probably supposed to be printing an endl there.



So, you just learned about binary search trees, which can store data of enumerated types like ints and chars and cut the search time down to logarithmic runtime. The next question, though, is how do we make a tree like that for strings? Well, if we think of strings as being made of characters, then that would mean that we could possibly have a data structure that would let us reTRIEve a string from a tree that searches for the string one character at a time. That data structure is called a trie, and its nodes have this struct:


struct tNode

{
    vector links;
    string isString;
    char letter;
};


Each tNode has a char, letter, which stores the char that it carries.



Its string isString is an empty string if the char it stores is not the last char of a string THAT IS IN THE TREE (I’ll get back to this point later.)


Links is where it gets interesting. Since each tNode stores one letter of a string, we have a large possible number of pointers, one pointing to a node holding each char. Now, let’s say we’re only working with the 26 letters of the alphabet. That means we would have an array of 26 pointers to tNodes. That is highly space-inefficient and time-inefficient to have to make and initialize all these pointers that may or may not even point to a tNode.


Because of this, instead of having an array of tNode*s, we have a VECTOR of tNode*s, which grows and shrinks depending on the number of tNodes that each tNode has pointers to. (This tutorial will use the vector’s member functions, which are available at cplusplus.com, but they are for the most part self-explanatory due to how they are named).


Now that you know how a tNode works, I will show you the tNode class and its constructor:

class Trie

{
    public:
        tNode* root;
        Trie();
        ~Trie();
        bool isEmpty();
        void remove(string word, unsigned int index, tNode* ptr);
        bool contains(string word, unsigned int index, tNode* ptr);
        void insert(string word, unsigned int index, tNode* ptr);
        void levelOrder(tNode* ptr);
    private:
        void clear(tNode* ptr);
        queue nodes;
};
Trie::Trie()
{
    root = new tNode;
}

One thing you should know about tries is that in my implementation of tries, the root is a dummy node; its letter and isString values are not initialized since this dummy node is only used for its vector of pointers to the tNodes containing the first characters of the strings stored in the trie. However, since that vector is the only useful part of the root node, the rest of the node doesn’t have to be used in an implementation of a trie.



Before we continue, I am going to tell you how to read my drawings of tries.


1. Each tNode has two halves. The top half contains the node’s char (not in quotation marks) if its isString is empty (in other words, the node holds a char that ISN’T the last char of a string that is in the trie). The top half contains the node’s string in quotation marks if isString is not blank (in other words, the node holds a character that is also the last character of a string that IS in the trie). The bottom half represents the vector of pointers to tNodes, with the chars in the vector representing the chars of the tNodes that each pointer points to.


2. In the root node, since the root node’s letter and isString are uninitialized due to the fact that they are not used in the trie, the root node’s top half has a nullset in it.


3. If a tNode’s vector is just a box with a nullset in it, that means that the tNode has an empty vector and therefore does not have any pointers pointing to other tNodes.






Now that you know how to read this drawing, here’s a drawing of a trie holding ten words:

 
Notice here that each level of the trie holds a char at a different position in whatever strings it holds. Also, notice that there are no pointers to NULL; the vectors take care of the pointers for us by completely removing any pointers we don’t need, so we don’t need to have any pointers initialized to point to NULL.







Because the root node is always in the trie, in a trie the isEmpty function looks a little different from other isEmpty functions:

bool Trie::isEmpty()
{
    return root->links.empty();

}

Since a trie’s root node only contains a vector of POINTERS TO the tNodes that contain the first chars of every string in the trie, if the root node’s vector is empty, that means that the root node does not contain any pointers.  Therefore, if the root node’s vector is empty, then the trie does not hold any strings, so the trie is empty.  This is what an empty trie looks like:
Now, you will see the contains function, which checks to see if the trie contains the string we are searching for.
//For contains to work, in the first call of the function pass 0 as index and pass root as ptr.
bool Trie::contains(string word, unsigned int index, tNode* ptr)
{
    if(index < word.length())//1   
    {
        for(unsigned int i = 0; i < ptr->links.size(); i++)//2
        {
            if(ptr->links.at(i)->isString.compare(word) == 0)
                return true;//3
            if(ptr->links.at(i)->letter == word.at(index))
                return contains(word, index+1, ptr->links.at(i));//4
        }
    }
    return false;//5
}

1/5. If the index is less than the length of the word we are searching for, then we look through the current node’s vector for a pointer to a node holding the character of the string at index’s position (ex in the first recursion, when index is 0, we look for a pointer to a node holding the first letter of the string). If the index is at the length of the word, then we are on the node holding the last character of the string, but it is not holding the string we are searching for, so the function returns false.



2. At each node, we iterate through its vector looking for a pointer to a node holding the string’s character at index’s position. For the iterator we use an unsigned int so that is compatible with ints returned from member functions of vectors and strings.


3/4. When the iterator reaches a pointer that points to a node holding the string’s character at index’s position…


If index is less than length - 1, then that pointer brings us to a new recursion of the function with index incremented by one; in that next recursion we will be looking for a pointer to a node with the next character.


If index is length - 1, then the node we found contains the last character of the string as its letter. Because of this, whether or not the trie contains the string we are looking for depends on this node’s isString; if isString contains the string we are looking for, the function returns true. Otherwise, another recursion starts on this node, but index is equal to length so the function returns false.


5. This line is not an else because false is supposed to be returned either if index is at length (we are on a recursion at the node containing the last character of the string but its isString is blank) or another return statement did not start the next recursion of the function (after iterating through the node’s vector, the function could not find a node containing the next char).


Now, here is a picture of the contains function. Please note that that in each drawing, for each recursion the node that ptr points to is the top node of the recursion ring. The node the arrow points to is the node that ptr will point to in the next recursion.

Now, I will show you the insert function for a trie:
//For insert to work properly, in the first call of insert pass 0 as index and root as ptr.

void Trie::insert(string word, unsigned int index, tNode* ptr)
{
    unsigned int loc = -1;//1
    for(unsigned int i = 0; i < ptr->links.size(); i++)
    {
        if(ptr->links.at(i)->letter==word.at(index))
            loc = i;
    }
    if(loc == -1)//2
    {
        loc = ptr->links.size();
        tNode* add = new tNode;
        add->letter = word.at(index);
        if(index == word.length()-1)//3
            add->isString = word;
        else
            add->isString = "";
        ptr->links.push_back(add);
        if(index < word.length() - 1)//4
            insert(word, index+1, ptr->links.back());
    }
    else
    {
        if(index < word.length() - 1)//5
            insert(word, index+1, ptr->links.at(loc));
        else//6
        {
            if(word.compare(ptr->links.at(loc)->isString) == 0)
                cout<< "Error: This word is already in the trie"<
            else
                ptr->links.at(loc)->isString = word;
        }
    }
}

1. For whatever node we are currently on, we iterate through its vector. If we find a pointer to a node containing the character at index’s position, then the pointer’s position in the vector is recorded in loc over its dummy value of -1.



2/3. If loc remains at -1 after the iteration (either the vector does not contain a pointer to a node with the character at index’s position or the vector is empty), then we create a pointer to a node containing the character at index’s position. If this character is the last character of the string, then we set its isString to the string we are inserting; otherwise we set this value to an empty “”. We then push this pointer into the vector of the node we are currently on.


4. If the pointer that we just created does not point to the node holding the last character of the string, then that means there are more characters to insert, so we start a new recursion of the function by incrementing index by 1 and passing the pointer we just inserted as ptr.


5. If loc is not -1, then that means that in our iterations through the current node’s vector we found a pointer to a node containing the char at index’s position. If the current index is not length-1, then we will start a new recursion at the node it that the pointer points to with index incremented by 1.


6. If we are at position length - 1, then if the node that the pointer at position loc of the vector points to currently holds the string that we want to insert, then that means that the trie already contains the string we want to insert. Otherwise, we set the isString of this node equal to the string we are trying to insert.

Now I will show you the remove function.
//Invariant: Remove is not to be used on an empty trie.

//For remove to work, in the first call of the function pass 0 as index and pass root as ptr.
void Trie::remove(string word, unsigned int index, tNode* ptr)
{
    int hasChar = -1;//1
    for(unsigned int i = 0; i < ptr->links.size(); i++)
    {
        if(ptr->links.at(i)->letter == word.at(index))//2
        {
            if(word.compare(ptr->links.at(i)->isString) == 0)
            {
                hasChar = i;
                ptr->links.at(i)->isString = "";//3
            }
            else
            {
                if(index < word.length() - 1)//4
                {
                    hasChar = i;
                    remove(word, index + 1, ptr->links.at(i));
                }
                else//5
                    cout<<"Error: This word is not in the trie!"<
            }
        }
    }
    if(hasChar != -1)//6
    {
        if(ptr->links.at(hasChar)->isString.compare("") == 0 && ptr->links.at(hasChar)->links.empty())
        {
            delete ptr->links.at(hasChar);
            ptr->links.erase(ptr->links.begin()+hasChar);
        }
    }
    else//7
        cout<<”Error: This word is not in the trie!”<
}

1/2. Like with loc in insert, in remove we have an integer variable hasChar. In each recursion, we iterate through the current node’s vector to try and find a pointer to a node holding the character at index’s position; if we do, then hasChar is set to that pointer’s position in the vector. Like loc, hasChar is initialized to a dummy value of -1.




3. If that pointer’s isString is the same string as the word we want to remove, then we set that pointer’s isString to be an empty string.


4. Otherwise, if index is still less than the length of the string we want to remove minus one, then we start another recursion of the function using the pointer we found as ptr and incrementing index by one.


5. However, if index is equal to the length of the string we want to remove minus one, then we have already searched every character of the string and found that the trie does not contain the string we want to remove.


6. After this, in each recursion of the function we still have our pointers that we used for starting the next recursion in place.


To delete unused space, if the vector of a node being pointed to (by a pointer in that recursion’s node’s vector) is empty and that node’s isString is blank, then the node is not in use since only nodes with strings can have empty vectors, so we delete that node, and in the vector that contains the pointer to that node, the pointer is erased.


Since each recursion is halted at the call of the next recursion until that recursion ends, this recursive deletion of nodes happens with the last node deleted first in reverse order, causing a deletion chain reaction until we reach a node that is in use either due to having a non-empty vector or due to being the last character of a string.


7. If hasChar is -1, then the trie does not contain the letter of the string at index’s position, so the trie does not contain the string we want to remove.

Now, we will learn how to do a level-order traversal of a trie.



A trie’s level-order traversal is very similar to a binary tree’s level-order traversal. The only real difference is that now we need to account for the fact that different nodes have different numbers of children. However, since we can iterate through the vector of pointers to each node’s children, that isn’t too hard.

//For levelOrder to work, pass root as ptr.

void Trie::levelOrder(tNode* ptr)
{

    for(unsigned int i = 0; i < ptr->links.size(); i++)//1
        nodes.push(ptr->links.at(i));
    if(ptr != root)//2
    {
        if(ptr->isString.compare("")==0)//3
            cout<letter<<"# ";
        else//4
        {
            for(unsigned int i = 0; i < ptr->isString.length(); i++)
                cout<isString.at(i);
            cout<<"! ";
        }
        nodes.pop();
    }
    if(!nodes.empty())//5
        levelOrder(nodes.front());
    else
        cout<
}

1. In each node, we iterate through its vector to find every child of that node, and we push the pointer to each child into the queue.




2. The code in this if statement is for printing the data of each node. If the node we are currently on is the root, we do not execute this section of the function because otherwise the first node is popped before its data gets printed.


3. If the node that ptr is pointing to has a blank string at isString, then we print the pointer’s letter and a # sign.


4. If the node that ptr is pointing to is holding a string, then we print the pointer’s word followed by an exclamation point (note that because strings are incompatible with cout, the string has to be printed one character at a time.)


5. Pretty much the same as the end of the BST level-order traversal function; if the queue is not empty, then start a recursion on the front element of the queue, otherwise print an endl.


The result is that it prints in order of which position each char is (look at the results below, then follow along with the complete trie at the top of the tutorial), printing the nodes of the first chars first, then the nodes of the second chars, and so on.


For example, this is the result of a level-order traversal of the trie at the beginning of this tutorial (the printing was separated so that you can see the chars being printed as the first chars and one-char strings at the first level, the second chars and two-char strings at the second level, and so on).


(First characters)T# A! (Second characters)H# A# AT! L# (Third characters)THE! A# I# L# ALL! (Fourth characters)THEM! THEY! THAT! THIS! TALL! ALLY!


Finally, I will show you the clear function for a trie, as well as the class destructor.

void Trie::clear(tNode* ptr)

{
    for(unsigned int i = 0; i < ptr->links.size(); i++)
    {
        clear(ptr->links.at(i));
    }
    delete ptr;
}
Trie::~Trie()
{
    clear(root);
}

Essentially what happens is that we start at the root of the trie and iterate through its vector, call recursions of the function on each of the root’s children, and then do the same with its children on each recursion. In each recursion, after clear is called on each of a node’s children, that node is deleted. This therefore deletes every node of the trie, including the root node, which is why it is that clear is only to be called in the trie class destructor. The destructor-clear duo is essentially the same for tries as it is for BSTs, and as you will find in the next tutorial, it also works the same way in Patricia trees.




That’s it for tries. In the next tutorial, you will be learning about Patricia trees, the infamous data structure that is actually pretty awesome when you know how to implement them. How is it awesome? Well, you really get down and dirty with every string by looking not just at the character level but the BINARY level!