Wednesday, September 8, 2010

Lesson 8. Binary Search Trees

Lesson 8. Binary search trees


In case you forgot, one way of searching through a group of items is through linear search. This meant that we would put every item into a list, either an array or a linked list, and then search through the items one by one to find the item we are looking for. Overall, even if the list was sorted, the worst-case runtime of a linear search would be O(n), with that worst case being getting to the end of the list and finding that the item we are looking for is not there. With binary search trees, we have a way of searching for an item of any enumerated data type (in this tutorial I will be demonstrating with ints) and finding the item with a possible runtime of O(lgn). How does this work? I’ll show you!

A binary search tree is a binary tree, so its nodes have this implementation:

struct bNode
{
    int data;
    bNode* left;
    bNode* right;
}

Binary trees also have a pointer to a bNode called root, and the overall the BST and its constructor look like this:

class BST

{
    private:
        queue nodes;//Used for level-order traversals.
    public:
        bNode* root;
        BST();
        ~BST();
        bool isEmpty();
        bool binSearch(int num, bNode* ptr);
        void insert(int num, bNode* ptr);
        void remove(int num, bNode* ptr);
        void inOrder(bNode* ptr);//inOrder and levelOrder can be found on the last lesson.
        void levelOrder(bNode* ptr, bool isRoot);
        void clear(bNode* ptr);
};

BST::BST()
{
    root = NULL;
}

The isEmpty function looks like this:


bool BST::isEmpty()
{
    return (root == NULL);
}

So far, no surprises. Now to see the main attraction, which is how these trees work.


So, how do binary search trees work? Well, each node contains its data, as well as pointers to two other nodes.

This means that when we insert an int into the tree at the root, if the int is less than the root’s data, then we check the root’s left subtree to see if there is anywhere to insert the int. If the int is greater than the root’s data, then we check the root’s right subtree to see if there is anywhere to insert the int. To demonstrate, first I will show you a sample binary search tree with the nodes inserted. Then, I will show you the insert function. As you will soon see, we will be relying a lot on the tree’s recursive nature.

 
You heard right, more recursion. When you’re dealing with trees, often times it helps to intuitively think of a way to recurse through the subtrees. It highlights the recursive property, and it also looks a whole lot cooler and more elegant. To show you an example of the recursion, I will start by showing you how to search for an item in the tree with the search function.
 
//To make binSearch check the whole tree, pass root as search in the first recursion of the function.

bool BST::binSearch(int num, bNode* search)
{
    if(num == search -> data)//1
        return true;
    else
    {
        if(num < search -> data)//2
        {
            if(search -> left != NULL)//3
            {
                return binSearch(num, search-> left);//4
            }
            else
            {
                return false;//5
            }
        }
        else//6
        {
            if(search -> right != NULL)
            {
                return binSearch(num, search -> right);
            }
            else
            {
                return false;
            }
        }
    }
}

1. If the data of the node we are currently on matches the number we are looking for, that means that the tree has the number we are looking for, and the function returns true.


2. If the number we are searching for is less than the data of the node we are currently on, that means that if the number we are looking for is in the tree, then it can only be in the current node’s left subtree.

3. The function checks if the current node’s left pointer points to NULL. If it is not, that means that the left bNode* is pointing to another node, and the current node does have a left subtree.

4. Since there is a left subtree we can check, we return the result of checking the left subtree by calling another recursion of binSearch using the current node’s left bNode* as the root of the current node’s left subtree, and we return its results.

5. If the current node’s left bNode* is pointing to NULL, that means that the current node does not have a left subtree. Therefore, there is nowhere in the tree that could possibly be storing the number we are looking for, so binSearch returns false.

6. If the number we are looking for is greater than the data of the node we are currently on, that means that if the number we are looking for is in the tree, then it can only be in the current node’s right subtree, so we check the right subtree the same way we would check the left subtree.

 
As you can see, each recursion has a runtime of O(1). Each recursion deals with a data set half the size of the last recursion’s data set, and as a result, the search takes a runtime of O(lgn) since to search a tree of seven nodes, it only took three recursions, while to search a linked list of seven nodes, it would take an O(n) runtime of seven recursions.




This binary search tree’s recursive nature also applies to inserting more nodes into the tree.

//For insert to work, in the first recursion of insert, pass root as ptr.

void BST::insert(int num, bNode* ptr)
{
    if(ptr == root && ptr == NULL)//1
    {
        root = new bNode;
        root -> left = NULL;
        root -> right = NULL;
        root -> data = num;
    }
    else if(num == ptr -> data)//2
    {
        cout<< "Error: This number has already been inserted into the tree." << endl;
    }
    else if(num < ptr -> data)//3
    {
        if(ptr -> left == NULL)//4
        {
            ptr -> left = new bNode;
            ptr -> left -> left = NULL;
            ptr -> left -> right = NULL;
            ptr -> left -> data = num;
        }
        else//5
            insert(num, ptr -> left);
    }
    else//3, this is if num is greater than ptr -> data.
    {
        if(ptr -> right == NULL)//4
        {
            ptr -> right = new bNode;
            ptr -> right -> left = NULL;
            ptr -> right -> right = NULL;
            ptr -> right -> data = num;
        }
        else//5
            insert(num, ptr -> right);
    }
}

1. If the tree is empty, we create a new bNode at the root.


2. If you are inserting a number that was already in the tree, an error message is printed.

3. Here’s where the recursion gets familiar. With each recursion, until we get to an empty pointer that we can insert at without violating the binary search tree property, we call recursions of the current node’s left and right subtrees. This is just like calling recursions using binSearch!

4. If the pointer we are checking points to NULL, we create a new node for the item we are inserting that is linked to the current node by the pointer.

5. If the pointer we are checking doesn’t point to NULL, we call another recursion of the function by passing that pointer as search.

Here’s a picture of an insertion into a binary search tree.

 
As you can see, once more the recursions of code with a runtime of O(1) in a data structure which halves its size with each recursion give us a runtime of O(lgn).




Now, we will look at a function that’s a little harder to write, but still proves to look really fly with its recursion: the remove function.

//For remove to work on the whole tree, in the first recursion of the function, pass root as ptr.

//Invariant: This function will not be used on empty trees.
void BST::remove(int num, bNode* ptr)
{
    if(num < ptr -> data)//1
    {
        if(ptr -> left != NULL)//2
        {
            remove(num, ptr -> left);
        }
        else//3
        {

            cout<< "Error: This number isn't in the tree." << endl;
        }

    }
    else if(num > ptr -> data)//1
    {
        if(ptr -> right != NULL)//2
        {
            remove(num, ptr -> right);
        }
        else//3
        {
            cout<< "Error: This number isn't in the tree." << endl;
        }
    }
    else//4
    {
        bNode* turnHere = NULL;
        if(ptr->left != NULL && ptr -> right != NULL)//5
        {
            bNode* swap = ptr -> right;//6
            while(swap -> left != NULL && swap -> right != NULL)//7
            {
                if(turnHere == NULL)
                    turnHere = swap;
                swap = swap -> left;
            }
            int temp = ptr -> data;//8
            ptr -> data = swap -> data;
            swap -> data = temp;
            ptr = swap;
        }
        bNode* prev = root;//9
        if(ptr != root && turnHere == NULL)//10
        {
            while(prev -> left != ptr && prev -> right != ptr)
            {
                if(num < prev -> data)
                    prev = prev -> left;
                else
                {
                    prev = prev -> right;
                }
            }
        }
        else
        {
            if(ptr != root)//11
            {
                prev = turnHere;
                while(prev -> left != ptr)
                    prev = prev -> left;
            }
        }
        if(ptr -> left != NULL)//12

        {
            if(prev -> left == ptr)
                prev -> left = ptr -> left;
            else
                prev -> right = ptr -> left;
            delete ptr;
        }
        else if(ptr -> right != NULL)
        {
            if(prev -> left == ptr)
                prev -> left = ptr -> right;
            else
                prev -> right = ptr -> right;
            delete ptr;
        }
        else//13
        {
            if(prev -> left == ptr)
                prev -> left = NULL;
            else
                prev -> right = NULL;
            delete ptr;
        }
    }
}

This Jumbo-sized drawing will show you the remove function!

 
Finally, there is one more function pair you should know about: clear and the destructor.
 
//For clear to work on the whole tree, in the first recursion call clear with root passed as ptr.
void BST::clear(bNode *ptr)

{
    if(ptr -> left != NULL)
        clear(ptr -> left);
    if(ptr -> right != NULL)
        clear(ptr -> right);
    delete ptr;
    root = NULL;
}

BST::~BST()
{
    clear(root);
}

Clear deletes the tree one node at a time starting at each leaf and then going back up to the root. Does this look familiar yet? Look back to the traversal member function I told you about in the last lesson and you will remember…
 
//Note: To traverse the whole tree with an inorder traversal, in the original recursion of the function, pass root as ptr.

void BST::inOrder(bNode* ptr)
{
    if(ptr -> left != NULL)//1
        inOrder(ptr -> left);
    cout<< ptr -> data << endl;//2
    if(ptr -> right != NULL)//3
        inOrder(ptr -> right);
}

The inorder traversal, in which we call a recursion of the traversal on the current node’s left subtree, then print the data of the current node, and then call a recursion of the traversal on the current node’s right subtree!


You also should remember that by moving that cout statement so that it’s after both recursion calls, you can make a postorder traversal in which a node’s data will always get printed after its children’s data.

Now look back at the clear function and you will see that this is…

a postorder deletion! And since the current leaves are always the nodes to get deleted in each call of the recursion, that means that there will be no memory leaks from deleting a pointer too quickly.

After the clear function is finished, root points to NULL, making the tree empty once more. As you can see, the class destructor just calls clear.

That’s it for binary search trees. In the next couple of lessons you will be learning about tries, a type of tree used to store strings, and you will be getting a preview of another string-storage tree, the infamous Patricia trees.

No comments :

Post a Comment