Wednesday, September 8, 2010

Lesson 7. Introduction to trees

Lesson 7. Introduction to trees


Now that you’re on this tutorial, you can undeniably call yourself a programmer. You are about to learn the data structure that all the big shots use, and that can be made into many different variations. This data structure is called the tree. Trees are extremely versatile; between all of their variations there are versions for storing just about any data type. They can be hard to learn, but once you get good at them, you have a data structure that can turbo charge your computer program, doing things like reducing the runtime of certain functions from O(n), which is pretty good, to an awesome O(lgn) (log base 2 of n).

Before we start, I will give you some tree vocabulary:

Item/node- Regardless of whether or not the tree is linked, we often refer to items of data in the tree as nodes.

Child- Each node can be connected to a certain number of other nodes; the nodes that one node is connected to is called the node’s children

Parent- The node that a child is connected to is its parent

Binary tree- A tree in that is made so that each node has up to two children.

Root- The topmost node of the tree

Leaf- A node in the tree with no children

As a data structure, trees have this as their general shape (the tree below is a binary tree):

Like many other data structures we have seen, a tree can be implemented either as an array or as a linked data structure. Here is what an array-implemented binary tree looks like.
 

 
If you make a tree class with the binary array tree, then when you are inserting a node into the tree, you will probably want to use a dynamic array for the tree and then have a doubleSize function for when the array is completely full. This is because of the fact that each node can have up to two children, so to make sure there is room for every node to have two children, the array would need to double in size so that the last parent node can have array[2x] and array[2x+1] as the spaces for its children. If we doubleSized this array, then array[4] would have room for its children at array[8] and array[9], array[5] would have room for its children at array[10] and array[11], array[6] would have room for its children at array[12] and array[13], and array[7] would have room for its children at array[14] and array[15].


Like I said earlier, there is also a linked version of trees. Linked trees have a new type of node called a treeNode that looks like this:


The treeNode’s struct would look like this:

//Note: since this is a treeNode for a binary tree, the structure is often named accordingly; in the following functions, the treeNode will be referred to as a bNode, and is the node of a binary search tree.

struct treeNode
{
    int data;
    treeNode* left;
    treeNode* right;
};

This is the treeNode for a binary tree. However, there are many variations to the treeNode. You could give the tree three children by also adding treeNode* middle in the struct. You could give the tree four children by adding yet another treeNode* to the struct. You could give the tree an even larger number of children by adding a dynamic array of treeNode*s! You could give it a treeNode* that links to its parent, you could give the treeNode a boolean variable called isLeaf to keep track of whether or not the node is a leaf, and there are many more possibilities. However, in this lesson we will base the functions on binary trees; all functions with code given are member functions of the binary search tree class that aren’t that different in other types of trees.


A linked version of the array tree we saw earlier looks like this:

 
The advantage an array tree has over a linked tree is that the array property makes it so that we can have indexing. The advantage a linked tree has over an array tree is that in linked trees, you don’t waste as much space with functions like doubleSize, so they are space-efficient. Both will be used in our programs, but we will use the linked tree more often; the functions we show here will be based on a linked tree.


Another thing you should notice about trees is that they are recursive data structures. A tree, no matter, what size, is still a tree. Each node’s child is actually the root of a subtree.

 
Now that you can see the recursive nature of the tree, I will show you two member functions of a binary search tree that take advantage of that in traversals of the tree.


We will start with a traversal function:

//Note: To traverse the whole tree with an inorder traversal, in the original recursion of the function, pass root as ptr.

void BST::inOrder(bNode* ptr)
{
    if(ptr -> left != NULL)//1
        inOrder(ptr -> left);
    cout<< ptr -> data << endl;//2
    if(ptr -> right != NULL)//3
    inOrder(ptr -> right);
}

Wow! With that code, we just traversed a whole tree with just five lines of code!


1. In each recursion, the function checks if the node being pointed to has a left pointer pointing to a node. If it is pointing to a node, then another recursion of the function is called with the current node’s left pointer as ptr.

2. The function prints the current node’s data.

3. After this, the function checks if the current node’s right pointer points to a node. If it is, then another recursion of the function is called with the current node’s right pointer as ptr.



The output would look something like this:

10 20 40 50 30 60 70

The way it works is as follows:

1. In the first recursion, the root’s data (10) is printed, and then since the root has a left child, the root’s left child is passed to a second recursion of the traverse function.

2. In the second recursion, the root’s left child is used as the root and its data (20) is printed, and then since the root’s left child has a left child, the root’s left left grandchild child is passed to a third recursion of the traverse function.

3. In the third recursion, the root’s left left grandchild is used as the root and its data (40) is printed, but this node has no children so the third recursion ends.

4. Back in the second recursion, the root’s left child has a right child, so the root’s left right grandchild is passed to a fourth recursion of the traverse function.

5. In the fourth recursion, the root’s left right grandchild is used as the root and its data (50) is printed, but this node has no children so the fourth recursion ends. Since the fourth recursion was the last function call of the second recursion, the second recursion ends too.

6. Back in the first recursion, the root has a right child, so the right child is passed to the traverse function as the root, and a fifth recursion begins.

7. In the fifth recursion, the root’s right child is used as the root and its data (30) is printed, and then since the root’s right child has a left child, the root’s right left grand child is passed to a sixth recursion of the traverse function.

8. In the sixth recursion, the root’s right left grandchild is used as the root and its data (60) is printed, but since this node has no children, the sixth recursion ends.

9. Back in the fifth recursion, the root’s right child has a right child, so the root’s right right grandchild is passed as the root for a seventh recursion of the traverse function.

10. In the seventh recursion, the root’s right right grandchild’s data (70) is printed, but since this node has no children, the seventh recursion ends. Since the seventh recursion was the last function call of the fifth recursion, the fifth recursion ends. Since the fifth recursion was the last function call of the first recursion, the first recursion ends, the recursion stack is empty, and the traversal is complete.

This is what the recursion stack looks like, with the nodes at the top being at the right:

1. Root

2. Root, left child

3. Root, left child, left left grandchild

4. Root, left child

5. Root, left child, left right grandchild

6. Root, left child

7. Root

8. Root, right child

9. Root, right child, right left grandchild

10. Root, right child

11. Root, right child, right right grand child

12. Root, right child

13. Root

14. Empty recursion stack

This is just one way to traverse a tree, called a preorder traversal because in each recursion the traversal prints the data of the root and then prints the data of the root’s children.

By re-ordering the function, you can get an inorder traversal, in which the data are printed “left to right” (left child, root, right child), or a postorder traversal, in which the children’s data are printed first, and then the function prints the root’s data.

There is also another type of traversal called level-order, in which every node of one level is printed before the nodes of the next level are printed. This means that the root’s data will be printed, followed by the data of all of the root’s children, followed by the data of all of the root’s grandchildren. This function uses a queue, which was included from the STL, and that queue, “nodes”, is a member data structure in the implement of binary search trees that you will see in the next tutorial.

//To traverse the whole tree correctly, in the first recursion of the function, pass root as ptr and true for isRoot.

void BST::levelOrder(bNode* ptr, bool isRoot)//1
{
    if(isRoot)//1
        nodes.push(ptr);
    if(ptr -> left != NULL)//2
        nodes.push(ptr->left);
    if(ptr -> right != NULL)
        nodes.push(ptr->right);
    cout<< nodes.front()->data << ' ';//3
    nodes.pop();
    if(!nodes.empty())
        levelOrder(nodes.front(), false);//4
    else
        cout<
}

1. isRoot is passed as true so that the root node’s pointer is added to the queue.


2. If the tree has left and right children, they are enqueued.

3. The front node of the queue (the root of the queue during the first recursion) is accessed and its data gets printed. This node is then popped from the queue.

4. Another recursion starts on whatever node is currently in the front of the queue with isRoot passed as false so the front node is not re-added to the queue in the next recursion. If the queue is empty, though, the function ends.

The queue and what has been printed is as follows:


Current node: Node 1

Queue: Node 1

Printed:

The function starts, and the first node is enqueued.

Current node: Node 1

Queue: Node 2, Node 3

Printed: 10

Node 1 has two children, Nodes 2 and 3, which are enqueued, and Node 1 is dequeued and printed. The next recursion uses node 2 as the root.



Current node: Node 2

Queue: Node 3, Node 4, Node 5

Printed: 10 20

Node 2 has two children, Nodes 4 and 5, which are enqueued, and Node 2 is dequeued and printed. The next recursion uses node 3 as the root.



Current node: Node 3

Queue: Node 4, Node 5, Node 6, Node 7

Printed: 10 20 30

Node 3 has two children, Nodes 6 and 7, which are enqueued, and Node 3 is dequeued and printed. The next recursion uses node 4 as the root.



Current node: Node 4

Queue: Node 5, Node 6, Node 7

Printed: 10 20 30 40

Node 4 has no children, so nothing is enqueued. Node 4 is dequeued and printed. The next recursion uses node 5 as the root.



Current node: Node 5

Queue: Node 6, Node 7

Printed: 10 20 30 40 50

Node 5 has no children, so nothing is enqueued. Node 5 is dequeued and printed. The next recursion uses node 6 as the root.



Current node: Node 6

Queue: Node 7

Printed: 10 20 30 40 50 60

Node 6 has no children, so nothing is enqueued. Node 6 is dequeued and printed. The next recursion uses node 7 as the root.



Current node: Node 7

Queue: Empty

Printed: 10 20 30 40 50 60 70

Node 7 has no children, so nothing is enqueued. Node 7 is dequeued and printed. The queue is empty, so the function ends.



Once again, trees are shown to be a recursive data structure in this function.

In addition to traversal, there are other functions you will probably want to do with trees. Like with other data structures, you will want to add nodes to the tree and remove nodes from the tree. You also may want to search the tree for a certain item. However, the way that you do that is all dependent on what kind of tree you are making. In my tutorial series, you will be learning about several kinds of trees, and I will also give you a preview of the infamous Patricia trees as well. In the next tutorial, you will be learning about binary search trees, an interesting twist on binary trees that makes it so you can use the trees for a binary search that can reduce the runtime of your search to a fast O(lgn) runtime.

No comments :

Post a Comment