Thursday, September 2, 2010

Lesson 1. Singly-linked lists, and a little bit on recursion too.

Basic nodes and singly-linked lists


If you remember from the end of COMP 11, there was something you learned about called a linked list, as well as a struct that linked lists are made of called a Node. These are the building blocks of many of the data structures you will learn about, from linked lists to trees to tries to the dreaded Patricia trees, so it is important that you know this structure.



If you recall, the basic data structure of a node includes a variable of any data type, known as the data, and a pointer to a node (Node*) called something like next.


The code for making this structure with the data being an int is:

struct Node

{

    int data;

    Node* next;
};



This is what the node looks like conceptually.



With one Node* for each node, you can connect the nodes with their Node*s to make a linked list. When there is only one Node* per node and a series of nodes are connected with each Node*, they make a singly-linked list.

A singly-linked list often starts with a Node* (not a whole node, just the pointer to a node) called the head of the list. If the list is empty, the head points to NULL, which means it points to C++’s equivalent to nothing and the head is the end of the list. If the list has nodes in it, the head will point to the first node of the list, the first node will point to the second node of the list (if there is one), and this continues to the last node of the list, which points to NULL. Conceptually, that looks like this. (The blue boxes are just to remind you that this data structure generally will use dynamic memory for everything in the blue boxes).



So, you probably have some questions, like how to add a node to the list, how to remove a node, and how to traverse a singly-linked list. Well, with all data structures we need to think of what functions each data structure will need. In this tutorial, I will give you the functions, but you WILL have to understand them to be able to use them, so don’t just copy-paste them. Pay attention!



Here are the functions you will need for singly-linked lists.

-You will need a function to tell you whether or not the list is empty. This function is often used in combination with a traverse function to make the traverse function easier to read and write.

-You will need a function to traverse the list and do whatever (print, alter data, etc) to each node.

-You will need a function to add data to the list in the form of nodes.

-You will need a function to remove data from the list. Not only that, but while you’re at it you should be able to return the data.

The linked list class and constructor would look like this:

class LinkedList

{

    public:

        LinkedList();
        ~LinkedList();
        bool isEmpty();

        void traverse();

        void insert(int x);
        void insertBack(int x);//Optional
        int remove();

        void insertIth(int x, int pos);

        void removeIth(int pos);

        Node* head;

};

LinkedList::LinkedList()

{

    head = NULL;//Initialize head

}

We will start with the easiest function, the isEmpty function.

bool LinkedList::isEmpty()

{

    return (head == NULL); //1

}

That was easy! Just one line to write this function.

1. head == NULL checks to see if head is pointing to NULL. If it is, then head == NULL evaluates to true, and the function returns true, indicating that the list is empty. If head is not pointing to NULL (the list has nodes in it), then head == NULL evaluates to false, and the function returns false, indicating that the list is not empty.

Now on to the traverse function. This traverse function will print each node’s data.

void LinkedList::traverse()

{

    Node* search = head; //1

    while(search != NULL) //2

    {

        cout<< search -> data << endl; //3

        search = search -> next; //4
    }
}


There you have it, how to traverse a linked list. Here’s how that works.

1. A new Node* called search is created, and it is currently pointing to whatever the head is pointing to.

2. When search points to NULL, you are at the end of the list, so when search != NULL evaluates to true, you are not at the end of the list and you continue traversing the list.

3. This prints out the data of the node search is pointing to.

4. Before checking the loop’s condition for continuing the iteration, search is re-assigned to point to search -> next.



All right! The function works, but while we’re at it, let’s take a quick detour so you can check out what a recursive version of this is like.

void LinkedList::traverse(Node* start)//1

{

    if(start != NULL)//2

    {

        cout<< start -> data << endl;//3

        traverse(start -> next);//4

    }

}

You show this sleek new function to your friends at a cocktail party (if you’re over 21 that is) and you’ll impress them all (or get called a nerd, but it’s awesome being a nerd so take it as a compliment)! You just made it do the exact same thing as the last function but with one less line of code! Anyway, here’s how it works!

1. A pointer to a node is passed (When starting a traversal, that pointer is head).

2. That pointer is then tested as the “start” of a sublist of the list. If this sublist is not empty (start points to a node) then…

3. The data of whatever node start is pointing to (the next node) is printed.

4. Here’s when the function gets real cool. You know how each Node has a component called next, which is a Node*? Well, notice that what is passed to the function as start is not necessarily THE starting node, but it is just a pointer to a node. That means that ANY Node* can be passed to the function. And in this case, what is being passed to the traverse function as start is the Node* next of whatever node the current start is pointing to.



So, you might be wondering, “why all that work of wrapping my head around the concept of recursion just to save one line?” Well, the reason why I showed you recursion is because recursion makes your code compact, and in many cases (especially trees) easier to write, making your code more elegant and professional. Recursion will be really useful with more complex data structures, so you should know recursion as soon as you can.

A big question you may have is when you’re supposed to use recursion. Here’s a simple answer.

You use recursion whenever you are making a function that works with smaller and smaller versions of the original data structure.

You saw in the example above those ripples. Those weren’t just there to make the example look cool (well actually they were). They were to show that with each recursion, you are still working with a linked list, only with each recursion you work with one that is one node smaller than the last one, until finally you work with an empty linked list, in which case the recursion is over.

In my drawings, recursion will normally be signified with ripples.



Hoewever, you need to be careful not to make infinite recursion stacks or THIS could happen to you!



Infinite recursion example (you don’t need to read it):

void yoDawg(bool x)

{

    if(x == true)

        cout<< “Yo dawg I heard you like functions so I put a 
        function in your function so you can recurse ”;
    else

        cout<< “while you recurse ”;

    yoDawg(false);

}



That’s it for now for recursion. You won’t see much more recursion for now because the data structures you learn first are simple enough that they don’t make the program THAT much better. However, keep it in the back of your mind because recursion does become extremely useful later on (especially with trees)!



So, now onto our now function, the insert function. For this function, we will pass data to the function (in the code examples we will use an int), and the function will put the data into a node that is then added to the linked list.

Now, from inserting data into arrays for all of COMP 11, you may think about inserting the data at the end of the list. However, adding at the end of the list means getting a pointer all the way to the end, and that means traversing the whole list with a runtime of O(n). We could do that, or we could insert at the beginning of the list, which has a fast O(1) runtime since we only have to access the head of the list.

void LinkedList::insert(int x) //1

{

    Node* first = new Node; //2

    first -> data = x; //3

    first -> next = head; //4

    head = first; //5

}

Here’s how it works!

1. An integer is passed to x.

2. A new Node* called first is declared and points to a new node in the dynamic memory. First will be the Node* for the node we are inserting.

3. The node that first is pointing to now stores x in its data.

4. The next of the node that first points to now points to the same node that head points to (the old first node), unless the list is empty, in which case the Node* points to NULL.

5. Head now points to the node that first points to, making it so that the node that first points to is the first node of the linked list.


The function for inserting at the back is seen below and has a runtime of O(n). However, if you’re inserting at the back a lot in your linked list, you might want to make the list doubly-linked (this will be covered in a couple lessons) so you can have both insertions happen with a runtime of O(1).  There is no picture for this because for this function it's basically just a combination of traversing and inserting at the end.

void LinkedList::insertBack(int x)
{
    if(isEmpty())//1
    {
        head = new Node;
        head -> data = x;
        head -> next = NULL;
    }
    else
    {
        Node* last = head;
        while(last -> next != NULL)//2
        last = last -> next;
        last -> next = new Node;
        last -> next -> next = NULL;
        last -> next -> data = x;
    }
}

1. If the list is empty, then the node is added at the head of the list, so the regular insert function is called.


2. If the list is not empty, then the list traverses to the last node and then adds a node at the back of the list.


Now it’s time to learn the remove function, but before we do that, I would like to explain invariants. Invariants are essentially the rules of using certain parts of the code like functions that the user or another programmer must follow. The remove function will be an example that we will use.


This is an example of the remove function’s code, which is designed to remove a node from the front of the linked list and return its data, which will be an int. Since we are returning the int that is the data of the first node of the list, is the list is empty there will be no int we can return, so the invariant of the remove function is that the list is not empty.

//Invariant: Remove is not to be used on empty lists.

int LinkedList::remove()

{

    Node* temp = head;//1
    int x = temp -> data; //2

    head = head -> next; //3

    delete temp; //4

    return x; //5

}

1. A Node* called temp is made, and points to the first node, which head also points to.

2. An integer variable x is declared, and equals the data of the first node, which is the node that will be removed.

3. Head now points to what the first node’s next points to, so head now points to the second node, or NULL if there is only one node in the list.

4. The piece of dynamic memory that temp points to, which is the first node, is deleted. The rest of the list is still intact, since head now points to the second node, which is the new first node.

5. x, which holds the data of the old first node, is now returned.



The class destructor also uses remove in order to make sure there is no dynamic memory left over after the linked list is done being used. The destructor is a simple function that looks like this:


LinkedList::~LinkedList()
{
    while(!isEmpty())
        remove();
}

It keeps removing items from the list one by one until the list is empty, coming out to a runtime of O(n).


Two other interesting functions are insertIth and removeIth. Depending on your data structure, you may not always want to insert and remove at the first position. To fix this problem, you can make it so that you can insert and remove at position i, which can be ANY position.

//Invariant: insertIth is never used to insert at a position greater than the size of the list.
//Note: The insertIth and removeIth functions will use indexing that starts with zero to prevent confusion with array indexing.
void LinkedList::insertIth(int x, int pos)
{
    Node* add = new Node;
    add -> data = x;
    if(pos == 0)//1
    {
        insert(x);
    }
        else//2
    {
        Node* temp = head;
        for(int i = 0; i < pos - 1; i++)
            temp = temp -> next;
        if(temp -> next == NULL)//2
        {
            temp -> next = add;
            add -> next = NULL;
        }
        else //3
        {
            add -> next = temp -> next;
            temp -> next = add;
        }
    }
}

The insertIth function really has no surprises if you are inserting at the start or end of the list. If you insert at the start of the list, insertIth will just add in the new node the same way insert did. If you are inserting in the back of the list, insertIth will just add in the new node the same way insertBack did (once again, the runtime of O(n) is slow but unavoidable for a singly-linked list). In the parameters, pos is the parameter being passed for the position we are adding at, which will be referred to as position i in both this function and the removeIth function that is coming up next.


1. The insert function is pretty much the same as using insertIth to insert at position 0. Because of this, insertIth for position 0 just does the same thing as the regular insert function.


2. If you’re not inserting at the start of the list, the first thing we need to do is traverse the list to find the node linked to the current ith node (or pointing to NULL if we are inserting the new last node). This node is the node at position i - 1. To do this we use the Node* temp to get to that node.


3. If node i - 1’s next points to NULL, then node i - 1 is the current last node in the list. Because of this, all we have to do is link node i - 1 to the new node and make the new node point to NULL.


4. If node i - 1’s next points to another node, we are inserting somewhere in the middle of the list. Because of this, in addition to linking node i - 1 to the new node, we also need to link the new node to the current ith node. To do this, we start by making the new node point to temp -> next, which is the current ith node. Now, both temp -> next and add -> next point to the current ith node, so now we need to get node i - 1 pointing to the new node. Since there is now a node pointing to the current ith node (this would be the new node), if we make node i - 1 point to the new node, no dynamic memory is lost, and the new node is now in position i.


Overall, if you are adding at the first position, the function has a runtime of O(1), and if you are adding at another position, the function has a runtime of O(n).





Now, we will look at the removeIth function.

//Invariants: removeIth will not be used on positions bigger than the size of the list, and it will not be used on an empty list.

//Note: Indexing starts at position 0 like in arrays.

int LinkedList::removeIth(int pos)
{
    int x;
    Node* temp = head;//1
    Node* temp2 = head;
    if(pos == 0)//2
    {
        return remove();
    }
    else
    {
        for(int i = 0; i < pos - 1; i++)
            temp2 = temp2 -> next;
        temp = temp2 -> next;
        if(temp -> next == NULL)//4
        {
            x = temp -> data;
            temp2 -> next = NULL;
            delete temp;
        }
        else//5
        {
            x = temp -> data;
            temp2 -> next = temp -> next;
            delete temp;
        }
    }
    return x;
}

1. Temp is supposed to point to the node at position i. Temp2 is supposed to point to the node at position i - 1 (unless we are removing the first node, in which case temp2 is not used).


2. If the position we are removing from is zero, that means that we are removing the very first (zeroeth) node, in which case we pretty much do the same thing we did in the original remove function, so the regular remove function is called to do the work for us.


3. Otherwise, we will traverse the node with a for loop to get temp2 to point to the that is linked to the node we are removing. After this, temp points to the node that temp2’s next points to, which is the node we are removing.


4. If temp -> next points to NULL, that means that temp is pointing to the last node in the list, so the last node will be removed and temp2 points to the new last node. Because of this, x gets temp’s data, temp is deleted, and then temp2 -> next now points to NULL since temp2 is the new last node.


5. Otherwise, temp now points to some node in the middle (the ith node) and temp2 points to the node preceding it. Since the ith node is being removed, node i -1 will need to be linked to node i + 1 so that there is no memory leak. To do this, since we have a pointer to node i and a pointer to node i - 1, we can make it so that node i - 1’s next now points to node i + 1. This makes it so that now node i - 1 and node i are both pointing to node i + 1, so we can remove node i without a memory leak. Before we do this, x gets node i’s data, and then temp is deleted, removing node i.






This gives you a whole lot of things you can do with linked lists. However, there’s more where that came from! You have yet to see what happens when you make the last node link to another node in the list, or what happens when you make each node link to both the next node and the last node! These are coming up in the next two tutorials!

No comments :

Post a Comment