DATA STRUCTURES AND ALGORITHMS-2015 Edition.pdf

FUNDAMENTALS OF DATA STRUCTURES AND ALGORITHMS LECTURE MANUAL By Sam A. Oluwadare, PhD April, 2014 DATA TYPE In computer programming, a data type simply refers to a defined kind of data, that is, a set of possible values and basic operations on those values. When applied in programming languages, a data type defines a set of values and the allowable operations on those values. Data types are important in computer programmes because they classify data so that a translator (compiler or interpreter) can reserve appropriate memory storage to hold all possible values, e.g. integers, real numbers, characters, strings, and Boolean values, all have very different representations in memory. A data type consists of: a domain (= a set of values) a set of operations that may be applied to the values. Data Type Classification Some data items may be used singly whilst others may be combined together and arranged to form other data items. The former are classified as ‘simple data types’ whereas the latter are classified as ‘data structures’. However, the following classification is appropriate for study at this level.The simple data types are classified as follows: a. Character b. Numeric integer c. Numeric real d. Boolean (logical). Examples of Data Types Almost all programming languages explicitly include the notion of data type, though different languages may use different terminology. Common data types in programming languages include those that represent integers, floating point numbers, and characters, and a language may support many more. Example 1: Boolean or logical data type provided by most programming languages. Two values: true, false. Many operations, including: AND, OR, NOT, etc. Example 2: In Java programming language, the “int” type represents the set of 32-bit integers ranging in value from -2,147, 483, 648 to 2,147, 483, 647 and the operation such as addition, subtraction and multiplication that can be performed on integers. Abstract Data Type An Abstract Data Type commonly referred to as ADT, is a collection of data objects characterized by how the objects are accessed; it is an abstract human concept meaningful outside of computer science. (Note that "object", here, is a general abstract concept as well, i.e. it can be an "element" (like an integer), a data structure (e.g. a list of lists), or an instance of a class. (e.g. a list of circles). A data type is abstract in the sense that it is independent of various concrete implementations. Object-oriented languages such as C++ and Java provide explicit support for expressing abstract data types by means of classes. A first class abstract data type supports the creation of multiple instances of ADT and 2 the interface normally provides a constructor, which returns an abstract handle to new data, and several operations, which are functions accepting the abstract handle as an argument. Examples of Abstract Data Type Common abstract data types (ADT) typically implemented in programming languages (or their libraries) include: Arrays, Lists, Queues, Stacks and Trees. What is a Data Structure? A data structure is the implementation of an abstract data type in a particular programming language. Data structures can also be referred to as “data aggregate”. A carefully chosen data structure will allow the most efficient algorithm to be used. Thus, a well-designed data structure allows a variety of critical operations to be performed using a few resources, both execution time and memory spaces as possible. Classification of Data Structures Data structures are broadly divided into two: Linear Data Structures Non-Linear Data Structures. Linear Data Structures Linear data structures are data structures in which individual data elements are stored and accessed linearly in the computer memory. For the purpose of this course, the following linear data structures would be studied: lists, stacks, queues and arrays in order to determine how information is processed during implementation. Non-Linear Data Structures A non-linear data structure, as the name implies, is a data structure in which the data items are not stored linearly in the computer memory, but data items can be processed using some techniques or rules. Typical non-linear data structures to be studied in this course are Trees. Data Structures and Programmes The structure of data in the computer is very important in software programmes, especially where the set of data is very large. When data is properly structured and stored in the computer, the accessibility of data is easier and the software programme routines that make do with the data are made simpler; time and storage spaces are also reduced. In the design of many types of programmes, the choice of data structures is a primary design consideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosing the best data structure. ARRAYS In Computer Science, an array is a data structure consisting of a group of elements that are accessed by indexing. Each data item of an array is known as an element, and the elements are referenced by a common name known as the array name. Arrays and Programming In Java, as in most programming languages, an array is a structure that holds multiple values of the same type. A Java array is also called an object. An array can contain data of the primitive data types. As it is an object, an array must be declared and instantiated. For example: 3 int[] anArray; anArray = new int; An array can also be created using a shortcut. For example: int[] anArray = {1,2,3,4,5,6,7,8,9,10} An array element can be accessed using an index value. For example: int i = anArray The size of an array can be found using the length attribute. For example: int len = anArray.length Before any array is used in the computer, some memory locations have to be created for storage of the elements. This is often done by using the DIM instruction of BASIC programming language or DIMENSION instruction of FORTRAN programming language. For example, the instruction: DIM LAGOS (45) will create 45 memory locations for storage of the elements of the array called LAGOS. In most programming languages, each element has the same data type and the array occupies a contiguous area of storage. Most programming languages have a built-in array data type. Some programming languages support array programming which generalises operations and functions to work transparently over arrays as they do with scalars, instead of requiring looping over array members. Declaration of Arrays Variables normally only store a single value but, in some situations, it is useful to have a variable that can store a series of related values – using an array. For example, suppose a programme is required that will calculate the average age among a group of six students. The ages of the students could be stored in six integer variables in C: int age1; int age2; int age3; However, a better solution would be to declare a six-element array: int age; This creates a six element array; the elements can be accessed as age through age in C. A two-dimensional array (in which the elements are arranged into rows and columns) declared by say DIM X(3,4) can be stored as linear arrays in the computer memory by determining the product of the subscripts. The above can thus be expressed as DIM X (3 * 4) or DIM X (12). Multi-dimensional arrays can be stored as linear arrays in order to reduce the computation time and memory. Multi-dimensional Arrays Ordinary arrays are indexed by a single integer. Also useful, particularly in numerical and graphics applications, is the concept of a multi-dimensional array, in which we index into the array using an ordered list of integers, such as in a[3,1,5]. The number of integers in the list used to index into the multi-dimensional array is always the same and is referred to as the array's dimensionality, and the bounds on each of these are called the array's dimensions. An array with dimensionality k, is often called k-dimensional. One-dimensional arrays correspond to the simple arrays discussed thus far; two-dimensional arrays are a particularly common representation for matrices. In practice, the dimensionality of an array rarely exceeds three. Mapping a one- dimensional array into memory is obvious, since memory is logically itself a (very large) one-dimensional array. When we reach higher-dimensional arrays, however, the problem is no longer obvious. Suppose we want to represent this simple two-dimensional array: 4 It is most common to index this array using the RC-convention, where elements are referred in row, column fashion such as: Multi-dimensional arrays are typically represented by one-dimensional arrays of references (Iliffe vectors) to other one-dimensional arrays. The sub-arrays can be either the rows or columns. Classification of Arrays Arrays can be classified as static arrays (i.e. whose size cannot change once their storage has been allocated), or dynamic arrays, which can be resized. Processing Arrays Although array-based iteration is useful when dealing with very simple data structures, it is quite difficult to construct generalized algorithms that do much more than process every element of an array from start to finish. For example, suppose you want to process only every second item; include or exclude specific values based on some selection criteria; or even process the items in reverse order. Being tied to arrays also makes it difficult to write applications that operate on databases or files without first copying the data into an array for processing. Using simple array-based iteration not only ties algorithms to using arrays, but also requires that the logic for determining which elements stay, which go, and in which order to process them, is known in advance. Even worse, if you need to perform the iteration in more than one place in your code, you will likely end up duplicating the logic. This clearly isn’t a very extensible approach. Instead, what’s needed is a way to separate the logic for selecting the data from the code that actually processes it. An iterator (also known as an enumerator) solves these problems by providing a generic interface for looping over a set of data so that the underlying data structure or storage mechanism—such as an array, database, and so on—is hidden. Whereas simple iteration generally requires you to write specific code to handle where the data is sourced from or even what kind of ordering or preprocessing is required, an iterator enables you to write simpler, more generic algorithms. An iterator provides a number of operations for traversing and accessing data. A Reverse Iterator Sometimes you will want to reverse the iteration order without changing the code that processes the values. Imagine an array of names that is sorted in ascending order, A to Z, and displayed to the user somehow. If the user chose to view the names sorted in descending order, Z to A, you might have to re-sort the array or at the very least implement some code that traversed the array backward from the end. With a reverse iterator, however, the same behavior can be achieved without re-sorting and without duplicated code. When the application calls first(), the reverse iterator actually calls last() on the underlying iterator. When the application calls next(), the underlying iterator’s previous() method is invoked, and so on. In this way, the 5 behavior of the iterator can be reversed without changing the client code that displays the results, and without re-sorting the array, which could be quite processing intensive, when you write some sorting algorithms. Applications of Arrays Arrays are employed in many computer applications in which data items need to be saved in the computer memory for subsequent reprocessing. Due to their performance characteristics, arrays are used to implement other data structures, such as heaps, hash tables, deques, queues, stacks and strings. LIST DATA STRUCTURE Lists are the most fundamental data structure upon which most other data structures are built and many more algorithms must operate. It’s not hard to find examples of lists in the real world: shopping lists, to-do lists, train timetables, order forms, even this “list of lists.” Much like arrays, lists are generally useful in most applications you will write. A list is an ordered collection of elements supporting random access to each element, much like an array—you can query a list to get the value contained at any arbitrary element. Lists also preserve insertion order so that, assuming there are no intervening modifications, a given list will always return the same value for the same position. Like arrays, lists make no attempt to preserve the uniqueness of values, meaning a list may contain duplicate values. For example, if you had a list containing the values “swimming”, “cycling”, and “dancing” and you were to add “swimming” again, you would now find that the list had grown to include two copies of “swimming”. The major difference between arrays and lists, however, is that whereas an array is fixed in size, lists can resize—growing and shrinking—as necessary. A list data structure is a sequential data structure, i.e. a collection of items accessible one after the other, beginning at the head and ending at the tail. It is a widely used data structure for applications which do not need random access. Lists differ from the stacks and queues data structures in that additions and removals can be made at any position in the list. Elements of a List The sentence “Dupe is not a boy” can be written as a list as follows: DUPE IS NOT A BOY Fig. 1: Elements of a list We regard each word in the sentence above as a data-item or datum, which is linked to the next datum, by a pointer. Datum plus pointer make one node of a list. The last pointer in the list is called a terminator. It is often convenient to speak of the first item as the head of the list, and the remainder of the list as the tail. Operations The main primitive operations of a list are known as: Add adds a new node Set updates the contents of a node Remove removes a node Get returns the value at a specified index IndexOf returns the index in the list of a specified element Additional primitives can be defined: 6 IsEmpty reports whether the list is empty IsFull reports whether the list is full Initialise creates/initialises the list Destroy deletes the contents of the list (may be implemented by re-initialising the list) Initialise Creates the structure – i.e. ensures that the structure exists but contains no elements e.g. Initialise(L) creates a new empty queue named Q Add e.g. Add(1,X,L) adds the value X to list L at position 1 (the start of the list is position 0), shifting subsequent elements up L A B C Fig. 2: List before adding value L Fig. A 3:X List B Cafter adding value Set e.g. Set(2,Z,L) updates the values at position 2 to be Z L A X Z C Fig. 4: List after update Remove e.g. Remove(Z,L) removes the node with value Z L A X Z C Fig. 5: List before removal L A X C Fig. 6: List after removal Get e.g. Get(2,L) returns the value of the third node, i.e. C IndexOf e.g. IndexOf(X,L) returns the index of the node with value X, i.e. 1 List Implementation There are many ways to implement a list depending on how the programmer will use lists in their programme. The two most common, are an array-based implementation and a linked list. 1. Array List: As the name suggests, an array list uses an array to hold the values. 7 2. Linked List: A linked list, conversely, is a chain of elements in which each item has a reference (or link) to the next (and optionally previous) element. Array Lists As the name suggests, an array list uses an array as the underlying mechanism for storing elements. Because of this, the fact that you can index directly into arrays makes implementing access to elements very easy. It also makes an array list the fastest implementation for indexed and sequential access. The downside to using an array is that each time you insert a new element; you need to shift any elements in higher positions one place to the right by physically copying them. Similarly, when deleting an existing element, you need to shift any objects in higher positions one place to the left to fill the gap left by the deleted element. Additionally, because arrays are fixed in size, anytime you need to increase the size of the list, you also need to reallocate a new array and copy the contents over. This clearly affects the performance of insertion and deletion. Properties of Array List: 1. The position of each element is given by an index from 0 to n-1, where n is the number of elements. 2. Given any index, the element with that index can be accessed in constant time – i.e. the time to access does not depend on the size of the list. 3. To add an element at the end of the list, the time taken does not depend on the size of the list. However, the time taken to add an element at any other point in the list does depend on the size of the list, as all subsequent elements must be shifted up. Additions near the start of the list take longer than additions near the middle or end. 4. When an element is removed, subsequent elements must be shifted down, so removals near the start of the list take longer than removals near the middle or end. Linked List The Linked List is stored as a sequence of linked nodes. Rather than use an array to hold the elements, a linked list contains individual elements with links between them. As in the case of the stack, each node in a linked list contains data AND a reference to the next node. It also makes insertion and deletion much simpler than it is for an array list. The Linked List has the following properties: The list can grow and shrink as needed. The position of each element is given by an index from 0 to n-1, where n is the number of elements. Given any index, the time taken to access an element with that index depends on the index. This is because each element of the list must be traversed until the required index is found. The time taken to add an element at any point in the list does not depend on the size of the list, as no shifts are required. It does, however, depend on the index. Additions near the end of the list take longer than additions near the middle or start. The same applies to the time taken to remove an element. A list needs a reference to the front node. There are many variations on the Linked List data structure, including: i. Singly Linked Lists A singly linked list is a data structure in which the data items are chained (linked) in one direction. Figure 1 shows an example of a singly linked list. header a1 a2 an 8 tail Figure 7: A singly linked list ii. Circularly Linked Lists In a circularly linked list, the tail of the list always points to the head of the list. iii. Doubly Linked Lists This permits scanning or searching of the list in both directions. (To go backwards in a simple list, it is necessary to go back to the start and scan forwards.) In this case, the node structure is altered to have two links. This double linking makes it possible to traverse the elements in either direction. It also makes insertion and deletion much simpler than it is for an array list. header a1 a2 an-1 an Figure 8: A doubly linked list As you might recall from the discussion on array lists, in most cases when deleting or inserting, some portion of the underlying array needs to be copied. With a linked list, however, each time you wish to insert or delete an element, you need only update the references to and from the next and previous elements, respectively. This makes the cost of the actual insertion or deletion almost negligible in all but the most extreme cases. For lists with extremely large numbers of elements, the traversal time can be a performance issue. A doubly linked list also maintains references to the first and last elements in the list—often referred to as the head and tail, respectively. This enables you to access either end with equal performance. iii. Sorted Lists Lists can be designed to be maintained in a given order. In this case, the Add method will search for the correct place in the list to insert a new data item. THE STACK DATA STRUCTURE A stack is a linear data structure in which all insertions and deletions of data are made only at one end of the stack, often called the top of the stack. For this reason, a stack is referred to as a LIFO (last-in-first-out) structure. 9 Push Pop Figure 9 shows a stack. A frequently used metaphor is the idea of a stack of plates in a spring loaded cafeteria stack. In such a stack, only the top plate is visible and accessible to the user, all other plates remain hidden. As new plates are added, each new plate becomes the top of the stack, hiding each plate below, pushing the stack of plates down. As the top plate is removed from the stack, the plates pop back up, and the second plate becomes the top of the stack. Application of Stacks Stacks are used extensively at every level of a modern computer system. For example, a modern PC uses stacks at the architecture level, which are used in the basic design of an operating system for interrupt handling and operating system function calls. Among other uses, stacks are used to run a Java Virtual Machine, and the Java language itself has a class called "Stack", which can be used by the programmer. Stacks have many other applications. For example, as processor executes a programme, when a function call is made, the called function must know how to return back to the programme, so the current address of programme execution is pushed onto a stack. Once the function is finished, the address that was saved is removed from the stack, and execution of the programme resumes. If a series of function calls occur, the successive return values are pushed onto the stack in LIFO order so that each function can return back to calling programme. Stacks support recursive function calls, subroutine calls, especially when “reverse polish notation” is involved. Solving a search problem, regardless of whether the approach is exhaustive or optimal, needs stack space. Examples of exhaustive search methods are brute force and backtracking. Examples of optimal search exploring methods are branch and bound and heuristic solutions. All of these algorithms use stacks to remember the search nodes that have been noticed but not explored yet. Another common use of stacks at the architecture level is as a means of allocating and accessing memory. 10 Fig. 10: Basic Architecture of a Stack Operations on a Stack The stack is usually implemented with two basic operations known as "push" and "pop". Thus, two operations applicable to all stacks are: A push operation, in which a data item is placed at the location pointed to by the stack pointer and the address in the stack pointer is adjusted by the size of the data item; Push adds a given node to the top of the stack leaving previous nodes below. A pop or pull operation, in which a data item at the current location pointed to by the stack pointer is removed, and the stack pointer is adjusted by the size of the data item. Pop removes and returns the current top node of the stack. The main primitives of a stack are known as: Push adds a new node Pop removes a node Figure 11 shows the insertion of three data X, Y and Z to a stack and the removal of two data, Z and Y, from the stack. To X To Y To Z To Y To X p p p p p Empty Push X Push Y Push Z Pop Z Pop Y Fig. 11: Insertion and removal of data from stack Additional primitives can be defined: IsEmpty reports whether the stack is empty IsFull reports whether the stack is full 11 Initialise creates/initialises the stack Destroy deletes the contents of the stack (may be implemented by re-initialising the stack) Initialise Creates the structure – i.e. ensures that the structure exists but contains no elements e.g. Initialise(S) creates a new empty stack named S S e.g. Push(X,S) adds the value X to the Top of the stacks, S X S Fig 12: Stack after adding the value X Pop S e.g. Pop(S) removes the TOP node and returns its value Pop S Fig. 13: Stack after removing the top node Stack Storage Modes A stack can be stored in two ways: 12 a static data structure OR a dynamic data structure Static Data Structures These define collections of data which are fixed in size when the Programme is compiled. Dynamic Data Structures These define collections of data which are variable in size and structure. They are created as the programme executes, and grow and shrink to accommodate the data being stored. THE QUEUE DATA STRUCTURE Queues are an essential part of algorithms that manage the allocation and scheduling of work, events, or messages to be processed. They are often used as a way of enabling different processes— either on the same or different machines—to communicate with one another. Customers line up in a bank waiting to be served by a teller and in supermarkets waiting to check out. No doubt you’ve been stuck waiting in a line to speak to a customer service representative at a call center. In computing terms, however, a queue is a list of data items stored in such a way that they can be retrieved in a definable order. The main distinguishing feature between a queue and a list is that whereas all items in a list are accessible—by their position within the list—the only item you can ever retrieve from a queue is the one at the head. Which item is at the head depends on the specific queue implementation. More often than not, the order of retrieval is indeed the same as the order of insertion (also known as first-in- first-out, or FIFO), but there are other possibilities as well. Some of the more common examples include a last-in-first-out queue and a priority queue, whereby retrieval is based on the relative priority of each item. You can even create a random queue that effectively “shuffles” the contents. Queues are often described in terms of producers and consumers. A producer is anything that stores data in a queue, while a consumer is anything that retrieves data from a queue. Queues can be ether bounded or unbounded. Bounded queues have limits placed on the number of items that can be held at any one time. These are especially useful when the amount of available memory is constrained—for example, in a device such as a router or even an in-memory message queue. Unbounded queues, conversely, are free to grow in size as the limits of the hardware allow. The queue data structure is characterised by the fact that additions are made at the end, or tail, of the queue while removals are made from the front, or head of the queue. For this reason, a queue is referred to as a FIFO structure (First-In First-Out). Figure 14 shows a queue of part of English alphabets. 13 Insertion Deletion Last data First data Fig. 14: Example of a Queue Application of Queues Queues are very important structures in computer simulations, data processing, information management, and in operating systems. In simulations, queue structures are used to represent real-life events such as car queues at traffic light junctions and petrol filling stations, queues of people at the check-out point in super markets, queues of bank customers, etc. In operating systems, queue structures are used to represent different programmes in the computer memory in the order in which they are executed. For example, if a programme, J is submitted before programme K, then programme J is queued before programme K in the computer memory and programme J is executed before programme K. Operations on a Queue The main primitive operations on a queue are known as: Enqueue: Stores a value in the queue. The size of the queue will increase by one. Dequeue: Retrieves the value at the head of the queue. The size of the queue will decrease by one. Throws EmptyQueueException if there are no more items in the queue. Clear: Deletes all elements from the queue. The size of the queue will be reset to zero (0). Size: Obtains the number of elements in the queue. IsEmpty: Determines whether the queue is empty (size() = 0) or not. Additional primitives can be defined thus: IsFull reports whether the queue is full Initialise creates/initialises the queue Initialise Creates the structure – i.e. ensures that the structure exists but contains no elements. e.g. Initialise(Q) creates a new empty queue named Q Add e.g. Add(X,Q) adds the value X to the tail of Q Q X Fig. 15: Queue after adding the value X to the tail of Q then, Add (Y, Q) adds the value Y to the tail of Q 14 X Y Fig. 16: Queue after adding the value Y to the tail of Q e.g. Remove(X, Q) removes the head node and returns its value Q Y Fig. 17: Queue after removing X from the head node Other Queue Operations Action Contents of queue Q after operation Return value Initialise (Q) empty Add (A,Q) A - Add (B,Q) A B - Add(C,Q) A B C - Remove (Q) B C A Add (F,Q) B C F - Remove (Q) C F B Remove (Q) F C Remove (Q) empty F Storing a Queue in a Static Data Structure This implementation stores the queue in an array. The array indices at which the head and tail of the queue are currently stored must be maintained. The head of the queue is not necessarily at index 0. The array can be a “circular array” in which the queue “wraps round” if the last index of the array is reached. Figure 18 below is an example of storing a queue in an array of length 5: 15 Fig. 18: Storing a queue in an array Storing a Queue in a Dynamic Data Structure A queue requires a reference to the head node AND a reference to the tail node. The following diagram describes the storage of a queue called Queue. Each node consists of data (DataItem) and a reference (NextNode). ·The first node is accessed using the name Queue.Head. ·Its data is accessed using Queue.Head.DataItem ·The second node is accessed using Queue.Head.NextNode ·The last node is accessed using Queue.Tail Adding a Node (Add) 16 The new node is to be added at the tail of the queue. The reference Queue.Tail should point to the new node, and the NextNode reference of the node previously at the tail of the queue should point to the DataItem of the new node. Removing a Node (Remove) The value of Queue.Head.DataItem is returned. A temporary reference Temp, is declared and set to point to head node in the queue (Temp = Queue.Head). Queue.Head is then set to point to the second node instead of the top node. The only reference to the original head node is now Temp and the memory used by this node can then be freed. 17 Blocking Queues Queues are often used in multi-threaded environments as a form of interprocess communication. Unfortunately, FIFO Queue is totally unsafe for use in situations where multiple consumers would be accessing it concurrently. Instead, a blocking queue is one way to provide a thread-safe implementation, ensuring that all access to the data is correctly synchronized. The first main enhancement that a blocking queue offers over a regular queue is that it can be bounded. So far, we have only dealt with unbounded queues—those that continue to grow without limit. The blocking queue enables you to set an upper limit on the size of the queue. Moreover, when an attempt is made to store an item in a queue that has reached its limit, the queue will, you guessed it, block the thread until space becomes available—either by removing an item or by calling clear(). In this way, you guarantee that the queue will never exceed its predefined bounds. The second major feature affects the behavior of dequeue(). When an attempt is made to retrieve an item from an empty queue, a blocking queue, will block the current thread until an item is enqueued. This is good for implementing work queues where multiple, concurrent consumers need to wait until there are more tasks to perform. The Scheduler Model The scheduler model consists of minimum of 5 processors and maximum of 10 processors. The scheduler model involves a centralized dynamic scheduling scheme in which all tasks arrive at a central processor, called the scheduler. Each scheduler has a task queue attached to it. The task queue holds newly arriving tasks. The role of the central scheduler is to distribute the tasks to other processors in the system for execution. There is dispatch queue associated with each processor. The communication between the scheduler and the processors is through the dispatch queues. The scheduler makes sure that each dispatch queue is filled with a minimum number of tasks so that a processor could always find a task in its dispatch queue when it finishes execution of a task. The scheduler 18 determines a feasible schedule based on the worst case computation times of tasks satisfying their timing and resource constraints. The scheduling algorithm has full knowledge about the currently active set of tasks, but not about the new set of tasks that may arrive while scheduling the current task set. The objective of the dynamic scheduling is to minimize the makespan thereby improving the guarantee ratio. The guarantee ratio is the percentage of tasks that arrived in the system whose deadlines are met. The scheduler must also guarantee that the tasks already scheduled are going to meet their deadlines. The scheduler model consists of minimum of 5 processors and maximum of 10 processors. The scheduler model is shown in Fig.3.1. P1 P2 P3 Task queue P4 Scheduler P5 P6 P7 P8 P9 P10 Dispatch queues Processors Fig. 3.1: The Scheduler Model (Source: Oluwadare, 2009) 19 TREES DATA STRUCTURE A tree is often used to represent a hierarchy. This is because the relationships between the items in the hierarchy suggest the branches of a botanical tree. A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2. The root node, at the top, has no parent. In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes. A node is a structure which may contain a value, a condition, or represent a separate data structure (which could be a tree of its own). Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn growing downwards). A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent. Nodes that do not have any children are called leaf nodes. They are also referred to as terminal nodes. The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its root path). This is commonly needed in the manipulation of the various self balancing trees, AVL Trees in particular. Conventionally, the value −1 corresponds to a subtree with no nodes, whereas zero corresponds to a subtree with one node. The topmost node in a tree is called the root node. Being the topmost node, the root node will not have parents. It is the node at which operations on the tree commonly begin (although some algorithms begin with the leaf nodes and work up ending at the root). All other nodes can be reached from it by following edges or links. (In the formal definition, each such path is also unique). In diagrams, it is typically drawn at the top. In some trees, such as heaps, the root node has special properties. Every node in a tree can be seen as the root node of the subtree rooted at that node. An internal node or inner node is any node of a tree that has child nodes and is thus not a leaf node. Similarly, an external node or outer node is any node that does not have child nodes and is thus a leaf. A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T. (This is different from the formal definition of subtree used in graph theory. The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree (in analogy to the term proper subset). 20 Root Node A tree with height 3 Right Child Node Edges or Links Leaf Node Fig 1: General Structure for a Tree. This forms a complete tree, whose height is defined as the number of links from the root to the deepest leaf. Key terms Root Node Node at the "top" of a tree - the one from which all operations on the tree commence. The root node may not exist (a NULL tree with no nodes in it) or have 0, 1 or 2 children in a binary tree. Leaf Node Node at the "bottom" of a tree - farthest from the root. Leaf nodes have no children. Complete Tree Tree in which each leaf is at the same distance from the root. A more precise and formal definition of a complete tree is set out later. Height Number of nodes which must be traversed from the root to reach a leaf of a tree. 21 Binary Trees The simplest form of tree is a binary tree. A binary tree consists of a. a node (called the root node) and b. left and right sub-trees. c. Both the sub-trees are themselves binary trees. A binary tree The nodes at the lowest levels of the tree (the ones with no sub-trees) are called leaves. In an ordered binary tree, 1. the keys of all the nodes in the left sub-tree are less than that of the root, 2. the keys of all the nodes in the right sub-tree are greater than that of the root, 3. the left and right sub-trees are themselves ordered binary trees. Traversal methods There are many different applications of trees. As a result, there are many different algorithms for manipulating them. However, many of the different tree algorithms have in common the characteristic that they systematically visit all the nodes in the tree. That is, the algorithm walks through the tree data structure and performs some computation at each node in the tree. This process of walking through the tree is called a tree traversal. There are 3 types of walks or traversal in a tree: pre-order, in-order and post-order  Pre-order walk: each parent node is traversed before its children is called; 22  Post-order walk: the children are traversed before their respective parents are traversed;  In-order: is a walk in which a node’s left subtree, is transfer Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a walk of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre- order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and then finally its right subtree are traversed is called an in-order traversal. (This last scenario, referring to exactly two subtrees, a left subtree and a right subtree, assumes specifically a binary tree.) Preorder Traversal The first depth-first traversal method we consider is called preorder traversal. Preorder traversal is defined recursively as follows: To do a preorder traversal of a general tree: 1. Visit the root first; and then 2. Do a preorder traversal each of the subtrees of the root one-by-one in the order given. Preorder traversal gets its name from the fact that it visits the root first. In the case of a binary tree, the algorithm becomes: 1. Visit the root first; and then 2. Traverse the left subtree; and then 3. Traverse the right subtree. Notice that the preorder traversal visits the nodes of the tree in precisely the same order in which they are written. A preorder traversal is often done when it is necessary to print a textual representation of a tree. Postorder Traversal The second depth-first traversal method we consider is postorder traversal. In contrast with preorder traversal, which visits the root first, postorder traversal visits the root last. To do a postorder traversal of a general tree: 1. Do a postorder traversal each of the subtrees of the root one by-one in the order given; and then 2. Visit the root. To do a postorder traversal of a binary tree 1. Traverse the left subtree; and then 2. Traverse the right subtree; and then 3. Visit the root. Inorder Traversal The third depth-first traversal method is inorder traversal. Inorder 23 traversal only makes sense for binary trees. Whereas preorder traversal visits the root first and postorder traversal visits the root last, inorder traversal visits the root in between visiting the left and right subtrees: 1. Traverse the left subtree; and then 2. Visit the root; and then 3. Traverse the right subtree. Common operations on Trees  Enumerating all the items  Enumerating a section of a tree  Searching for an item  Adding a new item at a certain position on the tree  Deleting an item  Removing a whole section of a tree (called pruning)  Adding a whole section to a tree (called grafting)  Finding the root for any node Common uses of Trees  Manipulate hierarchical data  Make information easy to search  Manipulate sorted lists of data  As a workflow for compositing digital images for visual effects General n-ary trees If we relax the restriction that each node can have only one key, we can reduce the height of the tree An m-way search tree a. is empty or b. consists of a root containing j (1

DATA STRUCTURES AND ALGORITHMS-2015 Edition.pdf

Document Details

Tags

Related

Full Transcript