Data Structures and Algorithms (DSA) PDF

UNIT I Introduction: Definition and types of Data structures- Asymptotic notations – Complexity analysis – Arrays- Representation of arrays. Linked lists: Singly linked list- Circular linked lists – Doubly linked lists. Stacks: Operations on stack, array representation of a stack, Applications:Conversion of Infix to Postfix Notation- Evaluation of Postfix expression - Queues: Operations on queues - Circular queues. --------------------------------------------------------------------------------------------------------------------- INTRODUCTION The major tasks done in any computer system are based on 3 activities-write, read and update of data. People need to have an abstract thinking to understand the principles and techniques to solve the vast array of unfamiliar problems that arise in a rapidly changing field. Even a good programmer may unable to write new algorithms for the given problem if he has no knowledge in data structures. “Data structure is the organization of data elements in a specific manner and functions are defined to store and retrieve individual data elements in it”. The classification of data structure is as given below: A linear data structure is one in which the elements have a linear ordering between them (ie) an element can be marked as first, second, third etc. A nonlinear data structure is one in which such an ordering is not present. ASYMPTOTIC NOTATIONS Asymptotic Notations are the languages that allow analysing an algorithm’s running time by identifying its behaviour based upon its input size. This is also known as an algorithm’s growth rate. This is to study how the execution time may increase or decrease according to the input size. In asymptotic analysis, only the largest term in the operations count is considered and the constant multipliers are ignored. Eg: Consider the following time complexities of two algorithms... Algorithm 1: 5n2 + 2n + 1 Algorithm 2: 10n2 + 8n + 3 Generally, when we analyse an algorithm, we consider the time complexity for larger values of input data (i.e. 'n' value). In above two-time complexities, for larger value of 'n' the term in algorithm 1- '2n + 1' has least significance than the term '5n2', and the term in algorithm 2- '8n + 3' has least significance than the term '10n2'. So, for larger value of 'n' we ignore the least significant terms to represent overall time required by an algorithm. In asymptotic notation, we use only the most significant terms to represent the time complexity of an algorithm. Types of Asymptotic Notation: Let us imagine an algorithm as a function f, n as the input size, and f(n) being the running time. This result in a graph where the X axis is the input size, Y axis is the runtime, and plot points are the resultants of the amount of time for a given input size. Consider function f(n) the time complexity of an algorithm and g(n) is the most significant term. These are some basic function growth classifications used in various notations. 1. Big-Oh Notation (O): Big-O, is an asymptotic notation for the worst case, or ceiling of growth for a given function. It provides us with maximum time taken for the runtime of an algorithm (ie) describes the worst case of an algorithm time complexity. If f(n) = 1, C > 0, then we can represent f(n) as O(g(n)). f(n) = O(g(n)) 2. Big-Omega notation (Ω): Big Omega notation is used to define the lower bound of an algorithm in terms of time complexity. It always indicates the minimum time required by an algorithm for all input values (ie) describes the best case of an algorithm time complexity. If f(n) >= Cg(n) for all n >= n0, C > 0 and n0 >= 1, then we can represent f(n) as Ω(g(n)). In above graph after a particular input value n0, always Cg(n) is less than f(n) which indicates the algorithm's lower bound. f(n) as Ω(g(n)) 3. Big - Theta Notation (Θ): Big - Theta notation is used to define the average bound of an algorithm in terms of time complexity (ie) the average time required by an algorithm for all input values. If C1g(n) 0 and n0 >= 1, then we can represent f(n) as Θ(g(n)). In the below graph after a particular input value n0, always C1g(n) is less than f(n) and C2g(n) is greater than f(n) which indicates the algorithm's average bound. f(n) = Θ(g(n)) The following list starts at the slowest growing function (logarithmic, fastest execution time) and goes on to the fastest growing (exponential, slowest execution time). Notice that as ‘n’, or the input, increases in each of those functions, the result clearly increases much quicker in quadratic, polynomial, and exponential, compared to logarithmic and linear. Efficiency Big-Oh Logarithmic O(log n) Linear O(n) Linear Logarithmic O(nlogn) Quadratic O(n2) Polynomial O(nk) Exponential O(Cn) Factorial O(n!) COMPLEXITY ANALYSIS The performance of an algorithm can be measured on the scales of time and space. SPACE COMPLEXITY: The space complexity can be defined as the amount of memory required by an algorithm (or) the function of space needed by the algorithm to run to complete. Regarding space factor, the expectation is that the algorithm should assume limited memory space. Consider the following three examples of algorithms to compute space complexity. Eg 1: Algorithm Add(a,b,c) return (a+b+c) If each a,b,c occupies one word size, the memory requirement is S(p)=3. Eg 2: Algorithm Add(x,n) sum=0.0 for i=1 to n do sum=sum+x[i] return (sum) The space requirement S(p) >=(n+3) where 3 is for n,i and sum. Eg 3: Algorithm Add(x,n) return (Add(x,n-1)+x[n]) The internal stack used for recursion includes space for formal parameters, actual parameters and return address. The space required by each call to function add requires 3 word for each function call (space for n values + space for return address + pointer to x[]). The length of recursion is (n+1) (ie) n times call to function and one return call to the main function. Hence S(p)=3(n+1) for the above algorithm. TIME COMPLEXITY: Regarding the factor time, the expectation is the fastest algorithm in minimum possible time. The time complexity of an algorithm is a function of the running time of an algorithm. It is difficult to compute the time complexity in terms of physically clocked time. Some of the factors that have an impact over time are system load, number of other tasks currently running, instruction set used, speed of underlying hardware etc. Time complexity measurements are: 1. Empirical or Posteriori testing: This approach executes the algorithm for various instances of the problem and then compares those results; which result is yielded in least time is considered as the best. The demerit of this method is it is totally machine dependent, programming language and even the programmer dependent. 2. Apriori or Theoretical testing: This approach calls for mathematically determining the resources (time, space) as a function of a parameter related to the instances of the problem considered. The advantage of this method is it is entirely machine, language and programmer independent. Time efficiency is analyzed by determining the number of repetitions of the basic operations as a function of input size. Basic operation is the operation that contributes most towards the running time of the algorithm. Hence T(n)=CopC(n) where T(n)=Running time, Cop=Execution of basic operation and C(n)=Number of times the basic operation is executed. Problem Input Size Measure Basic Operation Searching a value in a list N elements Key comparison Multiplication of two Matrix Multiplication Two nxn matrices numbers Number of vertices and Graph Problems Visiting a vertex edges ARRAYS An array refers to the homogeneous collection of elements stored contiguously in memory. The general features of an array are: A finite collection of elements. All elements of an array are of same data type. It is an ordered/linear collection of elements since we can label the elements as first, second, third etc. Each element of an array is stored adjacent to each other following contiguous memory allocation. Arrays follow static allocation of memory (ie) once the size of the array is declared, it cannot be expanded nor contracted. All elements are referred by same and single name. It is a random-access data structure (ie) any element can be picked up and manipulated without sequential traversal. Individual element of an array is addressed by an integer called as index or subscript. It denotes the relative position of the element within the array from the starting position of the array. The first index of the array is called as Lower Bound while the last index of the array is called as Upper Bound. Bound depends upon the programming language. Eg: char A[1:5] a b c d e Types of Arrays and its Representation in Memory According to the number of indices, the arrays can be broadly classified into three types as One-Dimensional Array, Two-Dimensional Array and Multi-Dimensional Array. One-Dimensional Array: It is the simplest form of array. It indicates a linear set of elements. It is declared as: datatype arrayname[size]; Eg:int a; defines an integer array of size 100. The array declaration involves both the type of element that will be contained in the array as well as the maximum number of elements that will be stored inside the array. It is also called as Vectors. a can be written as a[1:100] or a[0:99]. Number of elements in the array=u-l+1 Address of any ith element in the array=α+(i-l) where u=upper bound, l=lower bound and α=base or starting address of the array. Eg:-The following figure shows the representation of one-dimensional array in memory. char A; a b c d e Here u=5,l=1,α=100 Number of elements in A=5-1+1=5 Address of A=100+(3-1)=100+2=102. Two-Dimensional Array: They are popularly called as matrices. As the name implies, this type of array is having two indices-one for row and other for column. It is usually denoted as Arrayname[l1:u1,l2:u2] where l1=lower bound of row, u1=upper bound of row, l2=lower bound of column, u2=upper bound of column. Eg: Representation of 3 subject marks for 4 students may be given as the following matrix. 80 95 92 85 76 89 91 90 98 77 65 35 2D arrays are logical representation while the memory storage is single dimension. Hence there should be a mapping between 2D representation and single dimension. This mapping should suggest a systematic way of storing the elements of a two-dimensional array into a one-dimensional array. This can be done in two ways as row-major representation in which the row elements gain priority for storage while in column-major- representation, the column elements gain priority for storing in memory. Eg: Row-Major Representation of the above matrix 80 95 92 85 76 89 91 90 98 77 65 35 Column-Major Representation of the above matrix 80 85 91 77 95 76 90 65 92 89 98 35 Number of elements in the array=(u1-l1+1)(u2-l2+1) Address of any (i,j)th element in the array=α+(i-l1) (u2-l2+1)+(j-l2) where α=base or starting address of the array. Eg: int A[4,3] is a matrix with 4 rows and 3 columns. Consider the values at the matrix as, 80 95 92 85 76 89 91 90 98 77 65 35 With l1=1,u1=4,l2=1,u2=3, α=100 Number of elements in the above array=(4-1+1)(3-1+1)=12 Address of [2,3]rd element in the given array=100+(2-1)(3-1+1)+(3-1)=100+(1)(3)+2=105 Multi-Dimensional Array: If the index of the array goes beyond 2, it is normally referred as multi-dimensional array. Eg: To denote the x,y,z axis in image processing, the array can be defined as A[x,y,z]. LINKED LISTS The size of the array is declared and finalized at the compilation itself. We anticipate a higher bound and preserve the memory-but most of the time, we don’t use all. This problem of memory under-utilization can be solved if we allocate memory only when there is a need. This is called as dynamic memory allocation. Here the elements are stored in non- contiguous fashion. So, to maintain the order of elements, we need to know the address of each element. So, a node will store the address of its successive node and the last node will have NULL. These types of data structure are called linked list. They are examples of linear data structures as they permit linear access of element one after other; they are also dynamic data structures as their size can grow or shrink during execution. “Linked list is the natural data abstraction that arises in problems where a data structure maintains a collection of elements without knowing in advance about the number of elements”. Limitations of arrays 1. Inefficient implementation of insertion and deletion since multiple elements have to be move backward and forward respectively which incurs more cost if the array size is large. 2. Inefficient use of memory since array requires contiguous memory even though most of the allotted space remains unused. 3. Array can store only singleton element in its location. Advantages of Linked list 1. Linked list are dynamic data structure-they can grow or shrink during the execution of a program. 2. Efficient memory utilization-In linked list (or dynamic) representation, memory is not pre-allocated. Memory is allocated whenever it is required and it is de-allocated (or removed) when it is not needed. 3. Insertion and deletion are easier and efficient. Linked list provides flexibility in inserting a data item at a specified position and deletion of a data item from the given position. 4. Many complex applications can be easily carried out with linked list. Limitations of Linked List 1. More memory: More memory space is needed even for storing a single data since a simple node should have two parts. 2. Access to an arbitrary data item is little bit cumbersome and also time consuming because it purely supports sequential traversal. Operations on Linked List: The primitive operations performed on the linked list are as follows: 1. Creation operation is used to create a linked list. 2. Insertion operation is used to insert a new node at any specified location in an existing linked list. A new node may be inserted, (a) At the beginning of the linked list (b) At the end of the linked list (c) At any specified position in between in a linked list 3. Deletion operation is used to delete an item (or node) from the linked list. A node may be deleted from the (a) Beginning of a linked list (b) End of a linked list (c) Specified location of the linked list 4. Traversing is the process of going through all the nodes from one end to another end of a linked list. In a singly linked list, we can visit from left to right (ie) traversing only in forward direction. But in doubly linked list, forward and backward traversing are possible. 5. Searching is the process of picking up desired element from the list. 6. Concatenation is the process of appending the second list to the end of the first list. Consider a list A having n nodes and B with m nodes. Then the operation concatenation will place the 1st node of B in the (n+1)th node in A. After concatenation A will contain (n+m) nodes. Mechanisms needed to create linked lists 1. Determine which nodes are free and which have been allotted for use. 2. A mechanism to frame chunks of memory into nodes with required number of data items and fields. 3. Obtain nodes from free storage area [GETNODE(X)]. 4. Return/dispose of nodes again to free pool area [RETURN(X)]. Note: GETNODE() and RETURN() are OS system calls Hence dynamic allocation of memory solves the problem of static allocation of arrays. The classification of linked list done on the basis of the arrangement of nodes is as follows: LINKED LIST SINGLY LINKED LIST DOUBLY LINKED LIST CIRCULARLY LINKED LIST SINGLY LINKED LIST A linked list is a linear collection of specially designed data elements, called nodes, linked to another by means of pointers. Each node is divided into two parts: the first part contains information of the element and the second part contains the address of the next node in the linked list. Address part of the node is also called link or next field. As the name implies only one pointer is in every node. The above diagram shows a linked list with 3 nodes. Each node is divided into two parts. The left part of each node contains the data items and the right part represents the address of the next node; there is an arrow drawn from it to the next node. The next pointer of the last node contains a special value, called the NULL pointer, which does not point to any address of the node (ie) NULL pointer indicates the end of the linked list. START pointer will hold the address of the 1st node in the list START =NULL if there is no list (i.e.; NULL list or empty list). The data field can be integer, character, a string or even a large record. Insert Operation in Singly Linked List The below procedure is used to insert a new node X to the right of an existing node NODE in singly linked list whose first node is pointed by START pointer. It is mandatory to tell the position after or before the insertion has to be performed. Procedure INSERT (START, NODE, ITEM) Call GETNODE(X); DATA(X) = ITEM; If (START=NULL) then { LINK(X)=NULL; START=X; } else { LINK(X)=LINK(NODE); LINK(NODE)=X; } End INSERT. NODE LINK(NODE) X Item 1 2 NODE LINK(NODE) X Delete Operation in Singly Linked List The below procedure is used to delete a node which is right of NODE in the existing singly linked list. It is possible to delete a node if and only if its previous node is known. Procedure DELETE (START, NODE) If (START=NULL) then Call ABONDON-DELETE; else { TEMP=LINK(NODE); LINK(NODE)=LINK(TEMP); Call RETURN(TEMP); } End DELETE. NODE TEMP LINK(TEMP) NODE TEMP X X DOUBLY LINKED LIST A doubly-linked list is a linked data structure consists of a set of sequentially linked records called nodes. As the name implies, each node contains two fields, called links (LLINK, RLINK), which are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. The two-node links allow traversal of the list in either direction. Advantages: 1. The availability of 2 links LLINK and RLINK permit forward and backward movement during the processing of list. 2. The deletion of a node X from the list only for the value X to be known. Disadvantages: 1. Obviously, we are in need of more space to construct doubly linked list as a minimum of 3 parts are needed to construct a node. While adding or removing a node in a doubly-linked list requires changing more links than the same operations on a singly linked list, the operations are simpler and potentially more efficient (for nodes other than first nodes) because there is no need to keep track of the previous node during traversal or no need to traverse the list to find the previous node, so that its link can be modified. The availability of two links LLINK and RLINK permit forward and backward movement during the processing of list. Insert Operation in Doubly Linked List The below procedure is used to insert a new node X whose data part is ITEM to the right of an existing node Y in doubly linked list. Procedure INSERT-DL(X,Y,ITEM) Call GETNODE(X); DATA(X)=ITEM; LLINK(X)=Y; RLINK(Y)=X; LLINK(RLINK(Y))=X; RLINK(X)=RLINK(Y); End INSERT-DL. Y RLINK(Y) LLINK RLINK LLINK RLINK X 1 LLINK RLINK 4 Y 3 2 RLINK(Y) X LLINK RLINK LLINK RLINK Delete Operation in Doubly Linked List The below procedure is used to delete a node from a doubly linked list START. Procedure DELETE-DL(START,X) if (START=NULL) then Call ABONDON-DELETE; else { RLINK(LLINK(X))=RLINK(X); LLINK(RLINK(X))=LLINK(X); Call RETURN(X); } End DELETE-DL. LLINK(X) X RLINK(X) LLINK RLINK LLINK RLINK LLINK RLINK 1 LLINK(X) X RLINK(X) LLINK RLINK X LLINK RLINK X LLINK RLINK 2 CIRCULAR LINKED LIST A variant of the singly linked list is circular linked list in which the last node’s link part is pointing to the first node. Advantages:  Accessibility of a node- one can access any node from any given node due to the circular movement permitted by links.  Relative efficiency in the implementation of list-based operations like merging of 2 lists, splitting of a list into two, erasing a list etc. Disadvantages:  During processing, one has to make sure that one does not get into an infinite loop owing to the circular nature of the queue. A solution to the infinite loop in circular linked list is having a special node called header nodes. So, the list may be called a Headed circularly linked list or circular linked list with head node. The list can never be empty and will have a empty head node. If LINK(HEAD)=HEAD, it implies that the circular list is empty. Usually, the data field of the header node is empty, represented by a shaded portion in diagram. Insert Operation in Circular Linked List The following algorithm will insert a new node X to the right of the existing element NODE in a list. Procedure INSERT_CLL (HEAD,X,NODE,ITEM) Call GETNODE(X); DATA(X)=ITEM; IF (LINK(HEAD)=HEAD) THEN { LINK(HEAD)=X; LINK(X)=HEAD; } ELSE { LINK(X)=LINK(NODE); LINK(NODE)=X; } End INSERT_CLL. NODE LINK( NODE) X Item 1 2 NODE LINK( NODE) X Delete Operation in Circular Linked List The following procedure will delete the node existing to the right of the element NODE from a circular linked list. Procedure DELETE_CLL(HEAD,NODE) IF (HEAD=NULL) THEN CALL ABONDON-DELETE; ELSE { TEMP=LINK(NODE) LINK(NODE)=LINK(TEMP) CALL RETURN(TEMP) } End DELETE_CLL. NODE TEMP LINK( TEMP) NODE TEMP X X STACKS Linear Lists allow the insertion and deletion of elements at any position. If we put a restriction on the lists such that the insertion and deletion can occur only at one end, we get an important subclass of lists called as Stack. Eg: Pile of coins, Pile of plates, Coaches of a train. In stack, the user can add or delete the element only from one end called as Top of the stack. Manipulation is done only at the top item. Eg: TOP 21 TOP 12 12 TOP 12 10 10 10 TOP 10 3 3 3 3 A new element 21 is added at the top of the stack. Similarly, the element to be deleted is also 21-the top of the stack. During deletion, the element 12 is removed. The insertion of the element is called as PUSH and the deletion of the element is called as POP. It needs to be observed that during insertion of elements into the stack it is essential that their identities are specified, whereas for removal no identity need be specified since by virtue of its functionality, the element which occupies the top of stack position is automatically removed. Since the elements inserted into the stack join last and those added last are the first to be removed, the stack belongs to LIFO (Last In First Out) data structure. The stack can be represented as an array STACK[n] i.e., it can store “n” elements. It is therefore becoming essential to issue a warning called STACK-FULL when we try to insert more elements (PUSH) than “n”. Similarly, when we try to delete an element (POP) from the stack which is already empty, a warning should be issued to indicate STACK-EMPTY. Therefore, overflow and underflow should be properly indicated to the users during push and pop operations respectively. Primitive operations on Stack PUSH OPERATION: Let STACK [1:n] be an array representing stack and top be the pointer indicating the top element of the stack. Let top is initialized to 0, “item” is the element to be pushed into the stack and “n” is the maximum capacity of the stack. STACK-FULL is the procedure used to take remedial action for overflow of stack and print respective error message. 5 TOP 55 5 TOP 55 5 TOP 44 4 44 4 44 4 33 3 33 3 33 3 22 2 22 2 22 2 11 1 11 1 11 1 TOP=4, n=5 PUSH 55, TOP=4 PUSH 66, Failure! Procedure PUSH (STACK,N,TOP,ITEM) IF (TOP=N) THEN CALL STACK-FULL; ELSE { TOP=TOP+1; STACK[TOP]=ITEM; } End PUSH. POP OPERATION: In case of POP, no element identity need be specified since by default the element occupying the top of stack position is deleted. STACK-EMPTY is the procedure used to do alternative measure to tackle underflow of stack by printing respective error message. Procedure POP (STACK,TOP,ITEM) IF (TOP=0) THEN CALL STACK-EMPTY; ELSE { ITEM=STACK[TOP]; Print “The element deleted is”, ITEM TOP=TOP-1; } End POP. 5 5 5 4 4 4 3 3 3 TOP 22 2 2 2 11 1 TOP 11 1 1 TOP=2, POP TOP=1, POP TOP=0, POP Failure! PEEP OPERATION: Peep is the operation used to display the top element of the stack without removing the element from the stack. Procedure PEEP(STACK,TOP) IF (TOP=0) THEN CALL STACK-EMPTY; ELSE Print “THE TOP ELEMENT=”,STACK[TOP] End PEEP. 3 3 TOP 22 2 TOP 22 2 11 1 11 1 TOP=2, PEEP 22 will be printed & TOP=2 APPLICATIONS OF STACK Stacks have been used in various computer science applications like recursion, syntax checking, checking of well formedness of parenthesis, conversion of infix to postfix expression, evaluation of postfix expression, memory management etc. a) Infix to Postfix Expression (Notation): An expression is the combination of operators and operands. It can be expressed in three forms as infix, prefix and postfix. They differ from each other in the position of operators. If the operator is placed in between operands, it is called as Infix expression. Eg: a+b If the operator is placed in before operands, it is called as Prefix expression. Eg: +ab If the operator is placed in after operands, it is called as Postfix expression. Eg: ab+ To convert an infix to postfix notation, we need to find out the order of execution of the operators based on their hierarchy. The precedence of operators is 1. () Parenthesis 2. ^ Exponential operator 3. * / Multiplication, Division 4. + - Addition, Subtraction Eg: Conversion of infix notation A+ (B-C/D)*E to postfix notation Note: The expression is ended with $ and the bottom of the stack is also padded with $. Procedure INFIX-POSTFIX-CONV(E) x=getnextchar(E); while x ≠ “$”do case x of x is an operand: Print “x”; x =”)”: While (top element of stack ≠ ”(“) do Print top element of stack and pop; End While Pop “(“ from stack; Else: while ICP(x)ISP(*) B $*( AB Since B is an operand + $*(+ AB Since ICP(+)>ISP(() C $*(+ ABC Since C is an operand ) $* ABC+ Pop till “(” comes - $- ABC+* Since ICP(-)

Data Structures and Algorithms (DSA) PDF

Document Details

Tags

Related

Summary

Full Transcript