Data Structures Basics | Introduction to Data Structures and Algorithms | PDF

Summary

This document introduces fundamental data structures and algorithms in computer science. It explains arrays, linked lists, stacks, queues, trees, and graphs, along with their properties and uses. Searching and sorting algorithms are also discussed. Keywords include data structures, algorithms, and the different types of data storage.

Full Transcript

**UNIT-I** **Data Structures** - Definition, Classification of Data Structures, Operations on Data Structures, Abstract Data Type (ADT), Preliminaries of algorithms. Time and Space complexity. **Searching** - Linear search, Binary search **Sorting**- Insertion sort, Selection sort, Exchange (Bubb...

**UNIT-I** **Data Structures** - Definition, Classification of Data Structures, Operations on Data Structures, Abstract Data Type (ADT), Preliminaries of algorithms. Time and Space complexity. **Searching** - Linear search, Binary search **Sorting**- Insertion sort, Selection sort, Exchange (Bubble sort, quick sort), Merge sort. \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- **Data Structures: Definition** Data structures are fundamental concepts in computer science that help organize, manage, and store data efficiently. They form the backbone of algorithms and programming, enabling better performance and problem-solving. **Classification of Data Structures:-** Data Structures are categorized into two types: Primitive and Non-Primitive. **Primitive Data Structures:**- These are the fundamental data types which are supported by a programming language. Some of the basic data types are integer, real, character and boolean. **Non-primitive Data Structures:-** These are the data structures which are created using primitive data structures. It can further be classified into 2 categories: A. Linear Data Structures. B) Non-Linear Data Structures. A. **Linear Data Structures:-** A data structure is called linear if all of its elements are arranged in linear order. It can be represented in memory in 2 different ways: - Linear relationship between elements by means of sequential memory locations. - Linear relationship between elements by means of links. It can further be classified into 4 Types: 1\) Arrays. 2) Linked Lists. 3) Stacks. 4) Queues. **1) Arrays:-**An array is a collection of similar type of data items and each data item is called an element of the array. ![Creating Arrays in Your Programs - Dev.java](media/image2.png) The data type of the element may be any valid data type like char, int, float or double. The elements of array share the same variable name but each one carries a different index number known as subscript. The array can be one dimensional, two dimensional or multidimensional. **Disadvantages:-** It have fixed size. Data elements are stored in contiguous memory locations which may not be always available. **2) Linked Lists:-** Linked list is a linear data structure which is used to maintain a list in the memory. It can be seen as the collection of nodes stored at non-contiguous memory locations. Each node of the list contains a pointer to its adjacent node. Last node contains the NULL pointer indicates the end of the list. Basic Terminologies of Linked List - GeeksforGeeks **Advantage**:- Easier to insert or delete data elements. **Disadvantage**:- Slow search operation and requires more memory space. **3) Stacks:-** Stack is a linear list in which insertion and deletions are allowed only at one end called top. ![Define stack - Computer Science Class 12 - Teachoo - Past Year - 2 Mar](media/image4.png) MAX is variable used to store the maximum number of elements that stack can store. A stack is an abstract data type (ADT), can be implemented in most of the programming languages. It is also called as **LIFO(Last-In-First-Out).** **Push(), pop(), peep()** are 3 basic operations that stack supports. If top=NULL indicates stack is empty and **top=MAX-1**, indicates stack is full. **4) Queues:-** Queue is a linear list in which elements can be inserted only at one end called rear and deleted only at the other end called front. Queue is opened at both end therefore it follows **First-In-First-Out (FIFO**) methodology for storing the data items. MAX is variable used to store the maximum number of elements that queue can store. If rear=MAX-1, indicates queue is full. If front=NULL and rear=NULL indicates queue is empty. **Non Linear Data Structures:-** 1)Trees. 2)Graphs. **Trees:-** ![](media/image6.jpeg) The bottom most nodes in the hierarchy are called leaf node while the top most node is called root node. Each node contains pointers to point adjacent nodes. If root=NULL , indicates tree is empty. It consists of root node, left and right sub-trees. These subtrees are also binary trees. **Graphs:-** Graphs can be defined as the pictorial representation of the set of elements (represented by vertices) connected by the links known as edges. It is different from tree i.e. a graph can have cycle while the tree cannot have the one. **OPERATIONS ON DATA STRUCTURES:** **The different operations that can be performed on the various data structures previously mentioned.** **1.Traversing:** **It means to access each data item exactly once so that it can be processed. For example, to print the names of all the students in a class.** **2. Searching:** **It is used to find the location of one or more data items that satisfy the given constraint. Such a data item may or may not be present in the given collection of data items. For example, to find the names of all the students who secured 100 marks in mathematics.** **3. Inserting:** **It is used to add new data items to the given list of data items. For example, to add the details of a new student who has recently joined the course?** **4. Deleting*:*** **It means to remove (delete) a particular data item from the given collection of data items. For example, to delete the name of a student who has left the course?** **5. Sorting:** **Data items can be arranged in some order like ascending order or descending order depending on the type of application. For example, arranging the names of students in a class in an alphabetical order, or calculating the top three winners by arranging the participants' scores in descending order and then extracting the top three.** **6.Merging:** **Lists of two sorted data items can be combined to form a single list of sorted data** **items.** **Many a time, two or more operations are applied simultaneously in a given situation.** **For example, if we want to delete the details of a student whose name is X, then we first have to search the list of students to find whether the record of X exists or not and if it exists then at which location, so that the details can be deleted from that particular location.** **ABSTRACT DATA TYPE:** **An *abstract data type* (ADT) is the way we look at a data structure, focusing on what it does and ignoring how it does its job. For example, stacks and queues are perfect examples of an ADT. We can implement both these ADTs using an array or a linked list. This demonstrates the 'abstract' nature of stacks and queues.** **To further understand the meaning of an abstract data type, we will break the term into 'data type' and 'abstract', and then discuss their meanings.** **Data type:** **Data type of a variable is the set of values that the variable can take. We have already read the basic data types in C include int, char, float, and double. When we talk about a primitive type (built-in data type), we actually consider two things: a data item with certain characteristics and the permissible operations on that data. For example, an int variable can contain any whole-number value from --32768 to 32767 and can be operated with the operators +, --, \*, and /. In other words, the operations that can be performed on a data type are an inseparable part of its identity. Therefore, when we declare a variable of an abstract data type (e.g., stack or a queue), we also need to specify the operations that can be performed on it.** **Abstract:** **The word 'abstract' in the context of data structures means considered apart from the detailed specifications or implementation.** **In C, an abstract data type can be a structure considered without regard to its implementation.** **It can be thought of as a 'description' of the data in the structure with a list of operations that can be performed on the data within that structure.** **The end-user is not concerned about the details of how the methods carry out their tasks. They are only aware of the methods that are available to them and are only concerned about calling those methods and getting the results. They are not concerned about how they work.** **For example, when we use a stack or a queue, the user is concerned only with the type of data and the operations that can be performed on it. Therefore, the fundamentals of how the data is stored should be invisible to the user. They should not be concerned with how the methods work or what structures are being used to store the data. They should just know that to work with stacks, they have push() and pop() functions available to them. Using these functions, they can manipulate the data (insertion or deletion) stored in the stack.** **Advantage of using ADTs:** **In the real world, programs evolve as a result of new requirements or constraints, so a modification to a program commonly requires a change in one or more of its data structures. For example, if you want to add a new field to a student's record to keep track of more information about each student, then it will be better to replace an array with a linked structure to improve the program's efficiency. In such a scenario, rewriting every procedure that uses the changed structure is not desirable. Therefore, a better alternative is to separate the use of a data structure from the details of its implementation. This is the principle underlying the use of abstract data types.** **PRELIMINARIES OF ALGORITHMS:** **The typical definition of algorithm is 'a formally defined procedure for performing some calculation'. An algorithm is basically a set of instructions that solve a problem. In general terms, an algorithm provides a blueprint to write a program to solve a particular problem. It is considered to be an effective procedure for solving a problem in finite number of steps. That is, a well-defined algorithm always provides an answer and is guaranteed to terminate.** **Characteristics of an Algorithm:** **I.Input: Externally we have to supply '0' or '1' input.** **II.Output: After supplying of input we have to produce at least one output.** **III.Finiteness: The algorithm must be terminate (or) end at some point.** **IV.Definiteness: We have to define clear and unambiguous statements.** **V.Effectiveness: It should allow only necessary statements, need not to allow unnecessary statements.** **Different approaches to designing an algorithm:** **There are two main approaches to design an algorithm. They are: top-down approach and bottom-up approach, as shown below.** ![](media/image8.png) **Top-down approach: A top-down design approach starts by dividing the complex algorithm into one or more modules. These modules can further be decomposed into one or more sub- modules, and this process of decomposition is iterated until the desired level of module complexity is achieved. Top-down design method is a form of stepwise refinement where we begin with the topmost module and incrementally add modules that it calls. Therefore, in a top-down approach, we start from an abstract design and then at each step, this design is refined into more concrete levels until a level is reached that requires no further refinement.** **Bottom-up approach: A bottom-up approach is just the reverse of top-down approach. In the bottom-up design, we start with designing the most basic or concrete modules and then proceed towards designing higher level modules. The higher level modules are implemented by using the operations performed by lower level modules. Thus, in this approach sub- modules are grouped together to form a higher level module. All the higher level modules are clubbed together to form even higher level modules. This process is repeated until the design of the complete algorithm is obtained.** TIME AND SPACE COMPLEXITY: -------------------------- Analysing an algorithm means determining the amount of resources (such as time and memory) needed to execute it. Algorithms are generally designed to work with an arbitrary number of inputs, so the efficiency or complexity of an algorithm is stated in terms of time and space complexity. The ***time complexity*** of an algorithm is basically the running time of a program as a function of the input size. Similarly, the ***space complexity*** of an algorithm is the amount of computer memory that is required during the program execution as a function of the input size. In other words, the number of machine instructions which a program executes is called its time complexity. This number is primarily dependent on the size of the program's input and the algorithm used. Generally, the space needed by a program depends on the following two parts: - **Fixed part:** It varies from problem to problem. It includes the space needed for storing instructions, constants, variables, and structured variables (like arrays and structures). - **Variable part: It varies from program to program. It includes the space needed for recursion stack, and for structured variables that are allocated space dynamically during the runtime of a program.** **However, running time requirements are more critical than memory requirements.** **Therefore, in this section, we will concentrate on the running time efficiency of algorithms.** **Worst-case, Average-case, Best-case, and Amortized Time Complexity: Worst-case running time*:*** **This denotes the behavior of an algorithm with respect to the worst possible case of the input instance. The worst-case running time of an algorithm is an upper bound on the running time for any input. Therefore, having the knowledge of worst-case running time gives us an assurance that the algorithm will never go beyond this time limit.** **Average-case running time:** **The average-case running time of an algorithm is an estimate of the running time for an 'average' input. It specifies the expected behavior of the algorithm when the input is randomly drawn from a given distribution. Average-case running time assumes that all inputs of a given size are equally likely.** **Best-case running time*:*** **The term 'best-case performance' is used to analyze an algorithm under optimal conditions. For example, the best case for a simple linear search on an array occurs when the desired element is the first in the list. However, while developing and choosing an algorithm to solve a problem, we hardly base our decision on the best-case performance. It is always recommended to improve the average performance and the worst-case performance of an algorithm.** **The commonly used notations used for calculating the running time complexity of an algorithm is given below:** **1)Big oh Notation (Ο). 2)Omega Notation (Ω). 3)Theta Notation (θ).** **Big oh Notation (O):-** - **It is the formal way to express the upper boundary of an algorithm running time.** - It measures the worst case of time complexity or the longest amount of time, algorithm takes to complete their operation. ### Omega Notation (Ω):- - It is the formal way to represent the lower bound of an algorithm\'s running time. - It measures the best amount of time an algorithm can possibly take to complete or the best case time complexity. ### Theta Notation (θ):- **Searching:-** **It means to find whether a particular value is present in array or not. If the value is present in the array, then searching is said to be successful and the searching process gives the location of that value in the array. If the value is not present in the array, the searching process displays an appropriate message and in this case searching is said to be unsuccessful.** **Algorithm:-** **linear search (a, n, val)** **Step1: START** **Step2: set pos=-1** **Step3: set i=0** **Step4: repeat step5 while i\ - - **Space Complexity:** - **Advantages:** - - - - **Dis advantages:** - - **Selection Sort** **Selection Sort** is a simple comparison-based sorting algorithm. It works by dividing the array into two parts: - The sorted part, which initially contains no elements. - The unsorted part, which initially contains all elements of the array. **Steps of Selection Sort:** 1. Start with the entire array as the unsorted portion. 2. Find the smallest (or largest) element from the unsorted part of the array. 3. Swap it with the first unsorted element. 4. Move the boundary of the unsorted portion one step to the right (i.e., the first element becomes part of the sorted portion). 5. Repeat steps 2 to 4 until the entire array is sorted. ![](media/image22.png) **Algorithm:-** SELECTION SORT(ARR, N) ---------------------- Step 1: Repeat Steps 2 and 3 for K = 1 to N-1 Step 2: CALL SMALLEST(ARR, K, N, POS) Step 3: SWAP A\[K\] with ARR\[POS\] \[END OF LOOP\] Step 4: EXIT SMALLEST (ARR, K, N, POS) ------------------------- Step 1: \[INITIALIZE\] SET SMALL = ARR\[K\] Step 2: \[INITIALIZE\] SET POS = K Step 3: Repeat for J = K+1 to N -1 IF SMALL \ ARR\[J\] SET SMALL = ARR\[J\] SET POS = J \[END OF IF\] \[END OF LOOP\] Step 4: RETURN POS **Time Complexity**: Best Case: O(n) (when already sorted). Worst Case:O(n\^2) Average Case: O(n\^2) **Space Complexity**: O(1) (in-place sorting). **Advantages of Selection Sort:** - Simple to implement. - It performs fewer swaps compared to other algorithms like Bubble Sort. **Disadvantages of Selection Sort:** - **Inefficient for large datasets** because of its O(n²) time complexity. - It's not stable, meaning equal elements might not preserve their relative order after sorting. **Exchange (Bubble sort, quick sort):** **Bubble sort:** It is also known as Sinking sort. - It is a very simple method that sorts the array elements by repeatedly moving the largest element to the highest index position of the array segment. - The consecutive adjacent pairs of elements in the array are compared with each other. - If the element at the lower index is greater than the element at the higher index, the two elements can be interchanged so that the element is placed before the bigger one. ### Technique:- - In Pass 1, A\[0\] is compared with A\[1\], A\[1\] is compared with A\[2\], A\[2\] is compared with A\[3\] and so on. At the end of pass 1, the largest element of the list is placed at the highest index of the list. - In Pass 2, A\[0\] is compared with A\[1\], A\[1\] is compared with A\[2\] and so on. At the end of Pass 2 the second largest element of the list is placed at the second highest index of the list. - In pass n-1, A\[0\] is compared with A\[1\], A\[1\] is compared with A\[2\] and so on. At the end of this pass. The smallest element of the list is placed at the first index of the list. array of n elements requires n-1 passes for sorting. The Algorithm:- =============== Step 4: EXIT For example, consider an array a\[\] can be initialized as a\[\] = {30,52,29,87,19} ---- ---- ---- ---- ---- 30 52 29 87 19 ---- ---- ---- ---- ---- ### Pass1: i. compare 30 & 52, 30 \< 52, no swapping done. ---- ---- ---- ---- ---- 30 52 29 87 19 ---- ---- ---- ---- ---- ii. compare 52 & 29, 52 \ 29, swapping done. ---- ---- ---- ---- ---- 30 29 52 87 19 ---- ---- ---- ---- ---- iii. compare 52 & 87, 52 \< 87, no swapping done. ---- ---- ---- ---- ---- 30 29 52 87 19 ---- ---- ---- ---- ---- iv. compare 87 & 19, 87 \> 19, swapping done. ---- ---- ---- ---- ---- 30 29 52 19 87 ---- ---- ---- ---- ---- The last comparison step list in the pass1 will be taken as consideration for next pass. ### Pass2: i. compare 30 & 29, 30 \> 29, swapping done. ---- ---- ---- ---- ---- 29 30 52 19 87 ---- ---- ---- ---- ---- ii. compare 30 & 52, 30 \< 52, no swapping done. ---- ---- ---- ---- ---- 29 30 52 19 87 ---- ---- ---- ---- ---- iii. compare 52 & 19, 52 \> 19, swapping done. ---- ---- ---- ---- ---- 29 30 19 52 87 ---- ---- ---- ---- ---- iv. compare 52 & 87, 52 \< 87, no swapping done. ---- ---- ---- ---- ---- 29 30 19 52 87 ---- ---- ---- ---- ---- The last comparison step list in the pass1 will be taken as consideration for next pass. ### Pass3: i. compare 29 & 30, 29 \< 30, no swapping done. ---- ---- ---- ---- ---- 29 30 19 52 87 ---- ---- ---- ---- ---- ii. compare 30 & 19, 30 \> 19, swapping done. ---- ---- ---- ---- ---- 29 19 30 52 87 ---- ---- ---- ---- ---- iii. compare 30 & 52, 30 \< 52, no swapping done. ---- ---- ---- ---- ---- 29 19 30 52 87 ---- ---- ---- ---- ---- iv. compare 52 & 87, 52 \< 87, no swapping done. ---- ---- ---- ---- ---- 29 19 30 52 87 ---- ---- ---- ---- ---- The last comparison step list in the pass1 will be taken as consideration for next pass. ### ### Pass4: compare 29 & 19, 29 \> 19, swapping done. ---- ---- ---- ---- ---- 19 29 30 52 87 ---- ---- ---- ---- ---- compare 29 & 30, 29 \ - **Dis Advantages:** - - **Quick sort:** It is also known as partition exchange sort. This algorithm follows divide-and-conquer approach. Select an element pivot from the array elements. Rearrange the elements in the array in such a way that all elements that are less than the pivot appear before the pivot and all elements greater than the pivot element come after it. After such a partitioning, the pivot is placed in its final position. This is called the partition operation. **Algorithm:-** ![](media/image26.png) ![](media/image28.png) **Time Complexity:** **Worst-case: O(n^2^)**\ **best-case: O(nlogn)**\ **Average Case : O (n log n)** **Merge Sort:** - Merge sort is the algorithm which follows divide, conquer, and combine algorithmic approach. - **Divide** means partitioning the n-element array to be sorted into two sub-arrays by finding mid position in the list. If the array is of length 0 or 1, then it is easily sorted. - **Conquer** means sorting the two sub-arrays recursively using merge sort. - **Combine** means merging the two sorted sub-arrays to produce the sorted array of n elements. - The merge sort algorithm uses a function merge which combines the sub-arrays to form a sorted array. - While merge sort algorithm recursively divide the list in to smaller lists, the merge algorithm conquers the list to sort the elements in individual lists. - Finally the smaller lists are merged to form one list. ![](media/image31.png)Algorithm:- ================================= Step 1: \[INITIALIZE\] SET I = BEG, J = MID + 1, INDEX = 0 Step 2: Repeat while (I \

Use Quizgecko on...
Browser
Browser