Module-1 (DS) PDF
Document Details
Uploaded by TougherDwarf
KVT
Tags
Summary
This document introduces fundamental data structures concepts, focusing on data types and their characteristics within C programming. It describes various data types, including integers and floating-point values, and explains how they're used and stored. The content also briefly covers important details within data structure algorithms.
Full Transcript
UNIT - 1 Data: Data is derived from the word “Datum” meaning “fact”. Data means raw facts and figures. Data is also called as unprocessed information. Data is a set of qualitative or quantitative values. Data must be interp...
UNIT - 1 Data: Data is derived from the word “Datum” meaning “fact”. Data means raw facts and figures. Data is also called as unprocessed information. Data is a set of qualitative or quantitative values. Data must be interpreted by a human of machine to derive its meaning. So data doesn’t have meaning. It is not useful for business decision making. Data can take many forms Numeric data, alphanumeric data, text data, image data and audio e.t.c. Data Types: A data type is a term that refers to the type of data values that may be used for processing and computing. A data type defines a set of values that a variable can store and a set of operations that can be performed on that variable. C defines five basic data types: character, integer, floating-point, double floating-point, and valueless. These are declared using char, int, float, double, and void, respectively. 1. Integer data type: Integer data type allows a variable to store numeric values. “int” keyword is used to refer integer data type. The storage size of integer data type is 2 or 4 or 8 byte. It varies depend upon the processor in the CPU that we use. If we are using 16 bit processor, 2 byte (16 bit) of memory will be allocated for int data type. Like wise, 4 byte (32 bit) of memory for 32 bit processor and 8 byte (64 bit) of memory for 64 bit processor is allocated for int datatype. int (2 byte) can store values from -32,768 to +32,767, int (4 byte) can store values from - 2,147,483,648 to +2,147,483,647. If you want to use the integer value that crosses the above limit, you can go for “long int” and “long long int” for which the limits are very high. Note: We can’t store decimal values using int data type. If we use int data type to store decimal values, decimal values will be truncated and we will get only whole number. In this case, float data type can be used to store decimal values in a variable. 1 Character data type: Character data type allows a variable to store only one character. Storage size of character data type is 1 byte. We can store only one character using character data type. “char” keyword is used to refer character data type. For example, ‘A’ can be stored using char data type. You can’t store more than one character using char data type. Following table gives you details about standard integer types with its storage sizes and value ranges: Type Storage size Value range Char 1 byte -128 to 127 unsigned char 1 byte 0 to 255 signed char 1 byte -128 to 127 Int 2 or 4 bytes -32,768 to 32,767 or -2,147,483,648 to 2,147,483,647 unsigned int 2 or 4 bytes 0 to 65,535 or 0 to 4,294,967,295 Short 2 bytes -32,768 to 32,767 unsigned short 2 bytes 0 to 65,535 Long 4 bytes -2,147,483,648 to 2,147,483,647 unsigned long 4 bytes 0 to 4,294,967,295 Void 0 bytes Value less 3. Floating point data type: Floating point data type consists of 2 types. They are, float double 1. float: Float data type allows a variable to store decimal values. Storage size of float data type is 4 bytes. This also varies depend upon the processor in the CPU as “int” data type. We can use up-to 6 digits after decimal using float data type. For example, 10.456789 can be stored in a variable using float data type. 2. double: 1. Double data type is also same as float data type which allows up-to 10 digits after decimal. 1. Storage size of float data type is 8 bytes. Following table gives you details about standard floating-point types with storage sizes and value ranges and their precision: Type Storage size Value range Precision Float 4 bytes 1.2E-38 to 3.4E+38 6 decimal places Double 8 bytes 2.3E-308 to 1.7E+308 15 decimal places long double 10 bytes 3.4E-4932 to 1.1E+4932 19 decimal places 2 The void Type Void is an empty data type that has no value. The void type specifies that no value is available. This can be used in functions and pointers. It is used in three kinds of situations: S.N. Types and Description Function returns as void To specify the return type of a function (when the function returns no value). 1 A function with no return value has the return type as void. For example: void exit (int status); Function arguments as void 2 To specify the parameters of the function. A function with no parameter can accept as a void. For example, int rand(void); Pointers to void To create generic pointers (void*) 3 For example, a memory allocation function void *malloc( size_t size ); returns a pointer to void which can be casted to any data type. Data Structure: Data Structure is a way of collecting and organizing data in such a way that we can perform operations on these data in an effective way. A data structure is a particular way of organizing data in a computer so that it can be used efficiently. An implementation of abstract data type is a data structure i.e, a mathematical model or logical model of particular organization of data is called data structures. Classification of data sturctures: Different types of data structures: i) Primitive data structure ii) Non primitive data structure 1. Primitive Data Structures Primitive Data Structures are the basic data structures that directly operate upon the machine instructions. They have different representations on different computers. Integers, floating point numbers, character constants, string constants and pointers come under this category. 3 2. Non -primitive Data Structures Non-primitive data structures are more complex data structures and are derived from primitive data structures. They emphasize on grouping same or different data items with relationship between each data item. Arrays, lists and files come under this category. Non -primitive Data Structures are divided into two types: i. Linear Data Structures ii. Non Linear Data Structures Linear Data Structures: In linear data structures, values are arranged in linear fashion. A list which displays the relationship of adjacency between the elements is said to be linear. Arrays, linked lists, stacks and queues are examples of linear data structures in which values are stored in a sequence. Array: An array is a collection of variables of the same type that are referred to through a common name. Operations on arrays are serarching, insertion, deletion of an element. Linked list: linked list is a linear data structure which consist a group of nodes together represent a sequence. Each node is composed of data and an address (in other words, a link) to the next node in the sequence. The basic operations in a single linked list are: Creation. Insertion. Deletion. Traversing. Stack: A stack is a list of elements in which an element may be inserted or deleted only at one end, called the top of the stack. Stacks are sometimes known as LIFO (last in, first out) lists. i.e. the last item to be added to a stack is the first item to be removed. The two basic operations associated with stacks are: 1. Push: is the term used to insert an element into a stack. 2. Pop: is the term used to delete an element from a stack. Queue: Queue is a linear data structure, in which the first element is inserted from one end called REAR(also called tail), and the deletion of exisiting element takes place from the other end called as FRONT(also 4 called head). This makes queue as FIFO data structure, which means that element inserted first will also be removed first. The following operations on queues are: enqueue: The process of adding an element at the end of the queue is called Enqueue. dequeue: The process of removing an element at the front of the queue is called Dequeue. Non Linear Data Structures: This type is opposite to linear. The data values in this structure are not arranged in order. Trees and graphs are examples of non - linear data structures. Tree: A tree is hierarchical collection of nodes. One of the nodes, known as the root, is at the top of the hierarchy. Each node can have at most one link coming into it. The node where the link originates is called the parent node. The root node has no parent. The links leaving a node (any number of links are allowed) point to child nodes. Trees are recursive structures. Each child node is itself the root of a subtree. At the bottom of the tree are leaf nodes, which have no children. Graph: Graph G is a pair (V, E), where V is a finite set of vertices and E is a finite set of edges. A graph is generally displayed by following figure, in which the vertices are represented by circles and the edges by lines. 5 Operations on data structures: Traversal: accessing each node/element exactly once in the list. (This accessing and processing is sometimes called “visiting” the node or element.). Inserting: Adding a new node/element to the data structure. Deleting: Removing a node/element from the data structure. Searching: Finding the location of the desired node/element in the data structure. Storage structures – Sequential(Contiguous) and linked storage(Non-Contiguous) representations In contiguous structures, terms of data are kept together in memory (either RAM or in a file). An array is an example of a contiguous structure. Since each element in the array is located next to one or two other elements. In contrast, items in a non-contiguous structure, data scattered in memory, but we linked to each other in some way. A linked list is an example of a non-contiguous data structure. Here, the nodes of the list are linked together using pointers stored in each node. Difference between Sequential Storage structure and Linked Storage structure. S. Sequential Allocation Linked Allocation No 1. Insertions and deletions are difficult. Insertions and deletions can be done easily. It needs movements of elements for It does not need movement of elements for 2. insertion and deletion. insertion and deletion. 3. In it space is wasted. In it space is not wasted. 4. It is more expensive. It is less expensive. It requires less space as only It requires more space as pointers are also stored 5. information is stored. along with information. 6. Its size is fixed. Its size is not fixed. It cannot be extended or reduced It can be extended or reduced according to 7. according to requirements. requirements. Same amount of time is required to Different amount of time is required to access 8. access each element. each element. Elements are stored in consecutive Elements may or may not be stored in 9. memory locations. consecutive memory locations. If we have to go to a particular element then we If have to go to a particular element 10. have to go through all those elements that come then we can reach there directly. before that element. 6 File Structures: A file is a collection of data stored on mass storage (e.g., disk or tape) OR A file can be seen as 1. A stream of bytes (no structure), or 2. A collection of records with fields. The data is subdivided into records (e.g., student information). Each record contains a number of fields (e.g., student number, name, address). And each field has a data type. One (or more) field is the key field (e.g., student number). ACCESS METHODS: The access method determines how we will access (retrieve) information from the file. a) Sequential access b) Random access. Sometimes we need to process records one after another i.e, sequential access. Sometimes we need to access a specific record quickly without retrieving the preceding records. This is called random access. 1. sequential files 2. indexed files and 3. hashed files. Sequential Files Records are conceptually organized in a sequential list and can only be accessed sequentially. In sequential file organization, records are stored contiguously on the storage device, such as tape or disk, and there is an EOF (end of file) marker after the last record. Sequential files are read from beginning to end. Some operations are very efficient on sequential files (e.g. finding averages). The Operating system has no information about the record addresses; it only knows where the whole file is stored. The only thing known to the operating system is that the records are sequential. 7 Convenient way to batch (group together) a number of updates: Store the file in sorted order of key field. Sort the updates in increasing order of key field. Scan through the file once, doing each update in order as the matching record is reached. Note: it is not a convenient organization for accessing a particular record quickly. Indexed Files Sequential search is even slower on disk/tape than in main memory. To improve performance use more sophisticated file structures such as indexed files. An index for a file is a list of key field values occurring in the file along with the address of the corresponding record in the mass storage. Typically the key field is much smaller than the entire record, so the index will fit in main memory. The index can be organized as a list, a search tree, a hash table, etc. To find a particular record: Search the index for the desired key. When the search returns the index entry, extract the record’s address on mass storage. Access the mass storage at the given address to get the desired record. Multiple indexes, one per key field, allow searches based on different fields. 8 For a huge database structure, it can be almost next to impossible to search all the index values through all its level and then reach the destination data block to retrieve the desired data. Hashing is an effective technique to calculate the direct location of a data record on the disk without using index structure. Hashed Files A hashed file uses a mathematical function called hash function is used to calculate the address of the block to store the records. The user gives the key, the function maps the key to the address and passes it to the operating system, and the record is retrieved Hash Organization: Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records. Hash Function − A hash function is a mapping function that maps all the set of search-keys (Key) to the address where actual records are placed. It is a function from search keys to bucket addresses. Example: For example we have 5 buckets, 0 to 4. Given data element as a “key”. Hash function : key mod 5 key%5 which returns the address of the bucket where the data(key) is stored. 9 Abstract Data Type (ADT) An abstract data type is a datatype that consists of data as well as the operations to be performed on the data while hiding implementation. An ADT is a mathematical model of a data structure that specifies the type of data stored, the operations supported on them, and the types of parameters of the operations. An ADT specifies what each operation does, but not how it does it. For example, a stack is a typical abstract data type. Items stored in a stack can only be added and removed in certain order – the last item added is the first item removed. We call these operations, pushing and popping. In this definition, we haven’t specified have items are stored on the stack, or how the items are pushed and popped. We have only specified the valid operations that can be performed. For example, if we want to read a file, we wrote the code to read the physical file device. That is, we may have to write the same code over and over again. So we created what is known Lecture Notes 4 Dept. of Information Technology today as an ADT. We wrote the code to read a file and placed it in a library for a programmer to use. As another example, the code to read from a keyboard is an ADT. It has a data structure, character and set of operations that can be used to read that data structure. To be made useful, an abstract data type (such as stack) has to be implemented and this is where data structure comes into ply. For instance, we might choose the simple data structure of an array to represent the stack, and then define the appropriate indexing operations to perform pushing and popping. A good abstract data type should be representation independent. This means that the use of an abstract type is independent of its representation (the actual data structure or data fields used to implement it), so that changes in representation have no effect on code outside the abstract type itself. For example, the operations offered by List are independent of whether the list is represented as a linked list or as an array. You won’t be able to change the representation of an ADT at all unless its operations are fully specified with preconditions (requires), post conditions (effects), and frame conditions (modifies), so that clients know what to depend on, and you know what you can safely change. Arrays: Array: An array is a collection of variables of the same type that are referred to through a common name. A specific element in an array is accessed by an index. In C, all arrays consist of contiguous Memory locations. The lowest address corresponds to the first element and the highest address to the last element. The most common array is the string, which is simply an array of characters terminated by a null. Arrays can have from one to several dimensions. Types of Array:- i. One-dimensional arrays ii. Multi-dimensional arrays a) Two-dimensional arrays b)Three-dimensional array and so on. 10 One-Dimension Arrays: A list of items can be given with one variable name using only one subscript and such a variable is called a single-subscripted variable or a One-dimensional array. The general form for declaring a single-dimension array is type var_name[size]; Here, type declares the base type of the array, which is the type of each element in the array, and size defines how many elements the array will hold. The size of an array must be specified using a constant expression. Thus, the size of an array is fixed at compile time. Array name specifies that all the elements of an array share the same name and the elements of an array are distinguished from one another with the help of an element number (or) index number. Example: We want to represent a set of 5 numbers, declare the array of variable name ‘num’. int num; The computer reserves five storage locations as following. num num num num num 4002 4004 4006 4008 4010 Example2: double balance; It declare a 100-element array called balance of type double. Initializing Array:- Syntax:- < arrayname>[size]={list of values}; Example:- int num={10,20,30,40,50}; Memory Map With One-Dimensional Array:- num num num num num 10 20 30 40 50 4002 4004 4006 4008 4010 The amount of storage required to hold an array is directly related to its type and size. Size of a single dimension array: total bytes = sizeof(base type) × length of array; example: int a; total bytes = 4 x 5 =20 bytes of memory is allocated to the variable ‘a’ at compile time. OPERATIONS ON ARRAYS: 11 1. Retrieval of an element 2. Searching an element 3. Insertion of an element 4. Deletion of an element Retrieval(Accessing) Elements of an Array: To get or access an element from an array by using index or position is called retrieval of an element. An element of an array is accessed by indexing the array name. This is done by placing the index of the element within square brackets after the name of the array. This index number specifies the element’s position in the array. Index number starts with 0. Thus, num is not the second element of the array, but it is the third. Example: void main() { int a={10,20,3,15,25}; int i; //retrieving first and fifth elements from array elements printf(“first element =%d \n”,a); printf(“fifth element =%d \n”,a); //retrieving all the elements from array elements for ( i = 0 ; i