CMP2006 Lecture 1: Introduction to Data Structures PDF

Lecture 1 – Course Overview, Efficiency DATA STRUCTURES & Data Types CMP2006 LECTURER - MR. P. SMITH Email: [email protected] Contact: 876-470-5581 Consultation: Mondays & Tuesdays @5pm – 6pm Module Resources: Moodle – Utech Online EXPECTED OUTCOME At the end of this lecture, the student should:  Have a general understanding of what will be covered in this course  Explain the meaning and application of abstract data types (ADTs)  Demonstrate an understanding of structured data types ASSESSMENT BREAKDOWN  Two (2) lab tests worth - 10% each  Two (2) lecture tests worth - 10% each  Continuous Assessment/Participation (Lab) - 10%  One (1) group programming assignment - 20%  One (1) final exam - 30% TEXTBOOK & OTHER RESOURCES  Recommended Textbooks  Data Structures: A pseudocode approach with C++  C++ How to Program, Deitel and Deitel, Pearson/Prentice Hall  Java How to Program, Deitel and Deitel, Pearson/Prentice Hall  Recommended Websites  Cprogramming.com  Short tutorials with examples, exercises and answers http://www.cprogramming.com/algorithms-and-datastructures.html DATA STRUCTURES OVERVIEW In computer terms, a data structure is a Specific way to store and organize data in a computer's memory so that these data can be used efficiently later. Data may be arranged in many different ways such as the logical or mathematical model for a particular organization of data is termed as a data structure. Example: A house can be identified by the house name, location, number of floors and so on. These structured set of variables depend on each other to identify the exact house. Similarly, data structure is a structured set of variables that are linked to each other, which forms the basic component of a system. WHY STUDY DATA STRUCTURES Why is there a need for data structures? It gives different level of organization data. It tells how data can be stored and accessed in its elementary level. Provide operation on group of data, such as adding an item, looking up highest priority item. Provide a means to manage huge amount of data efficiently. Provide fast searching and sorting of data.  This module shall allow you to:  Understand various data structures and their trade-offs  Rigorously analyze the algorithms that use them  Learn how to pick “the right thing for the job”  Be able to make good design choices as a developer, project  Be able to communicate and justify you design decisions WHAT ARE DATA STRUCTURES FEATURES OF DATA STRUCTURES Data structure basically implements two complementary goals:  Correctness: Data structure is designed such that it operates correctly for all kinds of input, which is based on the domain of interest. In other words, correctness forms the primary goal of data structure, which always depends on the specific problems that the data structure is intended to solve.  Efficiency: Data structure also needs to be efficient. It should process the data at high speed without utilizing much of the computer resources such as memory space. In a real time state, the efficiency of a data structure is an important factor that determines the success and failure of the process. FEATURES OF DATA STRUCTURES Some of the important features of data structures are:  Robustness: Generally, all computer programmers wish to produce software that generates correct output for every possible input provided to it, as well as execute efficiently on all hardware platforms. This kind of robust software must be able to manage both valid and invalid inputs.  Adaptability: Developing software projects such as word processors, Web browsers and Internet search engine involves large software systems that work or execute correctly and efficiently for many years. Moreover, software evolves due to ever changing market conditions or due to emerging technologies.  Reusability: Reusability and adaptability go hand-in-hand. It is a known fact that the programmer requires many resources for developing any software, which makes it an expensive enterprise. However, if the software is developed in a reusable and adaptable way, then it can be implemented in most of the future applications. Thus, by implementing quality data structures, it is possible to develop reusable software, which tends to be cost effective and time saving KEY DATA RELATED DEFINITIONS  A type is a set, and the elements of the set are called the values of the type  A data type is simply a set of data, therefore; there are the INTEGER data type, the FLOAT data type, and so on.  Atomic data are data that we choose to consider as a single, non-decomposable entity. Thus, an atomic data type is set of atomic data having identical properties. These properties distinguish one atomic type from another.  Composite data can be broken out into subfields that have meaning. As an example of a composite data item, consider your telephone number. There are actually three different parts to a telephone number. First, there is the area code. Then, what you consider to be your phone number is actually two different data items, a prefix consisting of three digits and the number within the prefix, consisting of four digits.  An abstract data type (ADT) Is a well-defined collection of data with a well- defined set of operations on it ABSTRACT DATA TYPE (ADT)  An abstract data type is a logical description of how we view the data and the operations that are allowed without regard to how they will be implemented. This means that we are concerned only with what data is representing and not with how it will eventually be constructed.  By providing this level of abstraction, we are creating an encapsulation around the data. The idea is that by encapsulating the details of the implementation, we are hiding them from the user’s view. This is called information hiding.  The implementation of an abstract data type, often referred to as a data structure, will require that we provide a physical view of the data using some collection of programming constructs and primitive data types.  An important aspect of computer programming is the creation of new data types that are appropriate for a particular problem. Data structures are constructs used to represent these new types.  A data structure is a particular method of representing an ADT. A data structure may be an aggregation of atomic and/or composite data types into a set with defined relationships ABSTRACT DATA TYPES CONT… ADT’s comprise of three things:  The defined data type.  The set of operators defined for that data type.  The set of rules (conditions) governing the operations and the manipulation of the data of the type. The concept of abstraction as it relates to Abstract Data Types means:  We know what a data type can do.  How it is done is hidden. DATA STRUCTURE HIERARCHY Data Structur es Non- Primitive Primitive Non- Int Char Bool Linear Linear LinkedLi Array Stack Queue Trees Graphs st CLASSIFICATIONS OF DATA STRUCTURES Primitive Data Structures Non-Primitive Data Structures  Primitive data structures consist  Non-primitive data structures are of the numbers and the those that are derived from primitive characters which are built in data structures. These data programs. These can be structures cannot be operated or manipulated directly by the machine manipulated or operated directly level instructions. They focus on by the machine level instructions. formation of a set of data elements Basic data types such as integer, that is either homogeneous (same real, character, and Boolean data type) or heterogeneous come under primitive data (different data type). structures. These data types are  These are further divided into linear also known as simple data types and non-linear data structure based because they consist of on the structure and arrangement of characters that cannot be data. divided. PROGRAMMING LANGUAGE & COMPILER Course will be taught using the Java and C++ Programming Languages  C++ Compiler – Microsoft Visual Studios Community and Visual Studio Code  Java Compiler – Eclipse, NetBeans and IntelliJ  Online Compiler: www.onlinegdb.com ALGORITHMS REVIEW USE OF ALGORITHMS  An algorithm is a Step-By-Step process to solve a problem, where each step indicates an intermediate task. Algorithm contains finite number of steps that leads to the solution of the problem  An algorithm does not enforce a language or mode for its expression but only demands adherence to its properties.  Given an algorithm to solve a particular problem, we are naturally led to ask: 1. What is it supposed to do? 2. Does it really do what it is supposed to do? 3. How efficiently does it do it?  The technical terms normally used for these three aspects are: 1. Specification 2. Verification 3. Performance analysis EFFICIENCY OF AN ALGORITHM  The performances of algorithms can be measured on the scales of time and space. The performance of a program is the amount of computer memory and time needed to run a program. We use two approaches to determine the performance of a program. One is analytical and the other is experimental. In performance analysis we use analytical methods, while in performance measurement we conduct experiments.  Efficiency describes the how best use is made of the resources at hand Practical Algorithm Design Issues: 1. To save time (Time Complexity): A program that runs faster is a better program. 2. To save space (Space Complexity): A program that saves space over a competing program is considerable desirable. COMPLEXITY ANALYSIS COMPLEXITY ANALYSIS INTRODUCTION  We can discuss the performance of a computer program by measuring the amount of memory and time needed to run it.  It was also implied that to maximize the efficiency of an algorithm we need to minimize the amount of resources (time/space) used  Complexity of an algorithm is a function which is defined in terms of input size that outlines the running time and space requirements  For most of this course we will focus on time efficiency of algorithms as this is usually the larger concern in industry. CHALLENGES WITH EVALUATING EFFICIENCY  First, determine what is most important to us (time or space or some other measure). Then compare the efficiencies (e.g. how much time each algorithm takes to execute, or how much space each algorithm uses)  One challenge we will have is that to compare the efficiencies of two or more algorithms to determine which is more efficient, is to use a common measure (apples with apples). We need a method that makes the comparison fair.  Algorithms may be running on different platforms and in different environments  Eg. running on different machines or on different operating systems, written in different programming languages. These factors may introduce unwanted bias or overheads SPACE EFFICIENCY SPACE EFFICIENCY The space complexity/efficiency of an algorithm or program is a function of the space needed  Suppose the United Nations wants to give each person on the planet a unique personal identification number  Can we use an integer data type to store the id?  Things to consider: how many persons are on the planet? and What are the range of numbers an integer can store?  Pros and Cons. Anybody remember Y2K?  Data types, their sizes, and ranges will be explored in the first lab and tutorial SPACE COMPLEXITY  Space complexity of an algorithm represents the amount of memory space required by the algorithm in its life cycle.  The space required by an algorithm is equal to the sum of the following two components  A fixed part that is a space required to store certain data and variables, that are independent of the size of the problem. For example, simple variables and constants used, program size, etc.  A variable part is a space required by variables, whose size depends on the size of the problem. For example, dynamic memory allocation, recursion stack space, etc. TIME EFFICIENCY The time complexity/efficiency of an algorithm or a program is a function of the running time of the algorithm or a program. In other words, it is the amount of computer time it needs to run to completion. Suppose we have three algorithms that perform the same task and produce the same output  The first one is written in C++ and is running on a HP Z220 workstation running Linux  The second one is written in Java and is running on a Dell Inspiron 15 Laptop running Windows  The third one is written in Objective C and is running on an Apple MacBook Pro running OS X Which is the most efficient algorithm, and which is the least? ALGORITHM ANALYSIS Efficiency of an algorithm can be analysed at two different stages, before implementation and after implementation. They are as followed:  A Priori Analysis or Performance or Asymptotic Analysis − This is a theoretical analysis of an algorithm. Efficiency of an algorithm is measured by assuming that all other factors, for example, processor speed, are constant and have no effect on the implementation.  A Posterior Analysis or Performance Measurement − This is an empirical analysis of an algorithm. The selected algorithm is implemented using programming language. This is then executed on target computer machine. In this analysis, actual statistics like running time and space required, are collected. ASYMPTOTIC ANALYSIS ASYMPTOTIC ANALYSIS  Asymptotic analysis of an algorithm refers to defining the mathematical bounding/framing of its run-time Asymptote of the performance. upper bound  Using asymptotic analysis, we can very well conclude the best case, average case, and worst-case scenario of an algorithm.  Asymptotic analysis is input bound i.e., if there's no input to the algorithm, it is concluded to work in a constant time.  An asymptote is a line that a particular curve approaches but never reaches, even as it gets closer and closer ASYMPTOTIC ANALYSIS The time required by an algorithm falls under three types − Best Case − Minimum time required for program execution. Average Case − Average time required for program execution. Worst Case − Maximum time required for program execution. ASYMPTOTES OF THE UPPER BOUND (BIG OH NOTATION - O)  The Big Oh notation is the formal way to express the upper bound of an algorithm's running time. It measures the worst case time complexity or the longest amount of time an algorithm can possibly take to complete.  Efficiency is expressed as a function of problem/input size n and thus can be used to express how well an algorithm scales (growth rates of the resource usage) with its problem size  This method of examining and approximating the worst case performance of algorithms is also known as “Big O” analysis  Note that the Big O analysis does not give an exact measure, it is an approximation  An algorithm with the worst growth rate when compared to other algorithms design to perform the same task, may perform better than the others, when n (the data) is small BIG O NOTATION CATEGORIES Let us consider a scenario where the size of the dataset that we are working with is 10,000 (n = 10,000) and that each instruction takes 1 microsecond to execute. Efficiency (1-Best, 8- Big O Notation Common Name Iterations Est. Time Worst) 1 O(1) Constant 1 Very minimal 2 O(log n) Logarithmic 14 microseconds 3 O(n) Linear 10,000.1 seconds 4 O(n log n) Linear Logarithmic 140,000 2 seconds 5 O(n2) Quadratic 10,0002 15-20 mins 6 O(nk) Polynomial 10,000k hours 7 O(2n) Exponential 210,000 intraceable 8 O(n!) Factorial 10,000! intraceable BIG O CATEGORY – SCALE GRAPH k O(n!) O(n ) O(n2) O(n log n) O(2n) O(n) Time O(log n) O(1) N IDENTIFYING NOTATION BASED ON CODE SEGMENTS Big O Analysis CONSTANT Helpful Notes void main() { Assess each line first then assign for int x; mathematical value cout

CMP2006 Lecture 1: Introduction to Data Structures PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue