Data Storage Concepts PDF
Document Details
Uploaded by NobleKrypton
Jordan University of Science and Technology
Tags
Summary
This document provides an overview of data storage concepts. It covers topics such as bits, main memory, ROM, and mass storage (including magnetic disks, optical disks, and flash drives).
Full Transcript
Data Storage Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.1 Bits and Their Storage The CPU is composed of two parts: Control unit. Arithmetic Logic Unit. (ALU). The control unit contains a circuitry that directs and...
Data Storage Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.1 Bits and Their Storage The CPU is composed of two parts: Control unit. Arithmetic Logic Unit. (ALU). The control unit contains a circuitry that directs and controls to other parts of the computer. ALU contains circuitry that executes all arithmetic (+ , - , ÷ , x) and logical operations: AND, OR, XOR, NOT REGISTERS: They are temporary storage, areas for instructions or data or address. They exist in CPU. They are faster than memory. They are used to: Store data. Accept data. Transfer data. Figure 2.1 CPU and main memory connected via a bus 0-3 Bits and Bit Patterns Bit: Binary Digit (0 or 1) Bit Patterns are used to represent information. Numbers Text characters Images Sound And others 0-5 Boolean Operations Boolean Operation: An operation that manipulates one or more true/false values Specific operations AND OR XOR (exclusive or) NOT 0-6 Figure 1.1 The Boolean operations AND, OR, and XOR (exclusive or) 0-7 Gates Gate: A device that computes a Boolean operation Often implemented as (small) electronic circuits Provide the building blocks from which computers are constructed VLSI (Very Large Scale Integration) 0-8 As result (for an operation): Registers hold data immediately related to the operation being executed. Memory is used to store data and programs required in the near future. Auxiliary memory is used to store data and programs required later. binary notation binary notation is a means of representing numeric values using only the digits 0 and 1 rather than the ten digits 0 through 9. 1.2 Main Memory Memory: It’s called also: * Primary memory * Primary storage * Main storage * Main memory * Internal storage * Internal memory It holds instructions and data of the program being executed. It cannot hold the program if it is not executed because: Most memories destroy its data if the computer is turned off. If the computer is shared, other users will need the memory space for their programs. There may be no enough space to hold your program. How the CU finds instructions and Data: There is an address for each location in memory (sometimes called numerical designator), CPU will refer to this address to store / retrieve data. In computer programs, symbolic addresses are used Main Memory Cells Cell: A unit of main memory (typically 8 bits which is one byte) Most significant bit: the bit at the left (high-order) end of the conceptual row of bits in a memory cell Least significant bit: the bit at the right (low-order) end of the conceptual row of bits in a memory cell 0-14 Figure 1.7 The organization of a byte- size memory cell 0-15 Main Memory Addresses Address: A “name” that uniquely identifies one cell in the computer’s main memory The names are actually numbers. These numbers are assigned consecutively starting at zero. Numbering the cells in this manner associates an order with the memory cells. 0-16 Figure 1.8 Memory cells arranged by address 0-17 Bits, Bytes and words: 0 or 1 is called a bit (Binary digit) A group of bits is called a byte. A byte = 8 bits 210 = 1024 bytes = 1 kilobyte (KB) 210 x 210 bytes = million bytes = 1 megabytes. (MB) 1 billions of bytes = 1 GigaByte (GB) 1 trillion of bytes = 1 TeraByte (TB) 1 quadrillion of bytes = 1 Petabyte (PB) RAM and ROM: There are two types of memory chips: RAM (Random Access Memory): two types: SRAM: Static Faster No intervention from the CPU as long as power is maintained DRAM Dynamic Must be constantly refreshed (recharged) by the CPU Used for most PC memories because its size and cost advantages ROM (Read – only – Memory) data cannot be changed. 1.3 Mass Storage Mass (Secondary) Storage Magnetic disk storage: Floppy Disks Hard Disks Tape Optical disk storage Compact disks DVD-ROM Blue-ray Disks Flash Drives Flash Drives Secure Digital (SD) Memory Card Secondary Storage Magnetic disk storage: Floppy Disks ; not in use now-a-days Made of flexible Mylar and coated with iron oxide Magnetized spots on tracks on its surface Composed of tracks and sectors Popular for PCs 3.5-inch with 1.44MB Portable Convenient for backup High capacity drives: up to 750MB 1.3 Mass Storage Secondary Storage Magnetic disk storage: Hard Disks Rigid platter coated with magnetic oxide that can be magnetized Varity of sizes Several platters can be assembled into a disk pack A Disk drive is a device that enables data to be read from/written to a disk through a read/write head on the end of an access arm The read/write head does not match the surface, otherwise data are destroyed (called head crash) Secondary Storage Magnetic disk storage: Hard Disks (HD) Removable HD are available Up to 500s of GB Fast Composed of sectors and tracks A cluster is a fixed number of adjacent sectors that are treated as one unit by OS. A cylinder is track n on all platters To reduce access time; Related data are stored on the same cylinder (access arm movements are reduced) 1.3 Mass Storage How Data Is Organized in Hard Disks (HD) Track Sector Cluster Cylinder 26 Track The circular portion of the disk surface that passes under the read/write head 27 Sector Each track is divided into small arcs called sectors on which information is recorded as a continuous string of bits Each track is divided into sectors that hold a fixed number of bytes Typically 512 bytes per sector Zone recording assigns more sectors to tracks in outer zones than those in inner zones Uses storage space more fully Return Copyright © 2003 by Prentice Hall 28 Cluster A fixed number of adjacent sectors that are treated as a unit of storage Typically two to eight sectors, depending on the operating system Return 29 Cylinder The track on each surface that is beneath the read/write head at a given position of the read/write heads When file is larger than the capacity of a single track, operating system will store it in tracks within the same cylinder Return Copyright © 2003 by Prentice Hall 30 Secondary Storage Magnetic disk storage: Disk Access Speed Access time=seek time + head switching + rotational delay Where: Seek time: is the time it takes the access arm to get into position over a particular track Head switching: is the activation of a particular head over a particular track on a particular surface Rotational delay: is the time for the desired data to rotate under the head. Data transfer: after data are found, they are transferred from disk to memory Disk cache can be used to improve the performance 1.3 Mass Storage Magnetic Tape Storage Data is stored as extremely small magnetic spots 3 forms: 3.5 inch tape wound on a reel 3.5 inch tape in data cartridge Cassette tapes Tape capacity: characters per inch (CPI) or Bytes per inch (BPI) Two heads: r/w head and erase-head 1.3 Mass Storage Disks vs. Magnetic Tapes Disks are reliable Data on disks can be accessed directly, but tapes are sequential Tapes are inexpensive Tapes are used as a backup for data on disks 1.3 Mass Storage Secondary Storage Optical disk storage: Metallic material spread over the surface of a disk Laser hits this surface to form spots that represent 0s and 1s Compact Disks CD-ROM - drive can only read data from CDs CD-ROM stores up to 700 MB per disk Primary medium for software distribution CD-R - drive can write to disk once Disk can be read by CD-ROM or CD-R drive CD-RW - drive can erase and record over data multiple times Some compatibility problems trying to read CD-RW disks on CD-ROM drives Copyright © 2003 by Prentice Hall 35 Digital Versatile Disk (DVD) Short wavelength laser can read densely packed spots DVD drive can read CD-ROMs Capacity up to 17GB Allows for full-length movies Sound is better than on audio CDs Several versions of writable and rewritable DVDs exist 36 1.3 Mass Storage Secondary Storage Optical disk storage (Categories): Blue-ray Disks 5 times as the capacity of DVDs Flash Memory Nonvolatile RAM Flash chips are used in cellular phones, digital cameras,.. Requires less power and smaller that disk drives Requires Flash Drives Secure Digital (SD) Memory Card Files File: A unit of data stored in mass storage system Fields and keyfields Physical record : A block of data conforming to the specific characteristics of a storage Logical record: file containing a text document would consist of paragraphs or pages. These naturally occurring blocks of data called Logical record (natural divisions determined by the information represented) 0-39 1.3 Mass Storage File Storage and Retrieval Data: A Character A Field: a group of characters A Record: a group of fields A File: a group of records A Database: a group of files 1.3 Mass Storage File Storage and Retrieval Key: an identifying record(s) Buffer: a storage area used to hold data on a temporary basis, usually during the process of being transferred from one device to another. 1.4 Representation of information as bit patters Representing text Representing Numeric Values Representing Images Representing Sound Representing Text Each character (letter, punctuation, etc.) is assigned a unique bit pattern. ASCII: Uses patterns of 7-bits to represent most symbols used in written English text extensions to ASCII, each of which were designed to accommodate a major language group. For example, one standard provides the symbols needed to express the text of most Western European languages. Included in its 128 additional patterns are symbols for the British pound and the German vowels a, o, and u. is simply insufficient to accommodate the alphabet of many Asian and some Eastern European languages. Unicode:. Uses patterns of 16-bits to represent the major symbols used in languages world wide. Enough to allow text written in such languages as Chinese, Japanese, and Hebrew to be represented. 0-43 Figure 1.13 The message “Hello.” in ASCII 0-44 Representing Numeric Values Binary notation: Uses bits to represent a number in base two Limitations of computer representations of numeric values Overflow: occurs when a value is too big to be represented Truncation: occurs when a value cannot be represented accurately 0-45 Representing Images Bit map techniques Pixel: short for “picture element”. In the case of a simple black and white image, each pixel can be represented by a single bit whose value depends on whether the corresponding pixel is black or white. More elaborate back and white photographs, each pixel can be represented by a collection of bits (usually eight), which allows a variety of shades of grayness to be represented. RGB: One byte is normally used to represent the intensity of each color component. In turn, three bytes of storage are required to represent a single pixel in the Analytic geometry techniques Scalable TrueType and PostScript 0-46 Representing Sound Sampling techniques Sampling the amplitude of the sound wave at regular intervals and record the series of values obtained. using a sample rate of 8000 samples per second. Then These numeric values are then transmitted over the communication line to the receiving end. To obtain the quality sound reproduction obtained by today’s musical CDs, a sample rate of 44,100 samples per second is used. The data obtained from each sample are represented in 16 bits (32 bits for stereo recordings). 0-47 Figure 1.14 The sound wave represented by the sequence 0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0 0-48 Data Science Large-scale Data is Everywhere! There has been enormous data growth in both commercial and scientific databases due to advances in data generation and collection technologies E-Commerce Cyber Security New mantra Gather whatever data you can whenever and wherever possible. Expectations Gathered data will have value Social Networking: Twitter Traffic Patterns either for the purpose collected or for a purpose not envisioned. Sensor Networks Computational Simulations 2 Why Data Science? Commercial Viewpoint Lots of data is being collected and warehoused – Web data Googlehas Peta Bytes of web data Facebook has billions of active users – purchases at department/ grocery stores, e-commerce Amazon handles millions of visits/day – Bank/Credit Card transactions Computers have become cheaper and more powerful Competitive Pressure is Strong – Provide better, customized services for an edge (e.g. in Customer Relationship Management) 3 Why Data Science? Scientific Viewpoint Data collected and stored at enormous speeds – remote sensors on a satellite NASA EOSDIS archives over petabytes of earth science data / year fMRI Data from Brain Sky Survey Data – telescopes scanning the skies Sky survey data – High-throughput biological data – scientific simulations terabytes of data generated in a few hours Gene Expression Data Data science helps scientists – in automated analysis of massive datasets – In hypothesis formation Surface Temperature of Earth 4 Great Opportunities to Solve Society’s Major Problems Improving health care and reducing costs Predicting the impact of climate change Finding alternative/ green energy sources Reducing hunger and poverty by increasing agriculture production 5 What is Data Science? Like any emerging field, it isn’t yet well defined, but incorporates elements of: Exploratory Data Analysis and Visualization Machine Learning and Statistics High-Performance Computing technologies for dealing with scale. Skill Sets for Data Science Appreciating Data Computer Scientists do not naturally appreciate data: it’s just stuff to run through a program. The usual way to test algorithm performance is to run the implementation on “random data”. But interesting data sets are a scarce resource, which requires hard work and imagination to obtain. Computer vs. Real Scientists (1) Scientists strive to understand the complicated and messy natural world, while computer scientists build their own clean and organized virtual worlds. Thus: Nothing is ever completely true or false in science, while everything is either true or false in Computer Science / Mathematics. Computer vs. Real Scientists (2) Scientists are data-driven, while computer scientists are algorithm-driven. Scientists obsess about discovering things, which computer scientists invent rather than discover. Scientists are comfortable with the idea that data has errors; computer scientists are not. Genius vs. Wisdom Software developers are hired to produce code. Data Scientists are hired to produce insights. Genius shows in finding the right answer!!! Wisdom shows in avoiding the wrong answers. Data science (like most things) benefits more from wisdom than from genius. Developing Wisdom Wisdom comes from experience. Wisdom comes from general knowledge. Wisdom comes from listening to others. Wisdom comes from humility, observing how often you have been wrong and why/how. I seek pass on wisdom, through my experience on the difficulty of making good predictions. Developing Curiosity The good data scientist develops a curiosity about the domain/application they are working in. They talk shop with the people whose data they are working on. They read the newspaper every day, to get a broader perspective on the world. Asking Good Questions Software developers are not encouraged to ask questions, but data scientists are: What exciting things might you be able to learn from a given data set? What things do you/your people really want to know? What data sets might get you there? Let’s Practice Asking Questions! Who, What, Where, When, and Why on the following datasets: Baseball-reference.com Google ngrams NYC taxi cab records Baseball-Reference.com: biosketch Statistical Record of Play Summary statistics of each years batting, pitching, and fielding record, with teams and awards. Baseball Questions How to best measure individual player’s skill, value or performance? How fair do trades between teams work out? What is the trajectory of player’s performances as they mature and age? To what extent does batting performance correlate with the position played? Demographic Questions Do left-handed people have shorter lifespans than right-handers? How often do people return to where they were born? Do player salaries reflect past, present, or future performance? Are heights and weights increasing in the population? Google Ngrams Presents an annual time series of the frequency of every “popular” word/phrase with 1 to 5 words occurs in scanned books. `Popular’ means appears >40 times in total. Google has scanned about 15% of all books ever published, making this resource quite comprehensive. Google Ngram Viewer Ngram Questions How has the amount of cursing changed over time? What is the lifespan of fame and technologies? Is it increasing/decreasing? How often do new words emerge? Do they stay in common usage? What words are associated with other words, i.e. can you build a language model? NYC Taxi Cab Data Gives driver/owner, pickup/dropoff location, and fare data for every taxi trip taken. Data obtained from NYC via Freedom of Information Act Request (FOA) Taxicab Questions How much do drivers make each night? How far do they travel? How much slower is traffic during rush hour? Where are people traveling to/from at different times of the day? Do faster drivers get tipped better? Where should drivers go to pick up their next fare? Machine Learning Tasks … Data Milk 25 Predictive Modeling: Classification Find a model for class attribute as a function of the values of other attributes Model for predicting credit worthiness Class 26 Classification Example Test Set Training Learn Model Set Classifier Introduction to Data Mining, 2nd Edition Tan, 27 Steinbach, Karpatne, Kumar Examples of Classification Task Classifying credit card transactions as legitimate or fraudulent Classifying land covers (water bodies, urban areas, forests, etc.) using satellite data Categorizing news stories as finance, weather, entertainment, sports, etc Identifying intruders in the cyberspace Predicting tumor cells as benign or malignant Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil 09/09/2020 28 Classification: Application 1 Fraud Detection – Goal: Predict fraudulent cases in credit card transactions. – Approach: Use credit card transactions and the information on its account-holder as attributes. – When does a customer buy, what does he buy, how often he pays on time, etc Label past transactions as fraud or fair transactions. This forms the class attribute. Learn a model for the class of the transactions. Use this model to detect fraud by observing credit card transactions on an account. 29 Classification: Application 2 Churn prediction for telephone customers – Goal: To predict whether a customer is likely to be lost to a competitor. – Approach: Use detailed record of transactions with each of the past and present customers, to find attributes. – How often the customer calls, where he calls, what time-of-the day he calls most, his financial status, marital status, etc. Label the customers as loyal or disloyal. Find a model for loyalty. 30 Classification: Application 3 Sky Survey Cataloging – Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory). – 3000 images with 23,040 x 23,040 pixels per image. – Approach: Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find! 31 Classifying Galaxies Courtesy: http://aps.umn.edu Early Class: Attributes: Stages of Formation Image features, Characteristics of light waves received, etc. Intermediate Late Data Size: 72 million stars, 20 million galaxies Object Catalog: 9 GB Image Database: 150 GB 32 Regression Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Extensively studied in statistics, neural network fields. Examples: – Predicting sales amounts of new product based on advertising expenditure. – Predicting wind velocities as a function of temperature, humidity, air pressure, etc. – Time series prediction of stock market indices. 33 Clustering Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster Intra-cluster distances are distances are maximized minimized 34 Applications of Cluster Analysis Understanding – Custom profiling for targeted marketing – Group related documents for browsing – Group genes and proteins that have similar functionality – Group stocks with similar price fluctuations Summarization – Reduce the size of large data sets Courtesy: Michael Eisen Clusters for Raw SST and Raw NPP 90 Use of K-means to 60 Land Cluster 2 partition Sea Surface 30 Temperature (SST) Land Cluster 1 and Net Primary latitude 0 Ice or No NPP Production (NPP) into -30 clusters that reflect Sea Cluster 2 the Northern and -60 Southern Sea Cluster 1 -90 -180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180 Hemispheres. 35 Cluster longitude Clustering: Application 1 Market Segmentation: – Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. – Approach: Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters. 36 Clustering: Application 2 Document Clustering: – Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. – Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. Enron email dataset 37 Deviation/Anomaly/Change Detection Detect significant deviations from normal behavior Applications: – Credit Card Fraud Detection – Network Intrusion Detection – Identify anomalous behavior from sensor networks for monitoring and surveillance. – Detecting changes in the global forest cover. 38 Motivating Challenges Scalability High Dimensionality Heterogeneous and Complex Data Data Ownership and Distribution Non-traditional Analysis 39 DS Career path Introduction to Data Mining, 2nd Edition Tan, Steinbach, 09/09/2020 40 Karpatne, Kumar Introduction Graduates of data science program will mostly, and preferably, work as Data Scientists Data Scientists can work in any type of organization: – Private – Governmental – Non-for-Profit 9/3/20XX Presentation Title 41 Industries Any organization can benefit from the data they have, so data scientists can work in any industry: – Financial Institutions (E.g., Banks) – Government agencies (E.g., Civil Status and Passports Department and Police Department) – Healthcare (E.g., Hospitals) – Online platforms (E.g., Uber) – Large Retailers (E.g., Carrefour and Amazon) – Agricultural Companies – And much more … 9/3/20XX Presentation Title 42 Data Scientist Responsibilities Data scientists usually need to build models of verified and validated data sets These models will be used by the employer to predict, recommend, or evaluate any future business decision 9/3/20XX Presentation Title 43 Data Scientist Responsibilities For example, a data scientist, working for a hospital, can build a data model that predicts the best treatment for a specific patient The data scientist will use the data that was collected by the hospital about the patients and the treatments that worked and did not work for them in the past. 9/3/20XX Presentation Title 44 Data Scientist Responsibilities Another example could be a data scientist, working for the police department, can build a data model that predicts the location and time of the next crime before it happens The data scientist will use the data that was collected by the police department about the previous crimes to build the proposed model 9/3/20XX Presentation Title 45 Data Scientist Responsibilities Another example could be a data scientist, working for a large retailer, can build a data model that predicts the demand for certain products and services The data scientist will use the data that was collected by the retailer about the previous purchasing transactions The data scientist may use data that is provided by external entities 9/3/20XX Presentation Title 46 Data Scientist Responsibilities Before building the model, data scientist usually need to clean and normalize the data Data could be collected from internal sources or/and external sources Data scientists need to communicate with data management guys to make sure that necessary data is being collected – Data compliance department should be involved to make sure that data collection is properly handled from a legal perspective 9/3/20XX Presentation Title 47 More Opportunities In addition to working as data scientists, graduates of data science program can work as software development engineers In this field, they will mostly specialize in developing platforms that help data scientists in their jobs They also can develop dashboards that present business intelligence charts and reports to users 9/3/20XX Presentation Title 48 CIS Career Path CIS Career path Introduction Graduates of Computer Information Systems (CIS) program can pursue a job in of the following fields: – Business Analysis – Software Development – System Implementation 9/3/20XX Presentation Title 50 Introduction CIS is an interdisciplinary program that encompasses technology and business courses This makes the graduates of this program knowledgeable about how business works and how technology can make businesses more efficient and more effective 9/3/20XX Presentation Title 51 Introduction People who have knowledge about the technology only will have the following issues while working in the software development field: – Difficulty in developing a software that satisfies the business requirements – Difficulty in architecting the software systems according to the international standards – Difficulty in maintaining existing systems due to lack of knowledge about the business behind them 9/3/20XX Presentation Title 52 Example CIS program exposes students to healthcare information systems When a CIS graduate joins a software development team that is responsible for developing an electronic health record (EHR), he/she will be already aware of the features and functionality of the proposed system 9/3/20XX Presentation Title 53 You as a Business Analyst You will help customers define their requirements of any proposed software system Because you are already aware of how existing systems work, you can make notes and suggestions on how the proposed software system should look like Also, It is less likely you will misinterpret the requirements provided by customers 9/3/20XX Presentation Title 54 You as a Software Developer You will write code to make a software system Because you are already aware of how business works, you will be able to choose the right architecture for the system The right architecture is one that supports any future improvements without making radical changes to the existing architecture 9/3/20XX Presentation Title 55 You as a System Implementer You will help users use the software system the right way Because you are already aware of how business works, you will be able to provide a very helpful advice on how the software should be used and utilized 9/3/20XX Presentation Title 56 Programming Concepts Objectives Define computer programing languages Define a computer program Understand the basic terminologies Explore different types of programing languages List different programing language generations Identify different programming tools Explore different types of programing structures Identify the main steps to solve a problem using programming languages Programming Languages Programming languages are made up of keywords and grammar rules designed for creating computer instructions Computer Program: a set of instructions that tell the computer how to perform a task. Programmer: a person who writes the program instructions (source code) Terminologies Keyword/Command – It is a word with a predefined meaning that is reserved by a program that defines commands and specific parameters for that code set. The number of keywords may differ from one language to another. executable vs. non-executable statements – Executable statement: It usually starts with a key word and initiates actions. In other words, it is a description of what and how the program should take an action – Non-Executable statement: It provides info about the nature of the data or about the way the processing is to be done without causing any processing action. Syntax vs. Semantic – Syntax is the grammar rules that are used whenever a program in a computer language is written. (like grammar in the natural language) – Semantics is the function of the command. (like meaning in the natural language) Terminologies(cont’d) Variable vs. Constant – Variable: it is a memory location and its values are normally changed during the course of program execution. Naming a variable is part of the language syntax. Programmers should follow the language guidelines to name variables. It could be different from one language to others. Variable Declaration : it is to specify a variable’s name and characteristics. Variable Datatype: It is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. – Constant: it is a value that should not be altered by the program during normal execution. Note: if it used as type qualifier then it is used to declare a constant variable like in C language. Variable/constant name should be descriptive Terminologies (cont’d) File name vs. filename extension – File name: is a framework for naming a file in a way that describes what they contain and how they relate to other files. – Filename extension: It indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the filename with period(. ). Example: obj, exe, dat, etc. File types – Source – Object – Executable – Data Terminologies (cont’d) Interpreter: – A program that translates source code into some efficient intermediate representation or object code and immediately executes that. – It executes one statement at a time. Compiler: – A system program that translates high-level language (source code) to machine language (object code) – It translates the entire source code (statements) at one time. – Assembler: it translates the assembly program (source code) to the object code. Programming Languages Types & Generations Programming Languages Types Low – Level Language High – Level Language Programming Language Types Low-level languages typically include commands specific to a particular CPU or microprocessor family. It easy for the machine to understand – Machine Language. Programs are coded in binary (0s and 1s) – Assembly Language. It is a symbolic coded sequence of instructions Programming Language Types High-level Language: – High-level languages use command words and grammar based on human languages – Languages that are easy to understand and to write by humans using words from the English language. Java C# C and C++ Computer Programming Language Generations First Generation: – Machine Language: It consists of binary instructions (0’s and 1’s) that a computer can understand and respond to directly. Examples: 0101110110111001 Second Generation – Assembly Language: It is a low level programming language using the human readable instructions. It is used in kernels and hardware drives. – Example: Sub, Add, Mov. Computer Programming Language Generations (cont’d) Third Generation: – Easy-to-remember command words – Procedure Languages: It is a programming language that specifies a series of well- structured steps and procedures within its programming context to compose a program. – These are high-level languages like C, Fortran, and Basic. Computer Programming Language Generations (cont’d) Fourth Generation: – More closely resembles human language – Object Oriented Language: It is a programming paradigm that represents concepts as "objects" that have data fields and associated methods. – Languages that consist of statements that are similar to statements in the human language. These are used mainly in database programming and scripting. Examples of these languages include Java and Python. Computer Programming Language Generations (cont’d) Fifth-generation languages Based on a declarative programming paradigm. These are the programming languages that have visual tools to develop a program. Examples of these languages include Prolog. programming paradigm The programming paradigm refers to a way of conceptualizing and structuring the tasks a computer performs. Programming Tools An SDK (software development kit) is a collection of language- specific programming tools that enables a programmer to develop applications for a specific computer platform An IDE (integrated development environment) is a type of SDK that packages a set of development tools into a sleek programming application A component is a prewritten module, typically designed to accomplish a specific task An API is a set of application programs or operating system functions that programmers can access from within the programs they create A VDE (visual development environment) provides programmers with tools to build substantial sections of a program Solving problems using programming language Identify and Understand the problem State/outline the solution Program planning – Algorithm Flowchart Pseudocode Write the program source code Program testing and documentations Program Planning The problem statement defines certain elements that must be manipulated to achieve a result or a goal You accept assumptions as true to proceed with program planning Known information helps the computer to solve a problem Determine variables & constants Algorithms Set of steps for carrying out a task that can be written down and implemented Start by recording the steps you take to solve the problem manually Specify how to manipulate information Specify what the algorithm should display as a solution Expressing an Algorithm – Flowchart – Pseudocode Flowchart A flowchart is a graphical representation of solution steps (algorithms and programming logic) for a given problem. Flowchart symbols are the elements you can use to describe the steps involved in a workflow process. Flowchart Symbols - Bing images Example: the following flowchart Start represents a solution to find the largest of two numbers: ( A &B). Read A Read B Yes No It A>B Print A Print B Stop Pseudocode A Pseudocode is defined as a step-by-step description of an algorithm. It uses the simple English language text as it is intended for human understanding rather than machine reading. Example: Find the largest of two numbers: (A & B)? Pseudocode Step 1: Start Step 2: Read number a Step 3: Read number b Step 4: if a > b , print a (Compare a and b using greater than operator) Step 5: else print b Step 6: Stop (Output: Largest number between a and b) Writing the // C++ Program to Find Largest of two numbers #include program #include source code void main() { Use the clrscr(); appropriate int a, b, largest; language to write Cout > a >> b; the source code if(a>b) to find the { largest=a; largest of two } numbers. else { Example C, C++, largest=b; C #, Java, etc. } cout