CS Notes on Data Representation, File Organisation and Computers PDF

Summary

These notes provide a comprehensive overview of fundamental concepts in computer science, including data representation, file organization, different types of computer architectures e.g CISC and RISC, and other computer-related topics. The notes cover key ideas through definitions and explanations.

Full Transcript

## Data Representation ### User-defined Data Types - **Composite** - Record Data types - Set Data types - Classes - **Record:** - Consists of different data types. - Use it when multiple values are pointer variable. - **Set Data:** - Allow program to crea...

## Data Representation ### User-defined Data Types - **Composite** - Record Data types - Set Data types - Classes - **Record:** - Consists of different data types. - Use it when multiple values are pointer variable. - **Set Data:** - Allow program to create sets and apply the mathematical operations. - All elements in sets should be unique. - **Classes:** - Used in object-orientation. - Includes variables, function. - **Non-Composite** - Enumerated Data types - Pointer Data types - **Enumerated:** - A list of possible values. - Allow comparisons between values/countable. - **Pointer:** - Related to the type of the variables. - TYPE <Name> = ^ <TYPE> - ^ pointer ### File Type - **Text File** - **Binary File:** - To store data used by computers. - Program/created using specific programs. - File → records → field → value - **Random Files** - **Direct Access File** ### File Organisation - **Serial Files** - No defined order/new data added at the end. - Has to go through every record until the required one is found. - Good temporary storage. - Advantage: Task at hand is straightforward/low cost. - Disadvantage: Difficult to access/Cannot support for quick data access/Batch processing (slow data access, lower data transfer speed)/Backing up data on magnetic tape. - **Sequential Files** - Key fields are ordered. - Provide data integrity, less redundancy. - New records added in correct place. - Check key field's value to access the record. - Advantages: Easier to access records, but still need to go through each record/Use binary search, reduce search time. ### File Access - **Sequential Access** - **Direct Access** - Access to the record without looking up the other records. - Index of key field and used to look up the address location. ### Random Files - Records are stored randomly but accessed directly. - Use Hashing algorithm on the key field. - Use magnetic and optical. ### Direct Access File - Use direct access. - The value in the key field is input to hashing algorithm to output the address. - The position for the record is calculated using algorithm - same calculation is done to search. - Create new files only if the current file is full. - Allow overwritten. - Set flag to deleted record so it is skipped. ## Floating Point - Using binary floating point. - Mantissa: 0.10101 - Left most bit in mantissa is -1. - Exponent: 0010 - Left most bit in exponent is negative. - Left side of decimal: 12 4 8 16 - Right side of decimal: 1/2 1/4 1/8 1/16 1/32 - Decimal is always after first digit. - Normalization: - Positive fraction: starts with 0.1 - Negative fraction: starts with 1.0 - Shift the decimal point to make 0.1 (or) 1.0 - Subtract exponent if decimal shifts to right. - Add exponent if decimal shifts to left. - 1/2 = 0.5, 1/4 = 0.25, 1/8 = 0.125, 1/16 = 0.0625 - Why it can be stored as normalized form? - Not enough bits to store/too much bits to store. - Reduce precision/accuracy. - Describe how many bits are lost/or the range. - Way to improve: Increase bits for mantissa, decrease bits for exponent. ## Turn Fraction into 2^?? - 9/32 = 9 x 2^-5 - 9: mantissa - -5: exponent ## Why Should We Store It in Normalized Form? - Maximum precision can be achieved. - There is standard way to store it so avoid multiple representations. - To store maximum range of numbers in minimum numbers of bits. ## Explain the Problem to Represent +0.2 (or) Other Values. - Cannot be represented exactly, can only be represented closely. - Precision lost due to rounding error. ## Why a Binary Representation Is Only an Approximation to Real Number? - Fixed length of bits for both mantissa and exponent so cannot be stored very large number. - Real numbers has fraction for 0.2 (or) 0.4 which are too close to zero or too small to represent so it cannot be respresented exactly. - Some of the bits are lost when there is no enough bits to store so precision reduces due to rounding error. ## Why Underflow? - The result is too close to 0 or too small with high precision to represent with enough bits. ## Previous Exponent Was 2^28. Exponent Bit Is Reduced to 4 Bits From 6 Bits. Why Cannot Be Accurately Represented in This Format? - Exponent too large to be fit in 4 bits. - Exponent will turn into -8 since leftmost bit is negative so point move the wrong way while calculating. - Precision decreases due to rounding error since there is no enough bits so it can only be represented to nearest number. ## OUTPUT (0.2 * 0.4) → 0.08000000000002 Explain Why? - 0.210.4 cannot be represented exactly in binary since it uses base 2 for fractions as well (1/2 1/4 1/8 ...). - Both values are represented by greater values which increase the difference between output result and actual result. - The output answer is significant enough to be represented in binary. ## TCP/IP Protocols - **Sender:** - **Application Layer:** - **SMTP (Simple Mail Transfer Protocol):** For sending emails through servers. - **POP3 (Post Office Protocol version 3):** and **IMAP (Internet Mail Application Transfer Protocol):** For receiving emails. - Used by applications to communicate with each other over a network. - **HTTP (Hypertext Transfer Protocol):** Use HTTP request and HTTP response to get access to website. - **Transport Layer:** - **TCP (Transmission Control Protocol):** To ensure data is delivered efficiently/use TCP or UDP(TCP has more reliability). - Provide positive acknowledgement, resend lost packets. - **UDP(User Datagram Protocol):** Send data without rechecking. - **Internet Layer:** - To route data packets across the network, determine most efficient route. - Use IP to assign unique addresses. - Provide logical addressing and routing services. - Assign unique IP addresses on each device allowing the data packets to be sent in remote locations. - Can support both wired and wireless network, with different protocols. - **Link Layer:** - Transmit data over physical network hardware or software needed to connect to devices (network cards, cables). - Manage the connection between devices on the same local network. - Used for sending and receiving data frames over ethernet or wi-fi. - **Receiver:** - **POP3:** Emails are deleted from the server once it is downloaded/no future update. - **IMAP:** Emails are remained on the Server even after it is downloaded. - **Provide user-interface for network emailing:** Email clients, web browsers, file transfer. - **Use another protocol (FTP - File Transfer Protocol):** Use to transfer files between computers/use FTP servers to upload or download files. ## Remember: Link Layer Transmits Data Frames and Use (SMAICA and (SMA/CD - To prevent data collision (or) to allow multiple devices share same physical network. - **Collision avoidance:** - **CSMA/CA:** Used in wireless network such as Wi-Fi. - Devices listen signals from other devices before transmitting data. - If another device is detected, it will wait random amount of time before resending. - **CSMA/CD:** Used in wired network such as Ethernet. - Devices listen signals while transmitting data. - If collision is detected, both devices stop transmitting and wait random amount of time. ## TCP/IP Layers - **Application Layer:** - The application layer contains all the programs that exchange data, such as web browsers or server software; it sends files to the transport layer. - HTTP, FTP, POP3, IMAP, SMTP - This contains most of the configuration and coordination associated with the transmission that ensures that all the packets have arrived and that there are no errors in the packets. - **Transport Layer:** - TCP, UDP - **Network/Internet Layer:** - Defines the IP addresses of devices that send and receive data and handles the creation and routing of packets being sent and received. - IP - **Link Layer:** - This layer provides synchronisation of devices so that the receiving device can manage the flow of data being received. - It identifies what network topology is being used and controls the physical signals that transmit the strings of bits around the network/ - It also controls physical characteristics such as data transmission rates and the physical connections in a network. On wireless networks it handles the CSMA/CA protocol. ## HTTP - Hypertext transfer protocol; this is a protocol responsible for correct transfer of files that make up web pages on the world wide web ## SMTP - simple mail transfer protocol; this handles the sending of emails ## POP3/4 - post office protocol; this handles the receiving of emails ## IMAP - internet message access protocol; this handles the receiving of emails ## DNS - domain name service; protocol used to find the IP address, for example, when sending emails ## FTP - file transfer protocol; this is a protocol used when transferring messages and attachments ## RIP - routing information protocol; this is the protocol routers use to exchange routing information over an IP network ## SNMP - simple network management protocol; protocol used when exchanging network management information between network management and network devices (such as routers, servers and other network devices) ## Requesting a Website (Application Layer -HTTP) - Enter URL. - HTTP transmits the request from the application layer to the transport Layer (TCP). - TCP creates data packets and send them to the destination port. - DNS server stores a database of URLs and matching IP address. - DNS server uses the domain name to find matching IP address. - TCP sends back acknowledgement. - Web server sends the web page back in HTML to browser and browser display it. ## Ethernet in Link Layer - Ethernet is a system used to connect computers to form a LAN. - It uses protocol to control the movement of frames to avoid collisions. ## BitTorrent Protocol - BitTorrent client software - Leeches, seed, tracker - Peer-to-peer model, Used for files sharing. - File to be shared is split into pieces. - BitTorrent client software made available to other peers allowing them to work as seeds or leeches. - Peer downloading file can get pieces from different seeds at the same time. - (Peer can act as seed who upload pieces of a file) - (Peer can act as seed when they have 100% of a file and allowing the other to download the files) - Leeches download much more than they upload.&#x20; - Central server (a tracker) keeps records of all the peers and the part of file they have. ## Remember: Data Is Broken Down into Smaller Pieces (Packets) in Transport Layer - **Data transmission type:** - **Circuit Switching:** - Dedicated communication channel between sender and receiver. - It is established before the actual data transmission. - Open circuit is established for the whole duration of data transmission and the links that made up circuit are removed after transmission. - Allows bi-directional communication. - Allows real-time communication. - **Pros:** Use the whole bandwidth so sharing channel/Data transfer rate is faster than packet swiching/Data packets arrive in the same order so data is continuous/It makes sure there is no delay since it is transmitted over single channel so no network traffic/Single dedicated line so no data loss/Better for real-time applications. - **Cons:** Nobody else can use the channel even if it is not used/The circuit is always there/If there is a failure, no alternative routing option/Set up time is longer. - **Packet Switching:** - A message is broken down into packets. - Packets are sent independently from starting point to ending point. - Packets are reassembled into their correct order. - Routers use routing table to determine the most effective path for each packet. - **Pros:** No need of communication line/Easy to expand/Use many routing path so easier to overcome network traffic/Charge user for the duration/High data transmission. - **Cons:** More complex protocol/If packet is loss, sender resend again (more time consuming)/Not for real-time data transmission/Use smaller bandwidth, share bandwidth with other packets/Delay at destination, to reassemble data packets. - **Roles of routers:** Transmit packets/Contain connections to many other routers/When packets arrive at router, it checks destination IP address and decides where to send next, if hop number is 0 deleted. - **Hop Number**: Each packet has maximum hop number/Router decides next router to transmit and hop number is decreased by 1 if it reaches to next router/If hop number is 1, data is deleted when it reaches to next router. - **Packet Header**: Soure/destination IP address/Current hop number/Total size of packet/Checksum. ## Functions of TCP to Transfer Files - Allows application to exchange data and maintain a connection until exchange of data is complete. - Determines how to break applications data into packets. - Add header to each packets. - Sends packets to and accepts packets from the network. - Use acknowledgement to detect data loss. - Handle retransmission of packet loss. - Reassembles packet into the correct order. ## Functions of IP - Routes the packets around the network. - Assign unique IP address to devices. ## Hardware - **CISC vs RICS** - **CISC (Complex Instruction Set Computing):** - Large number of complex instructions. - Execute complex instructions in a single clock cycle. - Large number of addressing mode: Allow to access memory more efficiently. - Can perform multiple operations, suitable for complex arithemetic operations. - Slower execution time due to complex instructions. - Desktop, server, laptops, computers - **RICS (Reduced Instruction Set Computing)** - Small number of instructions. - Execute simple instructions in a single clock cycle. - Smaller addressing mode so less efficient at access memory. - Execute faster because simple instructions. - Embedded systems, mobile devices. ## Pipelining - Breaking down the execution of an instruction into smaller, simpler steps and execute them in parallel. - Allows multiple instructions to be proceed simultaneously by dividing the execution process into several stages. - **Stages:** If → ID → OF → IE → WB - **Instruction Fetch (IF):** CPU reads instructions from address in memory. - **Instruction Decode (ID)**: Instruction is decoded, access register file to get operands. - **Operand Fetch (OF):** Arithmetic operations carried out, access memory for load and store operations. - **Instruction Execution (IE):** Data is read from or written to memory. - **WriteBack (WB):** Results are written back to register file. - **Challenges:** - Different instructions require different numbers of clock cycles in each stage causing delays, reduce the overall performance. - **(Solution):** Branch predictions, forwarding. - Dependencies between stages (need results from the instruction that is not executed yet). - Need careful design and optimization. ## Interrupt Handling and Pipelining - **CISC:** More complex interrupt handling than in RICS due to complex instruction. - **Pipelined Processor:** In general, pipeline is flushed and all instructions are discarded, save current state, service interrupt. - **Precise Interrupt method:** Use it to handle interrupts more precisely. - When interrupt occurs, processor completes WB stage of current instruction before flushing the pipeline. - Reduce the number of instruction to be re-executed. ## Parallel Processing - It allows multiple calculations or data processing tasks to occur through multiple central processors. - **Pros:** Improved peformance (multitasks executed simultaneously (efficiency ↑) → reduce overall time to complete a task (power consumption↓)). - **SISD (Single Instruction Single Data):** Single processor executes single instruction on single data. - **SIMD (Single Instruction Multiple Data):** Multiple processor that executes on same Instruction on multiple data (parallel computing). - **MISD (Multiple Instruction Single Data):** Multiple processors execute different instructions on the same data. - **MIMD (Multiple Instruction Multiple Data):** Multiple processors execute different instructions on different data (parallel computing). - **Cluster:** Network of the number of computers. - **Supercomputer:** Processor from each computer in cluster, can do complex calculations. ## Massively Parallel Computers - Computer architecture that consists a large number of processors interconnected in network structure. - Processors divide tasks between them, perform computations simultaneously. - **Scability:** Additional processors can be easily added, easier to handle more complex computations. - **Distributed Memory:** Each processor has its own local memory and communicate with other processors/divide large datasets. - This allows efficient data sharing, parallel processing. - **Parallel Processing:** Divide a task into smaller subtasks and assign them into different processors. - Each processor works independently, communicate when necessary.&#x20; ## Virtual Machine - Software emulation that enables multiple operating system to run on a single computer. - It allows multiple virtual servers to run on a single physical laptop (reduce hardware costs, improve efficiency). - It allows users to create multiple virtual machines with different operating system to test applications (provide compatibility, minimizing risks). - It allows users to use older applications on modern hardware. - **Pros:** - Multiple operating system (reduce hardware costs, energy consumption↓) on a single hardware. - Provide flexibility to create virtual servers (easy to scale infrastructure). - If one virtual machine is affected (provide safety , risk of malware , virus infection↓) other virtual machines and host system remain still. - **Cons:** - Virtual operating performance ↓ due to extra layer of application emulation software. - Virtual machine share CPU, memory, storage can lead to performance issues. ## Differences Between RICS and CISC - **RICS:** - Fixed length instruction - Fewer instructions - Use many general registers - Simple instruction - Less complex circuit - Fewer addressing mode - **CISC:** - Variable length instructions - More instructions - Use fewer general register - Complex instructions - More complex circuit - More addressing mode ## Virtual Machine - Software that emulates a physical computer system (hardware) using host OS. ### Advantages - New system can be set up on different virtual machines without buying extra software. - Easy to recover if software emulating causes system crash because VM can protest host Os. - Emulate different OS by using guest OS on a computer. - Allow multiple OS to coexist on a single computer by creating new OS (or) VM. ### Disadvantages - Memory or processing time is shared with host OS so take longer to process. - Performance ↓ because both VMs and original need to be maintained. - It cannot emulate some hardware. ## Pipelining During Fetch-Execute Cycle in RICS Processors - Execution of code are divided into 5 subtasks. - IF, ID, OF, IE, WB - Each subtask is completed during one lock cycle. - No two instructions can execute their same stage at same clock cycle. - Second instruction begins in the second clock cycle while first instruction moves on to its second stage. ## The Use of Pipelining in RISC - It allows multiple instructions to be decoded at the same time so increase the number of instructions executed per time. - Different stages of each instructions executed at the same clock Lycle. - No two instructions of same stages cannot be executed at the same clock Lycle. ## Massively Parallel Computer - Large number of processors working together simultaneously on the same program communicating through messaging interface. - (Multiple processors, different bus) - **Hardware Issue:** - Multiple processors and each processor needs a link to another. - More processors so more links. - Takes time to build up the communication link between processors. - **Software Issue:** - Suitable programming algorithm to make multiple processors perform simultaneously.&#x20; ## How to Change the Application into Massive Parallel Computer - Split the code into blocks, assign each block to each processor. - Make execute the code blocks simultaneously allowing them to work independently. ## Guest Operating System vs Host Operating System - Guest operating system runs in a virtual machine. - Host operating system is controlling the physical hardware for physical machine. - Guest OS runs under host OS. ## How Guest OS and Host OS Operate - Guest OS handles request from the application and run as if it is running on its own physical hardware. - Requests are translated by virtual machine software and host OS executed instructions. - Host OS gets data and passes it to VM and VM passes to guest OS. - Guest OS passes it to application. ## App1 App2 App3 - App1, App 2, and App 3 run under GOST, GOSZ, GOSS respectively. - They all run on a VM under HOS. - HOS is on the actual hardware. - Guest OS are secondary OS installed on the hardware. - Host OS is the one who communicate directly with the physical hardware. - All apps run on the guest OS with same results. ## Roles of VM - It creates virtual machine. - It translates the instruction from guest OS and transfer them of host OS. - It provides hardware emulation and allow apps to be testing without affecting host OS in case of error. ## OOP Question - **Properties:** Attributes and data types of those attributes defined in a class. - **Methods:** Procedures in a class implementing algorithm carried out on the attributes. - **Inheritance:** Using attributes and methods implemented in a super class (parent class) in another class (child class). - **Encapsulation:** Putting properties, attributes and method inside a class and making it private. - **Getter:** Method used to return the value of property. - **Setter:** Method used to update the value of property. - **Instance:** A specific object based on the class. - **Polymorphism:** Redefining methods from super class in a derived class. ## Diagram - A, B, Cir are inputs. - Sum and Cout are outputs. - The diagram is of a full adder circuit. - The full adder circuit is made up of a half adder circuit and a few other logic gates. - The half adder circuit is used to add the two input bits. - The Sum output of the half adder circuit is the sum of the two input bits. - The Cout output of the half adder circuit is the carry bit. - The Cout output of the half adder circuit is connected to the input of the AND gate. - The AND gate outputs the carry bit for the full adder. - The Sum output of the half adder circuit is connected to the input of the XOR gate. - The XOR gate outputs the sum bit for the full adder. - The Sum output of the XOR gate is connected to the input of the NOT gate. - The output of the NOT gate is connected to the input of the Cout gate. - The Cout output of the Cout gate is the carry bit for the full adder. - The Sout output of the full adder circuit is the sum of the two input bits and the carry bit. ## Flip-flop Circuits - **SR flip-flops:** - It becomes unstable when both inputs of R and S are 1. Q and Q becomes the same so invalid outputs. - **JK flip-flops:** - Toggle Q value when both Sand R is 1. - Used for shift registers in a computer. - Used to store a single bit /-used for for memory storge.

Use Quizgecko on...
Browser
Browser