Quiz 1 PDF
Document Details
Tags
Summary
This document provides an introduction to informatics and data processing. It details the tasks involved in informatics, including defining information, obtaining information, communicating information, and information processing. It also discusses data, data processing, and display encoding.
Full Transcript
1. Tasks of informatics Today, informatics has become an independent discipline and belongs to the family of technical sciences. Informatics is the study and application of computer and information technology to collect, process, store, and manage data. It has a w...
1. Tasks of informatics Today, informatics has become an independent discipline and belongs to the family of technical sciences. Informatics is the study and application of computer and information technology to collect, process, store, and manage data. It has a wide range of tasks: ▪ Defining the concept of information Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs (0s and 1s) to convey information, other phenomena and artifacts (analog signals, poems, pictures, music or other sounds, and currents) convey information in a continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. The concept of information is relevant or connected to various concepts, including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy. ▪ 1. Obtaining the information Information can derive from virtually anywhere - media, blogs, personal experiences, books, journal and magazine articles, expert opinions, encyclopedias, web pages... - and the type of information will change depending on the question we are trying to answer. ▪ 2. Communication of information (exchanging - sending and receiving) Information in the primary information age was handled by newspapers, radio and television. The secondary information age was developed by the Internet, satellite televisions and mobile phones. ▪ 3. Information processing Information processing consists of locating and capturing information, using software to manipulate it into a desired form, and outputting the data. An Internet search engine is an example of an information-processing tool, as is any sophisticated information-retrieval system. ▪ 4. Protection of information → encryption (application of security algorithms). Summary name of informatics - information and communication technologies (ICT). In another approach, informatics consists of two main subfields: information technology (IT) - encompasses the use of computers, networks, software and other electronic or digital devices for the management and communication of information. information systems (IS) - primary goal is to manage data as it flows through the following five stages: input, processing, storage, output, feedback. IS focuses more closely on information handling processes, while IT tends to center the technologies that support those processes. Informatics tasks are solved with a system of computers on which programs run. Programs are implementations of algorithms and data structures. 2. Data, data processing, display encoding Data: a formalized representation of elementary knowledge, facts, concepts or instructions suitable for communication, display or processing by humans or automatic devices. New knowledge can be obtained from the data as a result of data processing. things that can be described with numbers, which can be recorded, processed and displayed using computing devices. Data processing - performing technical tasks related to data management operations, regardless of the method and tool used to perform the operations, as well as the place of application, provided that the technical task is performed on the data. Data processing involves collecting, transforming, and organizing data to extract meaningful insights, support decision-making, and achieve specific goals. Some examples of data processing: 1. Data cleaning o Removing duplicates - Identifying and deleting duplicate records in a dataset. o Handling missing values - Filling in missing data points using techniques like mean imputation or removing rows/columns with missing values. o Correcting errors - Fixing typos, formatting inconsistencies, or erroneous data entries. 2. Data transformation o Normalization - Scaling data to a standard range, such as 0 to 1, to ensure comparability. o Aggregation - Summarizing data, such as calculating the total sales per month from daily sales data. o Encoding - Converting categorical data into numerical format, like one-hot encoding. 3. Data integration o Merging datasets - Combining data from different sources into a single dataset based on common keys. o Joining tables - Using SQL joins to combine relational database tables. o Linking data - Connecting datasets based on unique identifiers. 4. Data reduction o Dimensionality reduction - Techniques like Principal Component Analysis (PCA) to reduce the number of variables while retaining important information. o Sampling - Selecting a representative subset of data for analysis to reduce processing time. 5. Data analysis o Statistical analysis - Performing descriptive statistics (mean, median, standard deviation) and inferential statistics (hypothesis testing, regression analysis). o Data mining - Identifying patterns, correlations, and anomalies in large datasets using algorithms and machine learning techniques. o Visualization - Creating charts, graphs, and plots to represent data visually for easier interpretation. 6. Data storage and retrieval o Database management - Storing data in structured formats like SQL databases and performing CRUD (Create, Read, Update, Delete) operations. o Data warehousing - Aggregating large volumes of data from different sources into a central repository for analysis and reporting. 7. Real-time data processing o Stream processing - Handling and analyzing data in real-time as it is generated, using technologies like Apache Kafka and Apache Flink. o Event processing - Responding to events (e.g., user clicks, sensor readings) in real-time to trigger immediate actions. 8. Data privacy and security o Anonymization - Removing or obfuscating personally identifiable information (PII) to protect user privacy. o Encryption - Protecting data integrity and confidentiality by converting it into a secure format. Excel - a powerful tool for data processing, offering a variety of features and functions to manipulate, analyze, and visualize data. Some examples of data processing tasks: 1. Data cleaning o Removing duplicates - Use of "Remove Duplicates" feature under the "Data" tab to eliminate duplicate rows. o Handling missing values - Use of functions (IF, IFERROR, and VLOOKUP) to fill in missing data or replace errors with appropriate values. o Text to Columns - Splitting data in a single column into multiple columns based on a delimiter using the "Text to Columns" feature. 2. Data transformation o Normalization - Use of formulas to scale data, such as (A1- MIN(A:A))/(MAX(A:A)-MIN(A:A)) to normalize a column. o Aggregation - Use of functions (SUM, AVERAGE, COUNT, MAX, MIN) to aggregate data. o PivotTables - Creation of PivotTables to summarize and aggregate data dynamically, allowing for quick insights and analysis. 3. Data integration o Merging datasets - Use of VLOOKUP, HLOOKUP, INDEX, and MATCH functions to combine data from different tables based on common keys. o Consolidate - "Consolidate" feature to combine data from multiple ranges or worksheets into a single summary table. 4. Data reduction o Filtering - "Filter" feature to display only the rows that meet certain criteria. o Grouping - Use of PivotTables to group data by categories and summarize the results. o Conditional Formatting - Highlight important data points or trends using color scales, data bars, or icon sets. 5. Data analysis o Statistical analysis - Built-in functions like AVERAGE, MEDIAN, STDEV, VAR, CORREL, and LINEST for statistical calculations. o Data mining - "Data Analysis Tools" (requires enabling) for advanced analysis like regression, histograms, and t-tests. o What-If Analysis - "Goal Seek" "Data Tables" and "Scenario Manager" to explore different scenarios and their outcomes. 6. Data visualization o Charts and graphs - various charts (line, bar, pie, scatter, etc.) to visualize data trends and patterns. o Sparklines - Use of Sparklines to add small, simple charts within a single cell to show trends. o Conditional formatting - Apply color scales, data bars, or icon sets to visually represent data ranges and trends. 7. Data storage and retrieval o Tables - Convert data ranges into tables to make data management easier with automatic filtering, sorting, and referencing. o Named ranges - Use of named ranges to simplify formula writing and enhance readability. o Importing data - "Get & Transform Data" feature to import data from various external sources like databases, web pages, and CSV files. 8. Real-time data processing o Refresh data connections - Automatically refresh data connections to update data from external sources. o Live data feeds - Use of web queries and Power Query to connect to live data feeds for real-time analysis. 9. Data privacy and security o Protect sheets and workbooks - Use of password protection to restrict access to certain sheets or the entire workbook. o Hide data - Hide rows, columns, or entire sheets to prevent unauthorized viewing of sensitive information. o Data masking - Formulas or custom formats to mask sensitive data, displaying only the necessary information. Coding - the process in which every element of a set of symbols corresponds to the theory of another set of symbols according to some rule. Codelist, codetable - each of the allowed sequences from the list of input symbols is assigned a sequence of symbols from the output list. character (s) → bit group letters (A, B,..) 1000001... numbers (1, 2,..) special characters (#, $,..) Example: ASCII alphabet characters ↔ 7-bit (ASCII → 27 = 128 characters) and 8-bit (extended ASCII → 28 = 256 characters) encoding For traditional reasons, we encode the following text as an example: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. If the numbers are written in the 10-number system, the solution begins with: 84 72 69 … and ends with … 68 79 71. Unicode The "U+" lets us know it's the Unicode standard, and the number is what results when the binary get's transformed to numbers. Example: Hello World U+0048: latin capital letter H U+0065: latin small letter e U+006C: latin small letter l U+006C: latin small letter l U+006F: latin small letter o U+0020: space U+0057: latin capital letter W U+006F: latin small letter o U+0072: latin small letter r U+006C: latin small letter l U+0064: latin small letter d