Lec. 2 Intro To Data Science 2024-2025 PDF
Document Details
Uploaded by GratifyingDiscernment9297
Alexandria University
Tags
Summary
This document is lecture notes on data science concepts, emphasizing the characteristics and sources of big data. It covers topics such as volume, velocity, variety and complexity, alongside examples and activities. The summary of the document provided is useful for readers seeking an overview of data science concepts.
Full Transcript
Lec.2. What is What makes data, “Big” Data? Big data from its name is very big Starting size of it at least 1 TB Volume Velocity Variety Data Data Data quantity Speed Types 1st Character of Big Data Scale (Volume) A typica...
Lec.2. What is What makes data, “Big” Data? Big data from its name is very big Starting size of it at least 1 TB Volume Velocity Variety Data Data Data quantity Speed Types 1st Character of Big Data Scale (Volume) A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day. Boeing 737 will generate 240 terabytes of flight data during a single flight across the US. The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video. Data volume is increasing exponentially Exponential increase in collected/generated data 6 Clickstreams and ad impressions capture user behavior at millions of events per second high-frequency stock trading algorithms reflect market changes within microseconds machine to machine processes exchange data between billions of devices infrastructure and sensors generate massive log data in real-time on-line gaming systems support millions of concurrent users, each producing multiple inputs per second. Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions ➔ missing opportunities Examples ◦ E-Promotions: Based on your current location, your purchase history, what you like ➔ send promotions right now for store next to you ◦ Healthcare monitoring: sensors monitoring your activities and body ➔ any abnormal measurements require immediate reaction 8 Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media. Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. Big Data analysis includes different types of data Various formats, types, and structures Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… Static data vs. streaming data A single application can be generating/collecting many types of data To extract knowledge➔ all these types of data need to linked together 10 Data quantity Data Speed Data Types 11 "Big Data are high-volume, high-velocity, and/or high- variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization”. Complicated (intelligent) analysis of data may make a small data “appear” to be “big” Bottom line: Any data that exceeds our current capability of processing can be regarded as “big” Big Data is any data that is expensive to manage and hard to extract value from ◦ Volume The size of the data ◦ Velocity The latency of data processing relative to the growing demand for interactivity ◦ Variety and Complexity the diversity of sources, formats, quality, structures. Sources of data Data from internet Data from military corporations Hospitals data NASA corporation data And so on… Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data What is data science? An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data Data science principles apply to all data – big and small - Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. - - Simply, data science is an umbrella of several techniques that are used for extracting the information and the insights of data. 20