Data Streaming Lecture Notes PDF
Document Details
Uploaded by ValuableBaroque2060
DMU
Dr.Mohamed Moustafa
Tags
Summary
These lecture notes cover data streaming, including class rules, course assessment details, and a few suggestions. The notes discuss data challenges, what data streaming is, and the role of IoT in data streaming. It also discusses important topics such as real-time data analytics and the challenges of data streaming.
Full Transcript
10/17/2024 Data Streaming Dr.Mohamed Moustafa Associate Professor, Computers and Artificial Intelligence CIO – DMU, SAS,MS 1 Class Rules You can do anything except: Make...
10/17/2024 Data Streaming Dr.Mohamed Moustafa Associate Professor, Computers and Artificial Intelligence CIO – DMU, SAS,MS 1 Class Rules You can do anything except: Make noises (chatting, singing…) Feel free to interrupt me if you have questions. According to the university policy, taking attendance is needed. Important: you are required to have an 80% attendance to be able to seat for the final exam. 2 2 1 10/17/2024 Course Assessment Temporary according to the situation: Final exam:50% Midterm: 20%, Quizzes: 10% Project: 20%, 2-3 members per group; report and presentation are required. Important:cheating and plagiarism will get no marks. 3 3 A few suggestions…. Your final grade is based on points – not on an accumulation of grades. You start the class with zero points and earn your way to your final grade If you have an issue or problem, communicate – send me an email If you know you’re not going to meet the deadline for a quiz or assignment – email me BEFORE the deadline 4 4 2 10/17/2024 Data challenges … Agenda “Data Everywhere, You have data , You have Everything ” 5 What is Data Streaming? Continuous data flow is generated from sensors, cameras, or IoT devices. Streaming data keeps flowing with no discrete beginning or end. Eg. Data from environment sensors, body sensors, surveillance cameras, log files, transactions, … Streaming data source emits data records continuously rather than in batches. Most streaming data sources continuously send data in small sizes (often in kilobytes) as the data is generated. Usually, the data needs to be processed on the fly 6 3 10/17/2024 Smart Cities and Homes Connected Customer Communications Surveillance Connected Car/ I nternet Building Management Things Transportation Energy OF Agriculture Manufacturing Finance / Insurance Retail Health Care 7 C o p y r i ght © S A S I n s titu te In c. A ll r igh ts r e s e r v e d. 7 IoT and Its Role in Data Streaming IoT: A network of interconnected devices that communicate with each other, generating data streams. Examples: Connected Car: Real-time GPS, sensors, and traffic systems data. Healthcare: Data streaming from wearables and remote health monitoring devices. Smart Cities: Surveillance systems and traffic monitoring 8 4 10/17/2024 Importance of Data Streaming in IoT Real-Time Decision-Making: IoT devices generate time-sensitive data that must be processed immediately (e.g., medical alerts, security systems). Scalability Challenges: IoT networks generate vast streaming data requiring scalable processing solutions. 9 Most IoT Data Remains Unused Data from sensors in manufacturing can provide information to detect conditions requiring attention. Sensors are pervasive: from wearables to rocket engines. Sensor data remains largely untapped (not being used for prediction and optimization). Imagine a structure that would allow sensor data to be processed as it gets produced. Therein lies an opportunity. 10 5 10/17/2024 Traditional Analytics at Rest Data Data Storage ETL Deploy Alerts - Reports Decisioning 11 Streaming Analytics Stream – Understand – Act Data Data Storage ETL Deploy Alerts - Reports Decisioning Deploy Enrich Store Streaming Data Streaming Model Execution 12 6 10/17/2024 Challenges in IoT Data Streaming Challenges: Volume and Variety: The large volume of diverse data from devices (e.g., cameras, environmental sensors). Latency and Bandwidth: IoT data often requires real-time processing, which demands low-latency network connections. Security and Privacy: Safeguarding sensitive data streaming from devices (e.g., healthcare data). 13 Characteristics of Data Streams Unbounded data Conceptually infinite, ever-growing set of data items/events Practically continuous stream of data, which needs to be processed/analyzed Push model The source controls data production and procession Publish/subscribe model Concept of time Often need to reason about when data is produced and when processed data should be output Processing time, ingestion time, event time 14 7 10/17/2024 Real-time Data Analytics 15 Data Value Continuum Data exists on a time continuum. The “things” we do with data strongly correlate to age. The value of data changes from the individual item to the aggregate over this time line. 16 8 10/17/2024 Data Value Chain 17 Data Streaming Traditionally, data is moved in batches. Batch processing processes large volumes of batched data with long latency. For many streaming data, batching processing can not be used since it is either prohibitively large to store and process in batch or the data can be stale when processed. Data streaming (or data stream processing, DSP) is the processing of streaming data on the fly. (visualizing, summarizing, analytics, …) 18 9 10/17/2024 Benefits of Data Streaming Good for time series analysis Well-suited for IoT data streams processing Can be used for real-time aggregation, correlation, filtering, or sampling. Enable the analysis of data in real time to gain insights into a wide range of activities. May accompany with planned actions based on the results of real-time analytics. Can feedback to improve the effectiveness of future monitoring, analytics and actions. 19 Patterns that Drives Most Streaming Use Cases 20 10 10/17/2024 Event-Driven Architecture in IoT Streaming An architecture where systems respond to events (e.g., a temperature sensor reading exceeding a threshold). Components: Producers: IoT devices generating events. Consumers: Systems or applications processing the event stream. Example: Smart homes reacting to changes in temperature by adjusting thermostats in real-time. 21 Static vs Streaming In static data computation, questions are asked of static data. In streaming data computation, data are continuously evaluated by static questions. 22 11 10/17/2024 Batch vs. Real-time Processing 23 Challenges of Streaming Streaming data management May have only one chance to examine the data Arbitrary and interactive exploration Real-time analytics Recency matter: alerts on recent changes Availability 24 12 10/17/2024 Challenges of DSP Streaming architecture and pipeline Streaming data ingestion and handling (adaptors, data formats, schema, cleaning, flow control, …) Stream processing algorithms design, testing, validation, deployment, and life-cyclemgnt. Scalability on volume and velocity. Elastic processing and load variations mgnt. Fault tolerance and processing guarantees Self-adapt at run-time for pattern shift Auto feedback and learning Security and privacy 25 Future Trends in IoT and Data Streaming Edge Computing: Moving data processing closer to IoT devices to reduce latency and bandwidth usage. 5G Networks: Enhanced IoT streaming capabilities with faster, more reliable connectivity. AI and Streaming: Real-time AI models applied to IoT data streams for predictive analytics and automation. 26 13 10/17/2024 27 14