Data Analytics Noes on All Topics PDF
Document Details
Uploaded by AmusingTundra9312
Imam Muhammad ibn Saud Islamic University
Tags
Summary
This document provides an overview of data analytics, encompassing key concepts like data rights, development tools, and different types of analysis. It covers topics such as big data characteristics, Hadoop components, and social media data analysis. The document also discusses the ETL process.
Full Transcript
- Key Rights: Right to access, right to rectification, right to erasure (right to be forgotten), and data portability. - CCPA: California law giving residents more control over their personal data, including the right to know and delete personal information held by businesses. - Development Tools: A...
- Key Rights: Right to access, right to rectification, right to erasure (right to be forgotten), and data portability. - CCPA: California law giving residents more control over their personal data, including the right to know and delete personal information held by businesses. - Development Tools: Android Studio for Android apps, Xcode for iOS apps. - APIs: Used to enable communication between the app and backend services or other apps. - Benefits: Cost efficiency, scalability, flexibility, and disaster recovery. - Deployment Types: Public, private, hybrid, and multi-cloud. - Definition: Blockchain is a decentralized ledger technology that records transactions across many computers. - GDPR (General Data Protection Regulation): Enforced in the EU, focuses on protecting personal data and privacy of individuals. - Applications: Cryptocurrency, supply chain management, identity verification, and smart contracts. - Challenges: Scalability, energy consumption, and regulatory concerns. - Characteristics: Big Data is defined by its volume - Volume: Refers to the enormous amounts of data generated daily. Traditional databases struggle to handle such large volumes, necessitating specialized technologies. - Velocity: The speed at which data is created and processed. Data flows in at unprecedented rates, requiring real-time or near-real-time processing capabilities. - Models: IaaS (infrastructure), PaaS (platform), SaaS (software). - Variety: The different types of data, including structured data (like databases), semi-structured data (like XML), and unstructured data (like text, images, and videos). Managing this variety poses significant challenges in data integration and analysis. - Hadoop: An open-source framework that allows for the distributed storage and processing of large datasets across clusters of computers. It uses a simple programming model (MapReduce) to handle massive amounts of data efficiently. - Components: - Hadoop Distributed File System (HDFS): For storing large files across multiple machines. - MapReduce: A programming model for processing and generating large datasets. - YARN (Yet Another Resource Negotiator): Manages and schedules resources in the Hadoop cluster. - Mobile Development Platforms: iOS (Swift, Objective-C) and Android (Java, Kotlin). - Definition: Data analytics involves inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. - Types of Analytics: - Descriptive Analytics: What has happened? (e.g., sales reports) - Diagnostic Analytics: Why did it happen? (e.g., root cause analysis) - Predictive Analytics: What is likely to happen in the future? (e.g., forecasting) - Prescriptive Analytics: What actions should be taken? (e.g., recommendations based on data) - Social Media: Platforms like Facebook and Twitter generate vast amounts of user-generated content, which can be analyzed for trends, sentiments, and behaviors. - IoT Devices: Devices like smart thermostats and wearable fitness trackers produce continuous streams of data that can be analyzed for patterns and insights. - Transaction Records: E-commerce and retail businesses create large volumes of transaction data, which can provide insights into customer behavior, inventory management, and sales trends. - Extract: Retrieving data from various sources (databases, flat files, APIs). - Transform: Cleaning and converting data into a suitable format for analysis (e.g., normalizing, aggregating, or enriching data). - Load: Importing the transformed data into a data warehouse or data lake for analysis. - Importance: ETL processes are crucial for integrating data from multiple sources and preparing it for analytics, ensuring that data is reliable and actionable.