🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Big Data part 1.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Document Details

RapidPrologue

Uploaded by RapidPrologue

2024

Tags

big data data analytics information technology

Full Transcript

Part 1 Understanding the fundamentals of Big Data Muzaffer Bogazliyan, M.Sc. 12.04.2024 Structure of the course 12.04.2024 What is big data 19.04.2024 Uses of big data in practice 26.04.2024 Inductive vs. Deductive reasoning 1...

Part 1 Understanding the fundamentals of Big Data Muzaffer Bogazliyan, M.Sc. 12.04.2024 Structure of the course 12.04.2024 What is big data 19.04.2024 Uses of big data in practice 26.04.2024 Inductive vs. Deductive reasoning 12.04.2024 Data understanding process 19.04.2024 Ensuring data quality 26.04.2024 Elements of descriptive statistics 03.05.2024 Statistics (data) and manipulation What is data? (1) Information units (like facts or statistics) Are not depleted after multiple use Data can be collected from anywhere Data is everywhere and everyone is familiar with it What is data? (2) In context of IT → Binary digital form → 0 or 1 Every day, you generate lots of data You watch a video on YouTube, like it, and share it with a few friends You then buy food and drinks online After that you search for cool places to vacation after the With all these transactions, semester you keep generating data You open Netflix and watch your favorite web series You pay your phone and electricity bills and sharing personal You update your details on a health portal to apply for information about yourself and insurance people you are related A friend calls you up to like their content on Instagram, so you log into your account and post comments on a few of their photos Then, you book your flight to your parents’ place for next weekend BSM 404 How do you recognize if it’s Big data or not? In short, if you are having trouble opening or storing your data, it will be due to Big Data. There are billions of internet and social media users January 2024 5.35 billion internet users worldwide 5.04 billion social media users worldwide Applications of (Big) data ▪ Forecast e.g. weather forecast, natural disaster, machine breakdown, disease outbraks ▪ Optimization for efficiency e.g. traffic flows, efficiency of machine utilization, logistics (goods transport) ▪ Personalization e.g. medicine, product recommendations (Spotify) ▪ Comfort e.g. autonomous driving, driving assistants ▪ Intelligence e.g. Automatic translation of texts, PC-games, robotics What are concrete business uses? Big Data is used for gaining practical insights for process and revenue improvements Cost optimization: companies are able to improve their business strategies Innovative products and services: companies are able to understand customer preferences better Better, quicker decision-making: faster insights and solutions in rapid company decision making Why is Big Data so important? Gain insights (about customers, products, processes…) Maximize effiency (of processes, products, services…) Make business (use data for daily operations) Generate new business (using massive data in new forms to generate value) Is Big Data always a good thing? Low data quality makes it impossible to trust analysis results: Garbage in, garbage out Where do we get data? „External“ data: „Internal“ data (company data) Open Data, e.g. Wikipedia Masta data of company Hidden Web: z.B. search via seach Transactional data engine Sensor data Government data, e.g. unemployment statistics … Scientific data, e.g. data of a telescope Publications, e.g newspaper article Historical data, e.g. weather data … The 5V‘s of Big Data 5V’s (Volume, Velocity, Variety, Veracity, and Value) → Volume: Data size → Velocity: Data production speed → Variety: Data in different forms → Veracity: Data accuracy (trustworthy) → Value: (Monetary) value of data Key characteristics of Big Data Volume of data = Scope of the data Big Data : Data so large, conventional methods cannot applied Large amounts of data cause problems: Operations become more complex – Entering new data – Searching for data – Sorting data Velocity of data = How quickly do I receive data? Data must be processed quickly Data storage is secondary Quick reaction necessary Examples: – Stock exchange – Banks (validation of transfers) – Autonomous driving Variatey of data = Diversity of data – Different data models – Different orders of magnitude – Different languages (e.g. German, English, Spanish) – Different standards (e.g. formats) Technical and semantical challenges Beispiel: Different forms of data Text data Sensor data Tables … Veracity of data = Quality of Data: Correctness Completeness Consistency Actuality Value of data Monetary reward of collecting and analyzing data No limit as to how much data value (in monetery terms) can be The amount of data and information is not directly correlated with knowledge generation. But the demand for data scientists will be growing. Data aggregation (1) Aggregating huge amounts of data into one data like mean, mode, standard deviation Data classification (2) Grouping „same“ objects Data clustering (3) Grouping „similar“ objects Data Mining (4) Searching in huge amounts of data to recognize trends or rules Machine learning (5) Learning with data What is an algorithm? A guide to solving a problem through a structured approach, like a cooking recipe An algorithm must fulfil five properties: Executability: Each step must be executable Determinism: There is always only one next step to consider Determinacy: The algorithm always delivers the same result with the same input Finiteness: The number of steps in the algorithm must be finite Termination: The algorithm itself must also end and deliver a result Algorithm: How to go to work Who has published the statistics or the data? Goverment institution? Private company? NGO? Lobby institution? Private person?

Use Quizgecko on...
Browser
Browser