Data, Database, & Data-Driven - Lecture Notes PDF
Document Details

Uploaded by QualifiedLanthanum
Universitas Padjadjaran
2024
Anindya Apriliyanti Pravitasari
Tags
Summary
These are lecture notes discussing data, databases, and data-driven concepts. The notes also cover big data and its characteristics, including volume, variety, velocity, value, veracity, variability, and virality. The slides are presented by Dr. Anindya Apriliyanti Pravitasari and Sinta Septi Pangastuti, M.Si., and come from Universitas Padjadjaran.
Full Transcript
Materi: 1 Dosen: Dr. Anindya Apriliyanti Pravitasari Sinta Septi Pangastuti, M.Si. Prodi Sarjana Statistika Semester Genap (3 SKS) Materi disusun oleh: Dr. Anindya Apriliyanti Pravitasari, M.Si. Sinta Septi Pangastuti, S.Si., M.Si. Pengajar Semester Genap 2024/20...
Materi: 1 Dosen: Dr. Anindya Apriliyanti Pravitasari Sinta Septi Pangastuti, M.Si. Prodi Sarjana Statistika Semester Genap (3 SKS) Materi disusun oleh: Dr. Anindya Apriliyanti Pravitasari, M.Si. Sinta Septi Pangastuti, S.Si., M.Si. Pengajar Semester Genap 2024/2025: Dr. Anindya Apriliyanti Pravitasari, M.Si. Sinta Septi Pangastuti, S.Si., M.Si. “Numbers have an important story to tell. They rely on you to give them a voice.” – Stephen Few – “Tambang Emas itu adalah Data” Bambang Brodjonegoro – Guru Besar FE UI “Data adalah jenis kekayaan baru bangsa kita, kini data lebih berharga dari minyak” Jokowidodo – Mantan Presiden RI “Siapapun yang menguasai data akan menguasai dunia” Jaron Lanier – American computer scientist,visual artist, computer philosophy writer, technologist, futurist, and composer of contemporary classical music. What is DATA? DATA? Representasi dari fakta yang direkam/ dicatat dalam bentuk angka, huruf, simbol, teks, gambar, bunyi atau kombinasinya. Big Data Small Data Usually designed with a goal in mind, but the Usually designed to answer a specific goal is flexible, the questions posed are protean. question/ serve a particular goal Typically spread through electronic space, multiple Typically, is contained within one institution, Internet servers, located anywhere on earth. often on one computer, sometimes in one file. Absorb Structured to Unstructured data, the Ordinarily contains highly structured data. resource may cross multiple disciplines When the data project ends, the data is kept Big Data projects typically contain data that must for a limited time be stored in perpetuity, used for a long long time. In most instances, all of the data contained in With few exceptions, such as those conducted the data project can be analyzed together, on supercomputers or in parallel computing. and all at once. Source: Jules J Berman, 2013 PRINCIPLES OF BIG DATA The “V”s BIG DATA VOLUME VARIETY VELOCITY VALUE VERACITY VARIABILITY VIRALITY The content of The business The degree to The ways big The viral Large Structured the data is value of data which big data data can be the data, amounts of Semi-Structured constantly collected can be trusted used and the big the data data Unstructured changing formatted (fast speed) Where it comes from? Adanya IoT, maka segala aktivitas manusia tercatat, menjadi milyaran gigabyte data setiap harinya. IoT (Internet of Things) merupakan sebuah konsep yang bertujuan untuk memperluas manfaat dari konektivitas internet yang tersambung secara terus-menerus. Adapun kemampuan seperti berbagi data, remote control, dan sebagainya, termasuk juga pada benda di dunia nyata. (Janssen, Cori - Internet of Things : IoT) What is DATABASE? DATABASE = BASIS + DATA representasi dari fakta yang direkam/ dicatat dalam bentuk angka, huruf, simbol, teks, gambar, bunyi atau kombinasinya. markas / tempat berkumpul / tempat bersarang / gudang BASIS DATA Himpunan kelompok data (arsip) yang saling berhubungan yang di-organisasi sedemikian rupa agar kelak dapat dimanfaatkan kembali dengan cepat dan mudah Kumpulan data yang saling berhubungan yang disimpan secara bersama sedemikian rupa dan tanpa pengulangan (redundancy) yang tidak perlu, untuk memenuhi berbagai kebutuhan Kumpulan file/tabel/arsip yang saling berhubungan yang disimpan dalam media penyimpanan tertentu DATABASES For BIG DATA CLOUD SYSTEM Perubahan Paradigma, membawa dampak pada: Sumber: Notodiputro (2019) & Suhartono (2020) Revolusi Pengukuran dan penanganan Data ▪ Bentuk data bervariasi: angka, teks, citra, suara, video, dan kombinasinya ▪ Penanganan data berubah: Revolusi Analisis Data Penyimpanan (data besar dan multistruktur sehingga bergerak ke sistem awan/ cloud system), Pengolahan dan pemanfaatan data Pergeseran Paradigma Penelitian Perubahan Paradigma, membawa dampak pada: Sumber: Notodiputro (2019) & Suhartono (2020) Revolusi Pengukuran dan penanganan Data Classical Methods Big Data Analytics Confirmative Explorative Small data set Large data set Revolusi Analisis Data Small number of variable Large number of Variable Deductive (no prediction) Inductive Numerical data Numerical & non-numerical data Pergeseran Clean data Data cleaning Paradigma Penelitian Perubahan Paradigma, membawa dampak pada: Sumber: Notodiputro (2019) & Suhartono (2020) Revolusi Pengukuran dan ▪ Penelitian tidak berawal dari hipotesis yang dikonfirmasi penanganan Data dengan data, tapi bergerak dari EKSPLORASI data guna memperoleh informasi yang maksimal ▪ Penelitian bergeser dari "SEARCH" (pencarian terarah) Revolusi Analisis Data menjadi "DISCOVERY" (lebih bersifat oportunis) MODEL is the KING DATA is the KING Pergeseran Paradigma Penelitian TEORY DRIVEN DATA DRIVEN DATA- DRIVEN? DATA- DRIVEN A data-driven approach is one that involves the gathering and analysis of data to make informed decisions. But before you can do that, you have to have a system in place that answers some crucial questions related to data: What kinds of decisions do you need to make? How much data do you need to say this is enough to make a decision? What kind of data? Where is this data? How long do you need to collect it and how often? How will you make sense of the data you collect? Why must learn DATABASE? STARTUPS Data collection Data storage Data transformation & exploration AI, DATA SCIENTIST Data agregate Deep Learning Learning A/B testing, experimentation, ML Algorithm Analytics, matrics, segments, aggregates, features, training data Cleaning, anomaly detection, prep Reliable data flow, infrastructure, pipelines, ETL, structured and unstructured data storage Instrumentation, logging, sensors, external data, user generated content Medium-size Companies Data collection Data storage SOFTWARE ENGINEER Data transformation & exploration DATA ENGINEER AI, Data agregate DATA SCIENTIST Deep Learning Learning A/B testing, experimentation, ML Algorithm Analytics, matrics, segments, aggregates, features, training data Cleaning, anomaly detection, prep Reliable data flow, infrastructure, pipelines, ETL, structured and unstructured data storage Instrumentation, logging, sensors, external data, user generated content Large-size Companies Data collection Data storage SOFTWARE ENGINEER Data transformation DATA ENGINEER & exploration AI, DATA SCIENTIST Data agregate Deep Learning ML ENGINEER Learning A/B testing, experimentation, ML Algorithm Analytics, matrics, segments, aggregates, features, training data Cleaning, anomaly detection, prep Reliable data flow, infrastructure, pipelines, ETL, structured and unstructured data storage Instrumentation, logging, sensors, external data, user generated content