Big Data PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This presentation provides an overview of big data, its characteristics, and uses in various industries. It describes the various types of analytics used with big data, particularly touching on descriptive and predictive analytics and the tools needed to store and process big data. The presentation also explores the importance of big data in business decisions.
Full Transcript
BIG DATA WHAT IS BIG DATA? Collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. WHAT IS BIG DATA? CONT.. The data is too big, moves too fast,...
BIG DATA WHAT IS BIG DATA? Collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. WHAT IS BIG DATA? CONT.. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures The scale, diversity, and complexity of the data require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it Big data is the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies. CHARACTERISTICS OF BIG DATA 5V'S WHY BIG DATA? & WHAT MAKES BIG DATA? Key enablers for the growth of “Big Data” are Increase of storage capacities Increase of processing power Availability of data Every day we create 2.5 quintillion bytes of data. 90% of the data in the world today has been created in the last two years. WHERE DOES DATA COME FROM? Data come from many quarters. Science – Medical imaging, Sensor data, Genome sequencing, Weather data, Satellite feeds Industry - Financial, Pharmaceutical, Manufacturing, Insurance, Online, retail Legacy – Sales data, customer behavior, product databases, accounting data etc., System data – Log files, status feeds, activity stream, network messages, spam filters. WHERE DOES DATA COME FROM? CONT.. WHAT IS THE IMPORTANCE OF BIG DATA? The importance of big data is how you utilize the data which you own. Data can be fetched from any source and analyze it to solve that enable us in terms of 1) Cost reductions 2)Time reductions 3) New product development and optimized offerings, and 4) Smart decision making. WHAT IS THE IMPORTANCE OF BIG DATA? CONT.. Combination of big data with high-powered analytics, you can have great impact on your business strategy such as: 1) Finding the root cause of failures, issues and defects in real time operations. 2) Generating coupons at the point of sale seeing the customer’s habit of buying goods. 3) Recalculating entire risk portfolios in just minutes. 4) Detecting fraudulent behavior before it affects and risks your organization. WHO ARE THE ONES WHO USE THE BIG DATA TECHNOLOGY? Banking Government Education Health Care Manufacturing Retail STORING BIG DATA ❑Analyzing your data characteristics ▪ Selecting data sources for analysis ▪ Eliminating redundant data ▪ Establishing the role of NoSQL ❑Overview of Big Data stores ▪ Data models: key value, graph, document, ▪ column-family ▪ Hadoop Distributed File System ▪ HBase ▪ Hive BIG DATA ANALYTICS والعثور على، واكتشاف االتجاهات،إنها عملية فحص البيانات الضخمة للكشف عن األنماط.ارتباطات غير معروفة وغيرها من املعلومات املفيدة التخاذ قرارات أسرع وأفضل It is the process of examining big data to uncover patterns, unearth trends, and find unknown correlations and other useful information to make faster and better decisions. WHY IS BIG DATA ANALYTICS IMPORTANT? Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. TYPES OF ANALYTICS BUSINESS INTELLIGENCE إنها عملية مدفوعة بالتكنولوجيا لتحليل البيانات وتقديم (BI) معلومات قابلة للتنفيذ ملساعدة املديرين التنفيذيني واملديرين وغيرهم من املستخدمني النهائيني للشركات على اتخاذ قرارات.تجارية مستنيرة It is a technology-driven process for analyzing data and presenting actionable information to help executives, managers and other corporate end users make informed business decisions. DESCRIPTIVE ANALYSIS Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge اإلحصاءات الوصفية هي املصطلح املعطى لتحليل البيانات التي تساعد from the data. على وصف البيانات أو عرضها أو تلخيصها بطريقة ذات مغزى بحيث قد. على سبيل املثال،تظهر األنماط من البيانات PREDICTIVE ANALYSIS Predictive analytics is the branch of data mining concerned with the prediction of future probabilities and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other entity to predict future behavior. التحليالت التنبؤية هي فرع استخراج البيانات.املعني بالتنبؤ باالحتماالت واالتجاهات املستقبلية وهو،العنصر املركزي للتحليالت التنبؤية هو التنبؤ متغير يمكن قياسه لفرد أو كيان آخر للتنبؤ بالسلوك.املستقبلي PREDICTIVE ANALYSIS ◦ التحليالت تحت اإلشراف هي عندما نعرف الحقيقة حول:فيما يلي نوعان من التحليالت التنبؤية درجة الحرارة والرطوبة وكثافة السحب. لدينا بيانات الطقس التاريخية:شيء ما في املاضي مثال ثم يمكننا التنبؤ بالطقس اليوم بناء على درجة الحرارة.(ونوع الطقس )مطر أو غائم أو مشمس There is 2 types of predictive analytics: والرطوبة وكثافة السحب اليوم ◦ غير خاضع لإلشراف غير خاضع لإلشراف هو عندما ال نعرف : والنتيجة هي الجزء الذي نحتاج إلى تفسيره على سبيل املثال.الحقيقة حول شيء ما في املاضي ◦ Supervised.نريد إجراء تجزئة على الطالب بناء على درجة االمتحان التاريخي والحضور والتاريخ املتأخر Supervised analytics is when we know the truth about something in the past Example: We have historical weather data. The temperature, humidity, cloud density and weather type (rain, cloudy, or sunny). Then we can predict today weather based on temp, humidity, and cloud density today ◦ Unsupervised Unsupervised is when we don’t know the truth about something in the past. The result is segment that we need to interpret Example: We want to do segmentation over the student based on the historical exam score, attendance, and late history. TOOLS USED IN BIG DATA Where processing is hosted? Distributed Servers / Cloud (e.g. Amazon EC2) السحابة )على سبيل املثال/ أين يتم استضافة املعالجة؟ الخوادم املوزعة Where data is stored? ( أين يتم تخزين البيانات؟ التخزين املوزع )على سبيلAmazon EC2 ( ما هو نموذج البرمجة؟ املعالجة املوزعة )على سبيلAmazon S3 املثال Distributed Storage (e.g. Amazon S3) ( كيف يتم تخزين البيانات وفهرستها؟ قواعد بياناتMapReduce املثال ( ما هيMongoDB عالية األداء خالية من املخططات )على سبيل املثال What is the programming model? الداللية/ العمليات التي يتم تنفيذها على البيانات؟ املعالجة التحليلية Distributed Processing (e.g. MapReduce) How data is stored & indexed? High-performance schema-free databases (e.g. MongoDB) What operations are performed on data? Analytic / Semantic Processing TOP BIG DATA TECHNOLOGIES Apache Hadoop Apache Hadoop هو إطار عمل مجاني قائم على نظام.جافا يمكنه تخزين كمية كبيرة من البيانات بشكل فعال في مجموعة امللفات املوزعةHadoop (HDFS) هو نظام تخزينHadoop الذي يقسم هذا أيضا.البيانات الضخمة ويوزعها عبر العديد من العقد في مجموعة يستخدم خوارزمية.يكرر البيانات في مجموعة وبالتالي يوفر توافرا عاليا 1. Apache Hadoop تقليل الخريطة للمعالجة. Apache Hadoop is a java based free software framework that can effectively store large amount of data in a cluster. Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big data and distribute across many nodes in a cluster. This also replicates data in a cluster thus providing high availability. It uses Map Reducing algorithm for processing. TOP BIG DATA TECHNOLOGIES CONT.. 3. Apache Spark Apache Spark is part of the Hadoop ecosystem, but its use has become so widespread that it deserves a category of its own. It is an engine for processing big data within Hadoop, and it's up to one hundred times faster than the standard Hadoop engine, Map Reduce. Apache Spark هو جزء من النظام البيئيHadoop، ولكن استخدامه أصبح واسع االنتشار إنه محرك ملعالجة البيانات الضخمة داخل. لدرجة أنه يستحق فئة خاصة بهHadoop، وهو أسرع بمائة مرة من محركHadoop القياسي، Map Reduce. TOP BIG DATA TECHNOLOGIES CONT.. 4. R R, another open source project, is a programming language and software environment designed for working with statistics. Many popular integrated development environments (IDEs), including Eclipse and Visual Studio, support the language. R، هو لغة برمجة وبيئة برمجية مصممة،وهو مشروع آخر مفتوح املصدر تدعم العديد من بيئات التطوير املتكاملة الشائعة.للعمل مع اإلحصاءات (IDEs)، بما في ذلكEclipse وVisual Studio، اللغة. APPLICATION OF BIG DATA ANALYTICS Smarter Multi-channel Healthcare sales Homeland Telecom Security Trading Traffic Control Analytics Search Manufacturing Quality