BDA-Chapter 1.pptx
Document Details
Uploaded by GodlikeVariable
Tags
Full Transcript
Big Data Analytics-Introduction Dr. Manju Venugopalan Dept of CSE, School of Computing, Bangalore Outline Big Data understanding Analytics on Big Data Three V’s of Big Data Types of Digital Data 2 Big Data Datasets that are too large or complex for the tradi...
Big Data Analytics-Introduction Dr. Manju Venugopalan Dept of CSE, School of Computing, Bangalore Outline Big Data understanding Analytics on Big Data Three V’s of Big Data Types of Digital Data 2 Big Data Datasets that are too large or complex for the traditional data processing application softwares How big? Traditionally GB currently petabytes(1024 Terabytes) or exabytes (1024 petabytes) 3 Big Data Analytics Describes the process of uncovering trends, patterns and correlations in large amounts of raw data and draw useful information Handling big data needs the appropriate techniques, tools and architecture It aims to solve problems with handling large data in a better way 4 Three Characteristics of Big Data- V3’s 5 The first V-Volume Big data is about volume. Volumes of data that can reach unprecedented heights It’s estimated that 2.5 quintillion bytes of data is created each day As a result, it is now not uncommon for large companies to have Terabytes – and even Petabytes – of data in storage devices and on servers. This data helps to shape the future of a company and its actions, all while tracking progress. 6 The second V-Velocity Online gaming platforms support millions of concurrent users Sensors generate massive log data in real time Clickstreams and impressions capture user behavior at millions of events per sec High frequency stock trading algorithms reflect market changes within microseconds Evolved through Batch Periodic near real time Real-time 7 The third V-Variety Data is extremely diverse. It isn’t just numbers, dates or strings Video Audio Geospatial data Images Unstructured text including log files and social media 8 Digital Data Types of Digital data Structured Semi-Structured Unstructured 9 Structured Data Data which is in an organized form( rows and columns) Relationships exist between entities Data stored in databases is an example 10 Digital Data Distribution 11 Structured Data 12 Sources of Structured Data. 13 Ease with Structured Data 14 Semi-structured Data 15 Semi-Structured Data It does not confirm to any data model but has some structure It uses tags to segregate semantic elements XML mark up languages like HTML 16 Sources of semi-structured Data 17 Characteristics of semi-structured Data. 18 Unstructured Data 19 Unstructured Data About 80-90% of data in an organization is in this format It does not confirm to any data model Memos, chatrooms, ppt, images, videos email etc. 20 Sources of Unstructured Data 21 Issues with Terminology- Unstructured Data 22 Dealing with Unstructured Data Association Rule Mining Regression Data Mining Analysis Collaborative Text Analytics Filtering Natural Language Processing Dealing with Noisy text Unstructured Data Analysis Manual Tagging with Metadata POS tagging Unstructured Information Management Architecture (UIMA) 23 Q&A 24 Q& A 25 Thank You 26