Big Data Analytics - Module 1.pptx
Document Details
Uploaded by EnoughIsland
Related
- PCSII Depression/Anxiety/Strong Emotions 2024 Document
- A Concise History of the World: A New World of Connections (1500-1800)
- Human Bio Test PDF
- University of Santo Tomas Pre-Laboratory Discussion of LA No. 1 PDF
- Vertebrate Pest Management PDF
- Lg 5 International Environmental Laws, Treaties, Protocols, and Conventions
Full Transcript
Big Data Analytics Module 1 Definitio n Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources, and in different si...
Big Data Analytics Module 1 Definitio n Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources, and in different sizes. Analysis of big data allows analysts, researchers and business users to make better and faster decisions using data that was previously inaccessible or unusable. Introductio n Businesses can use advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics and natural language processing to gain new insights from previously untapped data sources independently or together with existing enterprise data. Companies like Amazon and Google are masters at analysing big data and they use the resulting knowledge to gain a competitive advantage. Types of Big Data Analysis Basic analytics Basic analytics can be used to explore the data, if you’re not sure what you have, but you think something is of value Basic analysis is often used when you have large amounts of disparate data. Basic analytics Slicing and dicing: Slicing and dicing refers to breaking down of data into smaller sets of data that are easier to explore. Basic monitoring: Refers to monitor large volumes of data in real time. Anomaly identification: identifying anomalies, such as an event where the actual observation differs from expected, in the data because that may clue in that something is going wrong with your organization, manufacturing process, and so on. Advanced analytics Advanced analytics provides algorithms for complex analysis of either structured or unstructured data. It includes sophisticated statistical models, machine learning, neural networks, text analytics in other advanced data-mining techniques Advanced analytics can be deployed to find patterns in data, prediction, forecasting, and complex event processing. Advanced analytics Predictive modelling: Predictive modelling is one of the most popular big data advanced analytics use cases. A predictive model is a statistical or data-mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data to determine future outcomes. Advanced analytics Text analytics: The process of analysing unstructured text, extracting relevant information, and transforming it into structured information that can then be leveraged in various ways It has become an important component of the big data ecosystem Advanced analytics Other statistical and data-mining algorithms: This may include advanced forecasting, optimization, cluster analysis for segmentation or even micro segmentation, or affinity analysis. Advanced analytics doesn’t require big data. However, being able to apply advanced analytics with big data can provide some important results Operationalized analytics In operationalized analytics, analytics is a part of a business process. For example, statisticians at an insurance company might build a model that predicts the likelihood of a claim being fraudulent. The model, along with some decision rules, could be included in the company’s claims-processing system to flag claims with a high probability of fraud Monetizing analytics Analytics can be used to optimize the business to create better decisions and drive bottom- and top-line revenue. For example, credit card providers take the data they assemble to offer value-added analytics products. Data Big data consists of structured, semi- structured, and unstructured data. It can come from untrusted sources It can be dirty The signal-to-noise ratio can be low It can be real-time Big Data analytics examples Facebo ok Facebook stores enormous amounts of user data, making it a massive data wonderland. It’s estimated that there will be more than 183 million Facebook users in the United States alone by October 2019 Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted Types of Text Data Documents E-mails Log files Tweets Facebook posts Understanding Text Analytics Text analytics is the process of analysing unstructured text, extracting relevant information, and transforming it into structured information that can then be leveraged in various ways. The analysis and extraction processes take advantage of techniques that originated in computational linguistics, statistics, and other computer science disciplines. The text analytics process uses various algorithms, such as understanding sentence structure, to analyse the unstructured text extract information, and transform that information into structured data Characteristics of big data analysis Big data analysis should be viewed from two perspectives: Decision-oriented – Action-oriented analysis is used to analyse selective subsets and representations of larger data sources and try to apply the results to the process of making business decisions. Action-oriented – Action-oriented analysis is used for rapid response, when a pattern emerges or specific kinds of data are detected and action is required. Characteristics of big data analysis It can be programmatic One of the biggest changes in terms of analysis is that in the past data sets were manually loaded into an application and visualize and explore. With big data analysis, we might start with the raw data that often needs to be handled programmatically (using code) to manipulate it or to do any kind of exploration because of the scale of the data. Characteristics of big data analysis It can be data driven While many data scientists use a hypothesis driven approach to data analysis we can also use the data to drive the analysis — especially if we’ve collected huge amounts of it. For example, you can use a machine- learning algorithm to do this kind of hypothesis-free analysis.. Characteristics of big data analysis It can use a lot of attributes. In the past, data analytics had to deal with data having hundreds of attributes or characteristics. Big Data analytics is dealing with hundreds of gigabytes of data that consist of thousands of attributes and millions of observations. Everything is now happening on a larger scale. Characteristics of big data analysis It can be iterative. More compute power helps to iterate the models until we get them the way we want them. It can be quick to get the compute cycles by leveraging a cloud-based Infrastructure as a Service. With Infrastructure as a Service (IaaS) platforms like Amazon Cloud Services (ACS), you can rapidly provision a cluster of machines to ingest large data sets and analyse them quickly Characteristics of a Big Data Analysis Framework Important considerations to be taken in selecting a big data analysis framework Support for multiple data types: Many organizations are incorporating, or expect to incorporate, all types of data as part of their big data deployments, including structured, semi-structured, and unstructured data. Handle batch processing and real time data streams: Action orientation is a product of analysis on real-time data streams, while decision orientation can be adequately served by batch processing. Some users will require both, as they evolve to include varying forms of analysis. Utilize what already exists in your environment: To get the right context, it may be important to leverage existing data and algorithms in the big data analysis framework Support NoSQL and other newer forms of accessing data: While organizations will continue to use SQL, many are also looking at newer forms of data access to support faster response times or faster times to decision. Overcome low latency: If you’re going to be dealing with high data velocity, you’re going to need a framework that can support the requirements for speed and performance. Provide cheap storage: Big data means potentially lots of storage — depending on how much data you want to process and/or keep. This means that storage management and the resultant storage costs are important considerations Integrate with cloud deployments: The cloud can provide storage and compute capacity on demand. More and more companies are using the cloud as an analysis sandbox.