Big Data Visualization Techniques PDF
Document Details
Uploaded by BestPerformingPansy1267
MIT World Peace University
Tags
Summary
This document presents an overview of big data visualization techniques, including various methods and case studies. It discusses techniques like treemaps, circle packing, sunburst diagrams, and parallel coordinates, offering insights into their applications.
Full Transcript
Big Data Technologies (Computer Engineering and Technology) (TYB.Tech) UNIT V Big Data Visualization Techniques Big Data Technologies Unit V 1 1 Big Data Visualizat...
Big Data Technologies (Computer Engineering and Technology) (TYB.Tech) UNIT V Big Data Visualization Techniques Big Data Technologies Unit V 1 1 Big Data Visualization Techniques Introduction to data visualization, Data visualization factors, Challenges Unit-V in Data Visualization, Analytics Techniques: Basic charts scatter plots, Histogram, advanced visualization Techniques Tree Map Circle packing, Sunburst Circular Network Diagram Parallel Coordinates Streamgraph, Plots, Graphs, Networks, Hierarchies, Reports, Introduction to D3.js Case study: Google Analytics /Twitter Analytics 2 Big Data Technologies Unit V Introduction to Big 3 Data Visualization Data visualization is a technique to present the data in a pictorial or graphical format. Visuals are presented in the form of Graphs, Images, Diagrams, Animation Visualization is an excellent medium to analyze, comprehend, and share information Rolls-Royce believes visualising data is as important as manipulating it Big Data Technologies Unit V Helps in decision making Finding solution to the problems Introduction to Big For understanding data clearly Data Visualization To find relationship among the data Comparative analysis Big Data Technologies Unit V 4 Introduction to Big Data Visualization Big Data Technologies Unit V 5 6 Data Visualization Considerations Clarity Ensure the dataset is complete and relevant. This enables the Data Scientist to use the new patterns obtained from the data in the relevant places Accuracy Ensure you use appropriate graphical representation to convey the intended message Efficiency Use efficient visualization techniques that highlight all the data points Big Data Technologies Unit V There are some basic factors that one needs to be aware of 7 before visualizing the data: Data The visual effect includes the usage of appropriate shapes, colors, and sizes to represent Visualizatio the analyzed data The coordinate system helps n Factors organize the data points within the provided coordinates The data types and scale choose the type of data, for example, numeric or categorical The informative interpretation helps create visuals in an effective and easily interpretable manner using labels, title, legends, and pointers Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Diversity and heterogeneity in big data creates a big problem while visualizing Challenges that data in Big Data Analysis speed is most challenging factor visualizatio in Big data Analysis n Handling Big data scalability the cloud computing and advanced GUI are combined with the big data Usually data is unstructured, to visualize Tables, texts, trees, graphs, and other meta data is used Providing huge Parallelization is a 8 challenge in big data Visualization Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Challenges in Big Data High Complexity and High dimensionality during Discovery process due to huge visualizatio amount of data n It is difficult to design new big data visualization tool which results efficiency Due to the large size and dimensions of big data the visualization becomes more challenging 9 Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Analytical Techniques used in visualization 5 Analytical Techniques Temporal Hierarchical Multidimensi Geospatial onal Polar area Time series Scatter plots Timelines Line graphs diagrams sequences Node-link Venn Stacked bar Matrix charts Word clouds Scatter plots Pie charts Histograms Flow map Density map Cartogram Heat map diagrams diagrams graphs Big Data Technologies Unit V 10 UNIT IV- Big Data Visualization Techniques Analytical Techniques(Contd…) Most frequently used techniques for Big Data Visualization Includes Symbol Maps The symbols on such Example: Imagine a US maps differ in size, which manufacturer who has makes them easy to launched a new brand compare. recently. The manufacturer is interested to know which regions liked the brand particularly. To achieve this, they can use a map with symbols representing the number of customers who liked the product (left a positive comment in social media, rated a new product high in a customer survey, etc.) Big Data Technologies Unit V 11 UNIT IV- Big Data Visualization Techniques Analytical Techniques(Contd…) Line Charts Line charts allow looking at the behavior of one or several variables over time and identifying the trends. In traditional BI, line charts can show sales, profit and revenue development for the last 12 months. When working with big data, companies can use this visualization technique to track total application clicks by weeks, the average number of complaints to the call center by months, etc. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 12 Analytical Techniques(Contd…) Line Charts The resulting line gives shape to the change taking place, and can be used to show the volatility, trend, acceleration (peaks), and deceleration (valleys) of your chosen metric. Example: Line chart that shows the direct influence a hockey game had on water usage in Canada’s Edmonton during the Olympic Gold medal hockey game Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 13 Pie Charts Analytical Techniques(Contd…) Pie charts show the components of the whole. Companies that work with both traditional and big data may use this technique to look at customer segments or market shares. The difference lies in the sources from which these companies take raw data for the analysis. Example: Bank Customer Segments Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 14 Analytical Techniques(Contd…) Bar Charts Bar charts allow comparing the values of different variables. In traditional BI, companies can analyze their sales by category, the costs of marketing promotions by channels, etc. When analyzing big data, companies can look at the visitors’ engagement with their website’s multiple pages, the most frequent pre-failure cases on the shop floor and more. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 15 Bar Charts Analytical Techniques(Contd…) Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 16 Analytical Techniques(Contd…) Heat Maps A heat map or choropleth map is a data visualization that shows the relationship between two measures and provides rating information. The rating information is displayed using varying colors or saturation and can exhibit ratings such as high to low or bad to awesome, and needs improvement to working well. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 17 Analytical Techniques(Contd…) Heat Maps Heat maps use colors to represent data. A user may encounter a heat map in Excel that highlights sales in the best performing store with green and in the worst performing – with red. If a retailer is interested to know the most frequently visited aisles in the store, they will also use a heat map of their sales floor. In this case, the retailer will analyze big data, such as the data from a video surveillance system Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 18 Analytical Techniques(Contd…) Word Clouds A word cloud demonstrates how frequently a word (or phrase) appears in a block of text by connecting its size with its frequency. The bigger the word in the cloud, the more it appears in the text. Example: Most frequent keywords used by PM Narendra Modi in his speech Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 19 More Than One View per Representation Dynamical Changes in Number of Big Data Factors visualization Filtering Dynamic query filter Approaches Starfield Display Tight Coupling Big Data Technologies Unit V 20 UNIT IV- Big Data Visualization Techniques Big Data visualization Approaches More Than One View per Representation Despite the fact that there can be used completely every method of data visualization, the analyst uses just some similar or near to similar graphical objects as shown in the figure Big Data Technologies Unit V 21 UNIT IV- Big Data Visualization Techniques More Than One View per Representation Big Data Now, the researcher must compare visualizatio not only similar graphical objects, n but he also has to clearly distinguish Approaches different data and make a decision, based on different factors such an approach can guide analyst into desired location and provide enough support to make a decision in the very first stage of research. 22 Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data visualization Approaches More Than One View per Representation another key point in this approach is an ability to select desirable data areas onto all related representations, as shown in figure below Big Data Technologies Unit V 23 UNIT IV- Big Data Visualization Techniques Big Data visualization Approaches Dynamical Changes in Number of Factors An analyst has to indicate which data is to be 24 shown and how it should be shown to ease the information perception Typically, for Big Data, the analyst cannot observe the whole dataset, find anomalies in it, or find any relations from the first glance. This process is iterative and can be repeated until the desired pattern has not been found. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data visualization Approaches As shown in the example, on top histogram, we can see dependency between the number of cash collector units currently in use by payment system and the volume of each cash collector. After the analyst has chosen another factor, i.e. support expense, the diagram type changed into point diagram. Big Data Technologies Unit V 25 UNIT IV- Big Data Visualization Techniques Big Data visualization Approaches Filtering Need for Filtering Approach: 26 The issue of value discernibility was always topical for visual analysis and it becomes more important in case of Big Data Analyst’s area of interest is not static and can dynamically change during research process. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data visualization Approaches Filtering As an approach to solve this problem Overview map and filtering can be adopted As shown in the example detailed view does not have to be limited only by one level, but the detalization level can get wider and wider on each iteration. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 27 Big Data visualization Approaches Filtering As shown in the example, analyst can select one of the values, which is within his area of interest and get its distribution around city map, or highlighting of similar objects on overview map Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques 28 The main concepts, used for data filtering are: Big Data Dynamic Query Filters : visualizatio Some behavior patterns, which n are used in analytic research Approaches process, can be identified. The most popular of these patterns can be grouped and linked for simpler user interface components, This allows analyst to have a direct access to them in order to ease some routine actions, they 29 need to perform. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques The main concepts, used for data filtering are: Big Data Starfield Display: visualizatio This approach is based on the n idea that the whole data set is Approaches always visible. At the first level, some data need aggregation and the analyst sees only some grouped information, but as he makes a detailed request, which is represented in zooming actions, each group collapses into more and more 30 detailed data. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques The main concepts, used for data filtering are: Tight Coupling: Big Data Some user interface elements can visualizatio be directly linked to each other, so that their coupling can prevent the n analyst from making mistakes for Approaches data input, or restrict him to move his research into an obviously wrong direction. The basic example of such user interface elements is a group of radio buttons. After pressing the one radio button in group, other buttons loses user selection. 31 Different filters based on selection inversion are usually developed using this approach. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Big Data Visualization Methods are Methods classified based on following data criteria: (i) large data volume; (ii) data variety; (iii) data dynamics. 32 Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Big Data Visualization Methods include Visualization Tree Map Methods Circle packing Sunburst Circular Network Diagram Parallel Coordinates Streamgraph 33 Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Tree Map: This method is based on space-filling Big Data visualization of hierarchical data. There is a strict requirement applied to Visualization data—data objects to be hierarchically Methods linked. The Tree map is represented by a root rectangle, divided into groups, also represented by the smaller rectangles, which correspond to data objects from a set The visualization acquired by this method can only show two data factors. The first one is the factor used 34 for a shape volume calculation. And the second is a color, used for grouping the shapes. Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Tree Map: advantages: Big Data (i) hierarchical grouping clearly shows data Visualization relations; (ii) extreme outliers are immediately visible Methods using special color. disadvantages: (i) data must be hierarchical and, even more, Tree Maps are better for analyzing data sets where there is at least one important quantitative dimension with wide variations; (ii) not suitable for examining historical trends and time patterns; 35 (iii) the factor used for size calculation cannot have negative values Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods UNIT IV- Big Data Visualization Techniques Big Data Technologies Unit V 36 Big Data Big Data Visualization Methods include Visualization Circle Packing: Methods This method is a direct alternative to tree map, besides the fact that as primitive shape it uses circles, which also can be included into circles from a higher hierarchy level. 37 Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Circle Packing: Big Data advantages: Visualization (i) space-efficient visualization method compared to Tree map Methods disadvantages: (i) data must be hierarchical and, even more, Tree Maps are better for analyzing data sets where there is at least one important quantitative dimension with wide variations; (ii) not suitable for examining historical trends and time patterns; 38 (iii) the factor used for size calculation cannot have negative values Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 39 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 40 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods https://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-visualization/ Big Data Technologies Unit V 41 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Sun Burst: It uses Tree map visualization, Big Data converted to polar coordinate system. The main difference Visualization between these methods is that Methods the variable parameters are not width and height, but a radius and arc length. And this difference allows not to repaint the whole diagram upon data change, but only one sector containing new data by changing its radius. 42 Data dynamics can be met using sunburst Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Sun Burst: Big Data advantages: Visualization easily perceptible by most humans Methods disadvantages: (i) data must be hierarchical and, even more, Tree Maps are better for analyzing data sets where there is at least one important quantitative dimension with wide variations; (ii) not suitable for examining historical trends and time patterns; (iii) the factor used for size calculation 43 cannot have negative values Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods UNIT IV- Big Big DataData Visualization Technologies Techniques Unit V 44 Big Data Visualization Methods Biological Data Visualization https://www.researchgate.net/figure/General-comparison-of-features-among-th e-different-libraries-described-in-the-main-text_fig4_269187743 Big Data Technologies Unit V 45 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods UNIT IV- Big Data Visualization Techniques Big Data Technologies Unit V 46 Big Data Visualization Methods Big Data Visualization Methods include Circular Network Diagram: Data object are placed around a circle and linked by curves based on the rate of their relativeness. The different line width or color saturation usually is used as a measurement of object relativeness. Also method usually provides interactions making unnecessary links invisible and highlighting selected one. So, this method underlines direct relation between multiple objects and shows how relative it is Example: product transfer diagram between cities, relations between bought product in different shops, and so forth. Big Data Technologies Unit V 47 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Circular Network Diagram : advantages: Big Data (i) allows us to make relative data Visualization representation, which can be easily percepted Methods (ii) within the circle, the resolution varies linearly, increasing with radial position. This makes the center of the circle ideal for compactly displaying summary statistics or indicating points of interest. disadvantages: (i) method may end in imperceptible representation formandmay need regrouping of data objects on the screen (ii) objects with the smallest parameter weight 48 can be suppressed by larger ones, ending up in total mess onto the diagram Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 49 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 50 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Parallel Coordinates: This method allows visual analysis to Big Data be extended with multiple data factors Visualization for different objects. Methods Parallel Coordinates Plots are ideal for comparing many variables together and seeing the relationships between them. This method can handle several factors for a large number of objects per single screen, so it satisfies the data variety criterion Ex: if you had to compare an array of products with the same attributes i.e 51 say, comparing computer/cars specifications across different models Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Parallel Coordinates : Big Data advantages: Visualization (i) factors ordering does not influence Methods total diagram perceptions (ii) method allows us to analyze both whole data set of objects at once and individual data objects disadvantages: (i) method has limitation to the number of factors, shown at once (ii) visualization dynamic data end up 52 in changing whole data representation Big Data Technologies Unit V UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 53 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Example 2: For the given data of hourly wages in various countries Big Data Technologies Unit V 54 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Example 2 Parallel Coordinate chart for for the given data of hourly wages in various countries : Big Data Technologies Unit V 55 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods include Streamgraph: Streamgraph is a type of a stacked area graph, which is displaced around a central axis, resulting in flowing and organic shape. This method shows the trends for different sets of Big Data events, quantity of its occurrences, its relative rates, and so one. There can be a set of similar events, shown through the timeline on the image it does not support data variety criterion, but still it can Visualization be applied to large datasets. Examples: musical trends and cinema genre trends Methods Big Data Technologies Unit V 56 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 57 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods UNIT IV- Big Data Visualization Techniques Big Data Technologies Unit V 58 Big Data Visualization Methods Big Data Visualization Methods include Streamgraph : advantages: (i) effective for trends visualization disadvantages: (i) data representation shows only one data factor (ii) method depends on data layers (objects) sorting Big Data Technologies Unit V 59 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Comparison of Big Data Visualization Methods based on various data, large volumes data, and handles changes in time data + indicates satisfying the said criteria Big Data Technologies Unit V 60 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 61 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 62 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods Big Data Technologies Unit V 63 UNIT IV- Big Data Visualization Techniques Big Data Visualization Methods # libraries Sunburst can be used in 2 situations- 1. To depict Part of a whole library(tidyverse) library(treemap) library(sunburstR) # Load dataset from github data 121 UNIT IV- Big Data Visualization Techniques Case study: Twitter Analytics Twenty Feet Twenty Feet is a powerful analytics platform that tracks and graphs stats like Twitter mentions, followers, retweets, and more. Twenty feet also integrates with other services like Facebook, bitly, Google Analytics, YouTube, and more. 122 UNIT IV- Big Data Visualization Techniques Case study: Twitter Analytics 123 UNIT IV- Big Data Visualization Techniques References URLs: 1. https://www.scnsoft.com/blog/big-data-visualization-techniques 2. https://www.klipfolio.com/resources/articles/what-is-data-visualization 3. https://www.import.io/post/9-ways-make-big-data-visual/ 4. https://chezvoila.com/blog/parallel/ 5. https://datavizcatalogue.com/methods/parallel_coordinates.html 6. https://www.data-to-viz.com/graph/streamgraph.html 7. https://marketlytics.com/blog/google-analytics-data-visualizations/ 8. https://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/ 9. https://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/ 10. https://mumbaiunivercity.academia.edu/MCTA 11. https://business.twitter.com/en/blog/7-useful-insights-Twitter-analytics.html Technical Papers: 1. Analytical Review of Data Visualization Methods in Application to Big Data, Hindawi Publishing Corporation, Journal of Electrical and Computer Engineering, Volume 2013, Article ID 969458, http://dx.doi.org/10.1155/2013/969458 2. Big Data and Visualization: Methods, Challenges and Technology Progress, Digital Technologies, 2015, Vol. 1, No. 1, 33-38, DOI:10.12691/dt-1-1-7 3. Google Analytics - Case study by Suraj Chande Big Data Technologies Unit V 124 124