Full Transcript

# Lab 9: Data Visualization with Altair ### Introduction This lab introduces data visualization using Altair a declarative statistical visualization library for Python. Altair allows you to create a wide range of visualizations with a concise syntax. ### Objectives * Learn the basic syntax of A...

# Lab 9: Data Visualization with Altair ### Introduction This lab introduces data visualization using Altair a declarative statistical visualization library for Python. Altair allows you to create a wide range of visualizations with a concise syntax. ### Objectives * Learn the basic syntax of Altair * Create various types of charts including scatter plots, bar charts, line charts and histograms * Customize chart appearance with titles, axes labels, and color schemes * Create interactive charts with selections and conditions * Combine multiple charts into composite visualizations ### Materials * Python 3.6 or higher * Altair library * Pandas library * Jupyter Notebook or similar environment ### 1. Installing Libraries Open your terminal or command prompt and install the necessary libraries using pip: ```bash pip install altair pandas vega_datasets ``` ### 2. Basic Altair Syntax Altair visualizations are built using a declarative approach, where you specify what you want to see rather than how to draw it. The basic syntax involves creating a `Chart` object and encoding data fields to visual channels such as `x`, `y`, `color`, and `size`. ```python import altair as alt import pandas as pd # Sample data data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [6, 7, 2, 4, 5]}) # Create a chart chart = alt.Chart(data).mark_point().encode( x='x:Q', # Quantitative data y='y:Q' # Quantitative data ) chart.show() ``` ## Explanation: * `alt.Chart(data)`: Creates a chart object using the pandas DataFrame. * `.mark_point()`: Specifies that the data points should be represented as points (scatter plot). * `.encode(x='x:Q', y='y:Q')`: Encodes the `x` and `y` columns to the x and y axes, specifying them as quantitative data (`Q`). * `chart.show()`: Display the chart. ### 3. Creating Different Types of Charts #### 3.1 Scatter Plots As shown in the basic syntax example scatter plots are created using `.mark_point()`. ```python import altair as alt from vega_datasets import data source = data.cars() alt.Chart(source).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q', color='Origin:N' # Nominal data ).show() ``` #### 3.2 Bar Charts Bar charts are created using `.mark_bar()`. ```python import altair as alt import pandas as pd data = pd.DataFrame({'category': ['A', 'B', 'C', 'D'], 'value': [4, 6, 2, 7]}) alt.Chart(data).mark_bar().encode( x='category:N', # Nominal data y='value:Q' # Quantitative data ).show() ``` #### 3.3 Line Charts Line charts are created using `.mark_line()`. ```python import altair as alt import pandas as pd data = pd.DataFrame({'time': range(10), 'value': [1, 3, 2, 4, 5, 7, 6, 8, 9, 7]}) alt.Chart(data).mark_line().encode( x='time:Q', # Quantitative data y='value:Q' # Quantitative data ).show() ``` #### 3.4 Histograms Histograms are created using `.mark_bar()` along with `alt.Bin()` to bin the data. ```python import altair as alt from vega_datasets import data source = data.movies() alt.Chart(source).mark_bar().encode( alt.X('IMDB_Rating:Q', bin=True), y='count()' ).show() ``` ### 4. Customizing Chart Appearance #### 4.1 Titles and Axis Labels You can add titles and axis labels using the `.title()` method and by specifying the `axis` parameter in the `.encode()` method. ```python import altair as alt import pandas as pd data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [6, 7, 2, 4, 5]}) alt.Chart(data).mark_point().encode( x=alt.X('x:Q', axis=alt.Axis(title='X Axis')), y=alt.Y('y:Q', axis=alt.Axis(title='Y Axis')) ).properties( title='Scatter Plot Example' ).show() ``` #### 4.2 Color Schemes You can customize the color scheme using the `color` encoding channel and specifying a color palette. ```python import altair as alt from vega_datasets import data source = data.cars() alt.Chart(source).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.Color('Origin:N', scale=alt.Scale(scheme='category10')) ).properties( title='Scatter Plot with Color Scheme' ).show() ``` ### 5. Interactive Charts #### 5.1 Selections Selections allow users to interact with the chart to filter data or highlight specific points. ```python import altair as alt from vega_datasets import data source = data.cars() # Add a selection brush = alt.selection_interval() points = alt.Chart(source).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.condition(brush, 'Origin:N', alt.value('lightgray')) ).add_selection( brush ) points.show() ``` #### 5.2 Conditions Conditions allow you to change the appearance of chart elements based on certain criteria. ```python import altair as alt from vega_datasets import data source = data.cars() alt.Chart(source).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.condition('datum.Horsepower > 150', alt.value('red'), alt.value('blue')) ).show() ``` ### 6. Composite Visualizations #### 6.1 Layering You can layer multiple charts on top of each other using the `+` operator. ```python import altair as alt import pandas as pd data = pd.DataFrame({'x': range(10), 'y1': [1, 3, 2, 4, 5, 7, 6, 8, 9, 7], 'y2': [2, 4, 3, 5, 6, 8, 7, 9, 10, 8]}) line1 = alt.Chart(data).mark_line(color='red').encode( x='x:Q', y='y1:Q' ) line2 = alt.Chart(data).mark_line(color='blue').encode( x='x:Q', y='y2:Q' ) (line1 + line2).show() ``` #### 6.2 Concatenation You can concatenate charts horizontally or vertically using `alt.HConcatChart` and `alt.VConcatChart` respectively. ```python import altair as alt from vega_datasets import data scatter = alt.Chart(data.cars()).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q' ) histogram = alt.Chart(data.cars()).mark_bar().encode( alt.X('Miles_per_Gallon:Q', bin=True), y='count()' ) alt.HConcatChart(scatter, histogram).show() ``` ### Exercises 1. Create a scatter plot of the `iris` dataset from `vega_datasets`, mapping sepal length to the x-axis, sepal width to the y-axis, and species to the color. 2. Create a bar chart showing the total sales for each category in a sample dataset. Customize the chart with a title and axis labels. 3. Create a line chart showing the stock prices of two companies over time. Allow users to select a date range to zoom in on the chart 4. Create a composite visualization that combines a scatter plot and a histogram to show the relationship between two variables and the distribution of one of them. ### Conclusion This lab has provided an introduction to data visualization with Altair. You have learned how to create various types of charts, customize their appearance, create interactive charts, and combine multiple charts into composite visualizations. Use these skills to explore and present data effectively.