Summary

This document provides an introduction to the Seaborn Python library for data visualization. It covers basic plots, how to incorporate additional variables, and how to work effectively with pandas data structures. The document also includes examples on using different plot types.

Full Transcript

Introduction to Seaborn I N T R O D U C T I O N TO S E A B O R N Erin Case Data Scientist What is Seaborn? Python data visualization library Easily create the most common types of plots INTRO...

Introduction to Seaborn I N T R O D U C T I O N TO S E A B O R N Erin Case Data Scientist What is Seaborn? Python data visualization library Easily create the most common types of plots INTRODUCTION TO SEABORN Why is Seaborn useful? INTRODUCTION TO SEABORN Advantages of Seaborn Easy to use Works well with pandas data structures Built on top of matplotlib INTRODUCTION TO SEABORN Getting started import seaborn as sns Samuel Norman Seaborn ( sns ) import matplotlib.pyplot as plt "The West Wing" television show INTRODUCTION TO SEABORN Example 1: Scatter plot import seaborn as sns import matplotlib.pyplot as plt height = [62, 64, 69, 75, 66, 68, 65, 71, 76, 73] weight = [120, 136, 148, 175, 137, 165, 154, 172, 200, 187] sns.scatterplot(x=height, y=weight) plt.show() INTRODUCTION TO SEABORN Example 2: Create a count plot import seaborn as sns import matplotlib.pyplot as plt gender = ["Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male", "Male", "Male"] sns.countplot(x=gender) plt.show() INTRODUCTION TO SEABORN INTRODUCTION TO SEABORN Let's practice! I N T R O D U C T I O N TO S E A B O R N Using pandas with Seaborn I N T R O D U C T I O N TO S E A B O R N Erin Case Data Scientist What is pandas? Python library for data analysis Easily read datasets from csv, txt, and other types of les Datasets take the form of DataFrame objects INTRODUCTION TO SEABORN Working with DataFrames import pandas as pd df = pd.read_csv("masculinity.csv") df.head() participant_id age how_masculine how_important 0 1 18 - 34 Somewhat Somewhat 1 2 18 - 34 Somewhat Somewhat 2 3 18 - 34 Very Not very 3 4 18 - 34 Very Not very 4 5 18 - 34 Very Very INTRODUCTION TO SEABORN Using DataFrames with countplot() import pandas as pd import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv("masculinity.csv") sns.countplot(x="how_masculine", data=df) plt.show() INTRODUCTION TO SEABORN INTRODUCTION TO SEABORN INTRODUCTION TO SEABORN Let's practice! I N T R O D U C T I O N TO S E A B O R N Adding a third variable with hue I N T R O D U C T I O N TO S E A B O R N Erin Case Data Scientist Tips dataset import pandas as pd import seaborn as sns tips = sns.load_dataset("tips") tips.head() total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 INTRODUCTION TO SEABORN A basic scatter plot import matplotlib.pyplot as plt import seaborn as sns sns.scatterplot(x="total_bill", y="tip", data=tips) plt.show() INTRODUCTION TO SEABORN A scatter plot with hue import matplotlib.pyplot as plt import seaborn as sns sns.scatterplot(x="total_bill", y="tip", data=tips, hue="smoker") plt.show() INTRODUCTION TO SEABORN Setting hue order import matplotlib.pyplot as plt import seaborn as sns sns.scatterplot(x="total_bill", y="tip", data=tips, hue="smoker", hue_order=["Yes", "No"]) plt.show() INTRODUCTION TO SEABORN Specifying hue colors import matplotlib.pyplot as plt import seaborn as sns hue_colors = {"Yes": "black", "No": "red"} sns.scatterplot(x="total_bill", y="tip", data=tips, hue="smoker", palette=hue_colors) plt.show() INTRODUCTION TO SEABORN INTRODUCTION TO SEABORN Using HTML hex color codes with hue import matplotlib.pyplot as plt import seaborn as sns hue_colors = {"Yes": "#808080", "No": "#00FF00"} sns.scatterplot(x="total_bill", y="tip", data=tips, hue="smoker", palette=hue_colors) plt.show() INTRODUCTION TO SEABORN Using hue with count plots import matplotlib.pyplot as plt import seaborn as sns sns.countplot(x="smoker", data=tips, hue="sex") plt.show() INTRODUCTION TO SEABORN Let's practice! I N T R O D U C T I O N TO S E A B O R N data_visualization_using_seaborn April 22, 2018 0.0.1 Data Visualization using Seaborn (a Python library) Tutorial by: Navie Narula, Digital Centers Teaching Intern Created for: Research Data Services at Columbia University Libraries Resources used to create tutorial: DataCamp’s Introductory Tutorial Pandey’s Visualization Examples Seaborn Py- Data Swarm Plots Seaborn PyData Heat Maps List of Colors in Python In : # import libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn %matplotlib inline The seaborn library has many in-house datasets. You may find them here. We’ll be starting off with the tips dataset. In : # load in data and save to a variable df = seaborn.load_dataset("tips") In : # first five rows of dataset df.head() Out: total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 In : # last five rows of dataset df.tail() Out: total_bill tip sex smoker day time size 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 1 Swarm Plots In : # use swarmplot to visualize tip observations and amounts # by day of the week seaborn.swarmplot(x="day", y="tip", data=df) seaborn.set_style("whitegrid") plt.show() In : # visualize tip observations seaborn.swarmplot(x=df["tip"]) seaborn.set_style("darkgrid") plt.show() 2 In : # color points by category # create customized palette gender_palette = ["#A833FF", "#FFAF33"] seaborn.swarmplot(x="day", y="tip", hue="sex", palette=gender_palette, data=df) plt.show() 3 In : # control plot order on x-axis seaborn.swarmplot(x="smoker", y="total_bill", data=df, palette="husl", order=["Yes", "N plt.show() Violin Plots In : # plot tips seaborn.violinplot(x = df["tip"], color="gold") plt.show() 4 In : # draw plot based on variable seaborn.violinplot(x = "sex",y ="total_bill",data=df) plt.show() 5 In : # Split drawings to compare with hue/legend variables seaborn.violinplot(x = "time",y ="tip",data=df, hue ="sex",palette ="dark",split = True plt.legend() plt.show() Facet Grids In : # draw facet grid based on tip variable fg = seaborn.FacetGrid(df,col = "time",row = "sex") fg = fg.map(plt.hist,"tip", color ="tomato") 6 In : # we can also change the type of plot #...and the colors around the points fg = seaborn.FacetGrid(df, col="time", row="sex") fg = fg.map(plt.scatter, "total_bill", "tip", color="floralwhite", edgecolor="hotpink") 7 In : # plot by category x = seaborn.FacetGrid(df, col="time", hue="sex") x = x.map(plt.scatter,"total_bill","tip") x =x.add_legend() 8 Heat Maps In : # create random data uniform_data = np.random.rand(5, 3) # five rows, 3 columns print(uniform_data) seaborn.heatmap(uniform_data) plt.show() [[0.39376482 0.61566449 0.94105178] [0.57360765 0.66858876 0.03326495] [0.35962929 0.46553437 0.28784689] [0.32919801 0.02822342 0.38018925] [0.69303348 0.559752 0.61115946]] 9 In : # load in flights dataset flights = seaborn.load_dataset("flights") In : # print first five rows flights.head() Out: year month passengers 0 1949 January 112 1 1949 February 118 2 1949 March 132 3 1949 April 129 4 1949 May 121 In : # print last five rows flights.tail() Out: year month passengers 139 1960 August 606 140 1960 September 508 141 1960 October 461 142 1960 November 390 143 1960 December 432 In : flights = flights.pivot("month", "year", "passengers") # draw border x = seaborn.heatmap(flights, linewidths=0.3) 10 In : # change color and add value x = seaborn.heatmap(flights, annot=True, fmt="d", cmap="YlGnBu") 11 Now, it’s time for you to start working with your own data of choice and produce the visual- izations you like! You can use one of seaborn’s in-house datasets or load in your own. If you’d like to use in your own.csv file, you can load that into a dataframe by doing something like this: import pandas as pd df = pd.read_csv("", sep=",") 12

Use Quizgecko on...
Browser
Browser