Summary

This presentation discusses data analysis, including the process, techniques, types of analytics, and steps involved. It also includes Python code examples utilizing pandas and matplotlib for data manipulation and visualization.

Full Transcript

Data Analysis Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discoveri...

Data Analysis Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. A simple example of data analysis can be seen whenever we make a decision in our daily lives by evaluating what has happened in the past or what will happen if we make that decision. Basically, this is the process of analyzing the past or future and making a decision based on that analysis There are three types of analytics that businesses use to drive their decision making; 1. descriptive analytics, which tell us what has already happened; 2. predictive analytics, which show us what could happen, 3. prescriptive analytics, which inform us what should happen in the future. Seven steps of data analysis Step 1: Understanding the business problem. Step 2: Analyze data requirements. Step 3: Data understanding and collection. Step 4: Data Preparation. Step 5: Data visualization. Step 6: Data analysis. Step 7: Deployment. Load data set import pandas as pd sales=pd.read_csv('sales.csv') Sales Load 1st 10 sets sales.head(10) Load only invoice details sales['Invoice ID’] Load only category sales['Category’] Load unique category sales['Category'].unique() Load last 10 details sales.tail() Load 1st few rows sales.head() Gender=male sales[sales['Gender']=='Male’] 1st few gender=male sales[sales['Gender']=='Male'].head() 1st 10 gender=male sales[sales['Gender']=='Male'].head(10) Total>100 sales[sales['Total']>100] Last few tota>100 sales[sales['Total']>100].tail() Payment unique sales['Payment'].unique() Payment=cash sales[sales['Payment']=='Cash’] City=newyork sales[sales['City']=='NewYork’] From 100-200 sales[100:200] Total sales.sum()['Total'] Sum of quantity sales.sum()['Quantity’] Maximum sales.max() Max total sales.max()['Total’] Sales total & max total sales[sales['Total']==sales.max()['Total’]] Minimum sales['Total'].min() Min total sales[sales['Total']==sales.min()['Total']] Mean sales.mean()['Total’] Group by city sales.groupby('City’) Group by city sum sales.groupby('City').sum() Group by city total sales.groupby('City').sum()['Total’] sales.groupby('Date').sum() sales.groupby('Date').sum()['Total’] sales.groupby('Category').sum()['Total'] #which location has highest and lowest sales? #represent the sales on a barchart, also show the market share for each location using pie chart? location_list=sales.groupby('Location’) location=[x for x,y in location_list] location sales.groupby('Location').sum()['Total’] o Plot bardiagram import matplotlib.pyplot as plt plt.bar(location,sales.groupby('Location').sum()['Total']) o Plot pie diagram plt.pie(sales.groupby('Location').sum() ['Total'],labels=location,autopct='%1.1f%%’) #which location has more female customers and which location has more male location_sales=sales.groupby(['Gender','Location']).count()['Invoice ID’] location_sales location_sales.unstack(level=0) unsatcked_sales=location_sales.unstack(level=0) o Plot bar unsatcked_sales.plot(kind='bar') #which days of the month make more sales sales['Date’] pd.to_datetime(sales['Date’]) pd.to_datetime(sales['Date']).dt.month pd.to_datetime(sales['Date']).dt.day sales['Day']=pd.to_datetime(sales['Date']).dt.day day_sales=sales.groupby('Day').sum()['Total’] day_sales day_sales.plot() #which branch has more members Vs which has less members #which branch has highest rating and which has the lowest sales.head() members=sales.groupby(['Member','Location']).count()['Invoice ID’] members members.unstack(level=0) o Plot bar members.unstack(level=0).plot(kind='bar’) sales.groupby(['Location']).mean()['Rating’] rating=sales.groupby(['Location']).mean()['Rating'] rating.plot(kind='bar') #which city has more female shopping? #who spends more men or female? #which type of customer spends more member or none member #which product line sells more? #which product line is popular among men vs women? sales.groupby(['City','Gender']).count() sales.groupby(['City','Gender']).count()['Invoice ID’] female_shoppers=sales.groupby(['Gender','City']).count()['Invoice ID’] female_shoppers.unstack(level=0) o Plot bar female_shoppers.unstack(level=0).plot(kind='bar') sales.groupby('Gender').sum() sales.groupby('Gender').sum()['Total’] spend=sales.groupby('Gender').sum()['Total’] spend.plot(kind='bar’) sales.groupby('Member').sum() sales.groupby('Member').sum()['Total’] member_or_non=sales.groupby('Member').sum()['Total’] member_or_non.plot(kind='bar’) sales.groupby(['Category']).count() sales.groupby(['Category']).count()['Rating’] product_line=sales.groupby(['Category']).count()['Rating’] product_line.plot(kind='bar') sales.groupby(['Category','Gender']).count()['Rating’] products=sales.groupby(['Gender','Category']).count() ['Rating’] products.unstack(level=0) products.unstack(level=0).plot(kind='bar’) #which month has more sale sales.groupby('Month').sum()['Total’] sales_by_month=sales.groupby('Month').sum()['Total’] sales_by_month.plot(kind='bar')

Use Quizgecko on...
Browser
Browser