data-exploration (1).pdf
Document Details
Uploaded by SportyNash
Pamantasan ng Lungsod ng Maynila
Tags
Full Transcript
data exploration - is the initial step of data analysis, user explore data in an unstructured way to uncover initial patterns, characteristics and points of interest. data exploration can use both manual method and automated method such as data visualization, charts and reports. why data explorati...
data exploration - is the initial step of data analysis, user explore data in an unstructured way to uncover initial patterns, characteristics and points of interest. data exploration can use both manual method and automated method such as data visualization, charts and reports. why data exploration is important? - human process visual data better than numerical data. decision making process -database management system (dmbs) is a software package design to define, manipulate, retrive, and process data in a database. -dmbs always provides data independence. 4 main types of data organization: -relational data: data is organized in a logically independent tables. -flat database: data is organized in a single kind of record with a fixed number of fields. -object-oriented database: similar to object-oriented programming. has data and methods. -hierarchical datase: hierarchical relationship. one-to-many is violated. business intelligence (bi) -is the new technology for understanding past and predicting the future. - categories in technologies: gathering, storing, accessing, and analyzing data -categories in application: decision support system, query and reporting, online analytical processing (OLAP), and statistical analysis, forecasting, datamining big data analytics - it helps businesses and organizations make better decisions by revealing information -otherwise hidden. data warehouse - is a decision support system which stores historical data accross other organization. data warehouse vs data lake data warehouse - stores structured data, clean up and organized for specific business data lake - stores both structured and unstructured data. what is online analytical processing (OLAP) - is a system that supports multi-dimensional data analysis. what is online transaction processing (OLTP) - is a system for transactional processing, involves simple queries data warehouse architecture -bottom tier (storage layer) - compromises data media, meta-repositories, data mart, database server -middle tier (compute layer) - is the online analytical processing (OLAP) system. it processes complex queries -top tier (services layer) - layer represents the user front-end with visual dashboard schemas of data warehouse -star schema - have central fact table with dimensional tables -snowflake schema - have multiple child tables -fact constellation - have multiple fact table how does data exploration works -data collection - collecting data from diverse sources -data cleaning - rectification of outlier -exploratory data analysis (EDA) - application of various statistical tools -feature engineering - enhancing prediction models -model building & validation - preliminary models are developed exploratory data analysis (EDA) - is the crucial initial step in data science. types of exploratory data analysis (EDA) -univariate - focuses on a single variable to understand its internal structure -bivariate - exploring the connection between variables -multivariate - examines the relationships between two ore more variables in a dataset example for types of eda -univariate - histogram, box plot -bivariate - scatter plot, line graph -multivariate - pair plot descriptive statistics - involves summarizing and presenting data types of descriptive statistics -central tendency -dispercion -frequency distribution -shape -cross-tabulation -descriptive graphs TAKEAWAYS: Data visualization and data reporting are essential components of data analysis, enabling effective communication of insights and findings. Data Visualization Data visualization is the graphical representation of data, making it easier to understand, interpret, and communicate. It involves transforming data into visual elements like charts, graphs, and maps. Key benefits: Enhanced understanding: Visualizations can quickly convey complex information. Improved decision-making: Visual insights can aid in identifying trends, patterns, and anomalies. Effective communication: Visualizations can communicate findings to a wider audience. Common visualization techniques: Bar charts: Comparing categories or groups. Line charts: Showing trends over time. Pie charts: Representing proportions of a whole. Scatter plots: Exploring relationships between variables. Histograms: Understanding the distribution of a single variable. Maps: Visualizing geographic data. Tools: Python: Matplotlib, Seaborn, Plotly R: ggplot2 Business intelligence tools: Tableau, Power BI, Qlik Data Reporting Data reporting is the process of presenting data findings in a structured and organized manner. It involves creating reports that summarize key insights and provide context for decision-making. Key components of a report: Executive summary: A concise overview of the key findings. Introduction: Background information and objectives. Methodology: Description of the data collection and analysis process. Results: Presentation of the findings, supported by visualizations. Conclusions: Summary of the main conclusions and recommendations. Types of reports: Dashboard: Interactive reports that provide real-time updates. Ad-hoc reports: Customized reports created on demand. Scheduled reports: Reports generated automatically at regular intervals. Best practices for data reporting: Clarity and conciseness: Use clear and concise language. Relevance: Ensure the report focuses on the most important findings. Visual appeal: Use effective visualizations to enhance understanding. Customization: Tailor reports to the needs of the audience. Tools: Business intelligence tools: Tableau, Power BI, Qlik Spreadsheet software: Excel Word processing software: word