Data-Science-Classifications & Algorithms.pptx
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Data Science Classifications by Sharmila Sankar Classifications by Type of Analysis 1 Descriptive Analytics This foundational level of analysis focuses on summarizing historical data to provide insights into past events. It answers the question "What happened?" by organizing, ta...
Data Science Classifications by Sharmila Sankar Classifications by Type of Analysis 1 Descriptive Analytics This foundational level of analysis focuses on summarizing historical data to provide insights into past events. It answers the question "What happened?" by organizing, tabulating, and visualizing data to reveal patterns and trends. 2 Diagnostic Analytics Building on descriptive analytics, diagnostic analytics delves deeper to understand the reasons behind past outcomes. It addresses the question "Why did it happen?" through techniques such as drill-down, data discovery, and correlations. 3 Predictive Analytics This forward-looking approach uses historical data and statistical modeling to forecast future events. It answers "What is likely to happen?" by employing advanced techniques like machine learning and data mining. 4 Prescriptive Analytics The most advanced form of analytics, prescriptive analytics not only predicts future outcomes but also recommends actions to optimize those outcomes. It addresses "What should we do?" using complex algorithms and simulation models. Quantitative Data Classifications Discrete Data Continuous Data Discrete data represents countable, distinct values. It's Continuous data can take any value within a range and typically whole numbers without intermediate values. is typically measured rather than counted. Examples Examples include: include: Number of employees in a company Temperature readings Count of products sold Time durations Number of customer complaints Weight or height measurements Discrete data is often analyzed using frequency Continuous data is often analyzed using histograms, distributions and probability mass functions. density plots, and probability density functions. Qualitative Data Classifications Nominal Data Ordinal Data Nominal data represents categories Ordinal data represents categories with without any inherent order. It's used for a meaningful order or ranking, but the labeling variables without providing intervals between values may not be quantitative value. Examples include consistent. Examples include education gender, race, or blood type. Analysis of levels or customer satisfaction ratings. nominal data often involves frequency Analysis techniques for ordinal data counts and mode calculations. include median calculations and non- parametric tests. Binary Data Textual Data A special case of nominal data, binary While not always considered a separate data has only two possible categories, category, textual data is increasingly such as true/false or yes/no. It's often important in data science. It includes used in logistic regression and decision unstructured text from sources like tree models. social media, customer reviews, or emails. Analysis often involves natural language processing techniques. Data Science Roles and Specializations Data Strategist Data Architect Develops comprehensive data Designs and manages complex data strategies aligned with business structures, ensuring efficient data goals, identifying opportunities for storage, retrieval, and integration data-driven innovation and growth. across systems. Data Engineer Data Analyst Builds and maintains the data Examines data to extract infrastructure, creating pipelines for meaningful insights, creating data collection, processing, and reports and visualizations to storage. communicate findings effectively. Automating Data Collection, Cleaning, and Analysis Algorithms and technologies enable efficient and scalable automation of data processes. This presentation explores key methods for extracting, processing, and analyzing data. Data Extraction Algorithms Web Scraping and Crawling API Integration Extracts data from websites and web pages. Scraping Collects structured data directly from online sources. targets specific pages, while crawling follows links Efficient when APIs are available and well-documented. across multiple pages. OCR and Intelligent Data Capture Optical Character Intelligent Data Capture Recognition (OCR) Enhanced OCR techniques Converts unstructured data including data validation from physical documents and classification. Ensures into structured digital data. accuracy and relevance of Crucial for digitizing paper extracted data. workflows. Machine Learning and AI Algorithms 1 Natural Language Processing (NLP) 2 Speech Recognition Extracts and analyzes text data from various Collects and analyzes audio data. Converts it into sources. Helps categorize and process text or other usable formats. unstructured data. Data Processing and Transformation 1 ETL Algorithms Automate extracting data from sources, transforming it into a consistent format, and loading it into databases. 2 Data Cleaning Algorithms Handle missing values, remove duplicates, and correct errors. Ensure data quality and accuracy. Database Querying and Retrieval Database Querying Automated queries retrieve specific data from databases at predefined intervals or in response to triggers. Data Retrieval Algorithms Locate, access, and retrieve data from various sources based on specific criteria or queries. Automation Scripts and Tools Automation Scripts Integration Tools Scripts in Bash or PowerShell automate repetitive tasks Tools like Zapier or Apache NiFi integrate different data in data collection and processing. sources and systems. Key Technologies and Tools AI and ML Frameworks TensorFlow, PyTorch, scikit- learn Data Processing Frameworks Apache Spark Data Visualization Tools D3.js, Chart.js Data Science Algorithms: Powering Insights and Predictions Data science relies on various algorithms to analyze, interpret, and predict from data. These algorithms serve different purposes and address unique challenges in the field. Supervised Learning Algorithms Linear Regression Logistic Regression Decision Trees Predicts continuous outcomes Predicts event probabilities for Uses tree-like models for based on relationships between classification problems. Useful for classification and prediction. Easy variables. Uses the equation y = binary classification. to interpret and handles various b0 + b1x. data types. Unsupervised Learning Algorithms 1 K-Means Clustering 2 Hierarchical Clustering Groups similar data points into clusters based Builds cluster hierarchies on features. Used for by merging or splitting customer segmentation existing ones. Reveals and image compression. data structure at different granularities. 3 Principal Component Analysis (PCA) Reduces data dimensionality while retaining most information. Transforms high-dimensional data to lower dimensions. Semi-Supervised Learning Algorithms Self-Training Trains on labeled data, predicts labels for unlabeled data, then retrains on combined dataset. Co-Training Uses multiple models on different data subsets. Confident predictions update the training set. Improved Accuracy Combines labeled and unlabeled data to enhance model performance. Reinforcement Learning Algorithms Q-Learning Deep Q-Networks (DQN) Predicts expected return of actions in states. Used in Combines Q-learning with robotics and game playing. deep learning. Handles high-dimensional state and action spaces. Applications Used in video game playing and autonomous driving. Statistical Algorithms 1 Regression Analysis Models relationships between variables. Includes simple, multiple, and polynomial regression. 2 Hypothesis Testing Tests hypotheses about data. Includes t-tests and ANOVA to determine finding significance. 3 Data Interpretation Provides insights into data relationships and statistical significance of findings. Data Mining Algorithms Association Rule Mining Finds patterns in transactional data, like frequently purchased products together. Clustering Algorithms Groups similar data points. Includes k-means, DBSCAN, and hierarchical clustering. Pattern Discovery Extracts valuable information from large datasets. Neural Networks and Deep Learning Type Description Application ANN Interconnected Image neurons process classification, inputs to outputs speech recognition CNN Uses Image and video convolutional and processing pooling layers RNN Has feedback Sequential data, connections for time series, text internal state Key Applications and Benefits Predictive Modeling Classification Recommendation Systems Uses algorithms like linear Algorithms like logistic regression Uses collaborative filtering to regression to predict outcomes and SVM classify data into different recommend products based on user based on historical data. categories. behavior. Importance in Data Science 1 Automate Tasks 2 Find Patterns Algorithms automate data collection, cleaning, Algorithms identify hidden patterns and and analysis, improving efficiency and scalability. relationships in data. 3 Make Predictions 4 Drive Innovation Enables predictive modeling, crucial for decision- Uncovers hidden insights, helping businesses stay making across industries. competitive.