Week 6- DataDriven.pdf
Document Details
Uploaded by ChivalrousHelium8967
Tags
Full Transcript
Data-Driven Development in AI Applications Prasara Jakkaew, Ph.D. Data as the foundation for AI models. How AI systems learn from data. Data-Driven Development The approach of using large sets of real-world data to guide the design, development, and improvement of software applications, p...
Data-Driven Development in AI Applications Prasara Jakkaew, Ph.D. Data as the foundation for AI models. How AI systems learn from data. Data-Driven Development The approach of using large sets of real-world data to guide the design, development, and improvement of software applications, particularly in AI. In DDD, data becomes the primary asset, driving decisions related to feature development, system design, and user experience. The core concept is to rely on empirical evidence (data) rather than intuition or past experience to build solutions. Importance in AI Improve decision-making accuracy. Adapt based on continuous input from real-world data. Provide personalized experiences, as in the case of recommendation systems. Difference Between Traditional Development and Data-Driven AI Development Traditional Development: Data-Driven AI Development: Focuses on rule-based programming, where AI systems learn from historical and real-time data rather than being explicitly programmed. developers define specific logic to solve AI models are often trained on datasets and can problems. continuously improve as more data is fed into the Systems are manually coded based on the system. developer's understanding of the problem. AI systems, such as predictive analytics, Changes are usually based on user feedback or recommendation engines, and natural language observations made during the system's usage, processing, adapt based on new data, making them leading to future updates. highly flexible. AI can autonomously identify patterns, trends, and Typically more static and less adaptive to real- insights that would be challenging for humans to code time changes in user behavior or new data. or foresee manually. Role of Data in AI Type of Data Quality of Data Quantity of Data Structured data Unstructured data Examples: Databases, spreadsheets. Structured Data Usage in AI: Machine learning models, predictive analytics. Examples: Text, Unstructured images, videos. Types of Data Data Usage in AI: Natural language processing (NLP), image recognition. Examples: JSON, Semi-structured XML. How this data Data bridges structured and unstructured data. Quality of data Characteristics of High-Quality Data Accuracy, completeness, consistency, timeliness, relevance. Common Data Quality Issues Missing data, outliers, noisy data, biased data. Data Preprocessing Techniques Data cleaning, normalization, transformation, handling missing data. Large amounts of data typically lead to better performance in AI models. More data allows AI systems to Quantity of recognize complex patterns and generalize better across different use Data cases. For example, a facial recognition model trained on thousands of diverse faces will outperform a model trained on just a few hundred images. Data Collection Methods for AI Development Primary Data Collection Direct data from users, sensors, experiments. Open datasets, third-party APIs, publicly available Secondary Data Sources data. Data Gathering from Real-Time IoT devices, social media feeds, web scraping. Sources Synthetic data generation, oversampling for Data Augmentation Techniques imbalanced datasets. Data Labeling and Annotation Importance of Labeled Data in Supervised Learning How labeled data trains models. Techniques for Data Labeling Manual labeling, crowdsourcing, automated labeling tools. Challenges in Data Annotation Time and cost considerations, ensuring accuracy. Building AI Models from Data Training, Validation, and Test Datasets The importance of splitting data to avoid overfitting. Feature Selection and Engineering How to extract relevant features from raw data. Model Training and Evaluation Training AI models, evaluation metrics (accuracy, precision, recall, F1-score). Data-Driven Decision Making How AI Models Make Decisions Based on Data Predictive analytics, recommendations, classification. Case Study: AI-Powered Recommendation System How data drives recommendations in applications like Netflix, Spotify. Ethical Considerations in Data-Driven AI Bias and Fairness How biased data leads to biased AI outcomes. Privacy and Data Security Handling sensitive data responsibly, compliance with GDPR and other regulations. Transparency and Accountability Explainability of AI models, ensuring trust in AI outcomes. Tools and Platforms for Data- Driven AI Development Popular AI and Machine Learning Frameworks TensorFlow, PyTorch, Scikit-learn. Data Management Tools SQL databases, data lakes, cloud- based storage. Big Data and AI Integration How big data technologies (Hadoop, Spark) work with AI models. Challenges in Data-Driven AI Development Data Availability Ensuring access to diverse, high-quality data. Scalability Issues Handling large datasets and real-time data processing. Data Security and Compliance Ensuring AI development follows regulatory standards. Future Trends in Data-Driven AI Development AI and Big Data Synergy How big data and AI are driving innovation together. AI Model Deployment Integrating AI models with real-world applications using continuous data streams. The Rise of Synthetic Data How synthetic data can replace real data in certain AI applications. How data drives recommendations in applications Prasara Jakkaew, Ph.D. How data drives recommendations in applications Data User Interaction Data: Applications track user Collection behavior, such as clicks, searches, likes, and purchase history. Profile Data: Information such as user demographics, preferences, and interests are gathered during sign-up or through user inputs. Contextual Data: The app may also collect real-time contextual data such as location, time, and device usage. To collect user interaction data in User applications, there are several common methods used to track user behavior, such Interaction as clicks, searches, likes, and purchase Data history. Here are some ways to collect this data: Event Tracking Session Recording Log Analysis Cookies and Local Storage User Purchase History and Transaction Data Interaction Heatmaps Data A/B Testing Data Collection Search Behavior Tracking Clickstream Data Social Interaction Data Event Tracking Developers create and track specific actions or events such as button clicks, form submissions, or page loads. This allows for monitoring how users interact with specific elements in an application. Tools: Google Analytics, Mixpanel, Heap Analytics, Firebase Analytics Example: Tracking every time a user clicks the “Add to Cart” button in an e- commerce platform to see which products are most added to carts. Tools record full user sessions, capturing everything from mouse movements to scrolls and clicks. Developers can watch the replays to analyze how users interact with different elements. Session Tools: Hotjar, FullStory, Smartlook Recording Example: Recording a user session to analyze drop-off points during the checkout process. Log Analysis Server logs record backend interactions like API calls, page requests, and form submissions. These logs can be analyzed to understand user behavior without directly recording the front-end UI actions. Tools: Elastic Stack (Elasticsearch, Logstash, Kibana), Splunk, Fluentd Example: Analyzing server logs to see how frequently a specific product is searched for or added to wishlists. Cookies and Local Storage Cookies and local storage store data in the user's browser, tracking behaviors across sessions. This can be used for remembering preferences or tracking session continuity. Tools: Browser APIs (JavaScript), Google Tag Manager, Firebase for web storage Example: Using local storage to store a user’s shopping cart items, even after they leave the site and return later. Purchase History and Transaction Data Platforms store purchase history for users, which can be analyzed for personalized recommendations and marketing optimizations. Tools: Custom databases, CRM systems like Salesforce, Shopify Analytics Example: Analyzing transaction data to offer personalized product recommendations or promotions based on previous purchases. Heatmaps Heatmaps visually display where users click, scroll, or hover on a webpage. This helps identify hotspots and cold zones of engagement. Tools: Crazy Egg, Hotjar, Mouseflow Example: Using heatmaps to find out if users are interacting with a call-to-action button on a product page or ignoring it. A/B Testing Data Collection A/B testing compares two versions of a webpage or feature by measuring user interactions with each version to determine which one performs better. Tools: Optimizely, VWO, Google Optimize Example: Testing two different landing page designs to see which one leads to higher signup rates. Search Behavior Tracking Tracks search queries entered by users within the application, which helps understand user intent and improves the search functionality. Tools: Algolia, Elasticsearch, Google Analytics (Search Tracking) Example: Capturing search queries entered into a website’s search bar to identify popular products and improve search results. Clickstream Data Clickstream data tracks the entire journey of a user from entry to exit. It captures every click, page view, and interaction, offering a complete view of user navigation. Tools: Adobe Analytics, Google Analytics, Piwik PRO Example: Tracking the user journey from landing on a homepage to purchasing a product to see how they navigate through the site. Tracks user interactions on social media platforms or social features within an application, like likes, shares, and comments. This data helps measure content engagement. Social Interaction Tools: Hootsuite, Sprout Social, Google Analytics (for social sharing buttons) Data Example: Analyzing which blog posts or products users are sharing on social media platforms like Facebook or Twitter to understand what content resonates most with the audience. https://analytics.google.com Information such as user demographics, preferences, and interests are gathered during sign- up or through user inputs. Businesses use profile data to tailor user experiences, personalize Profile Data content, and drive targeted marketing efforts. It is typically collected during user registration, account creation, or through ongoing interactions with the platform. Demographic Information Example: An e- Age, gender, location, commerce website education, collecting user age and occupation, marital location to provide status personalized product recommendations. Preferences Interests, preferred languages, favorite genres, preferred communication channels Example: A music streaming platform like Spotify collecting user genre preferences during sign-up to suggest personalized playlists. Behavioral Data Purchase history, browsing behavior, search history Example: Netflix tracking viewing history and user ratings to recommend similar content. Psychographic Data Lifestyle, personality, values Example: Facebook gathering user interests based on their interactions with pages, groups, and posts for more personalized ad targeting. Sign- When users create an account, Up they fill out forms asking for personal information such as Forms name, age, gender, location, etc. Tools: Google Forms, Typeform, How to Firebase Authentication Collect Profile Data Example: LinkedIn collects detailed demographic data during sign-up, including job title, company, and industry. Onboarding Surveys: Upon first use, many apps ask users for preferences to create a personalized experience. Tools: SurveyMonkey, Qualtrics, in- How to app surveys Collect Profile Data Example: When new users sign up for a fitness app like MyFitnessPal, they are asked for fitness goals, dietary preferences, and activity level to personalize workout and meal plans. User Users can manually update their Inputs profile with preferences, settings, via and additional information over Profile time. Settings: Tools: Custom user management How to systems, Firebase Firestore, Google Collect Profile Cloud Datastore Data Example: Amazon asks users to update their shipping preferences, payment methods, and wishlists for better recommendations and streamlined purchases. Tracking Platforms track users' interaction User Behavior: with content, such as clicks, likes, purchases, and time spent on specific sections to infer their preferences. Tools: Google Analytics, Mixpanel, How to Segment Collect Profile Data Example: Spotify tracks songs users listen to and “like” to refine future recommendations. Segmentation: Divide users into segments based on shared characteristics like age group, How to gender, interests, or behavior patterns. Analyze Tools: Google Analytics, Mixpanel, Profile Data Tableau Example: Amazon segments users based on purchase behavior to create targeted product recommendations. Predictive Analytics: Use machine learning models to predict future behavior based on past actions. How to Analyze Tools: IBM Watson Analytics, Google Profile Data BigQuery, TensorFlow Example: Netflix uses predictive analytics to recommend TV shows and movies based on a user's previous viewing history. Personalization: Deliver personalized content and recommendations based on profile data. How to Analyze Tools: Adobe Target, Salesforce Marketing Profile Data Cloud, Optimizely Example: Spotify's "Discover Weekly" playlist is created using personalized algorithms to recommend new music based on a user’s listening patterns. How to Analyze Profile Data Customer Lifetime Value (CLV) Analysis: Determine the long-term value of users by analyzing demographic data, preferences, and past behavior to predict future purchases. Tools: RFM Analysis (Recency, Frequency, Monetary), Microsoft Power BI Example: E-commerce platforms like Shopify use CLV analysis to identify high-value customers and target them with exclusive offers. Real-World Case Studies of Businesses Impacted by Profile Data Netflix – Personalization Through Profile Data Amazon – Tailored Shopping Experience Spotify – Personalizing Music Recommendations Facebook – Personalized Ads Based on Interests and Demographics Starbucks – Personalized Rewards Program Netflix – Personalization Through Profile Data How Profile Data is Collected: Netflix gathers user profile data including demographics, viewing history, ratings, search queries, and the time spent watching shows. Impact: Netflix’s recommendation engine, powered by the collected profile data, accounts for over 80% of the content watched by users. Personalization based on profile data keeps users engaged and reduces churn. By analyzing demographic and behavioral data, Netflix creates personalized recommendations that keep users watching more content and increase subscription retention. Tools: Netflix uses machine learning and algorithms like collaborative filtering and content-based filtering to deliver recommendations. Amazon – Tailored Shopping Experience How Profile Data is Collected: Amazon collects vast amounts of profile data, including purchase history, browsing behavior, demographic data, and product reviews. Impact: By leveraging this data, Amazon personalizes the user experience through tailored product recommendations, targeted marketing campaigns, and “Frequently Bought Together” suggestions. Personalized recommendations account for approximately 35% of Amazon’s total revenue. Analysis Method: Amazon uses advanced AI algorithms to analyze shopping patterns and interests to suggest products and deals for individual users. Tools: Amazon’s personalization algorithms, AWS (Amazon Web Services) Spotify – Personalizing Music Recommendations How Profile Data is Collected: Spotify gathers user profile data, including preferences, playlist creation, song skips, and likes/dislikes. Impact: Spotify uses this data to deliver hyper-personalized playlists such as "Discover Weekly" and "Release Radar," which contribute to longer engagement and increased retention. Personalized playlists are a key feature that keeps users returning to the platform. Tools: Spotify uses machine learning models like collaborative filtering and deep learning to create personalized music recommendations. Facebook – Personalized Ads Based on Interests and Demographics How Profile Data is Collected: Facebook collects detailed user profile data, including age, gender, location, interests (based on likes, posts, and engagement), and behavior on the platform. Impact: Facebook's ad platform uses profile data to help advertisers create highly targeted ads. This precision targeting leads to higher engagement and conversion rates for advertisers. For example, a travel company may target ads to users who have liked travel- related pages or shown an interest in vacation destinations. Tools: Facebook Ads Manager, Audience Insights Starbucks – Personalized Rewards Program How Profile Data is Collected: Starbucks collects profile data through its mobile app, where users provide demographic information and preferences, and the app tracks purchasing behavior. Impact: Starbucks uses this data to offer personalized promotions through its loyalty program. For instance, customers receive custom offers based on their buying patterns, encouraging repeat purchases and increasing engagement. Starbucks reported that personalized marketing campaigns contributed to a 15% increase in mobile orders and pay transactions. Tools: Starbucks Rewards App, Salesforce Marketing Cloud Refers to information that provides context about a user's environment and circumstances while interacting with an application or device. This data typically includes factors like Contextual location, time, device usage, and network conditions. Data Contextual data can be used to enhance the user experience, provide real-time services, and offer personalized recommendations based on the user's current context. Examples of Contextual Data Location Data: Time Data: Device Data: Network and Environmental Connectivity: Context: Current time, duration of Device type (smartphone, Wi-Fi or mobile data status, Weather, surrounding noise GPS coordinates, city, usage, frequency of tablet, PC), operating signal strength, bandwidth levels, movement (e.g., country, region interaction at specific system, screen resolution, usage driving or walking) Example: Google Maps times battery status Example: YouTube offering Example: A weather app using a user's current Example: Food delivery Example: Netflix adjusting different streaming quality providing different location to provide apps like Uber Eats video quality based on the options depending on notifications based on the navigation directions. suggesting breakfast items user’s device type and whether the user is on Wi-Fi current weather conditions in the morning and dinner internet speed. or mobile data. at the user’s location. options in the evening. How to Collect Contextual Data Method Tools Example Location Data Apps use GPS, cell towers, Google Maps API, An app like Uber uses Collection or Wi-Fi signals to Apple Core Location, GPS data to match determine the user's GeoIP drivers with location. passengers and provide estimated arrival times. Time Data Time-related contextual JavaScript Date A fitness app like Collection data is often captured by Object, Python’s Strava records the system clocks or through datetime module, time of a user’s API calls that track the Firebase Realtime workout to time of events or Database (for event recommend optimal interactions. timing) times for future exercises based on past behavior. How to Collect Contextual Data Method Tools Example Device Data Apps can detect the Firebase Analytics, Websites like Amazon Collection device type, operating User-Agent String, optimize layouts system, screen size, and Device Information based on whether battery level using built-in APIs (iOS and the user is accessing device APIs. Android) them via mobile or desktop. Network and Apps collect information Android Connectivity Streaming services Connectivity about the user’s internet Manager, iOS like Netflix or Spotify Data Collection connection (Wi-Fi or Network Reachability, adjust content quality cellular) using built-in Google Network based on available networking APIs. Information API bandwidth and network speed. How to Collect Contextual Data Method Tools Example Environmental Apps use sensors such as Android Sensor A smart home Context microphones, Manager, iOS Core assistant like Google Collection accelerometers, and Motion, Home can detect barometers to gather OpenWeather API ambient noise levels environmental data. and adjust its volume accordingly. How to Analyze Contextual Data A travel app like TripAdvisor provides tailored recommendations for nearby attractions based on a user’s current location. A ride-sharing app like Uber predicts ride demand based on historical location and time data, adjusting surge pricing accordingly. Starbucks uses geofencing to send push notifications about promotions to users who are near one of their stores. A fitness app sends reminders to users when they haven’t exercised for a specific period based on their location or time of day. A mobile banking app adapts the layout and features based on the device’s screen size and battery level, offering low-power options when the battery is running low. Google Maps – Real-Time Traffic and Navigation Adjustments How Contextual Data is Collected: Google Maps uses GPS, time of day, and real-time traffic data collected from user devices to provide real-time navigation and estimated travel times. Impact: Google Maps adjusts suggested routes based on real-time traffic conditions, such as accidents or road closures, improving the user experience and cutting down travel time. Contextual data allows it to offer the fastest routes and display estimated times of arrival. Tools Used: Google Maps utilizes real-time GPS data from millions of users, machine learning algorithms, and geospatial data analysis. How data drives recommendations in applications Prasara Jakkaew, Ph.D. Data Processing and Analysis Data Aggregation: All collected data is stored and aggregated in databases or data warehouses. Data Cleaning: To ensure quality, the data is cleaned by removing inconsistencies, duplicates, and irrelevant information. Feature Engineering: Specific features or attributes are extracted from raw data that help in making more accurate predictions. Algorithms Used in Recommendations Collaborative Filtering: This technique recommends items based on the preferences of users who are similar to the current user. It can be user-based (recommending items liked by similar users) or item-based (recommending similar items based on user preferences). Content-Based Filtering: This method recommends items by analyzing the attributes of the items themselves and matching them with the user’s profile and history. For example, a user who frequently watches action movies might be recommended more action titles. Hybrid Systems: Many modern applications use a combination of collaborative and content-based filtering to improve recommendation accuracy. Machine Learning Models Supervised Learning: The system is trained on historical data where the inputs (user actions) and outputs (recommendations) are known. The model learns to predict what new users might prefer. Reinforcement Learning: Here, the model learns by interacting with users and refining its recommendations based on user feedback (e.g., likes, purchases, or rejections of recommendations). Deep Learning: Advanced recommendation systems often use neural networks, especially for complex tasks such as analyzing user preferences from large unstructured datasets like social media activity, reviews, or images. Real-Time Personalization Dynamic Adjustments: The application continuously collects real-time data, adjusting recommendations based on immediate user behavior and feedback. A/B Testing: Applications often run different recommendation strategies in parallel to see which one performs better, refining recommendations based on what users respond to most positively. Feedback Loop User Feedback: Positive actions like clicks or purchases validate the recommendation system, while negative actions such as skipping or rejecting an item can trigger adjustments in the algorithm. Continuous Learning: The system refines its predictions and recommendations over time by learning from user behavior, ensuring that the recommendations remain relevant and engaging. Common Examples of Data- Driven Recommendations Streaming Platforms (e.g., Netflix, Spotify): They recommend movies, shows, or songs based on your past viewing or listening habits and the preferences of similar users. E-commerce (e.g., Amazon): It suggests products based on your browsing history, previous purchases, and items that other customers bought. Social Media (e.g., Facebook, Instagram): These platforms recommend friends, posts, or ads based on your interaction patterns and the behavior of users in your network.