Data Science Practice Questions PDF

Summary

This document contains practice questions related to data science. The questions cover a range of topics from different units, including big data, data warehousing, exploratory data analysis, statistical modeling, artificial intelligence, and machine learning. The questions aim to test understanding of key concepts and techniques in data science.

Full Transcript

Unit-I 1. What is Data Science? 2. Define Big Data. 3. What is datafication? 4. List any two types of data with examples. 5. Name two essential skills required for a data scientist. 6. What is structured data? Give one example. 7. What is unstructured data? Give one example. 8. State one reason why...

Unit-I 1. What is Data Science? 2. Define Big Data. 3. What is datafication? 4. List any two types of data with examples. 5. Name two essential skills required for a data scientist. 6. What is structured data? Give one example. 7. What is unstructured data? Give one example. 8. State one reason why Data Science has become important now. 9. Mention any two elements of data. 10. What is the difference between data and information? 11. What does "Veracity" in Big Data refer to? 12. What is metadata? 13. Mention two examples of tools used in Data Science. 14. What is the role of a data engineer? 15. Define semi-structured data with an example. 16. Why is domain knowledge important in Data Science? 17. What is the importance of visualization in Data Science? 18. What is the difference between descriptive and predictive analytics? 19. Name two statistical techniques used in Data Science. 20. Mention two industries where Data Science is widely applied. 21. Explain the concept of Data Science. How is it different from traditional data analysis? 22. What is the hype around Big Data and Data Science, and how can we move beyond it? 23. Discuss the key reasons for the rise of Data Science in recent years. 24. What is Datafication? Explain how it is transforming industries with real-world examples. 25. Describe the current landscape of Data Science. Include key trends and roles in the field. 26. Differentiate between structured, semi-structured, and unstructured data with examples. 27. What are the core elements of data? Explain their significance in Data Science. 28. Discuss the essential skill sets required for a Data Scientist. Support your answer with examples and tools. 29. How does Data Science integrate with other domains (e.g., business, healthcare, finance)? Provide examples. 30. Analyze how the types of data impact data analysis techniques in Data Science. 31. Explain the 5 Vs of Big Data with examples. 32. Differentiate between a Data Analyst and a Data Scientist. 33. How has Data Science evolved over the years? 34. Explain the importance of interdisciplinary skills in Data Science. 35. What are the key components of the Data Science life cycle? 36. Explain the challenges faced in Big Data analysis. 37. Compare traditional databases with Big Data technologies. 38. How does Data Science contribute to decision-making? 39. Discuss ethical concerns in Data Science. 40. What are some popular tools and languages used in Data Science? Discuss their use. Unit-II 1. Define statistical modeling. 2. What is a probability distribution? 3. Give two examples of continuous probability distributions. 4. What is meant by 'fitting a model' to data? 5. State the difference between a probability mass function (PMF) and a probability density function (PDF). 6. What does the term 'parameter estimation' mean in the context of statistical modeling? 7. Write down the formula of the normal distribution. 8. What does a goodness-of-fit test check for? 9. Explain the concept of statistical modeling. Discuss the steps involved in building a statistical model. 10. Describe different types of probability distributions and explain how they are used in modeling real-world data. 11. With examples, differentiate between discrete and continuous probability distributions. 12. Discuss the process of fitting a probability distribution to a given dataset. Include a description of methods like method of moments or maximum likelihood estimation. 13. Explain the role of probability distributions in statistical modeling. How do we choose an appropriate distribution for a dataset? 14. Illustrate with an example how a normal distribution can be used to model real-world phenomena. 15. Discuss model evaluation techniques after fitting a statistical model. Include residual analysis and goodness-of-fit tests. 16. Explain the importance of assumptions in statistical modeling and the implications if these assumptions are violated. 17. What is the difference between a model parameter and a variable? 18. Name two methods of parameter estimation. 19. What is the purpose of the Central Limit Theorem in statistical modeling. 20. Define overfitting in the context of model fitting. 21. What is meant by ‘residual’ in model fitting? 22. State one real-world scenario where a Poisson distribution is applicable. 23. Explain Maximum Likelihood Estimation (MLE) and show how it is used to estimate parameters of a distribution. 24. Describe the process of selecting an appropriate statistical model for a given dataset. 25. Compare and contrast the Binomial and Poisson distributions. When would you use each? 26. Explain multiple linear regression in detail. Also determine the estimates of regression coefficient for following set up and write interpretation of this: 2 7 5 24 𝛽 = [𝛽$ , 𝛽& , 𝛽' ], 𝑋 = *3 1 82 , 𝑌 = *142 1 3 4 32 27. Consider the following data 1 2 5 𝑋 = 52 36 , 𝑌 = 57 6 3 4 9 (i) Calculate the regression coefficient 𝛽89 and 𝛽8$ using the least square method. (ii) Interpret the results. 28. Consider the following data 1 2 𝑋 = 52 6 , 𝑌 = 54 6 3 6 (1) Calculate the regression coefficient 𝛽8$ and intercept 𝛽89 using the formula. (2) Write the regression equation and predict Y when X = 5 29. Consider the following dataset 1 2 1 6 𝑋 = *2 3 22 , 𝑌=*82 3 4 3 10 (a) Calculate the regression coefficients 𝛽9 , 𝛽$ and 𝛽& using the least squares method. (b) Interpret the results. 30. Consider the following data 1 2 7 𝑋 = 52 36 , 𝑌=586 3 4 10 (i) Calculate the regression coefficient 𝛽89 and 𝛽8$ and 𝛽8& using the least square method. (ii) Write the regression coefficient and interpret the coefficient. Note*: Numerical from probability distributions can refer from the PDF already provided on Moodle Unit-III 1. Define a Data Warehouse. 2. What is the main purpose of a data warehouse in an organization? 3. Name any two characteristics of a data warehouse. 4. Differentiate between OLTP and OLAP. 5. What is an Operational Data Store (ODS)? 6. List two differences between a Data Mart and a Data Warehouse. 7. What does EDW stand for? 8. What is meant by a Data Warehouse Appliance? 9. Mention one key benefit of using a Data Mart. 10. Which type of data warehouse is best suited for real-time operational reporting? 11. Explain the concept of a Data Warehouse. Describe its key characteristics and the role it plays in decision-making. 12. Compare and contrast EDW (Enterprise Data Warehouse), ODS (Operational Data Store), and Data Mart in terms of purpose, scope, and data usage. 13. Discuss the architecture of a data warehouse. Explain the process of ETL (Extract, Transform, Load). 14. What are Data Warehouse Appliances? Describe their advantages and give examples of popular appliances. 15. Differentiate between OLTP and OLAP systems with examples. How are they related to data warehousing? 16. Describe the types of data marts (dependent, independent, hybrid) with suitable examples. 17. Explain how an ODS is used in real-time analytics. What makes it different from traditional data warehouses? 18. Discuss the importance of metadata in a data warehouse. What types of metadata are typically maintained? 19. Explain the advantages and disadvantages of building an EDW over using multiple Data Marts. 20. Describe the role of data warehousing in Business Intelligence. How does it support decision-making and reporting? Unit-IV 1. Define Exploratory Data Analysis (EDA). 2. What is the philosophy of EDA according to John Tukey? 3. List two graphical tools used in EDA. 4. What is the purpose of using summary statistics in EDA? 5. Name the main stages of the Data Science process. 6. What role does visualization play in EDA? 7. Mention one challenge in data cleaning during EDA. 8. What type of data visualization would be suitable for understanding the distribution of housing prices? 9. What is a box plot used for in EDA? 10. How did EDA help Real Direct understand user engagement patterns? (Case-specific) 11. What is the first step in any EDA process? 12. How is a scatter plot useful in EDA? 13. What is meant by detecting outliers in EDA? 14. Give an example of a summary statistic and what it tells us. 15. What type of data issue can a boxplot help detect? 16. Why is data cleaning important before performing EDA? 17. What is the main purpose of the data science process? 18. How can EDA guide feature selection in modeling? 19. Explain the philosophy and importance of Exploratory Data Analysis in the Data Science process. 20. Describe the main tools and techniques used in EDA, with appropriate examples. 21. Discuss the steps of the Data Science process and explain how EDA fits into the process. 22. Using a case study of Real Direct, explain how EDA was applied to gain insights into the real estate data. 23. Compare and contrast different plotting techniques (histograms, boxplots, scatterplots) used in EDA. 24. How do summary statistics assist in understanding a dataset during the EDA stage? Illustrate with examples. 25. Discuss the challenges faced during EDA and how they were addressed in the Real Direct case. 26. Explain how EDA supports decision-making in business, using examples from the real estate domain. 27. Discuss how visual and statistical summaries complement each other in EDA. 28. Explain the role of EDA in the overall data science process, with examples. 29. Explain how EDA can help improve a real estate recommendation system using the Real Direct case. Unit-V 1. Define Artificial Intelligence (AI). 2. How is Machine Learning (ML) different from traditional programming? 3. Name two popular supervised machine learning algorithms. 4. What is Deep Learning (DL)? 5. Give an example of AI used in daily life. 6. What is the role of algorithms in Machine Learning? 7. What is a neural network? 8. Differentiate between classification and regression problems. 9. What is the main difference between AI and ML? 10. Mention two real-world applications of Machine Learning. 11. What does "training data" mean in machine learning? 12. What is overfitting in machine learning? 13. What kind of tasks is deep learning particularly good at? 14. Name one deep learning architecture used for image classification. 15. What is the purpose of the activation function in a neural network? 16. Give an example of a reinforcement learning environment. 17. Differentiate between Artificial Intelligence, Machine Learning, and Deep Learning with examples. 18. Explain different types of Machine Learning algorithms with suitable examples. 19. Describe the architecture of a basic neural network used in Deep Learning. 20. Explain supervised and unsupervised learning with real-life examples. 21. Describe the key components of a Machine Learning system and their functions. 22. Compare rule-based AI systems with learning-based AI systems. Provide examples.