Final Review Business Analysis and Data Mining PDF

Summary

This document contains multiple-choice questions related to Business Analysis and Data Mining, covering topics such as data mining techniques, data preprocessing, and supervised/unsupervised learning. The questions focus on practical applications.

Full Transcript

Business Analysis and Data Mining Multiple Choice Questions 1) Which of the following is a challenge faced when applying data mining techniques to sensitive fields? a) Incomplete data b) Data distribution c) Data analysis d) Privacy concerns 2) When conducting data m...

Business Analysis and Data Mining Multiple Choice Questions 1) Which of the following is a challenge faced when applying data mining techniques to sensitive fields? a) Incomplete data b) Data distribution c) Data analysis d) Privacy concerns 2) When conducting data mining, which analysis type is primarily used to answer questions about past occurrences? a) Descriptive analysis b) Predictive analysis c) Prescriptive analysis d) Time series analysis 3) By integrating business analysis and data mining, organizations can significantly improve their overall performance by: a) A competitive b) Increased operational c) Improved customer d) All the above edge efficiency satisfaction 4) Which of the following processes involves summarizing given data or transforming it into relevant information? a) Classification b) Descriptive mining c) Predictive mining d) Data mining 5) A predictive model in data mining typically involves: a) Classification b) Prediction c) Time series analysis d) All of the above 6) Which technique in machine learning involves dividing data into multiple sets for training and testing a model, ensuring its quality and generalizability to new data? a) Data transformation b) Cross-validation c) Algorithm evaluation d) Data processing 7) Which data mining technique is specifically designed to analyze past data and forecast future trends or behaviors within a business context? a) Classification b) Descriptive mining c) Predictive mining d) Data mining 8) Which step involves handling outliers, removing inconsistencies, and eliminating duplicate entries within the data? a) Data transformation b) Data preprocessing c) Dataset selection d) Data collection 9) In which type of machine learning is the model trained on a dataset where both the inputs and desired outputs are provided? a) Machine learning b) Supervised learning c) Unsupervised learning d) Classification 10) Which data mining technique involves using machine learning algorithms to categorize data into predefined groups? a) Classification b) Predictive c) Cluster analysis d) Summarization 11) In the context of business analytics and data mining, which algorithms are commonly used for supervised learning tasks? a) Linear regression b) Decision tree c) Random forest d) All of the above 12) Which AI subfield is dedicated to the development of algorithms that allow computers to learn from data, identify patterns, and make predictions? a) Machine learning b) Data science c) Business intelligence d) Data mining 13) Which technology completely replaces the user's real-world environment with a computer-generated one? a) Augmented reality b) Actual reality c) Variable reality d) Virtual reality 14) Which technology adds digital elements (images, information, sounds) to the real world in real time ? a) Augmented reality b) Actual reality c) Variable reality d) Virtual reality 15) Which statistical method is primarily used to model the relationship between a dependent variable and one or more independent variables? a) Prediction b) Linear regression c) Clustering d) Time series analysis 16) In unsupervised learning, which algorithm is widely used for grouping similar data points? a) K-Means b) c) d) 17) When data mining, which technique is used to find connections between distinct datasets, where conditional statements highlight the potential link between data points? a) Clustering b) Prediction c) Association rule mining d) Linear regression 18) In business Analysis and data mining, which machine learning approach is used to discover hidden patterns in unlabeled data? a) Machine learning b) Supervised learning c) Unsupervised learning d) b) Prediction 19) How does business analytics contribute to improving organizational performance? Page 1 of 5 a) By producing daily b) Discovering hidden c) By reducing the d) Reducing employee reports patterns and workforce morale challenges 20) Which process involves grouping a set of objects so that objects within the same group are more like each other than those in other groups? a) Prediction b) Association rules c) Classification d) Clustering 21) Which data mining tool offers a comprehensive platform for conducting various data mining tasks, including clustering, classification, and visualization? a) SAS b) RapidMiner c) Rattle d) All of the above 22) Leveraging market forecasts to recommend optimal marketing strategies falls under the category of... a) Descriptive analytics b) Predictive analytics c) Prescriptive analytics d) Link Analysis 23) What is the primary challenge that data mining techniques face when applied to search engines? a) Optimal index b) Data volume c) Query diversity d) Options B and C organization 24) What does business analytics (BA) primarily focus on? a) Functional analysis b) Data and statistical c) Managing customer d) Product lifecycle analysis relationships analysis 25) Which analytics type is the most advanced and simulates human-like intelligence? a) Predictive b) Diagnostic c) Cognitive d) Descriptive 26) What is an attribute in data? a) A type of chart. b) A table containing c) A property or d) A numerical numbers. characteristic of a value data object. 27) Which methodology builds a comprehensive enterprise data warehouse first? a) SQL Methodology b) Inmon Methodology c) Kimball Methodology d) JSON Methodology 28) 3. Which component of the data warehouse identifies data sources, values, and usage? a) Central Database b) Metadata c) Data Integration d) Access Tools 29) What is the difference between OLAP and OLTP? a) Both are used for b) OLAP is for data c) OLAP is for d) There is no real-time data analysis, and OLTP is transactions, and difference. analysis. for transactions. OLTP is for analysis. 30) What is the primary difference between a data warehouse and a data mart? a) A data mart includes b) A data warehouse c) A data mart is based d) A data warehouse is multiple warehouses. serves specific on a relational design comprehensive, while departments. a data mart is department. 31) Which type of system is used in retail to handle transactions at the point of sale? a) POS b) TPS c) ROLAP d) MOLAP 32) Which process is typically used when high data quality and transformation control are essential? a) DDL b) ELT c) ETL d) EDL 33) Which data layer in a data warehouse architecture is responsible for managing transactional data? a) Semantics Layer b) Data Layer c) Metadata Layer d) Analytics Layer 34) ETL processes are used when we need: a) High-quality data b) Full control over the c) Moderate amounts of d) All of the above requirements. transformation data. process. 35) What type of data cannot be used for arithmetic operations? a) Nominal data b) Quantitative data c) Boolean data d) Discrete data 36) Which type of chart is used to analyze the relationship between two variables? a) Scatter plots b) Pie charts c) Radar charts d) Line charts 37) ……….are data that take specific numerical values without fractions. a) Nominal data b) Discrete data c) Quantitative data d) Boolean data 38) Graph databases are suitable for applications like: a) Social networks b) Financial transactions c) Text processing d) Record storage 39) What is the difference between batch data ingestion and real-time ingestion? Page 2 of 5 a) Batch relies on b) Batch is more c) Real-time takes less d) There is no scheduled intervals. efficient time difference 40) Cognitive analysis is inspired by which of the following? a) Automated b) Human intelligence c) Mathematical d) Financial models processes equations 41) Which database stores data in rows and dynamically expandable columns? a) Graph b) Columnar c) Document d) Relational databases databases databases databases 42) Which of the following is not a type of relational database? a) Columnar b) Key-Value c) Document d) All of the databases databases databases above 43) What is the primary function of OLAP? a) Store data from b) Perform real-time c) Aggregate data for d) Create machine various sources data transactions strategic insights learning models 44) What is the primary purpose of a Box Plot? a) To display data b) To compare multiple c) To show the range of d) To illustrate the trends over time datasets across values and identify relationship between different categories outliers two variables 45) Which of the following is a disadvantage of data warehouses? a) High setup and b) Low storage c) High data d) Real-time maintenance costs requirements flexibility data updates 46) Which advantage is specific to ETL over ELT? a) High speed b) Greater c) Low cost d) Direct data loading transformation control 47) Which system supports multiple transactions without significant performance impact? a) OLAP b) OLTP c) MOLAP d) ELT 48) In the ELT process, where are transformations usually performed? a) Data source b) Middleware c) Client system d) Target system 49) Which type of OLAP involves creating a data cube for fast analysis? a) MOLAP b) ROLAP c) HOLAP d) DOLAP 50) What does a "hot" data lake primarily store? a) Infrequently b) Real-time data c) Archival data d) Semi-structured accessed data data Choose "T" for the correct answer and "F" for the incorrect 1. Data mining is also referred to as knowledge discovery in databases (KDD). (√) 2. Time series analysis refers to datasets that are based on chronological time sequences. (√) 3. In unsupervised learning, the model learns from labeled data with predefined features and targets. (F) 4. In supervised learning, predictive or classification models aim to forecast a target value or classify new data. (√) 5. Market basket analysis utilizes association rules in its methodology. (√) 6. Algorithms in unsupervised learning attempt to discover hidden patterns and relationships in data independently, without human intervention. (√) 7. In the KDD process model the step involving algorithm performance evaluation and pattern interpretation is the final phase. (F) 8. Ten-fold cross-validation is a specific type of cross-validation where data is divided into nine nearly equal parts. (F) 9. In cross-validation, each partition of the dataset is used as test data once, while the remaining partitions serve as training data. (√) 10.The CRISP-DM process model is one of the knowledge discovery models widely used in industrial domains. (√) 11.Search engines heavily rely on advanced data mining techniques. (√) Page 3 of 5 12.The applications of data mining techniques are diverse and include fraud detection and manufacturing engineering. (√) 13.Data mining operations face challenges when dealing with incomplete or noisy data, which affects result accuracy. (√) 14.Business analytics is the strategic application of insights derived from data mining to achieve organizational goals. (√) 15.Descriptive analytics uses tools such as reports and dashboards to analyze historical data. (√) 16.Predictive analytics relies on statistical models and machine learning techniques to forecast future trends. (√) 17.Prescriptive analytics provides recommendations based on predictive analytics to enhance performance. (√) 18.The Internet of Things (IoT) has no significant impact on the future of business analytics and data mining techniques. (F) 19.Analytics involves the discovery and communication of patterns in data. (√) 20.Business Analytics is a subset of Data Analytics. (√) 21.Descriptive analytics aims to recommend future actions. (×) 22.Business analysis mainly focuses on statistical and data-driven analysis. (×) 23.Prescriptive analytics uses optimization techniques to suggest courses of action. (√) 24.Business analysis mainly focuses on statistical and data-driven analysis. (×) 25.Object-oriented databases store data in the form of objects. (√) 26.Data warehouses are generally designed for historical data analysis rather than real-time processing. (√) 27.The Kimball methodology for data warehouses is known for its rapid development approach. (√) 28.Inmon methodology promotes a single central point for data management, aiming for data consistency. (√) 29.Independent data marts are connected directly to a data warehouse. (×) 30.Data marts are designed to store data relevant to multiple departments within an organization. (×) 31.Data lakes store data in a structured format only. (×) 32.A data mart is always a subset of a data warehouse. (×) 33.OLAP technology is used for managing daily transactions. (×) 34.MOLAP provides slower query performance than ROLAP. (×) 35.OLTP is designed for real-time transaction processing. (√) 36.Power BI is a data visualization tool developed by Google. (×) 37.Pie charts are typically used to show the percentage distribution of categories. (√) 38.ROLAP relies on SQL queries for multidimensional analysis. (√) 39.Qualitative data is also known as categorical data. (√) 40.Continuous data can take any value within a range, including decimal values. (√) 41.Scatter plots are useful for displaying the relationship between two variables. (√) 42.MOLAP involves creating a data cube that represents multidimensional data from a data warehouse. (√) 43.A cloud database is a traditional database benefiting from cloud computing features. (√) Question 3: Essay question ( 10marks) Explain with a diagram the steps for extracting knowledge from databases and clarify the sub-steps in each basic step. Answer The process of Knowledge Discovery in Databases (KDD) involves systematically extracting meaningful patterns and insights from large datasets. It is a multi-step process that transforms raw data into actionable knowledge. Below are the main steps and their sub-steps: Page 4 of 5 1. Data Selection In this step, relevant data is identified and selected from the database. Sub-steps: o Identify Objectives: Define the business or research goals. o Data Scope: Specify the datasets required for the task. o Filter Data: Select only the data that is relevant to the goals. 2. Data Preprocessing This step involves cleaning and preparing the data to ensure its quality and consistency. Sub-steps: o Data Cleaning: Handle missing values, outliers, and errors. o Integration: Combine data from multiple sources if needed. o Normalization: Scale the data to standard formats. o Reduction: Simplify the dataset by removing redundancies. 3. Data Transformation Transform the processed data into a format suitable for mining. Sub-steps: o Attribute Selection: Identify relevant features or attributes. o Discretization: Convert continuous attributes into discrete ones. o Data Encoding: Format categorical data for processing. 4. Data Mining Apply algorithms to discover patterns or relationships in the data. Sub-steps: o Algorithm Selection: Choose appropriate techniques (e.g., clustering, classification). o Model Building: Train and test the data mining model. o Evaluation: Assess the accuracy and performance of the model. 5. Interpretation and Evaluation Analyze the discovered patterns to extract actionable insights. Sub-steps: o Pattern Interpretation: Understand the meaning of the results. o Validation: Cross-check the results with domain knowledge. o Knowledge Representation: Present the findings in an interpretable format, such as visualizations or reports. Page 5 of 5