Big Data Bank PDF
Document Details
Uploaded by EvaluativeProse
Tanta University
Tags
Summary
This document contains a set of questions and answers about big data concepts and technologies. It covers various topics within the field of big data, including data types, processing frameworks, tools, and applications.
Full Transcript
Q1 Which of the following is a key advantage of a data lake? A- Only stores structured data B- Can only handle small datasets C- Requires data schema before loading D-Provides flexible storage for structured, semi-structured, and unstructured data Q2 What is the importance of "value" in Big Dat...
Q1 Which of the following is a key advantage of a data lake? A- Only stores structured data B- Can only handle small datasets C- Requires data schema before loading D-Provides flexible storage for structured, semi-structured, and unstructured data Q2 What is the importance of "value" in Big Data? A- Ensures data processing speed B- Helps make data actionable and relevant C- Focuses on the structure of the data D- Limits the variety of data formats Q3 What role does Big Data play in the media industry? A- Improves broadcasting signal strength B- Manages offline marketing campaigns C- Personalizes recommendations based on user behavior D- Stores physical copies of content Q4 What is the key difference between structured and unstructured data in Big Data? A- Structured data is well-organized and stored in databases, whereas unstructured data lacks a defined format. B- Structured data is encrypted, while unstructured data is not. C- Structured data only includes text, while unstructured data includes multimedia. D- Unstructured data is faster to analyze than structured data. Q5 Which Big Data challenge does data integration address? A- Combining data from multiple sources into a unified view B- Removing duplicate records C- Encrypting sensitive data D- Reducing storage costs Q6 What does the 'Volume' aspect of Big Data refer to? A- The speed of data generation B- The variety of data types C- The sheer amount of data D- The accuracy of data Q7 What is a key benefit of Big Data analysis? A- Reduced hardware requirements B- Improved decision-making C- Limited data storage D- Lower cost of implementation Q8 Which of the following is the best description of Big Data? A small dataset processed using traditional tools B- Data that requires new forms of processing due to its size, variety, or speed C- Data stored in SQL databases D- Data collected from social media platforms Q9 Variety in Big Data means data comes only from text-based sources. A- true B- false Q10 What is the role of a DataNode in HDFS? A- To manage the metadata B- To store actual data blocks C- To manage the NameNode D- To perform data compression Q11 Which of the following technologies is often used for storing unstructured data in Big Data environments? A- SQL databases B- Relational databases C- NoSQL databases D- In-memory databases Q12 Big Data analytics can help predict customer preferences in the retail industry. A- true B- false Q13 Which of the following describes “data lake”? A- A centralized repository that stores raw data in its native format B- A tool for cleaning and preparing data C- A system for real-time data visualization D- A technique to back up data securely Q14 Which Big Data tool is specifically designed for scalable machine learning? A- Apache Mahout B- Apache Cassandra C- Apache Hadoop A- D- Google Analytics Q15 Which of the following is an example of unstructured data? A- An Excel spreadsheet B- A YouTube video C- A SQL table D- A customer database Q16 What is a common challenge related to the 'Variety' aspect of Big Data? A- Maintaining data privacy B- Analyzing different data formats C- Ensuring data consistency D- Reducing data size Q17 How does the 'Velocity' of Big Data impact data processing? A- It slows down data generation B- It increases the need for real-time processing C- It reduces the variety of data sources D- It has no significant effect on processing Q18 Veracity refers to the speed of data generation. A- True B- False Q19 What is the significance of machine learning in Big Data analytics? A- Automates pattern detection and predictive modeling B- Simplifies data storage C- Creates manual algorithms for data visualization D- Replaces the need for databases Q20 NoSQL databases are better suited for handling structured data only. A- true B- false Q21 How does Apache Pig handle large datasets in a distributed environment? A- By using a SQL-like language B- By using in-memory storage C- By parallel processing across nodes D- By replicating data Q22 Which programming language is commonly used to write Pig Latin scripts? A- Java B- Python C- Pig Latin D- Scala Q23 What is the primary purpose of Apache Hive? A- Distributed file storage B- Data querying C- Real-time analytics D- Data visualization Q24 How does the Shuffle and Sort phase contribute to the MapReduce process? A- It compresses data B- It arranges data in a specific order C- It distributes data evenly across reducers D- It aggregates data Q25 What is Apache Spark primarily used for? A- Real-time processing B- Batch processing C- Data storage D- File compression Q26 What is the purpose of partitioning in the MapReduce framework? A- To increase replication B- To group key-value pairs for processing C- To reduce data size D- To increase memory allocation Q27 Which of the following is a key characteristic of the MapReduce framework? A- Real-time processing B- Distributed processing C- Sequential processing D- In-memory processing Q28 In MapReduce, what is the function of the Reduce phase? A- To combine intermediate data B- To map data to key-value pairs C- To sort data D- To store data Q29 What is the primary function of the Map phase in MapReduce? A- Sort data B- Filter data C- Map data to key-value pairs D- Aggregate data Q30 How does MapReduce handle large data sets efficiently in Hadoop? A- By performing real-time analytics B- By splitting tasks across multiple nodes C- By using in-memory storage D- By compressing data Q31 Which of the following components is responsible for running a Hadoop job? A- NameNode B- DataNode C- YARN D- HDFS Q32 In which sector does Big Data play a critical role in reducing fraud and ensuring secure transactions? A- Retail B- Banking and Finance C- Education D- Healthcare Q33 Which machine learning algorithm is best suited for classifying data into distinct categories? A- K-means clustering B- Linear regression C- Decision tree D- K-nearest neighbors Q34 Which of the following is a common tool used for data analytics in Big Data? A- Apache Spark B- MongoDB C- HDFS D- NoSQL Q35 What is a key feature of NoSQL databases like Cassandra and MongoDB? A- Strong consistency B- Vertical scalability C- Horizontal scalability D- Limited data types Q36 What is the primary goal of data analytics in Big Data? A- Data visualization B- Data encryption C- Insight discovery D- Data storage Q37 What is the primary advantage of using NoSQL databases over relational databases? A- ACID compliance B- Scalability C- Data normalization D- Primary key usage Q38 How does Cassandra ensure high availability in a distributed environment? A- By using a master-slave architecture B- By replicating data across nodes C- By using in-memory storage D- By compressing data Q39 What is one of the most significant challenges in Big Data analytics? A- Lack of storage solutions B- Managing unstructured and semi-structured data C- Limited application across industries D- Excessive automation of processes Q40 How does unsupervised learning differ from supervised learning? A -It uses labeled data B- It is used for classification C- It does not require labeled data D- It does not handle large datasets Q41 What does “scalability” mean in Big Data systems? A- The ability to process only structured data B- Converting data into structured formats C- Reducing hardware size D- The capacity to handle increasing amounts of data and computation Q42 Identify the slave node among the following. A- Job node B- Data node C- Task node D- Name node Q43 The total forms of big data is ____ A- 1 B- 2 C- 3 D- 4 Q44 Which of the following is true about big data? A- Big data can be processed using traditional techniques B- Big data refers to data sets that are at least a petabyte in size C- Big data analysis does not involve reporting and data mining techniques D- Big data has low velocity meaning that it is generated slowly Q45 What is the minimum amount of data that a disk can read or write in HDFS? A- Byte size B- Block size C- Heap D- None of the above Q46 All of the following accurately describe Hadoop, except A- Open source B- Java-based C- C- Real-time D- Distributed computing approach Q47 Identify the node which acts as a checkpoint node in HDFS. A- Secondary Name node B- Secondary data node C- Name node D- Data node Q48 Identify the incorrect big data Technologies. A- Apache Pytorch B- Apache Kafka C- Apache Hadoop D- Apache Spark Q49 Choose the primary characteristics of big data among the following A- Value B- Variety C- Volume D- All of the above Q50 Data in _____ bytes size is called big data. A- Meta B- Giga C- Tera D- Peta Q51 ____ is data about data. A- HDFS B- MapReduce C- YARN D- All of the above Q52 What is the use of data cleaning? A- To remove the noisy data B- Transformations to correct the wrong data. C- Correct the inconsistencies in data D- All of the above Q53 Which of the following are the Benefits of Big Data Processing? A- Businesses can utilize outside intelligence while taking decisions. B- Better operational efficiency C- Improve customer service D- All of the above Q54 Transaction of data of the bank is a type of. A- Unstructured data B- Structured data C- Both a and b D- None of the above Q55 Total V's of big data is A- 3 B- 4 C- 5 D- 6 Q56 In which language is Hadoop written? A- C++ B- Java C- Rust D- Python Q57 Identify among the options below which is general-purpose computing model and runtime system for Distributed Data Analytics. A- HDFS B- MapReduce C- Oozie D- All of the above Q58 What is one of the main advantages of predictive analytics in Big Data? A- Increasing storage space for data B- Forecasting future trends and customer behavior C- Automating all customer support operations D- Focusing on offline sales only Q59 What aspect of Big Data allows companies like Netflix to provide personalized recommendations? A- Velocity B- Veracity C- Pattern recognition and user behavior analysis D- Data encryption Q60 Structured data is easier to analyze and store than unstructured data. A- true B- false Q61 Which attribute of big data involves an exponential data growth rate? A- variety B- value C- volume D- velocity Q62 Just collecting and storing information isn't enough to produce real business value. Big data analytics technologies are necessary to: A- Formulate eye-catching charts and graphs B- Extract valuable insights from the data C- Integrate data from internal and external sources D- Determine business goals and objectives Q63 One definition of Big Data stated that BIG DATA is/are "information that can be processed or analyzed using traditional processes or tools" A- True B- False Q64 Society is both the generator and the consumer of "Big Data". A- True B- False Q65 Data stored in an e-mail system would most likely be structured data A- True B- False Q66 Hadoop processes data serially rather than in parallel as in the past A- True B- False Q67 What is the default replication factor in Hadoop's distributed file system (HDFS)? A- 1 B- 2 C- 3 D-4 Q68 What does the Shuffle phase in MapReduce involve? A- Data compression B- Data distribution to reducers C- Data sorting and grouping D- Data storage in HBase Q69 In how many stages the MapReduce program executes? A- 2 B- 3 C- 4 D- 5 Q70 Identify the type of learning in which labeled training data is used. A- Semi unsupervised learning B- Supervised learning C- Reinforcement learning D- Unsupervised learning Q71 identify the one which is not a type of learning A- Semi unsupervised learning B- Supervised learning C- Reinforcement learning D- Unsupervised learning Q72 The NameNode is responsible for maintaining the file system namespace in HDFS. A- True B- False Q73 What is the role of the NameNode in HDFS? A- Store data files B- Manage metadata and file system namespace C- Perform data replication D- Process MapReduce jobs Q74 The NameNode stores actual data blocks. A- True B- False Q75 What is the main function of the Secondary NameNode? A- Act as a backup for the NameNode B- Manage DataNode communication C- Perform periodic checkpointing D- Facilitate MapReduce operations Q76 What is the main purpose of a distributed file system in Big Data? A- Creating a single database for all data B- Encrypting sensitive information C- Storing large datasets across multiple machines while enabling parallel processing D- Visualizing data in dashboards Q77 Data replication in HDFS enhances fault tolerance. A- True B- False Q78 HDFS read and write mechanisms are entirely managed by the Secondary NameNode. A- True B- False Q79 During the HDFS write process, acknowledgments are sent: A- From NameNode to DataNode B- From DataNode to Client C- From the last DataNode back to the client through the pipeline D- Directly to the client from each DataNode Q80 The primary purpose of MapReduce is to: A- Manage file system replication B- Process large-scale data in parallel C- Provide a backup for HDFS D- Configure YARN Q81 Which of the following is an example of a MapReduce job? A- Data replication B- Word count program C- NameNode checkpointing D- Data compression Q82 What does the Driver code in MapReduce handle? A- Managing NameNode B- Coordinating Mapper and Reducer tasks C- Data storage in HDFS D- Reducing network latency Q83 What is the purpose of data visualization in Big Data? A- Compressing large datasets B- Representing data insights graphically for better understanding C- Storing data in relational databases D- Backing up data on cloud systems Q84 What is the output format of the Mapper in MapReduce? A- Key-value pairs B- XML files C- Tabular data D- Raw text Q85 The Mapper and Reducer tasks can run on different nodes. A- True B- False Q86 MapReduce is not fault-tolerant. A- True B- False Q87 What does “NoSQL” mean in Big Data technologies? A- It is a query language used in traditional databases B- It supports flexible and non-relational data models C- It prevents data from being analyzed D- It restricts access to structured data only Q88 Hadoop Distributed File System is optimized for: A- High-speed computation B- Storage of small files C- Write-once-read-many operations D- In-memory processing Q89 The replication factor in HDFS is configurable to enhance: A- Read speed B- Write speed C- Fault tolerance D- Data compression Q90 Machine learning is a field that gives computers the ability to learn without: A- Accessing data B- Human intervention C- Being explicitly programmed D- Complex algorithms Q91 A decision tree always generates a unique solution for a dataset. A- True B- False Q92 In a decision tree, leaf nodes have multiple outgoing edges. A- True B- False Q93 Supervised learning requires labeled data for training. A- True B- False Q94 Clustering is a supervised learning method. A- True B- False Q95 An example of a supervised learning algorithm is: A- Decision Tree B- K-Means Clustering C- Reinforcement Agent D- Random Walk Q96 In Hadoop, data is replicated for reliability. A- True B- False Q97 The primary storage system in Hadoop is called: A- HBase B- Hadoop Distributed File System (HDFS) C- Apache Spark D- ZooKeeper Q98 Which Big Data characteristic is illustrated by the ability to analyze millions of customer transactions per hour? A- Velocity B- Volume C- Variety D- Veracity Q99 What is a major challenge faced in the field of Big Data due to "variety"? A- Processing high-speed data streams B- Managing structured, semi-structured, and unstructured data formats C- Handling unreliable data sources D- Extracting value from redundant datasets Q100 Apache Hive allows for SQL-like querying on data in Hadoop. A- True B- False Q101 Apache Pig is used for data compression in Hadoop. A- True B- False Q102 Which Big Data tool is used for real-time stream processing? A- Apache Kafka B- Oracle Database C- Google BigQuery D- Tableau Q103 Which tool is used to transfer data between Hadoop and relational databases? A- Apache Pig B- Apache Flume C- Sqoop D- Apache Kafka Q104 Which Hadoop mode runs completely on a single machine? A- Distributed Mode B- Standalone Mode C- Pseudo-Distributed Mode D- Cluster Mode Q105 Big Data is only about volume, not velocity or variety. A- True B- False Q106 How does Big Data help in the healthcare industry? A- Reduces the need for doctors B- Improves life quality and predicts disease outbreaks C- Focuses on building new hospitals D- Replaces manual record-keeping Q107 MapReduce can handle structured, unstructured, and semi-structured data. A- True B- False Q108 What does "Variety" in Big Data refer to? A- The speed of data generation B- The accuracy of data C- Different types of data sources D- The value derived from data Q109 Apache Hive is primarily used for: A- Real-time data streaming B- Data warehousing and SQL-like queries C- Machine learning D- Data visualization Q110 "Volume" in Big Data refers to the variety of data types. A- True B- False Q111 Which of the following statements is true about the relationship between Big Data and traditional data processing? A- Big Data can always be processed with traditional methods B- Traditional methods can handle the velocity of Big Data C- Traditional methods struggle with the volume and variety of Big Data D- There is no difference between Big Data and traditional data Q112 Social media platforms are significant sources of Big Data. A- True B- False Q113 The concept of "Veracity" in Big Data addresses: A- Trustworthiness and uncertainties in data B- The speed of data processing C- The variety of data sources D- The volume of data Q114 Which tool is NOT typically associated with Big Data processing? A- Apache Hadoop B- Oracle RDBMS C- Apache Spark D- Apache Hive Q115 What does "predictive modeling" in Big Data refer to? A- Storing past data for future use B- Analyzing data at rest C- Visualizing real-time customer data D- Using algorithms to forecast future trends Q116 What does the 'Velocity' characteristic of Big Data refer to? A- The amount of data B- The speed at which data is generated C- The different types of data D- The source of data Q117 What type of data does the 'Variety' aspect of Big Data encompass? A- Structured B- Unstructured C- Both structured and unstructured D- Neither Q118 A Big Data pipeline is slowing down due to an excessive amount of incoming data. Which aspect of the '3Vs' is causing this issue? A- Volume B- Velocity C- Variety D- Value Q119 What is the main purpose of data preprocessing in Big Data pipelines? A- Storing the data securely B- Cleaning and organizing raw data for analysis C- Visualizing the data trends D- Eliminating duplicate datasets entirely Q120 Apache Pig is used for writing complex scripts to process data in Hadoop. A- True B- False