Big Data Bank PDF

Q1 Which of the following is a key advantage of a data lake? A- Only stores structured data B- Can only handle small datasets C- Requires data schema before loading D-Provides flexible storage for structured, semi-structured, and unstructured data Q2 What is the importance of "value" in Big Data? A- Ensures data processing speed B- Helps make data actionable and relevant C- Focuses on the structure of the data D- Limits the variety of data formats Q3 What role does Big Data play in the media industry? A- Improves broadcasting signal strength B- Manages offline marketing campaigns C- Personalizes recommendations based on user behavior D- Stores physical copies of content Q4 What is the key difference between structured and unstructured data in Big Data? A- Structured data is well-organized and stored in databases, whereas unstructured data lacks a defined format. B- Structured data is encrypted, while unstructured data is not. C- Structured data only includes text, while unstructured data includes multimedia. D- Unstructured data is faster to analyze than structured data. Q5 Which Big Data challenge does data integration address? A- Combining data from multiple sources into a unified view B- Removing duplicate records C- Encrypting sensitive data D- Reducing storage costs Q6 What does the 'Volume' aspect of Big Data refer to? A- The speed of data generation B- The variety of data types C- The sheer amount of data D- The accuracy of data Q7 What is a key benefit of Big Data analysis? A- Reduced hardware requirements B- Improved decision-making C- Limited data storage D- Lower cost of implementation Q8 Which of the following is the best description of Big Data? A small dataset processed using traditional tools B- Data that requires new forms of processing due to its size, variety, or speed C- Data stored in SQL databases D- Data collected from social media platforms Q9 Variety in Big Data means data comes only from text-based sources. A- true B- false Q10 What is the role of a DataNode in HDFS? A- To manage the metadata B- To store actual data blocks C- To manage the NameNode D- To perform data compression Q11 Which of the following technologies is often used for storing unstructured data in Big Data environments? A- SQL databases B- Relational databases C- NoSQL databases D- In-memory databases Q12 Big Data analytics can help predict customer preferences in the retail industry. A- true B- false Q13 Which of the following describes “data lake”? A- A centralized repository that stores raw data in its native format B- A tool for cleaning and preparing data C- A system for real-time data visualization D- A technique to back up data securely Q14 Which Big Data tool is specifically designed for scalable machine learning? A- Apache Mahout B- Apache Cassandra C- Apache Hadoop A- D- Google Analytics Q15 Which of the following is an example of unstructured data? A- An Excel spreadsheet B- A YouTube video C- A SQL table D- A customer database Q16 What is a common challenge related to the 'Variety' aspect of Big Data? A- Maintaining data privacy B- Analyzing different data formats C- Ensuring data consistency D- Reducing data size Q17 How does the 'Velocity' of Big Data impact data processing? A- It slows down data generation B- It increases the need for real-time processing C- It reduces the variety of data sources D- It has no significant effect on processing Q18 Veracity refers to the speed of data generation. A- True B- False Q19 What is the significance of machine learning in Big Data analytics? A- Automates pattern detection and predictive modeling B- Simplifies data storage C- Creates manual algorithms for data visualization D- Replaces the need for databases Q20 NoSQL databases are better suited for handling structured data only. A- true B- false Q21 How does Apache Pig handle large datasets in a distributed environment? A- By using a SQL-like language B- By using in-memory storage C- By parallel processing across nodes D- By replicating data Q22 Which programming language is commonly used to write Pig Latin scripts? A- Java B- Python C- Pig Latin D- Scala Q23 What is the primary purpose of Apache Hive? A- Distributed file storage B- Data querying C- Real-time analytics D- Data visualization Q24 How does the Shuffle and Sort phase contribute to the MapReduce process? A- It compresses data B- It arranges data in a specific order C- It distributes data evenly across reducers D- It aggregates data Q25 What is Apache Spark primarily used for? A- Real-time processing B- Batch processing C- Data storage D- File compression Q26 What is the purpose of partitioning in the MapReduce framework? A- To increase replication B- To group key-value pairs for processing C- To reduce data size D- To increase memory allocation Q27 Which of the following is a key characteristic of the MapReduce framework? A- Real-time processing B- Distributed processing C- Sequential processing D- In-memory processing Q28 In MapReduce, what is the function of the Reduce phase? A- To combine intermediate data B- To map data to key-value pairs C- To sort data D- To store data Q29 What is the primary function of the Map phase in MapReduce? A- Sort data B- Filter data C- Map data to key-value pairs D- Aggregate data Q30 How does MapReduce handle large data sets efficiently in Hadoop? A- By performing real-time analytics B- By splitting tasks across multiple nodes C- By using in-memory storage D- By compressing data Q31 Which of the following components is responsible for running a Hadoop job? A- NameNode B- DataNode C- YARN D- HDFS Q32 In which sector does Big Data play a critical role in reducing fraud and ensuring secure transactions? A- Retail B- Banking and Finance C- Education D- Healthcare Q33 Which machine learning algorithm is best suited for classifying data into distinct categories? A- K-means clustering B- Linear regression C- Decision tree D- K-nearest neighbors Q34 Which of the following is a common tool used for data analytics in Big Data? A- Apache Spark B- MongoDB C- HDFS D- NoSQL Q35 What is a key feature of NoSQL databases like Cassandra and MongoDB? A- Strong consistency B- Vertical scalability C- Horizontal scalability D- Limited data types Q36 What is the primary goal of data analytics in Big Data? A- Data visualization B- Data encryption C- Insight discovery D- Data storage Q37 What is the primary advantage of using NoSQL databases over relational databases? A- ACID compliance B- Scalability C- Data normalization D- Primary key usage Q38 How does Cassandra ensure high availability in a distributed environment? A- By using a master-slave architecture B- By replicating data across nodes C- By using in-memory storage D- By compressing data Q39 What is one of the most significant challenges in Big Data analytics? A- Lack of storage solutions B- Managing unstructured and semi-structured data C- Limited application across industries D- Excessive automation of processes Q40 How does unsupervised learning differ from supervised learning? A -It uses labeled data B- It is used for classification C- It does not require labeled data D- It does not handle large datasets Q41 What does “scalability” mean in Big Data systems? A- The ability to process only structured data B- Converting data into structured formats C- Reducing hardware size D- The capacity to handle increasing amounts of data and computation Q42 Identify the slave node among the following. A- Job node B- Data node C- Task node D- Name node Q43 The total forms of big data is ____ A- 1 B- 2 C- 3 D- 4 Q44 Which of the following is true about big data? A- Big data can be processed using traditional techniques B- Big data refers to data sets that are at least a petabyte in size C- Big data analysis does not involve reporting and data mining techniques D- Big data has low velocity meaning that it is generated slowly Q45 What is the minimum amount of data that a disk can read or write in HDFS? A- Byte size B- Block size C- Heap D- None of the above Q46 All of the following accurately describe Hadoop, except A- Open source B- Java-based C- C- Real-time D- Distributed computing approach Q47 Identify the node which acts as a checkpoint node in HDFS. A- Secondary Name node B- Secondary data node C- Name node D- Data node Q48 Identify the incorrect big data Technologies. A- Apache Pytorch B- Apache Kafka C- Apache Hadoop D- Apache Spark Q49 Choose the primary characteristics of big data among the following A- Value B- Variety C- Volume D- All of the above Q50 Data in _____ bytes size is called big data. A- Meta B- Giga C- Tera D- Peta Q51 ____ is data about data. A- HDFS B- MapReduce C- YARN D- All of the above Q52 What is the use of data cleaning? A- To remove the noisy data B- Transformations to correct the wrong data. C- Correct the inconsistencies in data D- All of the above Q53 Which of the following are the Benefits of Big Data Processing? A- Businesses can utilize outside intelligence while taking decisions. B- Better operational efficiency C- Improve customer service D- All of the above Q54 Transaction of data of the bank is a type of. A- Unstructured data B- Structured data C- Both a and b D- None of the above Q55 Total V's of big data is A- 3 B- 4 C- 5 D- 6 Q56 In which language is Hadoop written? A- C++ B- Java C- Rust D- Python Q57 Identify among the options below which is general-purpose computing model and runtime system for Distributed Data Analytics. A- HDFS B- MapReduce C- Oozie D- All of the above Q58 What is one of the main advantages of predictive analytics in Big Data? A- Increasing storage space for data B- Forecasting future trends and customer behavior C- Automating all customer support operations D- Focusing on offline sales only Q59 What aspect of Big Data allows companies like Netflix to provide personalized recommendations? A- Velocity B- Veracity C- Pattern recognition and user behavior analysis D- Data encryption Q60 Structured data is easier to analyze and store than unstructured data. A- true B- false Q61 Which attribute of big data involves an exponential data growth rate? A- variety B- value C- volume D- velocity Q62 Just collecting and storing information isn't enough to produce real business value. Big data analytics technologies are necessary to: A- Formulate eye-catching charts and graphs B- Extract valuable insights from the data C- Integrate data from internal and external sources D- Determine business goals and objectives Q63 One definition of Big Data stated that BIG DATA is/are "information that can be processed or analyzed using traditional processes or tools" A- True B- False Q64 Society is both the generator and the consumer of "Big Data". A- True B- False Q65 Data stored in an e-mail system would most likely be structured data A- True B- False Q66 Hadoop processes data serially rather than in parallel as in the past A- True B- False Q67 What is the default replication factor in Hadoop's distributed file system (HDFS)? A- 1 B- 2 C- 3 D-4 Q68 What does the Shuffle phase in MapReduce involve? A- Data compression B- Data distribution to reducers C- Data sorting and grouping D- Data storage in HBase Q69 In how many stages the MapReduce program executes? A- 2 B- 3 C- 4 D- 5 Q70 Identify the type of learning in which labeled training data is used. A- Semi unsupervised learning B- Supervised learning C- Reinforcement learning D- Unsupervised learning Q71 identify the one which is not a type of learning A- Semi unsupervised learning B- Supervised learning C- Reinforcement learning D- Unsupervised learning Q72 The NameNode is responsible for maintaining the file system namespace in HDFS. A- True B- False Q73 What is the role of the NameNode in HDFS? A- Store data files B- Manage metadata and file system namespace C- Perform data replication D- Process MapReduce jobs Q74 The NameNode stores actual data blocks. A- True B- False Q75 What is the main function of the Secondary NameNode? A- Act as a backup for the NameNode B- Manage DataNode communication C- Perform periodic checkpointing D- Facilitate MapReduce operations Q76 What is the main purpose of a distributed file system in Big Data? A- Creating a single database for all data B- Encrypting sensitive information C- Storing large datasets across multiple machines while enabling parallel processing D- Visualizing data in dashboards Q77 Data replication in HDFS enhances fault tolerance. A- True B- False Q78 HDFS read and write mechanisms are entirely managed by the Secondary NameNode. A- True B- False Q79 During the HDFS write process, acknowledgments are sent: A- From NameNode to DataNode B- From DataNode to Client C- From the last DataNode back to the client through the pipeline D- Directly to the client from each DataNode Q80 The primary purpose of MapReduce is to: A- Manage file system replication B- Process large-scale data in parallel C- Provide a backup for HDFS D- Configure YARN Q81 Which of the following is an example of a MapReduce job? A- Data replication B- Word count program C- NameNode checkpointing D- Data compression Q82 What does the Driver code in MapReduce handle? A- Managing NameNode B- Coordinating Mapper and Reducer tasks C- Data storage in HDFS D- Reducing network latency Q83 What is the purpose of data visualization in Big Data? A- Compressing large datasets B- Representing data insights graphically for better understanding C- Storing data in relational databases D- Backing up data on cloud systems Q84 What is the output format of the Mapper in MapReduce? A- Key-value pairs B- XML files C- Tabular data D- Raw text Q85 The Mapper and Reducer tasks can run on different nodes. A- True B- False Q86 MapReduce is not fault-tolerant. A- True B- False Q87 What does “NoSQL” mean in Big Data technologies? A- It is a query language used in traditional databases B- It supports flexible and non-relational data models C- It prevents data from being analyzed D- It restricts access to structured data only Q88 Hadoop Distributed File System is optimized for: A- High-speed computation B- Storage of small files C- Write-once-read-many operations D- In-memory processing Q89 The replication factor in HDFS is configurable to enhance: A- Read speed B- Write speed C- Fault tolerance D- Data compression Q90 Machine learning is a field that gives computers the ability to learn without: A- Accessing data B- Human intervention C- Being explicitly programmed D- Complex algorithms Q91 A decision tree always generates a unique solution for a dataset. A- True B- False Q92 In a decision tree, leaf nodes have multiple outgoing edges. A- True B- False Q93 Supervised learning requires labeled data for training. A- True B- False Q94 Clustering is a supervised learning method. A- True B- False Q95 An example of a supervised learning algorithm is: A- Decision Tree B- K-Means Clustering C- Reinforcement Agent D- Random Walk Q96 In Hadoop, data is replicated for reliability. A- True B- False Q97 The primary storage system in Hadoop is called: A- HBase B- Hadoop Distributed File System (HDFS) C- Apache Spark D- ZooKeeper Q98 Which Big Data characteristic is illustrated by the ability to analyze millions of customer transactions per hour? A- Velocity B- Volume C- Variety D- Veracity Q99 What is a major challenge faced in the field of Big Data due to "variety"? A- Processing high-speed data streams B- Managing structured, semi-structured, and unstructured data formats C- Handling unreliable data sources D- Extracting value from redundant datasets Q100 Apache Hive allows for SQL-like querying on data in Hadoop. A- True B- False Q101 Apache Pig is used for data compression in Hadoop. A- True B- False Q102 Which Big Data tool is used for real-time stream processing? A- Apache Kafka B- Oracle Database C- Google BigQuery D- Tableau Q103 Which tool is used to transfer data between Hadoop and relational databases? A- Apache Pig B- Apache Flume C- Sqoop D- Apache Kafka Q104 Which Hadoop mode runs completely on a single machine? A- Distributed Mode B- Standalone Mode C- Pseudo-Distributed Mode D- Cluster Mode Q105 Big Data is only about volume, not velocity or variety. A- True B- False Q106 How does Big Data help in the healthcare industry? A- Reduces the need for doctors B- Improves life quality and predicts disease outbreaks C- Focuses on building new hospitals D- Replaces manual record-keeping Q107 MapReduce can handle structured, unstructured, and semi-structured data. A- True B- False Q108 What does "Variety" in Big Data refer to? A- The speed of data generation B- The accuracy of data C- Different types of data sources D- The value derived from data Q109 Apache Hive is primarily used for: A- Real-time data streaming B- Data warehousing and SQL-like queries C- Machine learning D- Data visualization Q110 "Volume" in Big Data refers to the variety of data types. A- True B- False Q111 Which of the following statements is true about the relationship between Big Data and traditional data processing? A- Big Data can always be processed with traditional methods B- Traditional methods can handle the velocity of Big Data C- Traditional methods struggle with the volume and variety of Big Data D- There is no difference between Big Data and traditional data Q112 Social media platforms are significant sources of Big Data. A- True B- False Q113 The concept of "Veracity" in Big Data addresses: A- Trustworthiness and uncertainties in data B- The speed of data processing C- The variety of data sources D- The volume of data Q114 Which tool is NOT typically associated with Big Data processing? A- Apache Hadoop B- Oracle RDBMS C- Apache Spark D- Apache Hive Q115 What does "predictive modeling" in Big Data refer to? A- Storing past data for future use B- Analyzing data at rest C- Visualizing real-time customer data D- Using algorithms to forecast future trends Q116 What does the 'Velocity' characteristic of Big Data refer to? A- The amount of data B- The speed at which data is generated C- The different types of data D- The source of data Q117 What type of data does the 'Variety' aspect of Big Data encompass? A- Structured B- Unstructured C- Both structured and unstructured D- Neither Q118 A Big Data pipeline is slowing down due to an excessive amount of incoming data. Which aspect of the '3Vs' is causing this issue? A- Volume B- Velocity C- Variety D- Value Q119 What is the main purpose of data preprocessing in Big Data pipelines? A- Storing the data securely B- Cleaning and organizing raw data for analysis C- Visualizing the data trends D- Eliminating duplicate datasets entirely Q120 Apache Pig is used for writing complex scripts to process data in Hadoop. A- True B- False

Document Details

Tags

Related

Summary

Full Transcript