Big Data Business Analytics PDF
Document Details
Uploaded by UnbiasedSweetPea
University of Information Technology
Tags
Summary
This document contains multiple choice questions relating to Big Data, Hadoop, HBase and other relevant topics. The questions test understanding of foundational concepts in Big Data processing and related technologies.
Full Transcript
1. What does commodity Hardware in Hadoop world mean? ( D ) a) Very cheap hardware b) Industry standard hardware c) Discarded hardware d) Low specifications Industry grade hardware 2. What does “Velocity” in Big Data mean? ( D) a) Speed of input data generation b) Speed of individual machine...
1. What does commodity Hardware in Hadoop world mean? ( D ) a) Very cheap hardware b) Industry standard hardware c) Discarded hardware d) Low specifications Industry grade hardware 2. What does “Velocity” in Big Data mean? ( D) a) Speed of input data generation b) Speed of individual machine processors c) Speed of ONLY storing data d) Speed of storing and processing data 3. The term Big Data first originated from: ( C ) a) Stock Markets Domain b) Banking and Finance Domain c) Genomics and Astronomy Domain d) Social Media Domain 4. Which of the following Batch Processing instance is NOT an example of ( D) BigData Batch Processing? a) Processing 10 GB sales data every 6 hours b) Processing flights sensor data c) Web crawling app d) Trending topic analysis of tweets for last 15 minutes 5. Sliding window operations typically fall in the category (C ) of__________________. a) OLTP Transactions b) Big Data Batch Processing c) Big Data Real Time Processing d) Small Batch Processing 6. What is HBase used as? (A ) a) Tool for Random and Fast Read/Write operations in Hadoop b) Faster Read only query engine in Hadoop c) MapReduce alternative in Hadoop d) Fast MapReduce layer in Hadoop 7. What is the default HDFS block size? ( D ) a) 32 MB b) 64 KB c) 128 KB d) 64 MB 8. What is the default HDFS replication factor? ( C) a) 4 b) 1 c) 3 d) 2 9. Which of the following is NOT a type of metadata in NameNode? ( C) a) List of files b) Block locations of files c) No. of file records d) File access control information 10. The mechanism used to create replica in HDFS is____________. ( C) a) Gossip protocol b) Replicate protocol c) HDFS protocol d) Store and Forward protocol 11. Where is HDFS replication factor controlled? ( D) a) mapred-site.xml b) yarn-site.xml c) core-site.xml d) hdfs-site.xml 12. Which of the following Hadoop config files is used to define the heap size? (C ) a) hdfs-site.xml b) core-site.xml c) hadoop-env.sh d) Slaves 13. Which of the following is not a valid Hadoop config file? ( B) a) mapred-site.xml b) hadoop-site.xml c) Depends on cluster size d) True if co-located with Job tracker 14. Which of following statement(s) are true about distcp command? (A) a) It invokes MapReduce in background b) It invokes MapReduce if source and destination are in same cluster c) It can’t copy data from local folder to hdfs folder d) You can’t overwrite the files through distcp command 15. Which of the following is NOT the component of Flume? (B) a) Sink b) Database c) Source d) Channel 16. Which of the following is the correct sequence of MapReduce flow? ( C ) f) Map Reduce Combine a) Combine Reduce Map b) Map Combine Reduce c) Reduce Combine Map 17.Which of the following can be used to control the number of part files ( B) in a map reduce program output directory? a) Number of Mappers b) Number of Reducers c) Counter d) Partitioner 18. Which of the following operations can’t use Reducer as combiner also? (D) a) Group by Minimum b) Group by Maximum c) Group by Count d) Group by Average 19. Which of the following is/are true about combiners? (D) a) Combiners can be used for mapper only job b) Combiners can be used for any Map Reduce operation c) Mappers can be used as a combiner class d) Combiners are primarily aimed to improve Map Reduce performance 20. Reduce side join is useful for (A) a) Very large datasets b) Very small data sets c) One small and other big data sets d) One big and other small datasets 21. Distributed Cache can be used in (D) a) Mapper phase only b) Reducer phase only c) In either phase, but not on both sides simultaneously d) In either phase 22. What is optimal size of a file for distributed cache? (C) a) =250 MB c) 900 nodes c) > 5000 nodes d) > 3500 nodes 56. Hive managed tables stores the data in (C) a) Local Linux path b) Any HDFS path c) HDFS warehouse path d) None of the above 57. On dropping managed tables, Hive: (C) a) Retains data, but deletes metadata b) Retains metadata, but deletes data c) Drops both, data and metadata d) Retains both, data and metadata 58. On dropping external tables, Hive: (A) a) Retains data, but deletes metadata b) Retains metadata, but deletes data c) Drops both, data and metadata d) Retains both, data and metadata 59. The partitioned columns in Hive tables are (B) a) Physically present and can be accessed b) Physically absent but can be accessed c) Physically present but can’t be accessed d) Physically absent and can’t be accessed 60. Hive data models represent (C) a) Table in Metastore DB b) Table in HDFS c) Directories in HDFS d) None of the above 61. When is the earliest point at which the reduce method of a given Reducer can be called? A. As soon as at least one mapper has finished processing its input split. B. As soon as a mapper has emitted at least one record. C. Not until all mappers have finished processing all records. D. It depends on the InputFormat used for the job. Answer: C