BDA Question Bank PDF

Summary

This document is a question bank for a Bachelor of Data Analysis (BDA) course, covering various modules. Modules include Big Data, Hadoop, NoSQL, Data Mining, Real-time Data Models, and Data Visualization. It contains questions and problems related to these subjects.

Full Transcript

QUESTION BANK FOR BDA Module-1 Question Bank 1. What is Big Data? Discuss different sources that generate big data. 2. How much data does it take to be called Big Data? With example explain various data measures. 3. With example explain types of Big Data. 4. What ar...

QUESTION BANK FOR BDA Module-1 Question Bank 1. What is Big Data? Discuss different sources that generate big data. 2. How much data does it take to be called Big Data? With example explain various data measures. 3. With example explain types of Big Data. 4. What are the 5Vs of big data?Explain. 5. Compare big data with traditional data. 6. How big data approach is different from traditional warehouse approach? 7. Discuss any 4 applications of big data in brief. 8. What are the key use cases of big data? Explain any one use case of your interest. Module 2- Question Bank 1. What is Hadoop? Explain Hadoop’s core components. 2. Discuss Hadoop ecosystem (all tools under Hadoop framework- mapreduce, flume, scoop, hbase,....etc). 3. What is HDFS? How files are written and read in HDFS? 4. Explain architecture of HDFS 5. Discuss role of: a)Namenode b) Secondary Name Node c)Data Node d) JobTracker e) TaskTracker 6. Explain concept of Map Reduce using an example. Write Map Reduce pseudo-code for “Group by” “Aggregations” in a database 7. Write Map Reduce pseudo-code to count distinct words in the input file. 8. Explain Hadoop Architecture. 9. Explain Google File System. 10. Explain how files are stored in HDFS 11. List relational algebra operations. Explain any two using Mapreduce 12. Explain limitations of Hadoop. 13. Explain mapreduce algorithm for matrix multiplication. Question Bank- NoSQL 1. What are the business drivers for NoSQL? 2. Explain CAP. How is CAP different from ACID property in database? 3. Explain why NoSQL is schema-less. 4. When it comes to big data how NoSQL scores over RDBMS? 5. Discuss the four different architectural patterns of NoSQL. 6. What is the replication and sharding? 7. What is the mechanism used by NoSQL to evenly distribute data in a cluster? 8. Explain the four ways by which big data problems are handled by NoSQL? 9. Discuss a few types of big data problems. 10. Mention some uses of key-value store and also state its weakness. Question Bank 4 Mining Data Stream 1. Theory and Problems on Flajolet-Martin and Blooms Filter algorithm. 2. What is data streaming? Issues and application. Question Bank on Module 5 Real time data models 1. What is distance measuring methods? What is the use of distance measuring methods? 2. Problems on Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance. 3. Explain Content-Based Recommendations, Collaborative Filtering. 4. Explain Clustering of Social-Network Graphs with an example. Question Bank on Module 6 Data Visualization 1. Data visualization using R. Note: refer the question pattern as per IA and MSE exams.

Use Quizgecko on...
Browser
Browser