Podcast
Questions and Answers
What is the main focus of the assignment?
What is the main focus of the assignment?
- Understanding cloud computing principles
- Learning programming languages
- Practical implementation of Big Data technologies (correct)
- Exploring theoretical concepts of data analytics
The Harvard Referencing System should not be used for this assignment.
The Harvard Referencing System should not be used for this assignment.
False (B)
What are the two main Big Data technologies required for this assignment?
What are the two main Big Data technologies required for this assignment?
Apache Spark and Hadoop
The assignment submission date is on __________.
The assignment submission date is on __________.
Match the following tasks with their respective descriptions:
Match the following tasks with their respective descriptions:
What percentage of total marks is allocated to the Problem Definition and Business Context?
What percentage of total marks is allocated to the Problem Definition and Business Context?
Students are allowed to reference Wikipedia for their work.
Students are allowed to reference Wikipedia for their work.
What must the dataset used for the project be greater than in size?
What must the dataset used for the project be greater than in size?
What is characterized by a 'good understanding of techniques applicable to their own research or advanced scholarship'?
What is characterized by a 'good understanding of techniques applicable to their own research or advanced scholarship'?
An exceptional understanding of techniques involves no limitations and ambiguities.
An exceptional understanding of techniques involves no limitations and ambiguities.
What type of understanding is indicated by 'limited understanding of techniques applicable to their own research'?
What type of understanding is indicated by 'limited understanding of techniques applicable to their own research'?
A person with __________ understanding tends to have little to no understanding of advanced techniques.
A person with __________ understanding tends to have little to no understanding of advanced techniques.
Match the level of understanding with its corresponding description.
Match the level of understanding with its corresponding description.
Which of the following describes a person with a very good understanding of techniques?
Which of the following describes a person with a very good understanding of techniques?
Advanced techniques are applicable solely to theoretical scholarship.
Advanced techniques are applicable solely to theoretical scholarship.
Someone with __________ understanding is characterized by the ability to work with techniques under certain limitations.
Someone with __________ understanding is characterized by the ability to work with techniques under certain limitations.
Which phrase best describes 'Excellent conceptual understanding'?
Which phrase best describes 'Excellent conceptual understanding'?
Limited conceptual understanding is characterized by strong arguments and critical evaluation.
Limited conceptual understanding is characterized by strong arguments and critical evaluation.
What is a key element of 'very good conceptual understanding'?
What is a key element of 'very good conceptual understanding'?
A student with __________ conceptual understanding can critically evaluate and synthesize a wide range of views.
A student with __________ conceptual understanding can critically evaluate and synthesize a wide range of views.
What does 'Low conceptual understanding' imply?
What does 'Low conceptual understanding' imply?
Match the level of conceptual understanding with its characteristic:
Match the level of conceptual understanding with its characteristic:
Descriptive explanations contribute to strong argumentation.
Descriptive explanations contribute to strong argumentation.
What is one outcome of a 'conceptual understanding that enables the student to display originality'?
What is one outcome of a 'conceptual understanding that enables the student to display originality'?
What ability best describes the skill of critically appraising a wide range of sources?
What ability best describes the skill of critically appraising a wide range of sources?
Demonstrating ethical awareness is not important when interpreting knowledge in the discipline.
Demonstrating ethical awareness is not important when interpreting knowledge in the discipline.
What is required for original creative or artistic application in a specific area of study?
What is required for original creative or artistic application in a specific area of study?
The ability to _________ relevant points from sources is critical in advancing academic arguments.
The ability to _________ relevant points from sources is critical in advancing academic arguments.
Match the skills with their academic relevance:
Match the skills with their academic relevance:
Which of the following describes the importance of depth and breadth in academic study?
Which of the following describes the importance of depth and breadth in academic study?
Good technical skills are sufficient for advancing work without the need for creativity.
Good technical skills are sufficient for advancing work without the need for creativity.
Identify the key benefit of using well-established techniques in academic research.
Identify the key benefit of using well-established techniques in academic research.
What level of expression indicates competent terminology and minimal errors in spelling and syntax?
What level of expression indicates competent terminology and minimal errors in spelling and syntax?
Very good expression includes many errors in spelling, grammar, and syntax.
Very good expression includes many errors in spelling, grammar, and syntax.
What is necessary for excellent expression in decision-making?
What is necessary for excellent expression in decision-making?
Low use of appropriate terminology indicates a ______ level of expression.
Low use of appropriate terminology indicates a ______ level of expression.
Match the following expression levels with their characteristics:
Match the following expression levels with their characteristics:
Which level of digital literacy indicates little to no competency?
Which level of digital literacy indicates little to no competency?
Good evidence of numeracy suggests high use of appropriate terminology.
Good evidence of numeracy suggests high use of appropriate terminology.
An expression level with many errors in spelling, grammar, and syntax is identified as ______.
An expression level with many errors in spelling, grammar, and syntax is identified as ______.
Which level represents an excellent ability to manage learning while exercising initiative and personal responsibility?
Which level represents an excellent ability to manage learning while exercising initiative and personal responsibility?
A person with a low ability to exercise initiative has good skills in decision-making in complex situations.
A person with a low ability to exercise initiative has good skills in decision-making in complex situations.
What is required for employment that necessitates the exercise of initiative?
What is required for employment that necessitates the exercise of initiative?
A person with very good ability to manage learning demonstrates _____ and exercise initiative.
A person with very good ability to manage learning demonstrates _____ and exercise initiative.
Match the team role levels with their descriptions:
Match the team role levels with their descriptions:
What characterizes a person with a 'good ability to manage learning'?
What characterizes a person with a 'good ability to manage learning'?
Everyone who possesses very good ability in managing learning also has an excellent ability to make decisions.
Everyone who possesses very good ability in managing learning also has an excellent ability to make decisions.
What type of responsibility is emphasized in skills necessary for employment?
What type of responsibility is emphasized in skills necessary for employment?
Flashcards
Big Data
Big Data
A massive collection of data that surpasses traditional processing capabilities and presents challenges in storage, processing, and analysis due to its volume, velocity, variety, veracity, and value.
Hadoop
Hadoop
An open-source framework that enables distributed storage and processing of massive datasets across a cluster of computers.
Yarn (Yet Another Resource Negotiator)
Yarn (Yet Another Resource Negotiator)
A component of Hadoop responsible for managing and scheduling resources across the cluster, ensuring efficient data processing.
HDFS (Hadoop Distributed File System)
HDFS (Hadoop Distributed File System)
Signup and view all the flashcards
MapReduce
MapReduce
Signup and view all the flashcards
Apache Spark
Apache Spark
Signup and view all the flashcards
Big Data Analytics
Big Data Analytics
Signup and view all the flashcards
Cloud-Based Big Data Analytics Platforms
Cloud-Based Big Data Analytics Platforms
Signup and view all the flashcards
Limited Understanding of Techniques
Limited Understanding of Techniques
Signup and view all the flashcards
Competent Understanding of Techniques
Competent Understanding of Techniques
Signup and view all the flashcards
Good Understanding of Techniques
Good Understanding of Techniques
Signup and view all the flashcards
Very Good Understanding of Techniques
Very Good Understanding of Techniques
Signup and view all the flashcards
Excellent Understanding of Techniques
Excellent Understanding of Techniques
Signup and view all the flashcards
Exceptional Understanding of Techniques
Exceptional Understanding of Techniques
Signup and view all the flashcards
Low Understanding of Techniques
Low Understanding of Techniques
Signup and view all the flashcards
Little to No Understanding of Techniques
Little to No Understanding of Techniques
Signup and view all the flashcards
Conceptual Understanding
Conceptual Understanding
Signup and view all the flashcards
Critical Evaluation
Critical Evaluation
Signup and view all the flashcards
Synthesis
Synthesis
Signup and view all the flashcards
Advanced Scholarship
Advanced Scholarship
Signup and view all the flashcards
Exceptional Conceptual Understanding
Exceptional Conceptual Understanding
Signup and view all the flashcards
Descriptive Approach
Descriptive Approach
Signup and view all the flashcards
Analytical Approach
Analytical Approach
Signup and view all the flashcards
Critical Approach
Critical Approach
Signup and view all the flashcards
Creative or artistic skills
Creative or artistic skills
Signup and view all the flashcards
Established techniques
Established techniques
Signup and view all the flashcards
Critically appraise sources
Critically appraise sources
Signup and view all the flashcards
Knowledge in the discipline
Knowledge in the discipline
Signup and view all the flashcards
Study beyond the usual range
Study beyond the usual range
Signup and view all the flashcards
Extract relevant points
Extract relevant points
Signup and view all the flashcards
Ethical awareness
Ethical awareness
Signup and view all the flashcards
Advance the work
Advance the work
Signup and view all the flashcards
Decision-making in complex and unpredictable contexts
Decision-making in complex and unpredictable contexts
Signup and view all the flashcards
Excellent expression
Excellent expression
Signup and view all the flashcards
Good expression
Good expression
Signup and view all the flashcards
Limited expression
Limited expression
Signup and view all the flashcards
Very good expression
Very good expression
Signup and view all the flashcards
Low use of appropriate terminology
Low use of appropriate terminology
Signup and view all the flashcards
Exceptional expression
Exceptional expression
Signup and view all the flashcards
Competent expression
Competent expression
Signup and view all the flashcards
Limited Understanding of Team Roles
Limited Understanding of Team Roles
Signup and view all the flashcards
Competent Understanding of Team Roles
Competent Understanding of Team Roles
Signup and view all the flashcards
Good Understanding of Team Roles
Good Understanding of Team Roles
Signup and view all the flashcards
Very Good Understanding of Team Roles
Very Good Understanding of Team Roles
Signup and view all the flashcards
Excellent Understanding of Team Roles
Excellent Understanding of Team Roles
Signup and view all the flashcards
Exceptional Understanding of Team Roles
Exceptional Understanding of Team Roles
Signup and view all the flashcards
Little to No Ability to Manage Learning
Little to No Ability to Manage Learning
Signup and view all the flashcards
Low Ability to Manage Learning
Low Ability to Manage Learning
Signup and view all the flashcards
Study Notes
Module Information
- Degree: MSc Data Analytics
- Module: Big Data Analytics
- Assignment Title: Cloud-Based Big Data Analytics with Apache Spark and Hadoop
- Assignment Type: Report
- Word Limit: 3000 words (+/- 300)
- Weighting: 100%
- Issue Date: 19/11/2024
- Submission Date: 07/02/2025
- Feedback Date: 28/02/2025
Plagiarism
- Students must submit their own original work for assessment
- Submissions will be electronically checked for plagiarism
- Students must adhere to guidelines and regulations regarding plagiarism on InterActive/Canvas
Learner Declaration
- Students must sign a declaration stating the work submitted is their own and research sources are acknowledged.
Harvard Referencing
- The Harvard Referencing System must be used
- Wikipedia, UKEssays.com, and similar websites are not allowed as sources
Learning Outcomes
- LO1: Understand basic concepts of Big Data and its importance in business.
- LO2: Explain Hadoop and HDFC components within the Big Data ecosystem.
- LO3: Summarize Big Data analytics using Yarn, HDFC and MapReduce
Assignment Tasks
1. Problem Definition and Business Context (15% of total marks)
- Identify: A real-world business problem suitable for Big Data analysis.
- Write: A report (500-800 words) explaining the business context, the need for Big Data, and how Big Data analytics can provide value in solving the problem.
- Suggest: A relevant, publicly available dataset (over 10GB) for the project.
2. Cloud Environment Setup and Data Ingestion (25% of total marks)
- Choose: A cloud platform (AWS, Google Cloud, or Azure) and set up a Big Data processing environment (EMR, Dataproc, or HDInsight).
- Document: Steps taken to configure the cluster, including instance types, scaling options, and cost considerations.
- Upload: The dataset to HDFS.
- Explain: Data ingestion process, including file formats (CSV, JSON, Parquet), and ensuring proper data distribution.
3. Data Processing with Spark and Hadoop (30% of total marks)
- Implement: Two data processing tasks: a Hadoop MapReduce job (e.g., word frequencies, anomaly detection) and an Apache Spark job (e.g., advanced data transformations, EDA).
- Evaluate: Performance of both tasks, comparing MapReduce with Spark concerning speed, scalability, and ease of use.
4. Advanced Analytics and Machine Learning (30% of total marks)
- Implement: A machine learning algorithm (e.g., classification, regression, or clustering) on the dataset using Apache Spark MLlib.
- Detail: The model selection process, including data preprocessing and feature selection, model training, and evaluation.
- Visualize: Results, highlighting any business insights.
Data Source
- Dataset: Amazon Customer Reviews (E-commerce Dataset).
- Information: Product reviews, customer sentiments, product popularity
- Size: Over 10GB
- Source: AWS Public Dataset.
- Use Case: Sentiment analysis, customer behavior, product trends
- Data Link: https://github.com/futurexskill/bigdata
Submission Instructions
- Compile: A comprehensive project report or presentation addressing all tasks.
- Report Includes: Steps to set up the Hadoop cluster, data ingestion into HDFS, MapReduce/Spark job code, job submission, results analysis.
- Format: Use the appropriate BSBI template (available on Canvas), Harvard Referencing style, and follow all specified submission instructions on Canvas.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This assignment focuses on the analysis and implementation of cloud-based big data analytics using Apache Spark and Hadoop. Students are required to explore the foundational concepts of big data and its business significance while adhering to academic integrity through original work and proper referencing.