MSc Data Analytics - Big Data Analytics Assignment (PDF)

Document Details

Uploaded by Deleted User

2025

Tags

big data analytics cloud computing apache spark data analysis

Summary

This is a Master's-level assignment focusing on cloud-based big data analytics using Apache Spark and Hadoop. The assignment involves problem definition, cloud environment setup, data processing, and advanced analytics. The assignment is based on Amazon Customer Reviews and the expected submission of a 3000-word report.

Full Transcript

DEGREE: MSc Data Analytics Module: Big Data Analytics Assignment Title: Cloud-Based Big Data Analytics with Apache Spark and Hadoop Ecosystem Assignment Type: Report Word Limit: 3000 words (+/- 300) Weighting: 100% Issue Date: 19/11/2024 Submission Date: 0 7 / 0 2...

DEGREE: MSc Data Analytics Module: Big Data Analytics Assignment Title: Cloud-Based Big Data Analytics with Apache Spark and Hadoop Ecosystem Assignment Type: Report Word Limit: 3000 words (+/- 300) Weighting: 100% Issue Date: 19/11/2024 Submission Date: 0 7 / 0 2 / 2 0 2 5 Feedback Date: 2 8 / 0 2 / 2 0 2 5 Plagiarism: When submitting work for assessment, students should be aware of the InterActive/Canvas guidance and regulations concerning plagiarism. All submissions should be your own, original work. You must submit an electronic copy of your work. Your submission will be electronically checked. Learner declaration I certify that the work submitted for this assignment is my own and research sources are fully acknowledged. Student signature: Date: Harvard Referencing: The Harvard Referencing System must be used. The Wikipedia, UKEssays.com or similar websites must not be used or referenced in your work. Introduction Learning Outcomes: LO1. Demonstrate the understanding of basic concepts of Big Data, its importance and need in business context. LO2. Explain the various components of Hadoop and HFDC along with their role in the Big Data ecosystem. LO3. Summarize the learning on Big Data analytics using Yarn, HDFC and MapReduce Overview: This project-based assignment is designed for master's students to demonstrate their knowledge ofCriteria: Assessment Big Data concepts and hands-on experience with cloud computing. Students will be required to use Apache Spark and Hadoop in a cloud environment (AWS EMR, Google Dataproc, Weighting or Azure 100% 3000 wordsHDInsight, Databricks) to process and analyse a large dataset. The focus of this assignment is on the practical implementation of Big Data technologies and the derivation of meaningful insights from large-scale data. Assignment Tasks: 1. Problem Definition and Business Context (15% of total marks): Identify a real-world business problem or case study where Big Data analytics can be used to drive decision-making. Write a brief report (500-800 words) explaining the business context, the need for Big Data, and how large-scale data analytics can provide value in solving the problem. Suggest and justify a relevant dataset that will be used for the project (the dataset should be publicly available and of substantial size, >10GB). 2. Cloud Environment Setup and Data Ingestion (25% of total marks): Choose a cloud platform (AWS, Google Cloud, or Azure) and set up a Big Data processing environment (using EMR, Dataproc, or HDInsight) to run Apache Spark and Hadoop. Document the steps taken to configure the cluster, including selecting the appropriate instance types, scaling options, and cost considerations. Upload your dataset into HDFS and explain the data ingestion process, including handling file formats (e.g., CSV, JSON, Parquet) and ensuring data is properly distributed across nodes. 3. Data Processing with Spark and Hadoop (30% of total marks): Implement two data processing tasks: 1. Hadoop MapReduce job: Create a basic MapReduce job in Python to process and clean your dataset (e.g., counting word frequencies, detecting anomalies, or aggregating data). 2. Apache Spark job: Use Spark (via Python) to perform advanced data transformations and processing, such as data aggregation, filtering, and exploratory data analysis (EDA). Evaluate the performance of both tasks and compare MapReduce with Spark, considering speed, scalability, and ease of use. 4. Advanced Analytics and Machine Learning (30% of total marks): Using Apache Spark MLlib, implement a machine learning algorithm (e.g., classification, regression, or clustering) on your dataset. Provide a detailed description of the model selection process, including data preprocessing, feature selection, model training, and evaluation. Visualize and explain the results of your model, highlighting any business insights derived from the analysis. Data Source: 1. Amazon Customer Reviews (E-commerce Dataset) This dataset contains reviews of products on Amazon, providing insights into customer sentiments and product popularity. Size: >10GB Source: AWS Public Dataset Use Case: Sentiment analysis, customer behavior, product trends. Data Link: https://amazon-reviews-2023.github.io/ Cloud Integration: Easily accessible through AWS S3 and can be processed on AWS EMR or other cloud services. Retail store Data link: https://github.com/futurexskill/bigdata Submission Instructions: Compile a comprehensive project report or presentation that addresses each task outlined in the preceding section. This report includes: Steps to setup the Hadoop cluster Steps to ingest data into HDFS Source code of the MapReduce job (Mapper and Reducer) Instructions on how to submit the job to YARN Analysis of the results obtained from the MapReduce job Ensure that your report is clear, well-organized, and visually appealing Prepare a document using the BSBI assignment template available on Canvas. User Harvard referencing style for your bibliography. Refer to the Essay-Guide available on Canvas for further instructions. Submit your assignment electronically by the specified deadline. GRADING DESCRIPTORS: LEVEL 7 EXPERIMENTATION & INNOVATION FAIL PASS Threshold Criteria 0-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Deals with complex Little to no ability to Low Limited Competent Good Very Good Excellent range of Exceptional issues both use techniques to utilisation of research or understanding of understanding of problem-solving skills extremely well- problem-solving skills systematically and deal with complex established advanced solving problems, solving problems displaying a developed problem- with sophisticated creatively issues systematically techniques to deal scholarship to their through own through own comprehensive solving displaying an evaluation and demonstrating self- (including those of with complex issues area of study by research or advanced research and understanding of understanding of application of a wide direction and ethics and systematically using a range of scholarship displaying advanced scholarship techniques applicable techniques applicable range of advanced originality in tackling sustainability) and (including those of information and a comprehensive critically selecting to their own research to their own research information and and solving problems creatively to solve ethics and established and understanding of and displaying a or advanced or advanced techniques to problems and/or sustainability) and advanced established and comprehensive scholarship scholarship beyond undertake projects. make decisions. creatively to solve techniques advanced techniques understanding of which is taught. problems and/or established and make decisions, but advanced techniques. with limitations in techniques or approach. Comprehensive Little to no Low Limited Competent Good understanding Very good Excellent Exceptional understanding of understanding of understanding of understanding of understanding of of techniques understanding of understanding of understanding of techniques techniques applicable techniques applicable key techniques techniques applicable applicable to their techniques applicable techniques applicable techniques applicable to their to their own research to their own research applicable to their to their own research own research or to their own research to their own research applicable to their own research or or advanced or advanced own research or or advanced advanced scholarship or advanced or advanced own research or advanced scholarship or their scholarship including advanced scholarship including and a some scholarship and a scholarship and advanced scholarship limitations and their limitations and scholarship including their limitations and understanding of some understanding mastery of some scholarship and ambiguities. ambiguities. their limitations and ambiguities more specialised of more specialised more specialised mastery of some ambiguities. techniques. techniques. areas. more specialised areas. GRADING DESCRIPTORS: LEVEL 7 RESEARCH & ANALYSIS FAIL PASS Threshold Criteria 0-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Systematic Little to no Low knowledge of Limited knowledge to Competent Good knowledge of Very good knowledge Excellent knowledge Exceptional understanding of knowledge of the the subject lacking deal with terminology, knowledge of ideas ideas or arguments at of ideas or arguments of ideas or arguments knowledge of ideas knowledge, and a subject with limited coherence, breadth, facts and concepts or arguments at the the forefront of any at the forefront of at the forefront of or arguments at the critical awareness of breadth or depth or or detail with only some of which is forefront of any part of the subject the subject some of the subject many of forefront of the current problems deficiencies in major some reference to informed by the part of the subject showing a clear, which are which are subject most of which and/or new insights, areas or currency. ideas or arguments at forefront of defined sufficient to deal critical insight into significantly beyond significantly beyond are significantly much of which is at, the forefront of any areas of the subject. with current issues the discipline as what has been taught what has been taught beyond what has or informed by, the part of the subject. in the discipline, whole and current and show a critical and show a critical been taught and forefront of their generally more issues/problems. insight into the insight into the show a critical insight academic discipline, descriptive than discipline and current discipline and current into the discipline field of study or area critical or issues/problems. issues/problems. and current of professional analytical. issues/problems. practice Conceptual Little to no conceptual Low conceptual Limited conceptual Competent Good conceptual Very good conceptual Excellent conceptual Exceptional understanding that understanding or understanding and understanding and conceptual understanding which understanding which understanding which conceptual enables the student argument and a focus arguments are weak argument understanding and critically evaluate and systematically critically apply a wide understanding of to display originality on descriptive or poorly construction with argument synthesise other synthesises a wide range of views publishable quality in the application of explanations which constructed, and the critical evaluation of construction with views and range of views with a through a perceptive with systematic knowledge do not comment on work does not alternative views or critical evaluation information with a critical insight into use of advanced engagement and arguments of others critically evaluate the comment on advanced of a range of views thoughtful advanced scholarship. usage of advanced or alternative views. arguments of others scholarship. and consistent interpretation of scholarship. scholarship. or consider engagement with advanced alternative views. advanced scholarship. scholarship. GRADING DESCRIPTORS: LEVEL 7 ENGAGING WITH PRACTICE FAIL PASS Threshold Criteria 0-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Practical Little to no evidence Low evidence of Limited Competent Good background Very good, Excellent Exceptional understanding of of background background background investigation, investigation, independent, independent, investigation, how established investigation, investigation, investigation, analysis, research, analysis, research, extensive and extensive and analysis, research, techniques of analysis, research, analysis, research, analysis, research, enquiry, ethical enquiry, ethical appropriate appropriate enquiry, ethical research and enquiry, ethical enquiry, ethical enquiry , ethical awareness, and/or awareness, and/or investigation, investigation, awareness, and/or enquiry are used awareness, and/or awareness, and/or awareness, and/or study using study using analysis, research, analysis, research, study which to create and study. study. study using established established enquiry, ethical enquiry, ethical demonstrates interpret established techniques techniques awareness, and/or awareness, and/or carefully considered knowledge in the techniques, with accurately, and can accurately, and study beyond the study well beyond depth and breadth discipline the ability to critically appraise and possesses a well- usual range, and the usual range, and and critically extract relevant use academic developed ability to critically evaluates critically evaluates synthesises this to points. sources. critically appraise a this to advance the this to advance the advance the work wide range of work and/or direct work and/or direct and/or direct sources. arguments. arguments. arguments. Originality in Little to no technical, Low technical, Limited Competent technical, Good technical, Very good range of Excellent range of Exceptional range of the creative or artistic creative or artistic technical, creative or artistic creative or artistic technical, creative or technical, creative or technical, creative or application of skills related to their skills related to their creative or skills required for skills required for artistic skills. artistic skills artistic skills knowledge area of study. area of study. artistic skills area of study. area of study. required for area of study. Independently Little to no Low contribution to Limited contribution Competent Good contribution to Very good Excellent Exceptional advance your own contribution to group group activity and/or to group activity contribution to group group activity and/or contribution to group contribution to group contribution to group knowledge and activity and/or undertaking further and/or undertaking activity and/or independently activity and/or activity and/or activity and/or understanding, and undertaking further training at a further training at a independently undertakes further independently independently independently to develop new training at a high/advanced level. high/advanced level. undertakes further training at a undertakes further undertakes further undertakes further skills to a high high/advanced level. training at a high/advanced level training at a training at a training at a level. high/advanced level. with an high/advanced level high/advanced level high/advanced level understanding of with an with teamwork and with teamwork and team roles understanding of leadership strong leadership. team roles GRADING DESCRIPTORS: LEVEL 7 REALISATION & COMMUNICATION FAIL PASS Threshold Criteria 0-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Communicate Little to no clarity in Low clarity in the Limited clarity in the Competent Good, confident and Very good, confident Excellent Exceptional information, ideas, the communication communication of communication of communication of clear communication and clear communication of communication of problems and of ideas, problems ideas, problems and ideas, problems and ideas, problems and of ideas, problems communication of ideas, problems and ideas, problems and solutions to both and solutions to solutions to solutions to solutions to and solutions to ideas, problems and solutions to solutions to specialist and non- audiences. audiences. audiences. audiences. audiences in a range solutions to audiences in a range audiences in a range specialist audiences. of means / media. audiences in a range of means / media. of means / media. of means / media. GRADING DESCRIPTORS: LEVEL 7 PERSONAL & PROFESSIONAL CONNECTIVITY FAIL PASS Threshold 0-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Criteria Independently Little to no Low contribution to Limited Competent contribution to Good contribution Very good Excellent contribution to Exceptional contribution to advance your contribution to group activity and/or contribution to group activity and/or to group activity contribution to group activity and/or group activity and/or own knowledge group activity undertaking further group activity independently undertakes and/or group activity independently independently undertakes and and/or training at a and/or undertaking further training at a independently and/or undertakes further further training at a understanding, undertaking high/advanced level. further training at a high/advanced level. undertakes further independently training at a high/advanced level with and develop further training high/advanced training at a undertakes high/advanced level with teamwork and strong new skills to a at a level. high/advanced level further training at teamwork and leadership. high level. high/advanced with an a high/advanced leadership level. understanding of level with an team roles understanding of team roles Qualities and Little to no ability Low ability to manage Limited ability to Competent ability to Good ability to Very good ability Excellent ability to Exceptional ability to transferable to manage learning and/or exercise manage learning manage learning, and systematically to systematically manage learning on own manage learning on own skills necessary learning and/or initiative, ethical and and exercise exercise initiative, manage learning, manage learning, initiative, and exercise initiative, and exercise for exercise initiative, personal responsibility initiative, ethical ethical and personal and exercise and exercise initiative, ethical and initiative, ethical and employment ethical and and/or decision-making and personal responsibility, and initiative, ethical initiative, ethical personal responsibility, personal responsibility, and requiring: (a) personal in complex and responsibility, and decision-making in and personal and personal and decision-making in decision-making in complex the exercise of responsibility unpredictable situations decision-making in complex and responsibility, and responsibility, and complex and and unpredictable situations initiative, and/or decision- complex and unpredictable situations decision-making in decision-making unpredictable situations ethical and making in unpredictable complex and in complex and personal complex and situations unpredictable unpredictable responsibility unpredictable situations situations. (b) decision- situations making in complex and Little to no use of Low use of appropriate Limited expression, Competent expression, Good expression, Very good Excellent expression, Exceptional expression, unpredictable appropriate terminology, with many style and style, and appropriate style and expression, style style and appropriate style and appropriate contexts terminology, errors in spelling, appropriate vocabulary with some appropriate and appropriate vocabulary with minimal vocabulary with no errors limited vocabulary and syntax. vocabulary with errors in spelling, grammar vocabulary with vocabulary with errors in spelling, in spelling, grammar and vocabulary and errors in spelling, and syntax which do not some errors in minimal errors in grammar and syntax. syntax. many errors in grammar and affect understanding. spelling, grammar spelling, grammar spelling, grammar syntax which affect and syntax. and syntax. and syntax. understanding. Little to no Low evidence of basic Limited evidence of Adequate evidence of Good evidence of Very good Excellent evidence of Exceptional evidence of evidence of basic numeracy or digital numeracy or digital numeracy or digital numeracy or digital evidence of numeracy or digital numeracy or digital numeracy or literacy, hardware and literacy, hardware literacy, hardware and literacy, hardware numeracy or literacy, hardware and literacy, hardware and digital literacy, software skills and software skills software skills and software skills digital literacy, software skills software skills competency. hardware and competency. competency. competency. competency. hardware and competency. software skills software skills competency. competency. Does not demonstrate achievement of professional competence The student has demonstrated achievement of professional competence when assessed against the requirements of a PSRB. when assessed against the requirements of a professional, statutory or regulatory body (PSRB). Inaccurate use of terminology with limited vocabulary and many The student has adhered to the appropriate rules and/or conventions set by regulators or the industry. errors in spelling, grammar and syntax. Inaccurate terminology, with many errors in spelling, vocabulary and syntax. 6

Use Quizgecko on...
Browser
Browser