Reference no: EM133883381 , Length: word count:2000
Big Data and Cloud Computing Assignment
Assignment Scenario: "Optimizing Big Data Processing in the Cloud with Data Set Extraction and Analysis"
You are a data architect in a multinational enterprise that handles vast amounts of diverse data across multiple departments. The company is looking to optimize its big data processing capabilities by leveraging cloud computing technologies. Your assignment is to design and implement a solution that utilizes cloud-based services for efficient storage, processing, and analysis of big data. Additionally, include a practical example of data set extraction and analysis. Consider the following aspects in your assignment:
Data Ingestion and Storage:
Describe the types and sources of data that the company deals with. Get online assignment help services Now!
Propose a cloud-based storage solution, considering scalability, redundancy, and cost-effectiveness.
Explain the data ingestion process, including tools or services used for seamless data transfer to the cloud.
Scalable Processing Architecture:
Design a scalable big data processing architecture using cloud computing resources.
Discuss the choice of cloud services for distributed computing and parallel processing.
Explore how the architecture accommodates the company's growing data volumes.
Data Extraction and Pre-processing:
Select a specific data set from the company's domain for extraction and analysis.
Discuss the extraction process, including data sources, extraction tools, and any transformations applied.
Outline pre-processing steps to clean and prepare the data for analysis.
Data Analysis and Insights:
Implement a cloud-based analytics solution for analysing the selected data set.
Provide examples of analytical queries or machine learning algorithms applied to extract meaningful insights.
Discuss how the analysis contributes to informed decision-making within the enterprise.
Cost Optimization Strategies:
Develop strategies for optimizing costs associated with storing and processing the selected data set in the cloud.
Consider factors such as resource utilization, reserved instances, and cost-effective storage options.
Security and Compliance:
Address security measures to ensure the confidentiality and integrity of the selected data set.
Discuss compliance considerations, especially if the enterprise operates in regulated industries.
Propose access control mechanisms and encryption practices specific to the analysed data set.
Performance Monitoring and Management:
Outline strategies for monitoring the performance of the big data processing system in the cloud, with a focus on the analysed data set.
Discuss how to manage and troubleshoot potential issues or bottlenecks specific to the selected data set.
Explore tools or services that enable efficient system management and monitoring.
There are several sources where you can obtain free large datasets for your assignment on big data processing in the cloud. Here are some reputable platforms and repositories:
Please prioritize working with larger datasets, as your current focus is on bigger data sizes, and justify this by providing the dataset link in your report.
Eg: Kaggle Datasets, UCI Machine Learning Repository, Google Cloud Public Datasets, AWS Public Datasets, and PhysioNet.
Sample Topics are below. Please select only one from the below
Food Nutrition Datasets
VR experiences,
Gen Z Datasets,
Mental health,
Chat bot using NLP
Sports
Retail
General considerations and Deliverables.
Please be aware that each step should be fully described in your assessment. You should support your implementation with written documentation.
Submit a detailed report addressing each aspect of the assignment, including the specifics of the selected data set and the results of the analysis.
Provide references to relevant cloud computing services, frameworks, or case studies supporting your design decisions.
You should submit a report summarizing your findings (Screenshots with an explanation), including tables and charts to support your analysis. Your report should also include a brief discussion of any limitations or caveats to your design.
Note: Assessment report with a copy of the programme must be converted to pdf file before submitting. Code must be submitted separately as .ipynb file.
Learning Outcomes
Critically apply skills, techniques, and knowledge from a range of data analysis methods and algorithms for enhancing and solving problems in various domains.
7.2 Develop abstract thinking and design ability to analytically demonstrate concepts relating to data science.
7.3 Use research-based knowledge for the design of experiments, analysis, and interpretation of data to provide valid results.
7.4 Critically evaluate and analyse advanced data science topics, and concepts, and implement them in workplace.
7.5 Identify and implement appropriate programming and software tools to critically analyse big data applications in workplace.
7.8 Critically analyse the data and apply predictive modelling technique in the field of Machine Learning and Artificial Intelligence.
7.9 Critique legal, social, and ethical issues within the field of data science and applicable ancillary sectors, as applied to contemporary research and industrial practice.