Distributed big data computing frameworks assessment

Assignment Help Computer Engineering
Reference no: EM134013346

Big Data

Assessment: Case Study Analysis Report

Objectives

This assessment item relates to the unit learning outcomes as in the unit descriptor. This assessment is designed to improve student research and writing skills and to give students experience researching (research journals) on a specific topic relevant to subject/unit. Students are required to conduct a comprehensive literature review addressing contemporary issues/challenges in Big Data analytics in information design and analyze how data mining architecture helps in various areas. In addition, students must critically analyze academic journal articles and apply their findings through a structured case study analysis.

Assessment description

This assessment will be completed by individuals. In this assessment you are required to conduct in-depth research-based analysis of real-world examples of Big Data from academic journals/research publications, industry reports in varies areas like education, health care, finance. You will analyze how big data and methods are applied to solve complex issues and challenges.

You can choose any 3-4 research papers on same technology/area and with the help of tutor. Each student must decide on a topic and email their topic within 2 weeks to the respective Tutor for approval. This assessment is worth 20% of the unit's grade.

The deadline for submitting the draft is in week 4, and it will be structured into the following sections:

Introduction
Literature review
Case Study Analysis
Architecture, Tools and methodology
Findings & discussions
References

The final submission of your paper is due in week 4.

Assignment 2

Tableau & Splunk Project

The Assignment consists of a research report of 2000 words Due Week 8 and weightage 30%.

Problems to be addressed in the report

Higher education institutions and organizations have high volumes of heterogenous data with volume, velocity, variety, variability and value. You will design, implement and implement scalable solutions to help with strategic and operational decision making for student information systems (student retention, course viability, resource allocation).

Part A: Scenario: Multi Campus university big data Analytics

A university operating across multiple campuses in Australia processes:

10+ years of student enrolment data
Learning Management System (LMS) logs
Financial records
Attendance and engagement data
Demographic information

University is experiencing issues like; Slow retention analysis, no predictive attrition modelling, Fragmented reporting systems, no real-time academic performance insights, Increasing data volume each semester.

Your responsibilities include analyzing the data, applying any required transformations, and facilitating the extraction of valuable insights from the processed data.

You must design and implement a scalable high volume data solution.

Dataset: Use any Student data set from Kaggle

Task 1: Problem analysis

5Vs of Big Data
Data growth challenges
Processing bottlenecks
Limitations of traditional RDBMS systems
Batch vs real-time requirements

Task 2: Distributed processing

Load large-scale dataset (Kaggle student dataset or simulated multi-year dataset)
Perform distributed processing transformations:
Cleaning & preprocessing
Aggregations
Cohort analysis
Campus-wise enrolment growth
Implement: Partitioning strategy, Caching strategy

Task 3: Visual analytics using Tableau

Create a dashboard showing:
Retention by program and enrolments
Demographic breakdown
Performance analysis & distribution

The research report must have the format:

Table of contents
Institute detail/information (Executive Summary)
Problem Identification
Implementation
Tableau visualisation report
Analysis and Discussion
Recommendations
References
(include diagrams whenever possible big data architecture)

Part B:

In this part of the assessment, you must analyze the log details of the Student Enrollment System in Australia with Splunk.

Execute a search to identify all failed login attempts in the last 24 hours. Export the results as a report.
Identify the top 10 IP addresses generating the highest number of events. Present the results in tabular format.
Create a query to calculate the average response time per host.
Filter events that occurred between two specific timestamps and display only the host and source fields.
Take a screenshot of your ‘Activity Jobs Menu' detailing the current job saved with expiration date.
Take a screenshot of your search history. Set a filter to narrow down your search results.

Show where selected fields are, interesting fields and all located. How do you use fields to perform a search.
How do you add time range when performing search.

This assignment consists of two integrated parts:

Part A: Tableau Research Report (Business Data Analysis)
Part B: Splunk Practical Exercises (Network Log Analysis)

Screenshot requirements:

Screenshots must clearly show the full Splunk interface (not cropped too tightly).
Your Student ID and the current date must be visible on every screenshot.
Screenshots should be pasted directly into the report under each task (a-h), followed by your short explanation.
Screenshots without Student ID visible will not be accepted.

Additional information regarding this Assessment:

Report document standards

Normal font is Calibri, size 11 point for the body of all documents with the text fully justified.
Headings should not exceed 14 points in size except on a title page where larger fonts are appropriate for the title of a report.
Documents should use 1.15 spacing within a paragraph and have an 8-point space between paragraphs.
Footers should be created on the report that includes a page number.
Up to 15% of the Report contents may be quoted or paraphrased from other sources provided you with knowledge and cite the original source of the material you use.
Use IEEE referencing all quoted or paraphrased material.

Assessment 3: Distributed Big Data Computing Frameworks Assessment

This assessment focuses on understanding big data frameworks and real-world applications. Students aim to enhance research, analytical, teamwork, and communication skills through critical evaluation of contemporary big data technologies.

You will be working in group 3-4
Apply theoretical knowledge to real world applications
Evaluate with critical thinking different frameworks and analyze strengths and limitations
Collaborate effectively within team to produce expected results

You are a Big Data Consultant hired by an organization processing large-scale data in ONE of the following domains:

Financial transactions
Smart city sensor data
Healthcare patient records
E-commerce customer behavior

The organization is experiencing: Slow processing, Scalability limitations, Real-time analytics constraints, Data pipeline bottlenecks.

Your task is to:

Compare two distributed big data frameworks.
Case study analysis of frameworks on choosing one industry.
Justify your framework selection criteria.
Evaluate performance, scalability, governance, and innovation aspects.

Framework selection (students can choose any two): Apache Hadoop, Apache Spark, Apache Kafka, Apache Flink.

Report Structure:

Introduction
Framework overview 1
Framework overview 2
Comparative Analysis
Processing model
Storage mechanisms
Resource management
Scalability
Performance
ML/Ai integration approach
Framework selection justification
Cost efficiency
Data velocity characteristics
Fault tolerance
Scalability
Batch vs real time needs
(comparative evaluation table is mandatory)

Organizational recommendation (case analysis)

Which framework is suitable and why?
Business benefits
Improved reporting
Risk and challenges
(like data privacy, implementation, staff, budget)
Future recommendations
(AI/cloud integration plan, predictive analytics)

Present architecture comparison
Justify selected frameworks for case study
Demonstrate understanding (not reading slides)
Slides: 12-15 slides maximum

Reference no: EM134013346

Questions Cloud

To what extent did parents influence religious upbringing : To what extent did your parents influence your religious upbringing? Do you plan to pass Messianic Jewish faith and practices on to your children and if so how?
Pros and cons of natural infection vs vaccine : Discuss the pros and cons of a natural infection vs a vaccine.
What are pros and cons of preceptor reaction : Based on the conflict style, what are the pros and cons of the preceptor's reaction for this specific situation?
What home care for infant related to feeding schedules : What home care for the infant related to feeding schedules, urine and stools, and when to bath the infant needs to be reviewed?
Distributed big data computing frameworks assessment : Distributed Big Data Computing Frameworks Assessment and Evaluate with critical thinking different frameworks and analyze strengths
Optimizing staffing and scheduling in nursing : Optimizing Staffing and Scheduling in Nursing: A Comprehensive Approach. List the title of your project with brief description of what the project will entail
Seizures secondary to head injury : An 8-year-old child is admitted to the pediatric hospital with seizures secondary to head injury following a fall on the playground.
Educational risk management presentation : The purpose of this assignment is to create an educational risk management presentation.
Which of her arguments is most likely to support her theory : Which of her arguments is MOST likely to support her theory that the cave paintings might have been made by prehistoric shamans?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd