Reference no: EM133991203
Big Data
Assessment details
Assessment: Case Study Analysis Report
Objectives
This assessment item relates to the unit learning outcomes as in the unit descriptor. This assessment is designed to improve student research and writing skills and to give students experience researching (research journals) on a specific topic relevant to subject/unit. Students are required to conduct a comprehensive literature review addressing contemporary issues/challenges in Big Data analytics in information design and analyze how big data architecture helps in various areas. In addition, students must critically analyze academic journal articles and apply their findings through a structured case study analysis.
Assessment description
This assessment will be completed by individuals. In this assessment you are required to conduct in-depth research-based analysis of real-world examples of Big Data from academic journals/research publications, industry reports in varies areas like education, health care, finance. You will analyze how big data and methods are applied to solve complex issues and challenges. No AI shortcuts - Just Genuine Assignment Help from Real Tutors.
You can choose any 3-4 research papers on same technology/area and with the help of tutor. Each student must decide on a topic and email their topic within 2 weeks to the respective Tutor for approval. This assessment is worth 20% of the unit's grade
The deadline for submitting the draft is in week 4, and it will be structured into the following sections:
Introduction
Literature review
Case Study Analysis
Architecture, Tools and methodology
Findings & discussions
References
The final submission of your paper is due in week 4
The final submission should be no less than 1500 words.
Please include 10 references and report should be as per IEEE format.
Assignment 2 Overview
Assessment 2
Tableau & Splunk Project
The Assignment consists of a research report of 2000 words Due Week 8 and weightage 30%.
Problems to be addressed in the report
Higher education institutions and organizations have high volumes of heterogenous data with volume, velocity, variety, variability and value. You will design, implement and implement scalable solutions to help with strategic and operational decision making for student information systems (student retention, course viability, resource allocation)
Part A: Scenario: Multi Campus university big data Analytics
A university operating across multiple campuses in Australia processes:
10+ years of student enrolment data
Learning Management System (LMS) logs
Financial records
Attendance and engagement data
Demographic information
University is experiencing issues like; Slow retention analysis, no predictive attrition modelling, Fragmented reporting systems, no real-time academic performance insights, Increasing data volume each semester
Your responsibilities include analyzing the data, applying any required transformations, and facilitating the extraction of valuable insights from the processed data.
Dataset: Use any Student data set from Kaggle
Task 1: Problem analysis
5Vs of Big Data
Data growth challenges
Processing bottlenecks
Limitations of traditional RDBMS systems
Batch vs real-time requirements
Task 2: Distributed processing
Load large-scale dataset (Kaggle student dataset or simulated multi-year dataset)
Perform distributed transformations:
Cleaning & preprocessing
Aggregations
Cohort analysis
Campus-wise enrolment growth
Implement: Partitioning strategy, Caching strategy
Task 3: Visual analytics using Tableau
Create a dashboard showing:
Retention by program and enrolments
Demographic breakdown
Performance analysis & distribution
The research report must have the format:
Table of contents
Institute detail/information (Executive Summary)
Problem Identification
Implementation
Tableau visualisation report
Analysis and Discussion
Recommendations
References
Part B:
In this part of the assessment, you must analyze the log details of the Student Enrollment System in Australia with Splunk
Execute a search to identify all failed login attempts in the last 24 hours. Export the results as a report.
Identify the top 10 IP addresses generating the highest number of events. Present the results in tabular format.
Create a query to calculate the average response time per host.
Filter events that occurred between two specific timestamps and display only the host and source fields.
Take a screenshot of your ‘Activity Jobs Menu' detailing the current job saved with expiration date
Take a screenshot of your search history. Set a filter to narrow down your search results.
Show where selected fields are, interesting fields and all located. How do you use fields to perform a search.
How do you add time range when performing search.
This assignment consists of two integrated parts:
Part A: Tableau Research Report (Business Data Analysis)
Part B: Splunk Practical Exercises (Network Log Analysis)
Screenshot requirements:
Screenshots must clearly show the full Splunk interface (not cropped too tightly).
Your Student ID and the current date must be visible on every screenshot.
Screenshots should be pasted directly into the report under each task (a-h), followed by your short explanation.
Screenshots without Student ID visible will not be accepted.
Assessment 3: Distributed Big Data Computing Frameworks Assessment
This assessment focuses on understanding big data frameworks and real-world applications. Students
aim to enhance research, analytical, teamwork, and communication skills through critical evaluation of contemporary big data technologies.
You will be working in group 3-4
Apply theoretical knowledge to real world applications
Evaluate with critical thinking different frameworks and analyze strengths and limitations
Collaborate effectively within team to produce expected results
You are a Big Data Consultant hired by an organization processing large-scale data in ONE of the following domains:
Financial transactions
Smart city sensor data
Healthcare patient records
E-commerce customer behavior
The organization is experiencing: Slow processing, Scalability limitations, Real-time analytics constraints, Data pipeline bottlenecks
Your task is to:
Compare two distributed big data frameworks.
Case study analysis of frameworks on choosing one industry
Justify your framework selection criteria
Evaluate performance, scalability, governance, and innovation aspects.
Framework selection (students can choose any two): Apache Hadoop, Apache Spark, Apache Kafka, Apache Flink.
Report Structure:
Introduction
Framework overview 1
Framework overview 2
Comparative Analysis
Processing model
Storage mechanisms
Resource management
Scalability
Performance
ML/Ai integration approach
Framework selection justification
Cost efficiency
Data velocity characteristics
Fault tolerance
Scalability
Batch vs real time needs
(comparative evaluation table is mandatory)
Organizational recommendation (case analysis)
Which framework is suitable and why?
Business benefits
Improved reporting
Risk and challenges
(like data privacy, implementation, staff, budget)
Future recommendations
(AI/cloud integration plan, predictive analytics
Present architecture comparison
Justify selected frameworks for case study
Demonstrate understanding (not reading slides) Slides: 12-15 slides maximum.