Research and compare various techniques for organizing data

Assignment Help Applied Statistics
Reference no: EM132357664

Big Data Technologies Assignment - Data Lake Architecture

In this assignment you will explore the management of big data using Data Lake technology. This Assessment Task relates to the following Learning Outcomes:

  • Obtain a high level of technical competency in standard and advanced methods for big data technologies.
  • Understand the current status of and recognize future trends in big data technologies.
  • Develop a competency with emerging big data technologies, applications and tools.

Part 1 - Data Lake Components

In the lecture, you have been introduced to the high-level concepts of the whys and whats of a Data Lake. The goal of this assignment is to take a deep dive into the architecture of Data Lake and provide a Design Patterns for the problem of dealing with organizing a collection of datasets that holds a vast amount of data gathered from various private/open data islands. Your design should include the specification of the following components in some details:

Data Ingestion Component:

a. You need to research and identify the different types of data (from structured to unstructured) and data ingest (e.g., batch, micro-batch, real-time), and briefly explain them.

b. Identify the existing Big Data Technologies and Tools for ingesting big data, e.g., Hortonworks DataFlow.

Data Organization Component:

a. You need to research and compare various techniques for organizing data, e.g., Directory Structure, Version Control and Database Management Systems.

b. Identify the existing Database Management Systems for each category, e.g. MySQL in Relational DBs and MongoDB in NoSQL document-oriented DBs.

Data Security and Governance Component:

a. You need to research and identify the requirements for governing the right data access and the rights for defining and modifying data.

b. Identify the existing trust, security, and privacy issues in Big Data.

Indexing and Search Component:

a. You need to research on the topic "Federated Search" topic and identify technologies that facilitates the simultaneous search of multiple searchable resources.

b. Identify the existing Big Data Technologies and Tools for indexing and searching the big data: e.g., Elasticsearch and some research outcomes.

Analytics Component:

a. You need to research and compare the techniques for analysing the data (from structured to unstructured) and extracting insight from them.

b. Identify the existing Big Data Technologies and Tools for analysing the big data: SAS Tools (such as SAS Text-Analytics), Microsoft ML platform, Amazon ML Platform, and Apache Mahoot.

Visualization Component:

a. You need to research and identify the techniques for visualizing the data.

b. Identify the existing Big Data Technologies and Tools for visualizing the big data: e.g. SAS10 Visual Analytics. Other examples include D3.JS and VIS.JS.

Part 2 - Data Lake Architecture

Design Patterns are formalized best practices that one can use to solve common problems when designing a system. Refer to the Data Lake components in Part 1, and propose a Data Lake architecture for the problem of graph search in big graph databases. Read the following papers to gain an understanding of a typical Data Lake architecture and a graph based search:

1. A. Beheshti, B. Benatallah, R. Nouri, V. Chhieng, H. Xiong, and X. Zhao, CoreDB: a Data Lake Service. Conference on Information and Knowledge Management (CIKM) 2017.

2. G. Sun, G. Liu, Y. Wang, M. A. Orgun, and X. Zhou: Incremental Graph Pattern based Node Matching, IEEE International Conference on Data Engineering (ICDE) 2018.

Attachment:- Big Data Technologies Assignment File.rar

Reference no: EM132357664

Questions Cloud

When dealing with public health policy and laws : When dealing with public health policy and laws, we need to consider various aspects of the economy.
Explain how stimulation of other types of receptors : Explain how stimulation of other types of receptors around a pain receptor can make pain appear less
Talk about reasons why a juvenile may go to suicide : Talk about reasons why a juvenile may go to suicide. At that point, utilizing one of these hypotheses, propose how to help a youngster who has been a casualty
Change is one of the hardest things for people to experience : Change is one of the hardest things for people to experience. Where would you say that your classmates/coworkers fall on this scale?
Research and compare various techniques for organizing data : ITEC874 Big Data Technologies Assignment - Data Lake Architecture, Macquarie University, Australia. Research and compare various techniques for organizing data
Explain what power issues may arise from the scenario : Can you please give me an idea to explain what power issues may arise from the scenario, and What factors influence statistical power
Confident speaker encourage stronger advocacy : How can becoming a confident speaker encourage stronger advocacy skills for themselves? Likewise, how does maintaining self-control encourage better listening?
Annotated bibliography-construct literature review : Using your annotated bibliography, construct a literature review. Provide your reader with a broad base of understanding of the research topic.
Distinguish between a theory hypothesis and operational : Please help me distinguish between a theory, a hypothesis, and an operational definition.

Reviews

len2357664

8/14/2019 12:44:18 AM

What to Submit: A single file (word or pdf) with the name “YourStudentNo+ITEC874A1”. Total Mark: 100 - Part 1. Data Lake Components (60 Marks) For Part 1, you will need to provide 5 tools/technologies for part a and b of each component. You will need to provide references for the tools and papers. You will need to briefly (not more than 1 paragraph) discuss and explain each tool/technology. Part 2. Data Lake Architecture (40 Marks) For Part 2, you will need to draw the architecture (you can use any preferred tools) and provide details of your proposed architecture in no more than 2 pages (including the proposed architecture and the details).

len2357664

8/14/2019 12:44:11 AM

Marking Guideline: Part 1. (60 Marks, 10 Marks for Each Question) [2.5 Marks]. You need to list the name of each tool/technology/method. 0.5 marks for each. If you provide 5 or more, you get the full mark, i.e., 2.5 marks. [7.5 Marks]. You need to give a comprehensive explanation for each tool/technology/method. 1.5 marks for each. You need to explain what it is, how it works, and when it is used (at least three sentences). If you miss one aspect, you will lose 0.5marks. If you provided 5 or more, you will get the full mark, i.e., 7.5 marks.

len2357664

8/14/2019 12:44:03 AM

Part 2. (40 Marks) [15 Marks]. Draw the data lake architecture: You need to use a tool to draw the architecture, e.g., Visio and OmniGraffle, etc. [9 Marks]. You need to draw all the 6 data lake components discussed in Part 1 (1.5 marks for each, totally 9 marks). [4 Marks]. You need to draw the relations between the component. [2 Marks]. You need to provide a clear layout and a figure with a high resolution. [25 Marks]. The details of your proposed architecture. [18 Marks]. You need to provide the details of each of the 6 components (3 marks for each, totally 18 marks). You need to explain what it is, how it works, and when it is used (at least three sentences). If you miss one aspect, you will lose 1 mark. [6 Marks].

len2357664

8/14/2019 12:43:56 AM

You need to provide a description of the workflow of the architecture. Namely, when a query comes, how the data lake works, including the input and output for each of the 6 components (1 marks for each, totally 6 marks). [1 Marks]. You need to provide a well-structured description, e.g., you could use a bulleted list to organize the structure of the paragraphs, or use bold and/or italic fonts to highlight some contents.

len2357664

8/14/2019 12:43:51 AM

Late Submission: No extensions will be granted without an approved application for Special Consideration. There will be a deduction of 10% of the total available marks (10 marks for the assignment, scale to 1 mark in your final grade) made from the total awarded mark for each 24 hour period or part thereof that the submission is late. For example, 25 hours late in submission for this assignment– 20% penalty (20 marks deducted, scale to 2 marks in your final grade). No submission will be accepted after solutions have been posted.

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd