Identify the challenges that will arise in integrating

Assignment Help Other Subject
Reference no: EM133545280 , Length: word count:1500

Big Data and Analytics

Assessment - Design Data Pipeline

Learning Outcome 1: Explain and evaluate the V's of Big Data (volume, velocity, variety, veracity, valence, and value)

Learning Outcome 2: Identify best practices in data collection and storage, including data security and privacy principles; and

Learning Outcome 3: Effectively report and communicate findings to an appropriate audience.

Task Summary

Critically analyse the online retail business case (see below) and write a 1,500-word report that:
a) Identifies various sources of data to build an effective data pipeline;
b) Identifies challenges in integrating the data from the sources and formulates a strategy to address those challenges; and
c) Describes a design for a storage and retrieval system for the data lake that uses commercial and/or open-source big data tools.

Context
A modern data-driven organisation must be able to collect and process large volumes of data and perform analytics at scale on that data. Thus, the establishment of a data pipeline is an essential first step in building a data-driven organisation. A data pipeline ingests data from various sources, integrates that data and stores that data in a ‘data lake', making that data available to everyone in the organisation.

This Assessment prepares you to identify potential sources of data, address challenges in integrating data and design an efficient ‘data lake' using the big data principles, practices and technologies covered in the learning materials.

Case Study
Big Retail is an online retail shop in Adelaide, Australia. Its website, at which its users can explore different products and promotions and place orders, has more than 100,000 visitors per month. During checkout, each customer has three options: 1) to login to an existing account; 2) to create a new account if they have not already registered; or 3) to checkout as a guest. Customers' account information is maintained by both the sales and marketing departments in their separate databases. The sales department maintains records of the transactions in their database. The information technology (IT) department maintains the website.

Every month, the marketing team releases a catalogue and promotions, which are made available on the website and emailed to the registered customers. The website is static; that is, all the customers see the same content, irrespective of their location, login status or purchase history.

Recently, Big Retail has experienced a significant slump in sales, despite its having a cost advantage over its competitors. A significant reduction in the number of visitors to the website and the conversion rate (i.e., the percentage of visitors who ultimately buy something) has also been observed. To regain its market share and increase its sales, the management team at Big Retail has decided to adopt a data-driven strategy. Specifically, the management team wants to use big data analytics to enable a customised customer experience through targeted campaigns, a recommender system and product association.

The first step in moving towards the data-driven approach is to establish a data pipeline. The essential purpose of the data pipeline is to ingest data from various sources, integrate the data and store the data in a ‘data lake' that can be readily accessed by both the management team and the data scientists.

Task Instructions

Critically analyse the above case study and write a 1,500-word report. In your report, ensure that you:

- Identify the potential data sources that align with the objectives of the organisation's data- driven strategy. You should consider both the internal and external data sources. For each data source identified, describe its characteristics. Make reasonable assumptions about the fields and format of the data for each of the sources;

- Identify the challenges that will arise in integrating the data from different sources and that must be resolved before the data are stored in the ‘data lake.' Articulate the steps necessary to address these issues;

- Describe the ‘data lake' that you designed to store the integrated data and make the data available for efficient retrieval by both the management team and data scientists. The system should be designed using a commercial and/or an open-source database, tools and frameworks. Demonstrate how the ‘data lake' meets the big data storage and retrieval requirements; and

- Provide a schematic of the overall data pipeline. The schematic should clearly depict the data sources, data integration steps, the components of the ‘data lake' and the interactions among all the entities.

Reference no: EM133545280

Questions Cloud

Understand the importance of leadership for quality : Understand the importance of leadership for quality. Compare the Total Quality (TQ) view of leadership to several prominent leadership theories
Describe the key customer segments in the market : Identify and describe the key customer segments (buyer groups) in the market in which your product competes. How is your product/service currently positioned
Why do you think cain chooses to focus on extroversion : why do you think Cain chooses to focus on extroversion/introversion? What is the main message Cain hopes you take away from her talk?
What are some industries that might be particularly : What are some advantages to an employer to provide accommodations to workers with disabilities? What are some industries that might be particularly
Identify the challenges that will arise in integrating : BDA601 Big Data and Analytics, Torrens University Australia - Identify the challenges that will arise in integrating the data from different sources
What are eight key components of an effective business model : What are the eight key components of an effective business model? Describe the five primary revenue models used by e-commerce firms.
Explain how robust corporate social responsibility policies : Explain how robust corporate social responsibility policies can enhance a company's reputation and public perception. Examine how having responsible corporate
What is the strategy proposed by norton for matching : Give an example of traditional methodology and briefly explain that methodology. Give an example of agile methodology and briefly explain that methodology.
How well you summarize a culture through a sound : You will be graded on how well you summarize a culture through a sound and validated set of dimensions such as the ones learned in this course.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd