Write spark program that load data and analyze data quality

Assignment Help Programming Languages
Reference no: EM131796182

Write a Spark program that loads the data, analyzes data quality, provides a summary report, and reports your findings, abc is an eCommerce company, and as such our analysts work using the language of eCommerce. Here are some terms that are used in this task description:

- Shopper - an individual using an eCommerce website

- Session - the experience of a shopper on an eCommerce website within a single continuous period of time (if a shopper visits a site multiple times, sessions are split anywhere that there was at least a 30 minute break)

Conversion - a session which resulted in a purchase (one conversion can have multiple transactions)

- Marketing Strategy - Any modification to an eCommerce website that is targeted to a shopper and is executed with the intent that it will increase the likelihood of a purchase Fields

ssid - session identifier: used to link logs between files, it is a key composed of three values in the following format: user_id:site_id:session_start_time (session start time is taken from client side).

st - server timestamp: timestamp of when a web request was recorded on the server side

gr - determines assignment of a session to a control or experiment group

ad - indicates which marketing strategy a shopper was exposed to Data Assumptions

- A shopper can have more than one session (each session separated by at least a 30 min break)

- Each session should have exactly one session log

- There is one marketing strategy per session

- Each session has a corresponding features log

Report format

After loading the data, we expect you to summarize and group it and prepare:

1 ) a populated table (tsv format) with the following header:

Session start date at hourly granularity, site_id, gr, Ad, browser, number of sessions, number of conversions, number of transactions, sum of revenue

Notes:

Each row will contain aggregated data (key being first five columns)

Session start date at hourly granularity: 1464742123 -> 2016-06-01 00:00 (UTC)

2) a list of means and standard deviations for each feature per every (site_id, ad) pair Expected outcome

- Source code for Spark program to generate reports
- Report regarding data quality
- Reports with data summary

Reference no: EM131796182

Questions Cloud

Discussion on new sausage system with an installed cost : Dog Up! Franks is looking at a new sausage system with an installed cost of $440,000. This cost will be depreciated straight-line to zero over the project's.
Administration style and leadership approach : Given this administration's style and leadership approach, do you think the minutes of the board meeting reflect actual board meeting discussions?
What is the current year Subpart F deemed dividend to USCo : OutCo's current year E&P is $250,000 and its accumulated E&P is $18 million. What is the current year Subpart F deemed dividend to USCo
Scheduling in single machine schedule results : The EDD (early due date) scheduling in single machine schedule results in.
Write spark program that load data and analyze data quality : Write a Spark program that loads the data, analyzes data quality, provides a summary report, and reports your findings, abc is an eCommerce company
Independent and uniformly distributed between 0 and 99 : a) Assuming the numbers of cents involved are independent and uniformly distributed between 0 and 99:
Distribution of the total lifetime of n batteries : And not counting that new battery as a replacement? [Hint: Use the normal approximation to the distribution of the total lifetime of n batteries for a suitable
Describe the independent auditors responsibility : Many people confuse the responsibilities of the independent auditors. Describe the independent auditors' responsibility regarding financial statements.
What is the company weighted-average cost of capital : What is the company's weighted-average cost of capital if the corporate tax rate is 35%? (Do not round intermediate calculations.

Reviews

Write a Review

Programming Languages Questions & Answers

  Write a haskell program to calculates a balanced partition

Write a program in Haskell which calculates a balanced partition of N items where each item has a value between 0 and K such that the difference b/w the sum of the values of first partition,

  Create an application to run in the amazon ec2 service

In this project you will create an application to run in the Amazon EC2 service and you will also create a client that can run on local machine and access your application.

  Explain the process to develop a web page locally

Explain the process to develop a Web page locally

  Write functions

These 14 questions covers java class, Array, link list , generic class.

  Programming assignment

If the user wants to read the input from a file, then the output will also go into a different file . If the user wants to read the input interactively, then the output will go to the screen .

  Write a prolog program using swi proglog

Write a Prolog program using swi proglog

  Create a custom application using eclipse

Create a custom Application Using Eclipse Android Development

  Create a application using the mvc architecture

create a application using the MVC architecture. No scripting elements are allowed in JSP pages.

  Develops bespoke solutions for the rubber industry

Develops bespoke solutions for the rubber industry

  Design a program that models the worms behavior

Design a program that models the worm's behavior.

  Writing a class

Build a class for a type called Fraction

  Design a program that assigns seats on an airplane

Write a program that allows an instructor to keep a grade book and also design and implement a program that assigns seats on an airplane.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd