Reference no: EM132871137 , Length: word count:2500
KF7032 Big Data and Cloud Computing - Northumbria University
Big Data and Cloud Computing
Aims
The aim of this assignment is to introduce a practical application of Big Data and Cloud Computing using a realistic big data problem. Students will implement a solution using an industry leading Cloud computing provider together with appropriate distributed processing environments such as Apache Spark. This will involve the provisioning and configuring of appropriate Cloud Computing resources and the selection of problem appropriate algorithms and visualization methods.
Learning Outcome 1: Apply big data analytic algorithms, including those for visualization and cloud computing techniques to multi-terabyte datasets.
Learning Outcome 2: Critically assess data analytic and machine learning algorithms to identify those that satisfy given big data problem requirements.
Learning Outcome 3: Critically evaluate and select appropriate big data analytic algorithms to solve a given problem, considering the processing time available and other aspects of the problem.
Learning Outcome 4: Design and develop advanced big data applications that integrate with third party cloud computing services.
Learning Outcome 5: Critically assess and interpret primary research to identify its applicability to a given big data problem scenario.
Big Data Product: Burglary Protection
In this scenario you are a data scientist working with a marketing consultancy. Your client is an insurance company that is developing a highly segmented home insurance product.Since it is hypothesized that customers who live in an area where burglary is prevalent would be more interested in a new insurance policy, the companywould like to find outwhether Burglary is more frequent in particular areas of England. If that is the case the company needs to determine whether these are areas of affluence, where a premium policywith high benefits could be sold, or one of relative deprivation where alow-cost economic policywith proportionately lower pay-outs would be more appropriate.
To solve this problem, you will usepublicly available data sets that have been prepared for you and placed in Amazon S3. These include (but are not limited to):-
1. Street Level Crime Data published by the UK Home Office, this dataset contains 19 million data rows giving a crime type, together with theirlocation as a latitude and longitude.
2. Land Registry Price Paid Data: This gives the postcode of a property, the property type from a enumeration of D (Detached), S (Semi-Detached), T (Terraced), F (Flats/Maisonettes) and the price paid.
3. English Indices of Deprivation Data:The English Indices of Deprivation 2010 data set contains the rankings of measures of deprivation within small area level across England. The 32000 localities are ranked from the least to most deprived, scored on seven different dimensions of deprivation.
4. Postcode Data: This data set provided by the Ordinance Survey gives a latitude and longitude to every postcode. This is useful in the product to provide a relation between the Land Registry Price Paid dataset postcode, and the original crime dataset latitude/longitude.
Specifics
1. Process the data prepared for you using Apache Spark.
2. Filter the dataset so that crimes refer to Burglary only.
3. Using appropriate software, determine whether Burglary is more closely associated with areas of affluence, relative deprivation, or neither.
4. Select and prepare no more than three visualizations to support your analytic findings from (3).
5. Explain the reasoning behind your code so that it is clear what each block is intended to achieve, and why.
6. Report critically on the advantages, disadvantages, and limitations of the methods used.
7. Your submission will be a Jupyter Notebook containing both code (typically Python), and explanatory text (Markdown) limited to 2500 words (plus references). References from scientific literature must be used and your discussion must be your own words.
Harvard Referencing
Attachment:- Big Data and Cloud Computing.rar