Reference no: EM132295997
Data engineering technical assignment - data validation and data flow
About this assignment - The objective of this assignment is to help us better understand your technical skills and the way you approach a problem. It's not designed to have a single right answer -- we're more interested in understanding your thought process. We're also looking for the standards and best practices you use in the code you write, and the design decisions you make, so don't be shy about submitting commentary and reflections on the assignment.
Context - At Arup we work a lot with transport data, so we're interested in data-centric services that tell us something about mobility in general, the transport network itself, and/or how it performs.
Assignment brief -
1. Take a look at this dataset about road traffic accidents in 2016.
2. The data contains location information in Eastings/Northings and Longitude/Latitude. Write some code in a language of your choice to validate this location data. Choose whatever validation methods/approaches you feel are appropriate.
3. Pick one of the following scenarios:
a. Managing the arrival of this data in (close to) real time and storing it for later analysis.
b. Distributing this data to a range of consumers, as it arrives.
4. Describe (in words, a sketch, or both) how you would architect a pipeline or service to achieve the task/scenario you chose in point 3 above.
Attachment:- Assignment File.rar