Reference no: EM133764076 , Length: word count:1900
Big Data
Learning outcome 1: Critically analyze a business objective and design and implement a database solution using an established methodology.
Learning outcome 2: Extract meaningful information from data sets using appropriate tools and techniques.
Learning outcome 3: Apply data visualization techniques to enable insightful information analysis for business decision-making, taking account of appropriate legal, ethical, and professional issues related to data.
Part A
The idea of Big Data has been introduced previously. Companies in many fields have been using Big Data for different goals. One of them is market research. This gives them leverage in having insights and helping them solve issues arising in the market. About Big Data, answer the following questions:
Explain the 7 V's of Big Data in your own words and real-world examples of how Big Data is applied in today's computing industry (explain with diagrams).
Describe with real-world examples all four (4) types of table joins (the answers must include relevant tables, SQL code, and diagrams).
Part B
Big Data is only useful when it can detect the trends and patterns of the data. Machine learning can help this process by incorporating its algorithms. Many tools in the market today offer hands- on features to undergo machine learning algorithms. With finding the right trends and patterns in mind, answer the following questions:
Choose one machine learning tool of your liking and give the details below:
The history and technical features of the tool.
The following is a set of steps for RainFall prediction using the WeatherAUS dataset.
Download the dataset from the course Blackboard that is named "WeatherAUS.csv".
Observe the dataset, and you will see a column named "RainTomorrow." This column will be your label/target.
Import this dataset into the tool that you have selected as in Question 1.
Do a data clean-up, i.e., removing missing data.
Choose a machine learning algorithm from the selected tool and explain why it is suitable for your computational problem.
Perform a computation using the selected tool to find the most optimal features from the dataset that accurately predict the label/target. You are not allowed to use "RISK_MM" as a training and testing feature for your machine-learning model.
Choose a performance evaluation technique for evaluating your machine learning algorithm and explain with justification why this technique has been selected.
Evaluate the experimental results with in-depth technical analysis and discussion.
Show all the experimental processes step-by-step with diagrams and explanations.