Reference no: EM132239041
Business Intelligence and Analytics Assignment -
1- [Introduction to Data Analytics] Specify whether each of the following activities is a data analytics task or not. And if yes, what type of task (e.g., classification, regression, clustering, association mining, etc.) it is.
a. Reporting total revenue of a product over a year in different branches to the CEO.
b. Predicting the credit score of a customer using historical data.
c. Predicting the outcome of flipping a coin.
d. Partitioning students into similar groups based on their courses, programs, and demographic profile.
2- [Data Warehousing] Jill, a manager in an I.T. department, is trying to explain to Joe, the head of marketing, why the company needs to have two major information systems, an ERP OLTP system and a data warehouse OLAP system. Write an explanation for Jill that she can give to Joe that discusses why these two systems are necessary and then explain to Joe how his department should use each of the systems.
3- [OLAP Operations] Assume that we have a data cube consists of three dimensions: Product, Time and Location. The values (i.e., facts) show the revenue in thousand dollars.
Assume that we have the following hierarchy:
What OLAP operations should be performed in order to answer the following questions:
a) Total revenue earned by Ontario in 2015.
b) Total revenue earned by each product in each country in 2012.
4- [Introduction to Data Mining] As a business analyst in TD, you were asked to design a data-driven solution for the mortgage approval process. There are customers who apply for mortgage and after approval, some customer cannot pay their mortgage back. The goal is to design a system to evaluate the history of an applicant and give useful information to the advisor regarding the application before the approval. Given CRISPDM process, explain each step (e.g., questions and their answers in each step) of the process to design the solution for the mortgage approval process.
Note: The best answer to this question is the one that customizes all the steps for the given case (TD mortgage) and describes the steps in the context of the problem. Having general questions as the definition of the steps is not very appealing!
5- [Predictive Analytics] You are given the decision tree for the concept "buy_computer" that indicates whether a customer at a company is likely to buy a computer or not.
Determine if the following loan applications should be approved and explain the steps (If the tree cannot provide the value for the instance, explain why):
a) Bob is a young student, his income is 25k and has a criminal record.
b) Marry's income is 45k and she doesn't have any criminal record.
c) Ellen is working as a data scientist since 2015 and makes all credit card payments.
d) Assume that the actual target values of Bob, Marry, and Ellen are "Yes", "No" and "No" respectively. How do you evaluate the precision of this decision tree based on these three instances? Why?
6- [Predictive Analytics] You are given a training dataset shown in the following table:
Credit
|
Income
|
Fraud
|
High
|
High
|
NO
|
High
|
High
|
NO
|
High
|
High
|
NO
|
High
|
Low
|
NO
|
High
|
Low
|
YES
|
Low
|
Low
|
YES
|
Low
|
Low
|
YES
|
Low
|
Low
|
YES
|
High
|
High
|
YES
|
High
|
Low
|
YES
|
*Credit and Income are attributes (features) and Fraud is the target attribute
Suppose you want to build a decision tree from these training examples to predict fraud. Which attribute between Credit and Income would you choose for the root of the decision tree if the information gain criterion is used? Why? Show your steps.
Note: you can use a calculator to find the values of the log2 function. Submit your assignment as a PDF file.