Data analysis and definition

Assignment Help Database Management System
Reference no: EM13102800

Project Tasks

Task 1: Analytic Objective

Individual groups are expected to come up with an analytic objective for which they are to utilize the knowledge and application of pattern discovery and predictive modelling using the SAS enterprise mining software. A well drafted business case will help you understand your data set; identify variable roles and measurement levels and ultimately your choice or method for doing your analytics.

An example of your analytic objective could take this form:

"A radio station wants to analyze the use of Web services such as simulcasts, podcasts, news streams, music streams, archives, and live Web music to see whether any unusual patterns exist in the combinations of services selected by its Web users. In this case study, you perform an association analysis"

Note: Individual groups are encouraged to come up with different Analytic objectives. No two (2) groups should have the same. Each group should attempt pattern discovery and predictive modelling using the assigned data set for this exercise.

Task 2: Data Analysis and Definition  

Prepare in tabulated form the data dictionary which defines the variables as they appear in your data set as well as the model roles and Measurement levels. An example can be seen below.

Name

Model

Role

Measurement

Level

Description

STOREID

       ID

     Nominal

                 Identification number of the store

 Tip 1: Execute the following steps in SAS Enterprise Miner

(i). Create a project with your group and group number as its name.

(ii). Create a library.

(iii). Create a data source by defining the data set (the one assigned to you) as a data source.

(iv). Determine whether the variable roles and measurement levels assigned to the variables are appropriate. The variable roles and measurement levels should match with the values in the data definition table above. Examine the distribution of the variables

2.1 Answer the following Questions.

1. Are there any unusual data values in any of your assigned input variables? Support your answer with appropriate argument.

2. List  two  possible  strategies  to  handle  cases  with  unusual  values  before  attaching your desired analysis node? Explain the possible scenarios in which those strategies are appropriate.

3. Are there missing values in any of the input variables?

4. If you assigned a variable a rejected role, why is this case?

Task 3: Cluster and Association analysis

For groups requiring running Cluster or Association Analysis the following tips should help you and the questions should be responded to.

Tip 2: Execute the following steps in SAS Enterprise Miner

(v). Add your data source to the diagram workspace.

(vi). Add a  Cluster node to the diagram workspace and connect it to the  data source node.

(vii). Select the Cluster node and select  Internal Standardization - Standardization.

(viii).  Specify a maximum of six clusters and run the diagram from the Cluster node.

(ix). Add a  Segment Profile node to the diagram workspace and connect it to the  Cluster node.

(x). Run the diagram from the Segment Profile node.

3.1      Answer the following Questions.

5. What would happen if you did not standardize your inputs?

6. Using the results of the Segment Profile node, interpret the characteristics of the first three biggest clusters.

7. Why was cluster analysis chosen?

Tip 3: Execute the following steps in SAS Enterprise Miner

(i). Create a new diagram and Name the diagram (Name of your dataset).

(ii). Create a new data source using the data set.

(iii). Assign the variable roles to the variable.

(iv). Add the node for the data set and an Association node to the diagram.

(v). Change the setting for Export Rule by ID to Yes.

(vi). Leave the remaining default settings for the Association node and run the analysis.

3.2 Answer the following Questions.

1. What is the highest lift value for the resulting rules?

2. Which rule has this value?  

3. Why was an Association Analysis run?

Task 4: Predictive Modeling

For groups requiring running their analysis with decision trees, regression and neural networks the following tips should help you and the questions should be responded to

Tip 4: Decision trees - Execute the following steps in SAS Enterprise Miner

(i). Create a new diagram named Predictive Analysis in your project

(ii). Define the data set as a data source for the project. Set the roles for the analysis variables as shown above.

(iii). Add the data set to the diagram workspace.

(iv). Add a  Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation.

(v). Add a Decision Tree node to the workspace and connect it to the  Data Partition node.

(vi). Create a decision tree model autonomously using average squared error as the model assessment statistic.

(vii). Add a second  Decision Tree node to the diagram and connect it to the  Data Partition node.

(viii).  In the  Properties panel of the new  Decision Tree node, change the maximum number of branches from a node to 3 to allow for three-way splits.

(ix). Create a second decision tree model autonomously using average squared error as the model assessment statistic.

4.1 Answer the following Questions.

1. Why was the Target Variable assigned that variable role?  

3. How many leaves are there in the optimal tree created in step (vi)? Which variable was used for the first split and explain why this variable was chosen over others?

4. How many leaves are there in the optimal tree created in step (ix)?

5. Which of the decision tree models appears to be better  

a.   based on average squared error on training data?

b. based on average squared error on validation data?

Tip 5: Regression - Execute the following steps in SAS Enterprise Miner

(x). Attach the  StatExplore tool to the data source and run it. View the results of the StatExplore tool and determine if any of the variables have missing values.

(xi). Add an  Impute node to the diagram and connect it to the  Data Partition node. Set the node to impute  U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs.

(xii). Add a  Regression node to the diagram and connect it to the  Impute node. Choose the stepwise selection and average squared error as the selection criterion.  Run the Regression node and view the results.

(xiii). Disconnect the  Impute node from the  Data Partition node. Add a  Transform Variables node to the diagram and connect it to the  Data Partition node. Connect the  Transform Variables node to the Impute node.

(xiv). Apply  a  log  transformation  to  the  DemAffl  and  PromTime  inputs  and  Run  the Transform Variables node.

(xv). Rerun the Regression node.

4.2 Answer the following Questions.

6. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?  

7. Which variables are included in the final regression model generated in step (xii)? List the variables in the descending order of importance to the model.

8. Which variables are included in the final regression model generated in the last step?

9. Based on average squared error on the validation data, which of the two regression models generated appear to be better?

Tip 6 : Neural Networks - Execute the following steps in SAS Enterprise Miner

(xvi).   Add a  Neural Network tool to the diagram. Connect the  Impute node to the Neural Network node.

(xvii). Set the model selection criterion to average squared error. Run the Neural Network node.

4.3 Answer the following Questions.

10. How many weights does the neural network model generated in step (xvii) include?

11. Examine the validation average squared error of the neural network model. How does it compare  to  the  two  decision  tree  models  and  the  regression  model  generated  after applying log transformation?

Task 5: Compare your models

Execute the following steps in SAS Enterprise Miner

(xviii). Add a  Model Comparison node to the diagram. Connect it to all the predictive models generated in the earlier steps.

(xix).   Run the Model Comparison node.

4.4 Answer the following Questions.

12. Examine the results of the Model Comparison node. Of the predictive models compared which model has been selected by the Model Comparison node? Based on what selection criteria this model has been selected?  

13. Change the default values of the Model Comparison node properties so that it selects the model having the least average squared error on the validation data. Run the Model Comparison node again. Which model has been selected now?  

14. Why are the models compared

Task 6: Business Implication

1. From the outcome of your analysis of the data set and the business case you have come up with, what can you deduce, recommend and conclude.

2. What is the business implications that can be drawn from the process of building and comparing these models, and has this practice helped resolve the business issue? Why or why not?

Reference no: EM13102800

Questions Cloud

Estimate arguments and overcome errors in truth or validity : Write down some examples of errors which affect truth and validity? Sum-up the steps you would take to estimate arguments and overcome any errors in truth or validity.
Probability of alternators of airplane : A certain airplane has two alternators to provide electrical power. The probability that a given alternator will fail on a 1-hour flight is.02.
Binomial probabilities and gender of children : Assume that male and female births are equally likely and that the birth of any child does not affect the probability of the gender of any other children.
Average cost of a one bedroom apartment : The average cost of a one bedroom apartment in a town is $550 per month. What is the probability of randomly selecting a sample of 50 one bedroom apartments in this town and getting a sample mean of less then $530 if the population standard deviat..
Data analysis and definition : What is the business implications that can be drawn from the process of building and comparing these models, and has this practice helped resolve the business issue? Why or why not?
Differentiate between influences of heredity and environment : Differentiate between influences of heredity and environment on his/her psychological development. Ensure to specify which area of psychological development (moral, emotional, etc.).
Probability of randomly selected bag having chocolate chips : The number of choco chips in an 18 ounce bag of chocolate chip cookies is approximately normally distributed with a mean of 1256 and a standard deviation of 129 chips, What is the probability that a randomly selected bag contains fewer than 1000 c..
Calculate the heading : You are piloting a small plane that can maintain an air speed of 136.5 kt (knots, or nautical miles per hour) and you want to fly due north (azimuth = 000°) relative to the ground.
Discrete and a continuous random variable. : Explain the difference between a discrete and a continuous random variable. Give two examples of each type of random variable.

Reviews

Write a Review

 

Database Management System Questions & Answers

  An active database in pl-sql

Did the corresponding lines for invoices 1001 and 1008 in table LINE get deleted automatically? Can you explain why?

  Determine the cartesian product of sets

Determine the Cartesian product (A X B) of the given pair of sets. A => Set of first letter of names of countries = A = { R, S, T } B => Set of numbers denoting Postal codes = B = { 0001, 1001}.

  Use of data definition language-data manipulation language

Pick one of the different types of SQL statements (data definition language, data manipulation language, or data control language). Describe how that type of statement is use.

  Explain primary problem in credit card records

You have a file which contains credit card records. Each record contains field for the card number, expiration date, and name of card holder. Explain what primary problem you try to solve is.

  Design premiere products databas-shows relationship

Indicate changes you require to make to design of Premiere Products database to support following situation: There is no relationship between customers and sales reps.

  Key value of fourth index record on top level

What is the key value of the 4th index record on the top level, assuming each index record points to the record with the highest key value in a block of the next level down?

  Design database for company organized into departments

Database designers stated following dimension of "miniworld"- to be represented in Company Supplier-Part database: company is organized into departments. Each department has unique name, unique number.

  Explain role of a database administrator

Explain the role of a database administrator,the tasks performed by this role and why this role is important in Database Management.

  Design scheme for determining wine bottle which poisoned

Design a scheme for determining exactly which one of the wine bottles was poisoned in just one month's time while expending O(logn) taste testers.

  Design updateable database for storing customer and sales

Design an updateable database for storing customer and sales data. Explain how to deal with the problems of missing data.

  Create link list in adt to maintain employee information

Create a link list in ADT c to maintain employee information like name,empid,basic salary and address.1.add employee info to the list if the empid is valid.

  Create microsoft access database

Create a Microsoft Access database. Create the tables, fi elds, data types, and primary key(s) for the database. Create the relationship(s) needed between the tables.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd