What important lupus data features in building the tree

Assignment Help Data Structure & Algorithms
Reference no: EM131049408

Assignment-

Problem 1: This problem illustrates the classification approach by using decision trees and the Lupus data (you can download the data file "sledata" from D2L site, course documents for week 6). The data consists of 300 patient records. Each record contains 12 elements. The first 11 elements stand for different symptoms and the final element of each record indicates the diagnosis. Build a decision tree and report:

1) The decision tree and the criteria used for building the tree for deciding the best split and the stopping condition (such as which impurity measure, how many cases for parents and children per node, etc)

2) How many nodes the final tree has and how many of them are terminal nodes;

3) What are the most important three Lupus data features in building the tree? Explain your answer.

4) Increase the number of cases for each parent and child. What do you notice with the complexity (number of nodes) of the tree? Does it increase? Explain your answer.

Problem 2: This problem illustrates the effect of the class imbalance of the accuracy of the decision trees. Download the red wine quality data from the UCI machine learning repository at: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

1. Report how many classes (treat each quality level as a different class) are and what is the distribution of these classes for the red wine data is.

2. Repeat Problem 1 on the red wine data.

3. Now bin the class variable in such a way that data is not so imbalanced with respect to the class variable. Repeat Problem 1 but on the wine data with less number of classes (the binned class variable).

4. How the performance of the best classification model on the original class variable compares with the accuracy of the best classification model on the binned classification variable?

5. Do you have any other ideas on how you can improve the results further?

Showing that your idea will actually work will be graded with five extra credit points.

Problem 3: Differentiate between the following terms:

a. feature selection and feature extraction
b. training and testing
c. parametric reduction techniques and non-parametric reduction techniques
d. uniform binning and non-uniform binning
e. covariance matrix and correlation matrix.

Verified Expert

The file has solution for three problems. The problems were solved using spss ibm version 22. The terminal nodes, all nodes, estimated risks were depcited. The Problem 2 used red wine data as provided and calculations were performed and demonstrated. Problem 3 elucidated differentiation of terminologies.

Reference no: EM131049408

Questions Cloud

Earning some economic rent : Two workers, X and Y, are roofers working on the same wage for the same firm. X earns more economic rent than Y. Which of the following statements is false?
Americans with disabilities act enabling technologies : The Americans with Disabilities Act prohibits discrimination on the basis of disability in regard to public accommodations and commercial facilities.
Compute the risk-weighted assets : What is the largest single deposit outflow can the bank comfortably handle, using only primary and secondary reserves? Note that even after this outflow the bank should have sufficient required reserves. Find it to the nearest whole dollar. Comput..
Planning for success project planning : Projects have many dependencies, any of which could become points of failure. Without the cooperation or input from even one vital resource, a project may fail to meet its objectives. Effective project planning helps project managers think through..
What important lupus data features in building the tree : This problem illustrates the classification approach by using decision trees and the Lupus data. What are the most important three Lupus data features in building the tree
The systems development life cycle : The systems development life cycle (SDLC) provides a structured problem-solving software development methodology. What works for information system-related problems, however, also works for many business problems, too. The SDLC provides a framewor..
Checking account card to withdraw : Assume that Jimmy Cash has $3,100 in his checking account at Folsom Bank and uses his checking account card to withdraw $310 of cash from the bank's ATM machine.
Information systems development life cycle : Pick a business task you would like to computerize. How could you use the steps of the information systems development life cycle as illustrated in Figure 12.3 to help you? Use examples to illustrate your answer.
Down for the natural monopoly : What should the role of the government in terms of the Internet taxation and Internet content? Based on which of the roles of government? The Internet Taxation issue is that if you buy products from online sellers than you may not pay sales taxes,..

Reviews

urv1049408

11/30/2018 1:00:27 AM

please see attachment for the data set, thanks! I never realized that I had the full solution to the assignment within 48hrs only after making the payment. I am really surprised because I did not pay any extra payment for such fast delivery. It was just normal pay as regular one. And also there is no compromise with the quality of the work.

Write a Review

 

Data Structure & Algorithms Questions & Answers

  Features of a database

What is a VIEW and what are its uses?

  Computations of database characteristics

A file has r=20,000 student records of fixed-length. Suppose the file is ordered by SSN; compute the number of blocks it takes to search for a record given its SSN value by doing a binary search.

  Inventory tracking database

Construct a relational database of your choice. The DB should contain no more than six tables. Define three business requirements that this database will provide.

  What is the time complexity

Design an algorithm for finding the closest two numbers in the input list - What is the time complexity? Is it probable to improve the algorithm if the input is a sorted list?

  Diagram of a telephone network

Consider a diagram of a telephone network, which is a graph G whose vertices represent switching centers, and whose edges represent communication lines joining pairs of centers. Edges are marked by their bandwidth, and the bandwidth of a path is the ..

  Creating a database with asp.net

Make a database with a table called "MyUsers" and "MyRole" The table should have the following columns.

  Use either the bubble sort or the selection sort algorithms

use either the Bubble Sort or the Selection Sort algorithms

  Question about lan and wan

Think about the following two scenarios two computers are connected to a LAN using a total of 20-feet of cable, and two computers are connected over the Internet and are 8000 miles from each other.

  Telephone number as a string

Write a program that inputs a telephone number as a string in the form (555) 555-5555. The program should use an object of class StringTokenizer to extract the area code as a token, the first three digits of the phone number as a token and the las..

  Sort the objects use one sorting algorithm

Sort the objects use one sorting algorithm (e.g. bubble-sort) and write the sorting results back to a file 126export.txt.

  Prepare the algorithm to solve the puzzle

Alternating disks you have a row of 2n disks of two colors, n dark and n light.

  Design a class that keeps track of a student food purchases

Design a class that keeps track of a student's food purchases at the campus cafeteria. A meal card is assigned to an individual student. When a meal card is first issued, the balance is set to the number of points. If the student does not specify ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd