Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

Explain about parallel programming environment, Q. Explain about parallel p...

Q. Explain about parallel programming environment? The parallel programming environment comprises of a debugger, an editor, performance evaluator, programme visualizer for incr

Discuss the classifications of switching systems, Discuss the classificatio...

Discuss the classifications of switching systems. The categorizations of switching systems are specified in the block diagram given below:

Why a function should have at least one input, Why a function should have a...

Why a function should have at least one input? There is no strong reason for this in verilog. I think this restriction isn't removed fin SystemVerilog. Some requirements where

ANS, Smugglers are becoming very smart day by day. Now they have developed ...

Smugglers are becoming very smart day by day. Now they have developed a new technique of sending their messages from one smuggler to another. In their new technology, they are send

What are the elements of analysis model, What are the elements of Analysis ...

What are the elements of Analysis model? i. Data Dictionary ii. Entity Relationship Diagram iii. Data Flow Diagram iv. State Transition Diagram v. Control Specifica

Artificial neural networks, Artificial Neural Networks: However imagin...

Artificial Neural Networks: However imagine now in this example as the inputs to our function were arrays of pixels and there actually taken from photographs of vehicles such

How to maintain lists, How to maintain lists? To return from a high li...

How to maintain lists? To return from a high list level to the next-lower level (SY-LSIND), the user chooses back on a secondary list.  The system then gives the currently dis

Truth tables, Any function can be expressed in a truth table.A truth table ...

Any function can be expressed in a truth table.A truth table lists all possible combinations ofinputs and gives the output produced in eachcase.Truth tables must include all combin

What are the difference between $display and $strobe, What are the Differen...

What are the Difference between $display and $strobe Difference between $display and $strobe is that $strobe displays parameters at the very end of current simulation time unit

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd