Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

What are assets in adobe premiere pro, Question: a) What is the meanin...

Question: a) What is the meaning of the term CTI in Adobe Premiere Pro? b) What are assets in Adobe Premiere Pro? c) In Adobe Premiere Pro, what is the principal diffe

Define race condition, Define race condition.  When several process acc...

Define race condition.  When several process access and manipulate similar data concurrently, then the outcome of the implementation depends on particular order in which the ac

Explain anonymous FTP, Explain Anonymous FTP. Use of a login password...

Explain Anonymous FTP. Use of a login password and name helps maintain file secure from unauthorized access. Though, sometimes these authorizations can also be inconvenient.

Determine the object oriented features, Object Oriented Analysis  T h...

Object Oriented Analysis  T he basics of object oriented analysis with the help of object oriented features. Analysis  Analysis is not a solution of the problem. We w

How to changing the directory in dos, Q. How to Changing the Directory in D...

Q. How to Changing the Directory in DOS? You can change the directory. After changing the directory, the new directory becomes the default directory. For example, you change th

What do you mean by linker, Q. What do you mean by Linker? For modular...

Q. What do you mean by Linker? For modularity of your programs it is better to break your program in numerous sub routines. It's even better to put common routine such as read

How many bits of data will be produced if voice is converted, If voice is c...

If voice is converted to digital form using PCM, how many bits of data will be produced in half a second? While voice is converted to digital by using PCM that is Pulse Code Mo

Advantages of edi (electronic data interchange), Advantages of EDI 1.  ...

Advantages of EDI 1.  EDI replaces paper transactions with electronic transactions therefore it saves times and speeds up transactions. 2.  It gives a legal record of busine

Need of the assembly language, Q. Need of the assembly language ? Machi...

Q. Need of the assembly language ? Machine language code comprises the 0-1 combinations which computer decodes directly.  Though the machine language has the following problems

EDC, Conparision of masfet and jfet

Conparision of masfet and jfet

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd