Naïve bayes algorithm for text classification, Computer Engineering

Assignment Help:

Assignment 3: Naïve Bayes algorithm for text classification.

First part:

In this assignment, we will redo the task of classifying documents (assignment 2) using the same Reuter dataset. But this time, you should implement the multinomial naive Bayes algorithm instead of KNN. Naive Bayes used to be the de facto method for text classification. Try various smoothing parameters for the Naive Bayes learner. What's the accuracy of your learner? Which parameters work best?

Second Part:

In this part, you will compare between the performance of k-NN classifier and Naïve Bayes classifier for text classification.  Follow the steps below:

1. Take the best classifier from your second assignment (k-NN). Chose the best value of k and best measure of distance/similarity that gave the best performance.

2. Compare the best k-NN with Bayesian classifier. Run 50 times both the k-NN and Bayesian learner. Compute mean and standard deviation of the results. Then, compute t-statistic and at significance levels of 0.005, 0.01, and 0.05 compare which algorithm (k-NN or Bayesian) is better. Report the results in a paper and submit it.

 

 


Related Discussions:- Naïve bayes algorithm for text classification

Correct cluster, For each of the three datasets (data1.txt, data2.txt, and ...

For each of the three datasets (data1.txt, data2.txt, and data3.txt in bnhw2q2.zip), cluster the data using k-means in Matlab, with k=2, 3, 4 and 5 using the provided Matlab script

Logic-based expert systems - , Logic-based Expert Systems - Artificial inte...

Logic-based Expert Systems - Artificial intelligence: Expert systems are agents which are programmed to make decisions about real world situations. They are put together by uti

Anu, write a program to find the area under the curve y=f(x) between x=a an...

write a program to find the area under the curve y=f(x) between x=a and y=b integrate y=f(x) between the limits of a and b. the area under a curve between two points can be found b

Explain advantages and disadvantages of static document, Explain Advantages...

Explain Advantages and Disadvantages of Static Document. The chief advantages of a static document are reliability, performance and simplicity. A browser can display a static d

Programming, how can compare alphabates of two words in c programming ?????...

how can compare alphabates of two words in c programming ?????

Define entry section and exit section, Define entry section and exit sectio...

Define entry section and exit section. The critical section problem is to design a protocol that the processes can use to cooperate. Every process must request permission to e

Pre-os and runtime sub-os functionality, In a raw Itanium, a 'Processor Abs...

In a raw Itanium, a 'Processor Abstraction Layer' (PAL) is incorporated in system. When it's booted PAL is loaded in the CPU and provides a low-level interface which abstracts a nu

C++, a program to find the area under the curve y = f(x) between x = a and ...

a program to find the area under the curve y = f(x) between x = a and x = b, integrate y = f(x) between the limits of a and b. The area under a curve between two points can be foun

Adaptive mechanism in Ais, pls give the list of adaptive mechanism in arti...

pls give the list of adaptive mechanism in artificial immune system

Carry look-ahead adder and booth''s algorithm, Describe carry look-ahead ad...

Describe carry look-ahead adder? Ans: The input carry required by a stage is directly computed from carry signals obtained from all of the preceding stages i-1,i-2,.....0, rat

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd