Naïve bayes algorithm for text classification, Computer Engineering

Assignment Help:

Assignment 3: Naïve Bayes algorithm for text classification.

First part:

In this assignment, we will redo the task of classifying documents (assignment 2) using the same Reuter dataset. But this time, you should implement the multinomial naive Bayes algorithm instead of KNN. Naive Bayes used to be the de facto method for text classification. Try various smoothing parameters for the Naive Bayes learner. What's the accuracy of your learner? Which parameters work best?

Second Part:

In this part, you will compare between the performance of k-NN classifier and Naïve Bayes classifier for text classification.  Follow the steps below:

1. Take the best classifier from your second assignment (k-NN). Chose the best value of k and best measure of distance/similarity that gave the best performance.

2. Compare the best k-NN with Bayesian classifier. Run 50 times both the k-NN and Bayesian learner. Compute mean and standard deviation of the results. Then, compute t-statistic and at significance levels of 0.005, 0.01, and 0.05 compare which algorithm (k-NN or Bayesian) is better. Report the results in a paper and submit it.

 

 


Related Discussions:- Naïve bayes algorithm for text classification

PADOVAN STRING, write a program that counts the number of occurrences of th...

write a program that counts the number of occurrences of the string in the n-th Padovan string P(n)

Explain the techniques used for protection of user files, Explain the techn...

Explain the techniques used for protection of user files. This is easier to protect against accidental than malicious misuse. Protection of user files implies that file owne

What are the risks involved in electronic payment systems, What are the ris...

What are the risks involved in Electronic Payment Systems?    From the customer's perspective: Dishonest merchants or financial service providers Stolen payment

Find blocking probably in 100-line strowger switching system, Calculate the...

Calculate the blocking probably Pb in 100 line strowger switching system where 10 calls are in progress and 11th one arrives, probably that there is a call in a given decade = 1/10

How many types of size categories and data classes are there, How many type...

How many types of size categories and data classes are there? There are five size categories (0-4) and 11 data classes only three of which are suitable for application tables:

ERP, ERP usage in real world

ERP usage in real world

Queue, write a program insert and remove value in queue.

write a program insert and remove value in queue.

Explain advantages and disadvantages of static document, Explain Advantages...

Explain Advantages and Disadvantages of Static Document. The chief advantages of a static document are reliability, performance and simplicity. A browser can display a static d

Flynns categorization, Normal 0 false false false EN-US...

Normal 0 false false false EN-US X-NONE X-NONE

Determine the uses of memory blocks, Determine the Uses of memory blocks. ...

Determine the Uses of memory blocks. Not as common a technique though something to consider. As Verilog has a very convenient syntax for declaring and loading memories, you ca

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd