Explore an application of multinomial Naive Bayes

Assignment Help Engineering Mathematics

Reference no: EM132222472

Machine Learning Homework -

Written Questions -

1. Mean and Variance

The most familiar property of a distribution is its mean, or expected value, denoted by μ or E[X]. For discrete random variables (rv's), it is defined as E[X] ∑_x_∈X xp(x), and for continuous rv's, it is defined as E[X] ∫xp(x)dx. The variance is a measure of the "spread" of a distribution, denoted by σ². It is defined as var[X] E[(X - μ)²]. Show that σ² = E[X²] - E[X]², assume X is a discrete random variable. Hint: Expand the definition of variance.

2. Fitting a naive bayes spam filter by hand

Consider a Naive Bayes model for spam classification with the vocabulary V = "secret", "offer", "low", "price", "valued", "customer", "today", "dollar", "million", "sports", "is", "for", "play", "healthy", "pizza". We have the following example spam messages "million dollar offer", "secret offer today", "secret is secret" and normal messages, "low price for valued customer", "play secret sports today", "sports is healthy", "low price pizza". Give the MLEs for the following parameters: θ_spam, θ_secret|spam, θ_{secret|non-spam}, θ_{sports|non-spam}, θ_dollar|spam.

Machine Problem: Text Classification and Naive Bayes

1. Multinomial Naive Bayes Classifier

The goal of this assignment is for you to gain familiarity with the multinomial Naive Bayes classifier. Specifically, you will look into an existing Python-based implementation, fill out the missing code block, and explore an application of multinomial Naive Bayes to a multiclass text classification task.

In the homework package (HW1.tar.gz), you are provided with the starter code and a dataset. The code was written in Python 2.7 and numpy.

There are two data files in the package: positive.review and negative.review. They correspond to positive and negative book reviews. The text has been preprocessed so that each line contains a review document; each token (e.g., year:2) represents a word and its frequency in the document. The last token (e.g., #label#:negative) in each line indicates the polarity (label) of the document.

The starter code includes four files: linear classifier.py, multinomial naive bayes.py, run classifier.py, sentiment reader.py. The functionality of the files should be self-evident..

The file multinomial naive bayes.py currently has a missing code block. Search TODO in the file and you will find the missing block. Your task is to fill out the missing code. Upon successful completion of the code, you will run python run classifier.py and this will return the following results: Accuracy on training set: 0.972500, on test set: 0.835000.

Please submit: A report named report fiirstname lastname.pdf. Copy and paste the missing code block to the report.

Attachment:- Assignment Files.rar

Reference no: EM132222472

Questions Cloud

Bases that they use to segment market for their products : What are some "bases" that they use to segment the market for their product(s) [that make sense to you given the product benefits and product category]?

How useful are budgetary processes : How useful are budgetary processes, such as zero-based budgeting, for human service agencies.

Current state of blockchain-based prediction markets : Research the current state of blockchain-based prediction markets. What firms, organizations, entities could benefit from such markets?

What does it mean to exercise outsight : What does it mean to "exercise outsight?" Why is it important? Give examples of the ways in which outsight can help leaders challenge the process successfully.

Explore an application of multinomial Naive Bayes : CAP5610 Machine Learning Homework - Explore an application of multinomial Naive Bayes to a multiclass text classification task

Develop strategy for addressing board concerns : Develop a strategy for addressing the board’s concerns and winning their buy-in and approval for the PACS project.

The lakeland medical center administrative team-the board : Divide into four teams—the Lakeland Medical Center admin- istrative team, the board, the medical staff and hospital and community at large.

The installation of computed radiography : In 2005, the installation of computed radiography (CR) components to build a picture archiving and communication system (PACS) began,

Identify key hurricane mitigation steps : Identify key earthquake mitigation steps. Identify key hurricane mitigation steps. What types of crisis events provide a warning period?

User Account

All Pages