Explore an application of multinomial Naive Bayes

Assignment Help Engineering Mathematics
Reference no: EM132222472

Machine Learning Homework -

Written Questions -

1. Mean and Variance

The most familiar property of a distribution is its mean, or expected value, denoted by μ or E[X]. For discrete random variables (rv's), it is defined as E[X] 2401_figure.pngx∈X xp(x), and for continuous rv's, it is defined as E[X] 2401_figure.png ∫xp(x)dx. The variance is a measure of the "spread" of a distribution, denoted by σ2. It is defined as var[X] 2401_figure.png E[(X - μ)2]. Show that σ2 = E[X2] - E[X]2, assume X is a discrete random variable. Hint: Expand the definition of variance.

2. Fitting a naive bayes spam filter by hand

Consider a Naive Bayes model for spam classification with the vocabulary V = "secret", "offer", "low", "price", "valued", "customer", "today", "dollar", "million", "sports", "is", "for", "play", "healthy", "pizza". We have the following example spam messages "million dollar offer", "secret offer today", "secret is secret" and normal messages, "low price for valued customer", "play secret sports today", "sports is healthy", "low price pizza". Give the MLEs for the following parameters: θspam, θsecret|spam, θsecret|non-spam, θsports|non-spam, θdollar|spam.

Machine Problem: Text Classification and Naive Bayes

1. Multinomial Naive Bayes Classifier

The goal of this assignment is for you to gain familiarity with the multinomial Naive Bayes classifier. Specifically, you will look into an existing Python-based implementation, fill out the missing code block, and explore an application of multinomial Naive Bayes to a multiclass text classification task.

In the homework package (HW1.tar.gz), you are provided with the starter code and a dataset. The code was written in Python 2.7 and numpy.

There are two data files in the package: positive.review and negative.review. They correspond to positive and negative book reviews. The text has been preprocessed so that each line contains a review document; each token (e.g., year:2) represents a word and its frequency in the document. The last token (e.g., #label#:negative) in each line indicates the polarity (label) of the document.

The starter code includes four files: linear classifier.py, multinomial naive bayes.py, run classifier.py, sentiment reader.py. The functionality of the files should be self-evident..

The file multinomial naive bayes.py currently has a missing code block. Search TODO in the file and you will find the missing block. Your task is to fill out the missing code. Upon successful completion of the code, you will run python run classifier.py and this will return the following results: Accuracy on training set: 0.972500, on test set: 0.835000.

Please submit: A report named report fiirstname lastname.pdf. Copy and paste the missing code block to the report.

Attachment:- Assignment Files.rar

Reference no: EM132222472

Questions Cloud

Bases that they use to segment market for their products : What are some "bases" that they use to segment the market for their product(s) [that make sense to you given the product benefits and product category]?
How useful are budgetary processes : How useful are budgetary processes, such as zero-based budgeting, for human service agencies.
Current state of blockchain-based prediction markets : Research the current state of blockchain-based prediction markets. What firms, organizations, entities could benefit from such markets?
What does it mean to exercise outsight : What does it mean to "exercise outsight?" Why is it important? Give examples of the ways in which outsight can help leaders challenge the process successfully.
Explore an application of multinomial Naive Bayes : CAP5610 Machine Learning Homework - Explore an application of multinomial Naive Bayes to a multiclass text classification task
Develop strategy for addressing board concerns : Develop a strategy for addressing the board’s concerns and winning their buy-in and approval for the PACS project.
The lakeland medical center administrative team-the board : Divide into four teams—the Lakeland Medical Center admin- istrative team, the board, the medical staff and hospital and community at large.
The installation of computed radiography : In 2005, the installation of computed radiography (CR) components to build a picture archiving and communication system (PACS) began,
Identify key hurricane mitigation steps : Identify key earthquake mitigation steps. Identify key hurricane mitigation steps. What types of crisis events provide a warning period?

Reviews

Write a Review

Engineering Mathematics Questions & Answers

  Prime number theorem

Dirichlet series

  Proof of bolzano-weierstrass to prove the intermediate value

Every convergent sequence contains either an increasing, or a decreasing subsequence.

  Antisymmetric relations

How many relations on A are both symmetric and antisymmetric?

  Distributed random variables

Daily Airlines fies from Amsterdam to London every day. The price of a ticket for this extremely popular flight route is $75. The aircraft has a passenger capacity of 150.

  Prepare a system of equations

How much money will Dave and Jane raise for charity

  Managing ashland multicomm services

This question is asking you to compare the likelihood of your getting 4 or more subscribers in a sample of 50 when the probability of a subscription has risen from 0.02 to 0.06.]  Talk about the comparison of probabilities in your explanation.

  Skew-symmetric matrices

Skew-symmetric matrices

  Type of taxes and rates in spokane wa

Describe the different type of taxes and their rates in Spokane WA.

  Stratified random sample

Suppose that in the four player game, the person who rolls the smallest number pays $5.00 to the person who rolls the largest number. Calculate each player's expected gain after one round.

  Find the probability density function

Find the probability density function.

  Develop a new linear programming for an aggregate production

Linear programming applied to Aggregate Production Planning of Flat Screen Monitor

  Discrete-time model for an economy

Discrete-time model for an economy

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd