Compute the constants of the classification functions

Assignment Help Computer Engineering
Reference no: EM131926092

Problem

Detecting Spam E-mail (from the UCI Machine Learning Repository). A team at Hewlett-Packard collected data on a large number of e-mail messages from their postmaster and personal e-mail for the purpose of finding a classifier that can separate e-mail messages that are spam vs. non-spam (a.k.a. "ham"). The spam concept is diverse: It includes advertisements for products or websites, "make money fast" schemes, chain letters, pornography, and so on. The definition used here is "unsolicited commercial e-mail." The file Spambase.csv contains information on 4601 e-mail messages, among which 1813 are tagged "spam." The predictors include 57 attributes, most of them are the average number of times a certain word (e.g., mail, George) or symbol (e.g., #, !) appears in the e-mail. A few predictors are related to the number and length of capitalized words.

a. To reduce the number of predictors to a manageable size, examine how each predictor differs between the spam and non-spam e-mails by comparing the spam-class average and non-spam-class average. Which are the 11 predictors that appear to vary the most between spam and non-spam e-mails? From these 11, which words or signs occur more often in spam?

b. Partition the data into training and validation sets, then perform a discriminant analysis on the training data using only the 11 predictors.

c. If we are interested mainly in detecting spam messages, is this model useful? Use the confusion matrix, lift chart, and decile chart for the validation set for the evaluation.

d. In the sample, almost 40% of the e-mail messages were tagged as spam. However, suppose that the actual proportion of spam messages in these e-mail accounts is 10%. Compute the constants of the classification functions to account for this information.

e. A spam filter that is based on your model is used, so that only messages that are classified as non-spam are delivered, while messages that are classified as spam are quarantined. In this case, misclassifying a non-spam e-mail (as spam) has much heftier results. Suppose that the cost of quarantining a non-spam e-mail is 20 times that of not detecting a spam message. Compute the constants of the classification functions to account for these costs (assume that the proportion of spam is reflected correctly by the sample proportion).

Reference no: EM131926092

Questions Cloud

How many years is it until the bond matures : The Lo Sun Corporation offers a 6% bond with a current market price of $875.05. The yield to maturity is 7.34%. The face value is $1,000.
Describe in your own words what a systematic review is : Explain distinct key reasons which indicate that the study by Choowong, Tillgren and Söderbäck (2017) is qualitative in nature
Sodium metal in liquid ammonia : What would be the product formed if (S)-4-methyl-2-heptyne were reacted with sodium metal in liquid ammonia?
Calculate the current price of each of the three bonds : John is a recent retiree who is interested in investing some of his savings in corporate bonds. His financial planner has suggested the following bonds.
Compute the constants of the classification functions : In the sample, almost 40% of the e-mail messages were tagged as spam. Compute the constants of the classification functions to account for this information.
Aqueous sulfuric acid and mercury : What would be the product(s) formed if (6R,7R)-3-ethyl-6,7-dimethyl-4-nonyne were reacted with aqueous sulfuric acid and mercury(II) sulfate?
Which repayment method results in higher home equity : What is the difference in total interest payments between the two alternative payment methods? Hint: The total of payment for Option 2 involves.
How much experience must be accumulated by an administrator : How much experience must be accumulated by an administrator with 4 training credits before his or her estimated probability of completing the tasks exceeds 0.5?
Identify a case and write a paper describing the cases focus : What role has each interest group played in American politics? Provide two (2) examples for each group.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd