Examine the term-document matrix

Assignment Help Computer Engineering
Reference no: EM131926008

Problem

Classifying Classified Ads Submitted Online. Consider the case of a website that caters to the needs of a specific farming community, and carries classified ads intended for that community. Anyone, including robots, can post an ad via a web interface, and the site owners have problems with ads that are fraudulent, spam, or simply not relevant to the community. They have provided a file with 4143 ads, each ad in a row, and each ad labeled as either -1 (not relevant) or 1 (relevant). The goal is to develop a predictive model that can classify ads automatically.

• Open the file farm-ads.csv, and briefly review some of the relevant and non-relevant ads to get a flavor for their contents.

• Following the example in the chapter, preprocess the data in R, and create a term document matrix, and a concept matrix. Limit the number of concepts to 20.

a. Examine the term-document matrix. i. Is it sparse or dense? ii. Find two non-zero entries and briefly interpret their meaning, in words (you do not need to derive their calculation)

b. Briefly explain the difference between the term-document matrix and the concept document matrix. Relate the latter to what you learned in the principal components chapter (Chapter 4).

c. Using logistic regression, partition the data (60% training, 40% validation), and develop a model to classify the documents as ‘relevant' or ‘non-relevant.' Comment on its efficacy.

d. Why use the concept-document matrix, and not the term-document matrix, to provide the predictor variables?

Reference no: EM131926008

Questions Cloud

Explain the contents of some or all of the given clusters? : What other external information can explain the contents of some or all of these clusters? Remove all records with missing measurements from the dataset.
What is nashs thesis : What is he trying to convince you is true about people Colonial America and the reasons they might have participated in the American Revolution?
What happens after the jury has returned a verdict : What happens after the jury has returned a verdict. The discussion will cover motions for a new trial, motions in arrest of judgment, as well as the appeal.
How many natural clusters appear : Perform hierarchical clustering and inspect the dendrogram. From the dendrogram, how many natural clusters appear?
Examine the term-document matrix : Examine the term-document matrix. i. Is it sparse or dense? ii. Find two non-zero entries and briefly interpret their meaning, in words.
Which specific details in agee description of boyhood : Which specific details in Agee's description of his boyhood in Knoxville suggest his attitude toward the people and the rhythm
Calculate the equivalent annual net costs : Make comparisons of these projects to establish which has higher/lower present values of their costs. Use the "rollover" method to establish equal project.
What are the elements of poetry : What are the elements of poetry, and how can poetry stimulate the imagination in children? 150 please
What is black lives matters : What is black lives matters? When did this organization come about and why?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd