Perform data mining steps on the given dataset

Assignment Help Database Management System
Reference no: EM13934402

Data Mining Project

In this project you will use the sentiment labelled sentences dataset provided in the following link: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences This dataset contains review sentences labeled (classified) as positive and negative such as the following two sentences from imdb movie reviews:

Wasted two hours. 0
Saw the movie today and thought it was a good effort, good messages for kids. 1

If the sentence is labeled as 0, it means a negative comment, if it is labeled as 1 it means a positive comment. There are 3 different files (imdb_labelled.txt, amazon_cells_labelled.txt, yelp_labelled.txt) each containing 500 positive and 500 negative sentences. (amazon and yelp datasets contain more number of instances but the ones labeled as 0 or 1 should be considered only). This data is used in the following paper: Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth: From Group to Individual Labels Using Deep Features. KDD 2015: 597-606

You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.

Steps:

a. Literature review: You should read the following paper to learn what has been done before on this problem: https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf. You should write the summary of this work with your own sentences and this summary will be in the "Related Work" section of your paper.

b. Dataset characteristics: Data description, size, training, test, number of attributes, attribute lists, type of attributes, range of attributes, etc. In this dataset, each distinct word should be considered as an attribute/feature.

c. Data preprocessing: Normalization, missing values, outlier detection, smoothing, attribute reduction/attribute selection, sampling etc.

d. Data mining tasks (Classification): Use Weka (preferred) or any other data mining tool. Perform classification experiments using different algorithms including at least decision trees, naïve bayes, rule learning. Performance analysis with measures covered in the lecture. Discuss the results.

Project Paper: Write your project report in the form of a conference paper. (January 22, 2016) Follow the IEEE template in here: https://www.ieee.org/publications_standards/publications/conferences/2014_04_msw_usltr_format.doc.

Your paper should contain the following sections:

1. Abstract: one paragraph summary of your paper

2. Introduction: Describe the sentiment classification problem, why it is important to classify sentiments (give the motivation). Finally mention what are the contributions of your work in this paper.

3. Related work: your should write the summary of the paper in step a). https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf.

4. Sentiment Classification: You should write data mining steps that you performed in steps b, c, and d except the classification results.

5. Experimental Results: you should report classification results with measures covered in the lecture. You should also discuss the results in this section.

6. Conclusion: Briefly summarize the paper and state your opinions about what can be done to improve classification accuracy further.

Reference no: EM13934402

Questions Cloud

Difference between manufacturing and non manufacturing cost : What is the difference between manufacturing and non manufacturing costs?
Risk level equivalent to that of the overall market : Your portfolio has a beta of 1.54. The portfolio consists of 16 percent U.S. Treasury bills, 34 percent stock A, and 50 percent stock B. Stock A has a risk level equivalent to that of the overall market. What is the beta of stock B?
Write about e-cigarettes topic : Write about e-Cigarettes topic and just write Summry and Quote(s). A brief description of the technology and an explanation of the associated science. On E-Cigarettes topic
What are the current carrying costs : Louise Manufacturing uses 2,200 switch assemblies per week and then reorders another 2,200. The relevant carrying cost per switch assembly is $8.50, and the fixed order cost is $1,100. What are the current carrying costs?
Perform data mining steps on the given dataset : You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.
Assets-liabilities and equity-total revenue and net income : Select one (1) U.S. publicly traded company and review its most recent Annual Report. Use the Income Statement and Balance Sheet to determine the changes in: assets, liabilities, and equity, total revenue and net income.
What is your expected rate of return on stock : You recently purchased a stock that is expected to earn 25 percent in a booming economy, 14 percent in a normal economy, and lose 5 percent in a recessionary economy. There is a 23 percent probability of a boom, a 62 percent chance of a normal econom..
Find holding-period return for one-year investment period : A newly issued bond pays its coupons once a year. Its coupon rate is 5.3%, its maturity is 20 years, and its yield to maturity is 8.3%. Find the holding-period return for a one-year investment period if the bond is selling at a yield to maturity of 7..
Assume that the risk premium : Grammy phone is a cellular firm that reported a net income of $50 million in the most recent financial year. The firm had $1 billion in debt, on which it reported interest expenses of $100 million in the most recent financial year. Also assume that t..

Reviews

Write a Review

Database Management System Questions & Answers

  Decide which attributes are foreign keys and justify this

Foreign keys are not indicated in these relations. Decide which attributes are foreign keys and justify this. Draw an ER diagram for these relations, using your answer to problem 1

  Draw inheritance hierarchy to represent shoe object

Draw an inheritance hierarchy to represent a shoe object. The base class should have derived classes of Dress Shoes, Tennis Shoes and Boots.

  Create an arff file with the data types

Create an arff file with the following data types, flags, unit_id, names must be nominal and timestamps (ts) must be date

  Create and input the design of the table

When building a database, you should first create and input the design of the table and reports at the same time. determine the input and then design the tables

  Create one table representing normalized design

Create one table representing normalized design (3NF) for the important objects and their attributes in a textbook catalogue and accounting office at a university book store

  Data modeling and normalization

Data Modeling and Normalization

  Normalize the table by listing the 2nf tables

Answer Yes or No to the following Functional Dependency questions. Base your answers on the data shown in Table X below. There are no further rows in this table.  Provide reasoning VERY briefly. Normalize the table by listing the 2NF tables

  Vehiclerentaloz data warehousevehiclerentaloz is a large

vehiclerentaloz data warehousevehiclerentaloz is a large chain of vehicle rental company over 500 stores distributed

  Create a supplier database and related reports

Create a supplier database and related reports and queries to capture contact information for potential PC component suppliers that might be used to purchase the equipment your specified in your MS Word project - the PC specifications

  Translation from erd to the relational model

Complete (i.e., reverse engineering) ER diagram below such that 4 relation schemas above are exactly result of a translation from the ERD to the relational model.

  Design relational database using entity-relationship diagram

Design a Relational Database using Entity-Relationship Diagram (ER-D) and design a Relational Database by Mapping Entity-Relationship Diagram (ER-D) into Relational Models.

  Describe how harrahs treats customer data

Describe how Harrah's treats customer data. What is customer lifetime value? Do you think this is an easier metric to calculate at Caesars or Wal-Mart? Why?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd