Perform data mining steps on the given dataset

Assignment Help Database Management System
Reference no: EM13934402

Data Mining Project

In this project you will use the sentiment labelled sentences dataset provided in the following link: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences This dataset contains review sentences labeled (classified) as positive and negative such as the following two sentences from imdb movie reviews:

Wasted two hours. 0
Saw the movie today and thought it was a good effort, good messages for kids. 1

If the sentence is labeled as 0, it means a negative comment, if it is labeled as 1 it means a positive comment. There are 3 different files (imdb_labelled.txt, amazon_cells_labelled.txt, yelp_labelled.txt) each containing 500 positive and 500 negative sentences. (amazon and yelp datasets contain more number of instances but the ones labeled as 0 or 1 should be considered only). This data is used in the following paper: Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth: From Group to Individual Labels Using Deep Features. KDD 2015: 597-606

You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.

Steps:

a. Literature review: You should read the following paper to learn what has been done before on this problem: https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf. You should write the summary of this work with your own sentences and this summary will be in the "Related Work" section of your paper.

b. Dataset characteristics: Data description, size, training, test, number of attributes, attribute lists, type of attributes, range of attributes, etc. In this dataset, each distinct word should be considered as an attribute/feature.

c. Data preprocessing: Normalization, missing values, outlier detection, smoothing, attribute reduction/attribute selection, sampling etc.

d. Data mining tasks (Classification): Use Weka (preferred) or any other data mining tool. Perform classification experiments using different algorithms including at least decision trees, naïve bayes, rule learning. Performance analysis with measures covered in the lecture. Discuss the results.

Project Paper: Write your project report in the form of a conference paper. (January 22, 2016) Follow the IEEE template in here: https://www.ieee.org/publications_standards/publications/conferences/2014_04_msw_usltr_format.doc.

Your paper should contain the following sections:

1. Abstract: one paragraph summary of your paper

2. Introduction: Describe the sentiment classification problem, why it is important to classify sentiments (give the motivation). Finally mention what are the contributions of your work in this paper.

3. Related work: your should write the summary of the paper in step a). https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf.

4. Sentiment Classification: You should write data mining steps that you performed in steps b, c, and d except the classification results.

5. Experimental Results: you should report classification results with measures covered in the lecture. You should also discuss the results in this section.

6. Conclusion: Briefly summarize the paper and state your opinions about what can be done to improve classification accuracy further.

Reference no: EM13934402

Questions Cloud

Difference between manufacturing and non manufacturing cost : What is the difference between manufacturing and non manufacturing costs?
Risk level equivalent to that of the overall market : Your portfolio has a beta of 1.54. The portfolio consists of 16 percent U.S. Treasury bills, 34 percent stock A, and 50 percent stock B. Stock A has a risk level equivalent to that of the overall market. What is the beta of stock B?
Write about e-cigarettes topic : Write about e-Cigarettes topic and just write Summry and Quote(s). A brief description of the technology and an explanation of the associated science. On E-Cigarettes topic
What are the current carrying costs : Louise Manufacturing uses 2,200 switch assemblies per week and then reorders another 2,200. The relevant carrying cost per switch assembly is $8.50, and the fixed order cost is $1,100. What are the current carrying costs?
Perform data mining steps on the given dataset : You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.
Assets-liabilities and equity-total revenue and net income : Select one (1) U.S. publicly traded company and review its most recent Annual Report. Use the Income Statement and Balance Sheet to determine the changes in: assets, liabilities, and equity, total revenue and net income.
What is your expected rate of return on stock : You recently purchased a stock that is expected to earn 25 percent in a booming economy, 14 percent in a normal economy, and lose 5 percent in a recessionary economy. There is a 23 percent probability of a boom, a 62 percent chance of a normal econom..
Find holding-period return for one-year investment period : A newly issued bond pays its coupons once a year. Its coupon rate is 5.3%, its maturity is 20 years, and its yield to maturity is 8.3%. Find the holding-period return for a one-year investment period if the bond is selling at a yield to maturity of 7..
Assume that the risk premium : Grammy phone is a cellular firm that reported a net income of $50 million in the most recent financial year. The firm had $1 billion in debt, on which it reported interest expenses of $100 million in the most recent financial year. Also assume that t..

Reviews

Write a Review

Database Management System Questions & Answers

  Analyze the sales data to determine the true track record

You are going to determine the percent of asking price for each home sold and analyze the sales data to determine the true track record of the company in selling homes for the asking price

  What is system r and what are its two major subsystems

What is System R? What are its two major subsystems? How is the data structure of System R different from the relational structure? What is Data Independence?

  Provide the statement necessary to answer the given queries

List all salesperson numbers, salesperson names, and their salaries - List the addresses, balances, and invoice numbers of those customers who were sold merchandises in the database

  Create a data dictionary

Create a data dictionary that includes the following: A description of the content for each field, The data type of each field, The format the data will be stored as in the field

  Prepare an entity relationship diagram

Prepare an entity relationship diagram which models your proposed design. Write the SQL statements to create the database tables, relationships and populate each table with at least 10 records per table, where applicable

  Define database systems and data warehouses

Describe how that firms likely use or should use Management Information Systems, Information Systems and Information Technology as it relates to the various topics covered in the class.

  How bcbs is using analytics to improve their business

Write a paper that expands on the discussion as to how BCBS is using analytics to improve their business and save time and money. Address how they incorporated the software system into their daily business

  Create one table representing normalized design

Create one table representing normalized design (3NF) for the important objects and their attributes in a textbook catalogue and accounting office at a university book store

  Questionsince its establishment in 2003 fresh grocers has

questionsince its establishment in 2003 fresh grocers has expanded its business from a single store in queensland to

  Define the importance of various system analysis

By now you should have a "toolbox" full of useful design and analysis tools. List the tools in your toolbox and then write one short paragraph on how you can/will apply them in your careers, lives, etc.

  Design and implement the best deal business database

Design and implement the Best Deal business database that you have modelled in the assignment-1 and a series of SQL queries to reflect the business logic of the Best Deal.

  What is the functionality of the tool

What is the functionality of the tool and what is the actual running environment (software and hardware) of the tool - how will you evaluate the tool based on your own experience?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd