Determine the best match for a token

Assignment Help Computer Engineering
Reference no: EM131635118

Knowledge Technologies Project: Lexical Normalisation of Twitter Data

Overview -

The goal of this Project is to assess the performance of some spelling correction methods on the problem of tweet normalisation, and to express the knowledge that you have gained in a technical report. This aims to reinforce concepts in approximate matching and evaluation, and to strengthen your skills in data analysis and problem solving.

Deliverables

1. One or more programs, implemented in one or more programming languages, which must:

  • Determine the best match(es) for a token, with respect to a reference collection (dictionary)
  • Process the data input ?le(s), to determine the best match for each token
  • Evaluate the matches, with respect to the truly intended words, using one or more evaluation metrics

2. A README that brie?y details how your program(s) work(s). You may use any external re- sources for your program(s) that you wish: you must indicate these, and where you obtained them, in your README. The program(s) and README are required submission elements, but will not typically be directly assessed.

3. A technical report, of 1000-1600 words, which must:

  • Give a short description of the problem and data set
  • Brie?y summarise some relevant literature
  • Brie?y explain the approximate matching technique(s), and how it is (they are) used
  • Present the results, in terms of the evaluation metric(s) and illustrative examples
  • Contextualise the system's behaviour, based on the (admittedly incomplete) understanding from the subject materials
  • Clearly demonstrate some knowledge about the problem

Terms of Use

By using this data, you are becoming part of the research community - consequently, as part of your commitment to Academic Honesty, you must cite the curators of the dataset in your report, as the following publication:

Bo Han and Timothy Baldwin (2011) Lexical normalisation of short text messages: Makn sens a #twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, USA. pp. 368-378.

Reports that do not cite this work constitute plagiarism, and will be correspondingly assigned a mark of 0.

Please note that the dataset is a sub-sample of actual data posted to Twitter, with almost no ?ltering whatsoever. Unfortunately, the Internet is a place where freedom of speech is both empowering and harmful: consequently, some of the information expressed in the tweets is undoubtedly in poor taste. We would ask you to please look beyond this to the task at hand, as much as possible. (For example, it is generally not necessary to actually read the tweets themselves.)

The opinions expressed within the tweets in no way express the of?cial views of the University of Melbourne or any of its employees; using the data in a teaching capacity does not constitute endorsement of the views expressed within. The University accepts no responsibility for offence caused by any content contained within this data.

Attachment:- Assignment Files.rar

Reference no: EM131635118

Questions Cloud

List roles manages play in carrying out management functions : List and describe five roles manages play in carrying out their management functions. Write a short essay on why internal controls are necessary.
Describe the etiology and pathophysiology of morbid obesity : Describe the etiology and pathophysiology of morbid obesity. include in the discussion the current accepted diagnostic criteria for morbid obesity
What are the motivations of average congressmen : What are the motivations of average congressmen? (What influences their vote? What structures their lives? What are their fears and aspirations?)
Create a powerpoint project : Choose from one of the topics below and analyze its history from 1877 to the present - Write an introduction with a thesis. Your thesis should summarize
Determine the best match for a token : COMP90049 Knowledge Technologies Project: Lexical Normalisation of Twitter Data. Determine the best match(es) for a token with respect to a reference collection
Discuss the command-and-control policy : Consider two approaches to reducing emissions of CO2 into the environment from manufacturing industries in the United States.
Why human factors can influence the effectiveness of policy : How or why human factors can influence the effectiveness of each recommended policy control. Zero Day Exploits employed for economic or military advantage.
Is voter fraud a major problem for our democracy : Is voter fraud a major problem for our democracy or are some groups trying to make it harder for some segments of society to vote?
How is juvenile crime measured in the united states : How is juvenile crime measured in the United States? Include in your response a brief description of each measurement and specify whether the measurement.

Reviews

len1635118

9/9/2017 6:38:46 AM

Australian Client, Message if(Any):- Lexical Normalisation of Twitter Data. Submission materials: Source code, README; PDF Report. Please go through them well! I need this as work is incomplete. The instructions mentioned to submit a READ ME file. It is not submitted. Please submit that as well urgently. You will attempt a representative sample of approximate matching techniques, which is adequate for deriving some knowledge about the problem of tweet normalisation. You will evaluate your method(s) formally.

len1635118

9/9/2017 6:38:37 AM

You will explain the practical behaviour of your systems, referring to the theoretical behaviour where appropriate. You will support your observations with evidence, in terms of illustrative examples and evaluation metrics. You will derive some knowledge about the problem of tweet normalisation. You will produce a formal report, which is commensurate in style and structure with a (short) research paper. You must express your ideas clearly and concisely, and remain within the word limit (1000-1600 words). You will include a short summary of related research. We will post a marking rubric to indicate what we will be looking for in each of these categories when marking.

Write a Review

Computer Engineering Questions & Answers

  Design reportthe design report is a single team report that

design reportthe design report is a single team report that describes the functionality of the application in its final

  Design a serial adder circuit to add two bcd digits

Design a serial adder circuit to add two BCD digits. Each digit is in a four-bit shift register. The sum should occupy a five bit shift register.

  Operation of project management

Operation of project management to the specification, design and implementation of the project -  corporate management

  Create an array of peoples first names

Create an array of people's first names. Using a loop, read the names from a text (.txt) file, and store each one into the array.

  What is a voluntary response sample

Voluntary Response Sample Some magazines and newspapers conduct polls in which the sample results are a voluntary response sample.

  Find the average number of miles per gallon

design a class named Vehicle that acts as a superclass of vehicle types. The Vehicle class contains private variable fo rthe number of wheels and the average number of miles per gallon.

  Explain electrical and computer engineering

Electrical and Computer Engineering, Suppose Xt is a random process which is second-order stationary. Show that it is also stationary of order 1.

  What is the accuracy when a iv signal is digitized

An A/D convener is to digitize a 10 V full-scale signal to a resolution of 1 part in 1024.

  Design a logic circuit to implement the function

Design a logic circuit to implement the function F(A,B,C) = (SOP)(1,3,7) with a 3-to-8 Decoder with active high outputs. Use a gate with the minimum fan-in

  Specify the foreign keys for schema

Specify the foreign keys for this schema, stating any assumptions you make. Next, populate the relations with a few example tuples, and then give an example of an insertion in the SALES and SALESPERSON relations that violates the referential integ..

  Discretionary access control

As you look at restricting access to data in the organization, you have been asked which control model will best meet the organization's needs. Take this opportunity to describe the following 4 access control methods

  List several of the technical requirements placed

List several of the technical requirements placed on software applications (§ 1194.21) and web-based applications (§ 1194.22), then explain the purpose each serves.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd