Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Vector processing, Vector Processing  A vector is an ordered set of the...

Vector Processing  A vector is an ordered set of the similar type of scalar data items. The scalar item can be a floating point number, a logical value or an integer. Vector pr

What is the security vulnerabilities for vba, Microsoft Visual basic has it...

Microsoft Visual basic has its host of problems one such being macros, Macros can be formed which can make havoc for a programmer with good intentions. Also the security issue rest

Isoquants, what are the types of isoquants

what are the types of isoquants

Calculate the switching elements in a two stage network, In a two stage ne...

In a two stage network there are 512 inlets and outlets, r=s=24. If the probability that a given inlet is active is 0.8, calculate: the switching elements Given: N =M =512,

Illustrate lcd technology, Q. Illustrate LCD Technology? The technology...

Q. Illustrate LCD Technology? The technology behind LCD is known as Nematic Technology since the molecules of liquid crystals used are nematic which implies that rod-shaped. Th

Problems for decision tree learning, A ppropriate Problems for Decision Tr...

A ppropriate Problems for Decision Tree Learning - Artificial intelligence It is a expert job in AI to select accurately the right learning representation for a particular lea

Explain client server model, Explain Client Server Model. In the client...

Explain Client Server Model. In the client- server model, communication usually takes the form of a request message from the client to the server asking for several works to be

What does the ''suppress dialog'' do, What does the 'SUPPRESS DIALOG' do? ...

What does the 'SUPPRESS DIALOG' do? Suppressing of whole screens is possible with this command.  This command permits us to perform screen processing "in the background".  Sup

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd