Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Infix to reverse polish, A) Change the following formulas from reverse Poli...

A) Change the following formulas from reverse Polish to infix:             a) AB +C + D x               b) ABCDE + x x / B) Change the following formulas from infix to

Binary search tree, Given the following interface public interface WordS...

Given the following interface public interface WordSet extends Iterable { public void add(Word word); // Add word if not already added public boolean contains(Word word);

User compatibility & appropriateness of inputs and outputs, User Compatibil...

User Compatibility & Appropriateness of Inputs and Outputs User compatibility - this is the degree to which the machine works towards task completion in way that suits the u

Compute the frequency of the radiation, Q. The work function of a metal sur...

Q. The work function of a metal surface is 6.626 X 10-19 joule. Compute the frequency of the radiation? Work function                                     W = hγ o The fre

Connectives in first-order logic sentences, Connectives in first-order logi...

Connectives in first-order logic sentences - Artificial intelligence We may string predicates together into a sentence in the same way by utilising connectives that we did for

Benefit of digital versatile disk read only memory, Q. Benefit of digital v...

Q. Benefit of digital versatile disk read only memory? The main benefit of having CAV is that individual blocks of data can be accessed at semi-random mode. So head can be move

Cut-off search - artificial intelligence, Cut-off Search: By using  a m...

Cut-off Search: By using  a minimax search, all we have to do is program, in a game playing situation our agent to look at the whole  search tree from the current state of the

Nor gate, The NOR gate. The NOR gate is equivalent to an OR gate follow...

The NOR gate. The NOR gate is equivalent to an OR gate followed by a NOT gate so that the output is at logic level 0 when any of the inputs are high otherwise it is at logic le

What is connection, An established communication session among a server and...

An established communication session among a server and a workstation.

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd