Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

How much frequency is centered in typical human voice, Typical human voice ...

Typical human voice is centered around                   Hz. (A)  200-400                                     (B)  280-3000  (C)  400-600

Corresponding port numbers, A) Around how many entries are there in this fi...

A) Around how many entries are there in this file on your VM? B) Select and list names and corresponding port numbers for four well-liked services listed in this file?

Electing neil, ELECTING NEIL : The response for this is, of course. In...

ELECTING NEIL : The response for this is, of course. In this case, we considered the point of the search or researching is to find an artifact  -  a word which is an anagram o

Show block diagram of sequential circuits, Q. Show block Diagram of sequent...

Q. Show block Diagram of sequential circuits? A sequential circuit is an interconnection of storage elements and combinational circuits. The storage elements known as flip-flop

What is default look-and-feel of a swing component, The default Look and Fe...

The default Look and Feel of swing components are Java Look-and-Feel.

Network throughput-network properties, Network throughput It is an indi...

Network throughput It is an indicative measure of the point carrying capacity of a network. It is distinct as the total number of messages the network can transmit per unit tim

Define terminal symbols, Define Terminal symbols? Terminal symbols: T...

Define Terminal symbols? Terminal symbols: These are literal strings forming the input of a formal grammar and can't be broken down in slighter units without losing literal m

Application of Theory of computation in DBMS, Sir/Mam, I want to know the a...

Sir/Mam, I want to know the application of Theory of computation in DBMS

Is it possible to change the color and font of the sheet tab, Yes we can mo...

Yes we can modify the color of sheet tabs. By right clicking on sheet tabs and you will get option change color but you didn't find any option to modify the font of sheet tabs.

Explain about api s of olap, Microsoft in the late 1997 introduced a standa...

Microsoft in the late 1997 introduced a standard API called as OLE DB. After which XML was used for analysis specification and this specification was largely used by a lot of vendo

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd