Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain arithmetic shift micro-operations, Q. Explain arithmetic shift Micr...

Q. Explain arithmetic shift Micro-operations? In arithmetic shift a signed binary number is shifted to right or to the left. So an arithmetic shift-left causes a number to be m

What is digital switch, Digital switch It is a device that handles digi...

Digital switch It is a device that handles digital signals formed at or passed by a telephone company's central office and sends them across the company's backbone network.

What is a program counter, It is a 16 bit special function register in the ...

It is a 16 bit special function register in the 8085 microprocessor. It remains track of the next memory address of the instruction that is to be implemented once the implementatio

Determine about the three-state gate, Determine about the three-state gate ...

Determine about the three-state gate A three-state gate is a digital circuit which shows three states. Two of them are equivalent to logic 1 and 0.  The third one is a high im

Explain the scan code, What is meant by scan code?  When a key is press...

What is meant by scan code?  When a key is pressed on the keyboard, the keyboard controller places a code take to the key pressed into a part of the memory known as the keyboar

Limitations of experts system, 1. It is hard even for a highly skilled ex...

1. It is hard even for a highly skilled experts to abstract good situational assessment when he is under time pressure. 2. Expert systems perform well with specific t

Explain about the biometrics, Explain about the Biometrics  This inclu...

Explain about the Biometrics  This includes obtaining data and identifying characteristics automatically in security systems e.g. use of palm prints, finger prints, facial ima

Illustrate about fifth generation electronic computers, Fifth Generation (1...

Fifth Generation (1984-1990) The advancement of the next generation of computer systems is characterized majorly by the acceptance of parallel processing.  Until this time para

Hard disk technology - computer architecture, Hard Disk Technology: Fi...

Hard Disk Technology: Figure of a computer hard disk drive           HDDs record data by magnetizing ferromagnetic material directionally, to represent

Explain about iframe, Q. Explain about IFRAME? is an HTML 4.0 addition...

Q. Explain about IFRAME? is an HTML 4.0 addition to frames toolbox. Presently only MSIE supports . Unlike frames created employing and

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd