Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Abap/4 module can "branch to " or "call" the next screen, The commands thro...

The commands through Which an ABAP/4 Module can "branch to " or "call" the next screen are Set screen Call screen , Leave screen, Leave to screen

Explain essential properties of real time operating system, Describe the es...

Describe the essential properties of the Real Time operating systems. Real time operating system has following essential properties: Time constraint result Priority

Allocation of bits among opcode and operand, Allocation of Bits among Opcod...

Allocation of Bits among Opcode and Operand The trade-off here is between numbers of bits of opcode vs. the addressing capabilities. An interesting development in this regard i

The extra key inserted at the end of the array, The extra key inserted at t...

The extra key inserted at the end of the array is called a Sentinel is the extra key inserted at the end of the array

What problem does the namespace feature solve, Multiple providers of librar...

Multiple providers of libraries may use common global identifiers causing a name collision when an application tries to link with two or more such libraries. The namespace feature

Why we use debugger, Q. Why we use Debugger? Debugger is a program whic...

Q. Why we use Debugger? Debugger is a program which allows the user to test as well as debug the object file.  Theuser can use this program to perform subsequent functions.

By which all systems are identified, In a LAN network every system is ident...

In a LAN network every system is identified by? In a LAN network all systems are identified through IP Address.

Which one is an error reporting protocol, Which one is an error reporting p...

Which one is an error reporting protocol? An error reporting protocol is ICMP.

What is called static and dynamic branch prediction, What is called static ...

What is called static and dynamic branch prediction? The branch prediction decision is always the similar every time a given instruction is implemented. Any approach that has t

Explain a multiprocessing operating system, Explain a multiprocessing opera...

Explain a multiprocessing operating system? A multiprocessing system is a computer hardware configuration which contains more than one independent processing unit. Multiprocess

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd