Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Power supply in computer system, Q. Power supply in computer system? Co...

Q. Power supply in computer system? Computer operate electronically- either by power supply attained from your electric plug or batteries in case of portable computers. Though

Variables and quantifiers - first-order logic, Variables and Quantifiers: ...

Variables and Quantifiers: Now we have to diagnose now that if we wanted to say that there is a meal at the Red Lion which costs only 3 pounds, is well sayed. Rather thenif we

Architecture of world wide web with the basic entities, Explain architectur...

Explain architecture of World Wide Web with the basic entities. The architecture of the World Wide Web, demonstrated below, is the one of clients, as like Netscape, Lynx or Int

What are near and far pointers, These days, they're pretty much obsolete; t...

These days, they're pretty much obsolete; they're definitely system-particular. They had to do with 16-bit programming under MS-DOS and perhaps some early on versions of Windows. I

Cryptography, Basically I need implement program using LC3 assembly languag...

Basically I need implement program using LC3 assembly language where I can type any message using ASCII code (this will my input). Then read the output in cipher text. It has to be

Find a connection factory, Why do I get an exception when trying to find a ...

Why do I get an exception when trying to find a connection factory? Ans) The exception is regularly something like java.io.InvalidClassException or java.lang.NoClassDefFoundErro

Which device converts BCD to seven segment, A device which converts BCD to ...

A device which converts BCD to Seven Segment is called ? Ans. DECODER is a device that converts BCD to Seven Segment. This coverts binary words in alphanumeric characters.

How do stubs work in a weblogic server cluster, Clients that join to a WebL...

Clients that join to a WebLogic Server cluster and look up a clustered object get a replica-aware stub for the object. This stub haves the list of available server instances that h

How did you find web server related issues, Using Web resource monitors we ...

Using Web resource monitors we can search the performance of web servers. Using these monitors we can examine throughput on the web server, number of hits per second that happened

Calculate period of congestion in a particular exchange, In a particular ex...

In a particular exchange during busy hour 1200 calls were offered to a group of trunks, during this time 6 calls were lost. The average call duration being 3 minutes Calculate

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd