Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

How cloud can be used in event schedules, How cloud can be used in Event Sc...

How cloud can be used in Event Schedules and Management with an example. Web-based calendars can be used to post dates and schedules for specific public events, such as school

Name some of synthesizable and non-synthesizable constructs, Can you list o...

Can you list out some of synthesizable and non-synthesizable constructs? not synthesizable->>>> initial ignored for synthesis. delays  ignored for synthesis. ev

System software, what can be the good projects for system software subject

what can be the good projects for system software subject

What do you call an event and when do you call an assertion, What do you ca...

What do you call an event and when do you call an assertion? Assertion based Verification Tools, checks whether a statement holds a explained  property or not, while, Event bas

Full form of ldap, LDAP stands for LDAP- Light weight Directory Access...

LDAP stands for LDAP- Light weight Directory Access Protocol.

Formal analysis of visual elements, Formal Analysis: The second step o...

Formal Analysis: The second step of the art critiquing process often begins with an analysis of the artworks formal elements and how they are the organised. The formal elem

Real-time systems, This document is intended to help students get started w...

This document is intended to help students get started with the real-time systems (RTS) assignment. We will start on the assignment together in the laboratory. Students will then c

Propositional logic - artificial intelligence, Propositional Logic - artifi...

Propositional Logic - artificial intelligence: This is a limited logic, which permit  us to write sentences about propositions - statements about the world - which can either b

Define memory utilization factor, Memory utilization factor shall be comput...

Memory utilization factor shall be computed as? Ans. memory in use/total memory connected.

Cut-off search - artificial intelligence, Cut-off Search: By using  a m...

Cut-off Search: By using  a minimax search, all we have to do is program, in a game playing situation our agent to look at the whole  search tree from the current state of the

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd