Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

What is language of a model, What is language of a model?   Language of...

What is language of a model?   Language of a model is the collection of formulae that use only the relational symbols the model assign and that use every relation symbol with t

What is clock gating, What is Clock Gating? Clock gating is one of the...

What is Clock Gating? Clock gating is one of the power-saving methods used on several synchronous circuits with the Pentium four processors. To save power, clock gating consid

Analysis of merge sort, i) The width of the sorting + merging circuit is eq...

i) The width of the sorting + merging circuit is equivalent to maximum number of devices needed in a phase is O(n/2). As in the above diagram maximum number of devices for a given

What is a lambda expression, A Lambda expression is not anything but an Ano...

A Lambda expression is not anything but an Anonymous Function, can have expressions and statements. Lambda expressions can be used mostly to make delegates or expression tree types

Computer organisation, How many 32K X 1 RAM chips are needed to provide a m...

How many 32K X 1 RAM chips are needed to provide a memory capacity 256 kilobytes?

What is actor, What is actor? An actor is a direct external user of a s...

What is actor? An actor is a direct external user of a system. Every actor shows objects that behave in a particular way towards systems. Actors are directly linked to system.

What is a table pool, What is a table pool? A table pool (or pool) is ...

What is a table pool? A table pool (or pool) is used to join several logical tables in the ABAP/4 Dictionary.  The definition of a pool having of at least two key fields and a

Connectivity options accessible to internet subscribers, What are the diffe...

What are the different connectivity options accessible to Internet Subscribers? Explain in detail. Internet Connectivity Options: Internet access is perhaps one of the ma

Initial thought process, Initial thought process: Design a script which...

Initial thought process: Design a script which was simple and user friendly.  Integrate procedures/functions to extract data under the hood.  I focused on giving the user the o

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd