Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain basic function of keyboard, Q. Explain basic function of Keyboard? ...

Q. Explain basic function of Keyboard? Keyboard is the major input device for your computer. It is an accurate and fast device. The multiple character keys permit you to transm

Determine the uses of defparam, Using defparam Parameter values can be ...

Using defparam Parameter values can be changed in any module instance in the design with keyword defparam. Hierarchical name of the module instance can be used to override para

Explain the disadvantages off-the-shelf, Explain the disadvantages Off-the...

Explain the disadvantages Off-the-shelf -  can be over-complex since it tries to cover as many characteristics as possible (for example most users of Word only utilise about

Addition of array elements using two processors, Q. Addition of array eleme...

Q. Addition of array elements using two processors? In this example we have to find sum of all elements of an array A of size n. We will divide n elements in 2 groups of roughl

Explain advantages and disadvantages of macro pre-processor, What are the a...

What are the advantages and disadvantages of macro pre-processor? Advantages Any of existing conventional assembler can be improved in this way to incorporate macro proces

How many chips will be required in a microprocessor , A microprocessor uses...

A microprocessor uses RAM chips of 1024 × 1 capacity. (i) How many chips will be required and how many address lines will be connected to provide capacity of 1024 bytes. (ii) How

How a physical addressing is performed in wan, How physical addressing is p...

How physical addressing is performed in WAN? WAN networks operate as similar to a LAN. All WAN technology classifies the exact frame format a computer uses while sending and re

What do you mean by rad, a. What are the activities included during the pro...

a. What are the activities included during the process of developing a formal specification of a sub-system interface? b. Explain the Pair Programming? What are the benefits of

Define class np, Define class NP. Problems that can be solved in polyn...

Define class NP. Problems that can be solved in polynomial time by a nondeterministic TM. Contains all problems   in P and some problems possibly outside P.

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd