Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain briefly how firewalls protect network, Explain briefly how firewall...

Explain briefly how firewalls protect network.       A firewall is simply a program or hardware device that filters the information coming by the Internet connection into your

print the sum of the numbers in the file, Make a file which kepts on every...

Make a file which kepts on every line a letter, a space, and a real number.  For example, it may look like this: e 5.4 f 3.3 c 2.2 f 1.1 c 2.2 Write a script that uses textscan to

user to enter the weight, A red and blue car were involved in a head-on co...

A red and blue car were involved in a head-on collision. The red car was at a standstill and the blue car was possibly  speeding. Eye witness video recorded suddenly following the

Convert number to hexadecimal and decimal, Perform the following calculatio...

Perform the following calculations assuming that all numbers are stored in 16-bit registers as 2's complement binary numbers with no overflow provision. Convert each of the numbers

Cse 320, Explain the functionality of the quadrant approach in the Spiral M...

Explain the functionality of the quadrant approach in the Spiral Model ?

Conversion of decimal number to binary number, Conversion of Decimal Number...

Conversion of Decimal Number to Binary Number: For converting a decimal number to binary integer part and fractional part are handled separately. Let's describe it with help of

Explain bitwise-inclusive-or operato, Bitwise-Inclusive-OR Operator: i...

Bitwise-Inclusive-OR Operator: inclusive-OR-expression : exclusive-OR-expression inclusive-OR-expression | exclusive-OR-expression The  bitwise-inclusive-OR  operator

How to calculate the logic circuit outputs, How to Calculate the Logic Circ...

How to Calculate the Logic Circuit Outputs? Once the Boolean expression for a circuit output has been acquired, the output logic level can be determined for any set of input le

Explain disadvantage of optimal page replacement algorithm, Explain Disadva...

Explain Disadvantage of Optimal Page Replacement Algorithm Optimal page replacement algorithm cannot be implemented in the general purpose operating system as it is impossible

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd