Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain instruction stream and data stream, Instruction Stream and Data Str...

Instruction Stream and Data Stream The term 'stream' indicates to a series or flow of either instructions or data operated on by computer. In the entire cycle of instruction ex

Define superscalar processors, In scalar processors just one instruction is...

In scalar processors just one instruction is implemented per cycle which means just one instruction is issued for each cycle and only that one instruction is completed however the

Integrated services digital network service, Whole idea of ISDN is to digit...

Whole idea of ISDN is to digitize the telephone network to permit transmission of video, audio and text over existing telephone lines. The purpose of ISDN is to provide fully integ

Should validation occur server-side or client-side, Should validation (did ...

Should validation (did the user enter a real date) occur server-side or client-side? Why? Validation will be done in both sides i.e., at the server side and client side. Ser

Implement a priority queue, 1. Insert the following characters with their r...

1. Insert the following characters with their respective priorities (shown as ordered pairs) into an empty treap: (K, 17), (F, 22), (P, 29), (M, 10), (N, 15), (L, 26), (G, 13),

Define organizing, Q. Define Organizing? Grouping of related activities...

Q. Define Organizing? Grouping of related activities together, Identification of required activities and forming departments and coordinating various departments with the estab

Software, is c++ is language or any software

is c++ is language or any software

What is xml, XML is the Extensible Markup Language. It betters the function...

XML is the Extensible Markup Language. It betters the functionality of the Web by letting you recognize your information in a more accurate, flexible, and adaptable way. It is e

What is gimps native graphics file format, XCF is GIMP's "native" format. T...

XCF is GIMP's "native" format. This will preserve all information about an image, having the layers.

E-r diagrams, for ticket reservation in trains for payroll processing for i...

for ticket reservation in trains for payroll processing for insurance database

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd