Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Operating system, explain network operating system and design issues?

explain network operating system and design issues?

What is a structure, What is a structure? A structure is a collection o...

What is a structure? A structure is a collection of variables under a single name. These variables can be of different types, and each has a name which is used to select it fro

Explain briefly the generic framework for e-commerce, Explain briefly the g...

Explain briefly the generic framework for e-commerce.  Generic framework of e-commerce contains the Applications of EC   (like as banking, shopping in online stores and malls,

Explain a macro, Explain a macro Macro is a preprocessor directive, als...

Explain a macro Macro is a preprocessor directive, also called as macro definition takes the following general form:  #define identifier string

Develop an object-oriented design, Investigate the MIPS programmers model a...

Investigate the MIPS programmers model and develop an object-oriented design that will reflect aspects of the MIPS architecture. Consider the functional units of the architecture a

Show list files by dir command, Q. Show list files by Dir command? The ...

Q. Show list files by Dir command? The Dir command can also be used to list files from the exacting directory. For example, the list of files present in the WORD directory in d

Utilization count - processor, The Utilization Count shows the status of ea...

The Utilization Count shows the status of each processor in a specific mode i.e.  Overhead mode, busy mode, and idle mode with respect to the progress in time as shown in Figure.

Explain the term- tracker ball and braille printers, Explain the term- Trac...

Explain the term- Tracker ball and Braille printers Tracker ball Easier to use than a mouse if people have problems using their arms and hands or if they have a coordinati

Bus master - computer architecture, Bus Master: In  computer system,  ...

Bus Master: In  computer system,  bus  mastering  is  a attribute  supported  by  various  bus  architectures  that  enables  a  device linked to the bus to initiate transacti

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd