Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain hardwired control organization, Q. Explain Hardwired control organi...

Q. Explain Hardwired control organization? In the hardwired organization control unit is designed as a combinational circuit. The control unit is applied by gates, flip-flops,

Qwerty - keyboard layout, Q. QWERTY - Keyboard Layout ? A keyboard layo...

Q. QWERTY - Keyboard Layout ? A keyboard layout is arrangement of keys across the keyboard. There is one keyboard layout which anybody who has worked on a standard keyboard or

Differences b/w user level and kernel supported threads, What are the diffe...

What are the differences between user level threads and kernel supported threads? A thread, sometimes termed a lightweight process (LWP), is a fundamental unit of CPU utilizati

Binary resolution, Binary Resolution: However we saw unit resolution f...

Binary Resolution: However we saw unit resolution for a propositional inference rule in the previous lecture:  (A?  B,   ¬B) /A Thus we can take this a little further to

Bangla numeral recognition using multilayer feed forward, Assignment 4: Han...

Assignment 4: Handwritten Bangla Numeral Recognition using Multilayer Feed Forward Neural Network. In this assignment, you will design a multi layer feed forward neural network

Explain collaboration on task and event management, What is difference betw...

What is difference between collaboration on task and event management? Web-based task management application let the user handle the multiple pieces and parts of large projects

How to define a filename in dos, Q. How to define a Filename in DOS? Ea...

Q. How to define a Filename in DOS? Each file is given a name so that it can be referred to later. This name is termed as Filename. The filename in DOS can be up to eight alpha

Implementation of arithmetic circuits, Implementation of Arithmetic Circuit...

Implementation of Arithmetic Circuits for Arithmetic Micro-operation  An arithmetic circuit can be implemented by a number of full adder circuits or parallel adder circuits. F

Explain auto increment and auto decrement mode, Explain Auto increment and ...

Explain Auto increment and Auto decrement mode  The register is incremented or decremented after (or before) its value is used to access memory.  The address stored in the regi

Implement a priority queue, 1. Insert the following characters with their r...

1. Insert the following characters with their respective priorities (shown as ordered pairs) into an empty treap: (K, 17), (F, 22), (P, 29), (M, 10), (N, 15), (L, 26), (G, 13),

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd