Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Explain asynchronous decade counter, Draw the circuit diagram of Asynchrono...

Draw the circuit diagram of Asynchronous decade counter and explain its working. Ans: To  design  a  circuit diagram of decade  asynchronous  counter  initially  we  dr

Legacy systems and current infrastructure, Legacy systems and current infra...

Legacy systems and current infrastructure - Information System This problem is most easily understood using the everyday example of the rail ways where the rail network infras

Objectives-classification of parallel computers , Objectives After goi...

Objectives After going through this unit, you should be able to: Describe the diffrent criteria on which classification of parallel computers are based; Examine the

Requirements for cache simulator, Using your cache simulator and using smal...

Using your cache simulator and using smalltex.din as your memory trace determine the total miss rate, compulsory miss rate, capacity miss rate, and conflict miss rate for the follo

the entire pcd division, Will executing  SAP R/3 across the entire PCD div...

Will executing  SAP R/3 across the entire PCD division give the division with a competitive benefit?  Clarify  your answer carefully.

Drill-down features provided by abap/4 in interactive lists, What are the d...

What are the drill-down features provided by ABAP/4 in interactive lists? ABAP/4 gives some interactive events on lists such as AT LINE-SELECTION (double click) or AT USER-COM

Describe the term- macros, Describe the term- Macros A macro is a key o...

Describe the term- Macros A macro is a key or name which signifies a series of commands or key strokes. Many applications allow single nominated key or single word on a keyboar

What is gdpro and magicdraw uml, What is GDPro and  MagicDraw UML GDP...

What is GDPro and  MagicDraw UML GDPro : This  is a full suite of code  management tools and UML. MagicDraw UML: UML diagrams fully support this: MagicDraw RConverter a

How can i delete a file, The Standard C Library function is removing. (This...

The Standard C Library function is removing. (This is thus one of the few questions in this section for which the answer is not ''It's system-dependent.'') On older, pre-ANSI Unix

Logical database structure, It is not essential to maintain the Parent-Chil...

It is not essential to maintain the Parent-Child relationship among the tables in Logical Database Structure. False. One has to handle the Parent-Child relationship.

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd