Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

What is the main function of the memory-management unit, What is the main f...

What is the main function of the memory-management unit? The runtime mapping from virtual to physical addresses is done by a hardware device known as a memory management unit

Explain naming convention in arrays, Explanation Arrays in many programm...

Explanation Arrays in many programming-languages generally show a fixed list of values (e.g. a list of lastnames). Though within SQABasic the size for an array can either be fix

What are the functions of passes used in two-pass assembler, What are the f...

What are the functions of passes used in two-pass assembler? In an assembly language program, two pass translations can handle forward references early. The subsequent task

Recursion to an iterative procedure, The data structure required to convert...

The data structure required to convert a recursion to an iterative procedure is  Stack is the data structure required to convert a recursion to an iterative procedure

Computer to computer transmission of structured data, Computer to computer ...

Computer to computer transmission of structured data using standardised documentation is known as Electronic data interchange (EDI).

State the importance of CRT monitor, State the importance of CRT monitor ...

State the importance of CRT monitor CRT is considered to be one of the most important component because the quality of displayed image influences the perception of generated de

Dfd, dfd for big bazaar

dfd for big bazaar

8259 PIC, plz expalain interfacing of 8259 with 8085 step wise

plz expalain interfacing of 8259 with 8085 step wise

Explain the concept of thread, The Concept of Thread A thread is a sequ...

The Concept of Thread A thread is a sequential flow of control within a process. A process is able to have one or more threads. Threads have their own register-values and progr

Call the masm procedure, Assignment:  write a C program and a MASM procedur...

Assignment:  write a C program and a MASM procedure.  The C program calls the MASM procedure to perform letter case conversion. Text sections covered:  12.1 to 12.3.1 Write a

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd