Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Internrt application, what is the significance of telecommunications deregu...

what is the significance of telecommunications deregulation for managers and organization

What are the features of client/server computing, What are the features of ...

What are the features of Client/Server Computing? Although there are several different configurations, different hardware and software platforms and even dissimilar network pro

Show how pages will be allocated using first-in-first-out, Consider the fol...

Consider the following page reference and reference time strings for a program: Page reference string: 5,4,3,2,1,4,3,5,4,3,2,1,5,..... Show how pages will be allocated using t

Factors of information system which affect the business, Q. Factors of Info...

Q. Factors of Information system which affect the business? There are several factors which affect the business such as - a) Threats of fresh entrants. b) Rivalry surro

State the Process of sending signals to televisions, Determine about the Si...

Determine about the Signals to televisions Signals to televisions are now sent digitally thus need a computer to interpret as well as decode these signals into a sound/picture;

Explain about binning?, Binning process is very helpful to save space. Perf...

Binning process is very helpful to save space. Performance may differ depending upon the query generated sometimes solution to a query can come within some seconds and sometimes it

Associative mapping - computer architecture, Associative Mapping: It i...

Associative Mapping: It is a more flexible mapping technique A primary memory block can be placed into any specific cache block position. Space in the cache may be

What is a container class, What are the types of container classes in C++? ...

What are the types of container classes in C++?  Ans) A container class is a class that is used to hold objects in memory/external storage. A container class behaves as a ge

Minimum possibility -minimax algorithm, Minimum possibility -minimax algori...

Minimum possibility -minimax algorithm: Finally, we want to put the scores on the top edges in the tree. So there is over again a choice. Whenever, in this case, we have to r

Dfd, Give a 5-minute presentation on your team''s solution

Give a 5-minute presentation on your team''s solution

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd