Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Which address is specially used by transport layer, The addressing speciall...

The addressing specially used by Transport Layer is? The addressing particularly used through transport layer is application port address.

Public key infrastructure solutions, Public Key Infrastructure solutions ...

Public Key Infrastructure solutions The use of public-key based security systems requires great attention and due care in design and management of security features. The secur

What are the types of parallel programming, Q. What are the types of parall...

Q. What are the types of parallel programming? There are various parallel programming models in general use. A few of them are:  Data Parallel programming Message P

State about the object oriented analysis, State about the Object oriented a...

State about the Object oriented analysis Object oriented analysis (OOA) is concerned with developing software engineering specifications and requirements that expressed as a s

Difference between digital zoom and optical zoom, Question: (a) What ar...

Question: (a) What are effect presets and how can they be helpful? (b) Explain the difference between digital zoom and optical zoom. (c) Explain exposure in the context o

Explain working of counters, Q. Explain working of Counters? A counter ...

Q. Explain working of Counters? A counter is a register that goes through a predetermined sequence of states when clock pulse is applied. In principle value of counters is incr

What are the techniques of data collection, What are the techniques of Data...

What are the techniques of Data Collection It can be either automatic or manual. Manual techniques can include: -  Keypads/Keyboards to type in data -  touch screens to s

Identify free memory areas in allocation and de-allocations, Name and expla...

Name and explain the popular techniques to identify free memory areas as a result of allocation and de-allocations in a heap. Two well-liked techniques to identify free memory

Can we run matlab without graphics, Sometimes you may need to run scripts w...

Sometimes you may need to run scripts which have plotting commands without displaying the plots and without going into the script to comment out the commands. An example: if you're

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd