Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Safety argument for good design, So far we have considered the problems of ...

So far we have considered the problems of poor ID in terms of the loss of productivity and efficiency to business. There is another important aspect to consider: the issue of safet

Explain classification based on grain size, Explain Classification Based On...

Explain Classification Based On Grain Size This categorization is based on distinguishing the parallelism in a program to be executed on a multiprocessor system. The concept is

Describe characteristics needed for an e-commerce server, Describe the addi...

Describe the additional characteristics needed for an e-commerce server? E-commerce services need dynamic configuration abilities and seasonal and every day service configurat

Testing project, Design and test the functions that are needed: a.  Test...

Design and test the functions that are needed: a.  Test Main  in the Testing project add a new file main.c. b.  Test Drivers  in the Testing project add 2 new files, testDriver

Describe the analytical engine by babbage, THE ANALYTICAL ENGINE BY BABBAGE...

THE ANALYTICAL ENGINE BY BABBAGE: It was general use computing device that could be used for performing any types of mathematical operation automatically. It contains the follo

Explain about open system, Q. Explain about Open System? The 'Open Syst...

Q. Explain about Open System? The 'Open System' is a system within its environment. It receives input from environment as well as provides output to environment.  Illustrati

What is fork, What is Fork Clk gets its value after 1 time unit, rese...

What is Fork Clk gets its value after 1 time unit, reset after 10 time units, enable after 5 time units, data after 3 time units. All the statements are executed in parallel.

Enumerate the process of dynamic modelling:, Define the Process of dynamic ...

Define the Process of dynamic modelling: Analyse applicability of actions Recognize object states Create dynamic model diagram Express every state in terms o

What are packages, What are packages? Package is a group of elements (c...

What are packages? Package is a group of elements (classes, generalizations, associations and lesser packages) with a common theme. Package partitions a model making it simpler

What are the cycle based simulators, What are the Cycle based simulators ...

What are the Cycle based simulators Cycle based simulators are more like a high speed electric carving knife in comparison since they focus on a subset of the biggest problem:

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd