Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

What is a transaction, What is a transaction? A transaction is dialog ...

What is a transaction? A transaction is dialog program that alter data objects in a consistent way.

Explain macro definition, Explain macro definition. A unit of specifica...

Explain macro definition. A unit of specification for a program generation is termed as a macro. This consists of name, body of code and set of formal parameters.

Why gateways are used during mail transfer, Why Gateways are used during ma...

Why Gateways are used during mail transfer? Email using SMTP effort best while both the sender and the receiver are on the internet and can hold TCP connections in between send

Define throughput, Define throughput?  Throughput in CPU scheduling is ...

Define throughput?  Throughput in CPU scheduling is the number of processes that are completed per unit time. For long processes, this rate might be one process per hour; for s

Why should you use bugzilla?, Bugzilla can dramatically enhance the product...

Bugzilla can dramatically enhance the productivity and accountability of individual employees by giving a documented workflow and positive feedback for good performance.

Advantages of edi (electronic data interchange), Advantages of EDI 1.  ...

Advantages of EDI 1.  EDI replaces paper transactions with electronic transactions therefore it saves times and speeds up transactions. 2.  It gives a legal record of busine

What can be middle wares role within e-commerce, What can be middle wares r...

What can be middle wares role within e-commerce? By the utilize info Platform Commerce Server a shopping service can take benefit of reaching the end user onto Open TV, SMS, WA

Database management subsystem, (a) Explain, using suitable examples, the fu...

(a) Explain, using suitable examples, the functions of each of the sub system mentioned in the context of a large chain of supermarkets (i) Database Management Subsystem (ii)

Define the message queues, The message queue provides the information about...

The message queue provides the information about sizes of queues under utilization of various processors. It points to size of every processor incoming message queue that would be

Evaluate the damping coefficient, A certain car has suspension modes which ...

A certain car has suspension modes which are uncoupled front to back.  When a person with a mass of 60 kg sits in the rear of a car, in the centre of the seat which is directly ove

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd