Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

What is icon, An icon is a picture used to show an object. Some example obj...

An icon is a picture used to show an object. Some example objects are: data files, program files, folders, email messages, and drives. Every type of object has a dissimilar icon. T

Performance instrumentation in parallel computer, Performance instrumentati...

Performance instrumentation focuses on how to resourcefully collect information about computation of parallel computer. Method of instrumentation mostly tries to capture informatio

menu-driven program, Write a menu-driven program to change a time in secon...

Write a menu-driven program to change a time in seconds to other units (minutes, hours, etc.).  The main script will loop to continue until the user chooses to exit.  Every time in

What are the types of convergences, What are the types of convergences? ...

What are the types of convergences? Three different types of convergences are: a. The convergence of wireless and e-commerce technology b. The Convergence of E-Commerce a

What are the largest UDP messages, What are the largest UDP messages that c...

What are the largest UDP messages that can fit into single Ethernet frame? UDP utilizes IP for delivery. As ICMP UDP packet is encapsulated in IP datagram. Therefore entire UDP

C, "Super ASCII", if it contains the character frequency equal to their asc...

"Super ASCII", if it contains the character frequency equal to their ascii values. String will contain only lower case alphabets (''a''-''z'') and the ascii values will starts from

Determine what is the frame rate and frame duration, (i)  A multiplexer com...

(i)  A multiplexer combines four 100-Kbps channels using a time slot of 4 bits. Each Frame has the size of 16 bits. a)  Show the output with the four inputs as shown in the figu

Explain the synchronous-transmission, A control character is sent at the be...

A control character is sent at the beginning as well as at the end of every block in the synchronous-transmission in order to  (A) Synchronize the clock of transmitter and rece

Computer Network, Write a short notes on transition from IPv4 to IPv6

Write a short notes on transition from IPv4 to IPv6

What is serialization, What is serialization, how it works in .NET? Ser...

What is serialization, how it works in .NET? Serialization is when you persist the state of an object to a storage medium so an exact copy can be re-created at a later stage. S

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd