Implement smith-waterman algorithm

Assignment Help Other Subject
Reference no: EM132846268

Problem #1 - Implement Smith-Waterman algorithm

A. Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences.
o The function must take a genomic similarity scoring matrix (+2 for match, -1 for mismatch) and a gap penalty (-3) as an input.
o The function must return:
• score for best alignment
• alignment in text format (hint: use a struct of 3 character arrays, 2 for sequences one for alignment codes {x, I , } - use whitespace for gaps). See below.
o Test your code using the SARS-COV2 viral genome found in appendix A at the bottom of this homework assignment and sequence fragments found in appendix B at the bottom of this homework.
o You will need to submit a screenshot of the output as part of the homework write up (it should look something like the text below)

B. Generate 1K, 10K, 100K, and 1M (million) completely random genomic sequences (50nt) to use as targets for alignment and use SARS-COV2 genome as subject. Perform alignment of the queries to the subject sequence and record time to completion (in seconds / minutes).
Special note: Depending on the speed of your alignment implementation, this homework may take hours or days to complete. The goal is to get a sense for how slow these ‘optimal' alignment algorithms are... for the explicit purpose of establishing a baseline to be able to compare the improved algorithms and data structures we will be discussing later in the course. Please be aware of this and ‘give up' on larger benchmarks as appropriate.

Problem #2 - Having a BLAST

A. Implement a seed-based Smith Waterman. This means:
o Use the genome in Appendix A and break I down into seeds with word size = 11.
o Load your seeds (created in part A) into memory
o For each read disassemble it into k-mers (of size 11). Compare your read k-mers to the SARS-COV2 seeds.
o If a seed-match is found, extend the seed by cutting out the appropriate segment of the subject (Genome of SARS-COV2) and running the Smith Waterman on the two sequences (original read and the segment from SARS-COV2)
• Beware of edge cases
• Ok to just expand one seed and be done (multiple seeds from a read can be found, typically necessitating multiple seed expansions and decision on what is the best alignment)
o Test your code on the 50-mers I have provided below in Appendix B. You must report alignment for the 50-mers I've provided as part of the homework solution.

B. Test your code on a set of 1K, 10K, 100K, and 1M (million) completely random 50-mers, aligning them to SARS-COV2 genome. How long did it take? Compare it to the results in problem 1B.

C. Test algorithm's exhaustiveness. Randomly select 100,000 fragments from the SARS-COV2 genome and use these fragments to query the SARS-COV2 genome using the seed-based SW you implemented in part A. How many fragments were you able to find? Now introduce random errors into your 100,000 fragments at a 5% per-base error rate (every character has a 5% change of being changed to some other random character). Use these error-filled 100,000 fragments to query the SARS-COV2 genome again. How many fragments were you able to find?

Attachment:- algorithm_Homework.rar

Reference no: EM132846268

Questions Cloud

What is the change in operating income for the year : What is the change in operating income for the year if $6.50 is the new selling price and costs remain the same but sales increase to 500,000
Should original sources be credited since technology driven : Should the original sources be credited since it is technology driven, or is it similar to a person learning about a topic and creating their own work?
Prepare the journal entry to record the expense : Equipment with a cost of $120,000 has an estimated salvage value of $15,000. Prepare the journal entry to record the expense under the unit of activity method
Discuss the financial benefits of chatbots : Discuss how IBM Watson will reach 1 billion people by 2018 and what the implications of that are. Discuss the financial benefits of chatbots.
Implement smith-waterman algorithm : Implement Smith-Waterman algorithm - Test your code using the SARS-COV2 viral genome found in appendix A at the bottom
Describe various ways that knowledge management systems : Describe various ways that knowledge management systems could help firms with sales and marketing or with manufacturing and production.
Find the balance of the investment in associate account : Berry Limited revalued its Plant downwards by $50,000 during the current financial period. Find the balance of the investment in associate account
Compute and display the average sales value : Compute and display the average sales value and the largest and the smallest daily sales values of the numbers entered. Write program that requests daily sale.
How much will service revenue will be reflected : If Peter & Sons makes the appropriate adjusting entry, how much will service revenue will be reflected on the December 31, 2019 income statement

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd