Identifying all possible gene coding sequences

Assignment Help Other Subject
Reference no: EM132553338

Assignment -

In this assignment help with identifying all possible gene coding sequences from a bacterial genome sequence. This bacterium has been shown to reduce gut inflammation to humans.

To summarise, here is how a bacterial genome "works":

The sequence we give is always the "top" strand and the genes can exist in the "bottom" strand as well. To get the "bottom" strand from the "bottom" strand, you need to 'reverse complement' the sequence (complement by A<=>T and G<=>T and then reverse so first letter is last etc).

In between the genes (note that there are likely to be multiple genes in a genome sequence) is the non-coding space.

Each gene sequence is composed of triplets of A/T/C/G nucleotides called codons (each codon eventually forms an amino acid of a protein).

In bacteria, e ach gene is composed of only one Coding Exon and there are no introns. However, the unique thing about a bacterial genome is that the genes can also overlap: where in one gene might contain the codons: AAT -GCT, the genome might also be read as A-ATG-CT and a different gene might start "within" the same gene.

In bacteria, e ach gene begins with one of the three codons: ATG (usually), GTG (rare) and TTG (very rare) and finishes with a tri-nucleotide TAA, TGA, or more rarely TAG.

These codons, being triplets, means that gene coding sequence is a multiple of three. The non -coding sequence does not have to be a multiple of three and can be any length.

The Stop Codons don't appear anywhere else in the gene but may be present in the non -coding areas.

In the non-coding regions the A/T/C/G letters have a different distribution that in coding sequence since they do not contribute to codons.

ORFs refer to Open Reading Frames which for bacteria is equivalent to putative genes (as bacteria have no introns). They have no stop codons inside the middle of them, they start with a start codon, end with a stop codon but may or may not be real genes.

Note: I'm only interested in the coding sequence (CDS) and not the Untranslated Parts (UTR). The coding sequences will typically be no less than 100 codons (300 nucleotide bases).

Write a report with your solution and an analysis of the method you used (there may be more than one way). Don't forget to tell me how you benchmark your model.

Data access: You will be given 4 FASTA files:

1. Some sample coding sequences that we have already identified as coding genes using other methods,

2. Some sample noncoding sequences

3. A file containing a number of possible Open Reading Frames (ORFs) which are putative genes but could belong to either the coding or non -coding region

4. A 20,000 base pair (20 kb) sequence from the genome sequence.

Attachment:- Assignment & Assignment Data Files.rar

Reference no: EM132553338

Questions Cloud

What extent do personally value the flaws in character : What extent do you personally value the ability to learn about the limitations of your beliefs, the flaws in your character, and the vulnerabilities
Explain three ethical violations related to nursing research : Explain three ethical violations related to nursing research conduct that violates the protection of human subjects including what ethical decisions would have
Discuss strengths while creating group research project : Discuss your strengths and weaknesses you discovered while creating the group research project. How did you overcome your weaknesses?
Summarize the story behind the strike : Summarize the story behind the strike. Was there a "winning" side and a "losing" side? Or, were both sides "winners? Or, were both sides "losers"? Why?
Identifying all possible gene coding sequences : Identifying all possible gene coding sequences from a bacterial genome sequence. This bacterium has been shown to reduce gut inflammation to humans
Should health care workers be required to take a course : Discuss your experiences of the course, your beginnings, and where you are now. Should health care workers be required to take a course in Ethics
What are the hydrolysis products of cellulose : What are the hydrolysis products of cellulose?
Discuss the parameters of hathaway claim : Discuss the parameters of Hathaway's claim. In other words, how is the body (and perceptions of the body) culturally constructed?
What are the major limitations of mill methods : What are the major limitations of Mill's methods? If they are so limited, why do scientists still use these methods to investigate causal laws?

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd