Obtain the sequence for the gene and the sequence for the RNA transcript in FASTA format. If there is more than one transcript choose and appropriate one and explain your choice. Using Perl, convert the FASTA format files into a simple strings containing only nucleotides - save those for later - and determine the amino acid sequence of the protein.
Using the sequences prepared with Perl, calculate the molecular weight of the Gene vs mRNA vs protein using Java.
You now have sequences for gene, RNA and protein. Write an R script to calculate answers to the following:
By taking the current estimate of global population and multiplying that by the estimate for the average number of cells in the human body, determine the total number of nucleotides representing the coding part of the gene in living humans.
If all of these nucleotides were printed out using 12pt Arial font on A4 with 3cm margins and the sheets laid end-to-end, how long would it take to drive along the paper at 30km/h ?
For comparison, how long would it take to drive the length of this sequence of nucleotides at the same speed, if it was in the form of a molecule of double-stranded DNA helix laid end-to-end ?