Reference no: EM131985119
Problem
1. Extract the gene names from column 9 of the GFF3 file by vectorized regular expression parsing. These gene names will be saved into a vector whose length is the total number of annotation lines in the GENCODE file. This step is an overhead needed to be run only once for the same GFF3 file.
2. Sort the gene name vector alphabetically using the sort() function in R. However, in order to track the original row number of each sorted gene, we name the vector by their row numbers before sorting. This is also a overhead step. The sorted vector should be saved for future use, and regenerated only if a new GENCODE release is to be used.
3. Write a logarithm search function to report the range of sorted names that are identical to the query gene. The input is a gene name and a sorted gene name vector. The output is a range, which is a vector of two elements -- beginning and ending indices of the query gene in the sorted vector. As the vector is sorted, all elements in the range in the vector is equal to the query gene. If the gene is not found, the function returns NULL. You will make sure that the run time must be O(log n), where n is the length of the sorted vector. The runtime must also be independent of how many times the query gene shows up in the sorted vector.
4. With the range from step 3, extract the rows of the gff3 data frame to form an new data frame which contains all annotation regarding the query gene.
5. Develop a test function to check several cases to make sure the function is correct. The test function should check more than the correct number of rows containing the give gene name, because the total number can still be correct if the exact row numbers are wrong.
6. Report the run time of the above logarithm search on the entire GENCODE annotation with three genes of your choice.
7. Report the runtime for the first three steps. Compare the run time of step 3 with the for-loop, apply, and vectorized operation implementations of linear search.
Turn in your R source code files and a summary of the run time recorded for the algorithms.
|
How much interest will they pay over the life of the loan
: Your sister just got married.She and her new husband have found a home to buy that is selling for $130,000 .
|
|
Develop a flowchart for this solution using a while loop
: Develop a flowchart for this solution using a WHILE loop; Implement your solution in Matlab using good programming practices.
|
|
What would the firms return on equity be
: Butters Corporation has a profit margin of 3.5 percent and its return on assets (investment) is 12.75 percent. What is its assets turnover?
|
|
Substance abuse group comprised of mostly younger
: What are some ethical considerations for a group counselor when working with a substance abuse group comprised of mostly younger adults (in early 20's)
|
|
Develop a test function to check several cases
: Develop a test function to check several cases to make sure the function is correct. The test function should check more than the correct number of rows.
|
|
What is the most you would pay today for a promise to repay
: What single investment made today, earning 8% annual interest, will be worth $4, 300 at the end of 5 years?
|
|
Explain reasons why the stock price could be low
: Review the financial performance of Skinner Industries and try to identify some possible reasons why the stock price could be low.
|
|
List all the steps used by maximum finding algorithm
: List all the steps used by Maximum Finding Algorithm to find the maximum of the list 10, 12, 9, 15, 2, 14.
|
|
How large will your last payment be
: Your last deposit, which will occur at the end of Year 6, will be for less than $1, 500 if less is needed to reach $10,000. How large will your last payment be?
|