Genome4u is a scientific research project at a large

Assignment Help Computer Networking
Reference no: EM13347888

Genome4U is a scientific research project at a large university in the United States. Genome4U has recently started a large-scale project to sequence the genomes of 250,000 volunteers with a goal of preparing a set of publicly accessible databases with human genomic, trait, and medical data.

The project's founder, a brilliant man with several talents and interests, tells you that the public databases will give information to the world's scientific community in general, not just those interested in medical research. Genome4U is trying not to prejudge how the data will be used because there can be opportunities for interconnections and correlations that computers will find that people might have missed.

The founder envisions clusters of servers that may be accessible by researchers all over the world. The databases will be used by end users to study their own genetic heritage, with the help of their doctors and genetic counselors. In addition, the data will be used by computer scientists, mathematicians, social scientists, physicists, and other researchers.

The genome for a single human consists of complementary DNA strands wound together in a double helix. The strands hold about 6 billion base pairs of nucleotides connected by hydrogen bonds. To store the research data, 1 byte of capacity is used for each base pair.

As a result, 6 Giga-Bytes of data capacity is required to store the genetic information of just one person. The project plans to use network-attached Storage (NAS) clusters. A system has been prototyped using the current version of FreeNas software. Production software is expected to migrate to HBase and Hadoop cloud computing infrastructure.

Genome4U has prepared new techniques to sequence a person's genome, quickly, accurately and most importantly at low cost. The research group is a contestant for the $10,000,000 X-Prize offered by Archon-Genomics. With their current funding they expect to complete the sequencing of 25,000 individuals by December 2012 and will sequence 5,000 individuals every month thereafter with the equipment that they are currently using.

In addition to genetic information, the project will ask volunteers to provide detailed information about their traits so that researchers can find correlations between traits and genes. Volunteers will also give their medical records. Storage will be needed for these data sets and the raw nucleotide data. This detailed medical information is expected to require not more than 100 Mega-Bytes of storage for each individual.

Since the data is to be publically shared, an initial community of 25,000 active users are expected, and this community expected to double every 18 months. Active users are expected to access 10% of the entire database daily which is expected to make huge demand on the networking infrastructure.

You have been brought in as a network design consultant to help the Genome4U project and the management team has asked you to help them organize their needs.

They would appreciate your analysis to answer the subsequent questions:

1. List the major user communities.

2. List the major data stores and the user communities for each data store.

3. Prepare a graph of the storage requirements for the project monthly for the next 3 years.

4. Based on the size of the database, and the demands of the active users, what is the expected network capacity required to support the growing community of users? Add this capacity demand to the storage graph you drew above.

5. Can you evaluate the relationship between the storage size, number of genomes, number of users and network capacity requirements? If possible express this as an equation.

6. Review the capabilities of FreeNAS software. Will the FreeNAS software scale to the projected requirements of this application? If you find limits to its scalability what other solutions are possible?

7. Characterize the network traffic in terms of flow, load, behavior, and QoS requirements. You will not be able to precisely characterize the traffic but provide some theories about it and document the types of tests you would conduct to prove your theories right or wrong.

8. What additional questions would you ask Genome4U's founder about this project? Who besides the founder would you talk to and what questions would you ask them?

Reference no: EM13347888

Questions Cloud

You must illustrate your answers with real examples showing : you must illustrate your answers with real examples showing evidence of research outside of the textbook.question
Question a one of your old english lit professors has been : question a one of your old english lit professors has been elected governor of baxter a cute little new england state
Your organization city rehab has been approached by an mco : your organization city rehab has been approached by an mco looking for an exclusive arrangement for the rehabilitation
Imagine that you work for fema or the department of : imagine that you work for f.e.m.a. or the department of homeland security.in light of the problems brought up by
Genome4u is a scientific research project at a large : genome4u is a scientific research project at a large university in the united states. genome4u has recently started a
1 n vehicles occupy squares 1 1 through n 1 ie the bottom : 1. n vehicles occupy squares 1 1 through n 1 i.e. the bottom row of an n times n grid. the vehicles must be moved to
Experimentuse the transistor board provideddc analysisa1 : experimentuse the transistor board provideddc analysisa1 connect a 15 volt supply to vcc and ground and accurately set
Create a program that maintains the required book catalog : create a program that maintains the required book catalog for the circulation desk of a library. the book catalog is to
Question 1 an unanticipated demand-pulled inflation would : question 1 an unanticipated demand-pulled inflation would normally lead to all the following problems except a change

Reviews

Write a Review

Computer Networking Questions & Answers

  Describe the network management software components

Describe the network management software components. Side server components, middleware components and northbound interface and explain the elements and capabilities of a fault, configuration, accounting, performance, and security server

  Which layer of tcp/ip reference model contains information

Which layer of TCP/IP reference model contains the information on source port number and destination port number? Why do we need the source and destination port numbers when we already have source and destination IP addresses?

  Will the system be based cluster architecture for the server

Is there a need for a network to be built for connecting the servers and users, will data go across the WAN or will it stay local to the LAN?

  Investigate the weaknesses that arise in elgamal encryption

investigate the weaknesses that arise in Elgamal encryption if a public key of small order is used. We look at the following example. Assume Bob uses the group Z

  Research three recent information security breaches

Research three recent information security breaches. Do main targets seem to be larger or smaller companies? Is there a particular industry which seems predominately targeted?

  Draw message sequence diagram using stop-and-wait arq

Consider the Stop-and-Wait ARQ. Drawing message sequence diagram illustrating that if network connection between sender and receiver can reorder messages,

  Comparison of pptp, l2tp and sstp

Provide a brief comparison of PPTP, L2TP and SSTP, outlining the advantages and disadvantages of each. Which organizations might choose to implement a VPN service hosted on a Windows Server 2012 system?

  Demonstrate familiarity with problems of tcp/ip connectivity

A network administrator faces when dealing with a client/server network is TCP/IP connectivity between computers in the network.

  Describe the methods to tackle and mitigate rf multipath

Describe the methods to tackle and mitigate RF multipath that impact to WLAN throughput. Explain the purposes of POE in WLAN design.

  How rtsp vary from http in-band-out-of-band traffic

Describe how does RTSP vary from HTTP (a) in sustaining client state information; (b) in terms of in-band, out-of-band traffic?

  Strategy planning for corporate

In another year, after all assignments are completed, you consider to convert back to a volunteer basis with AllTechComm, and to discover more profitable employment, preferably with a major company.

  Comparing the cache blocks

A computer using direct mapped cache has 2^24 words of main memory and a cache of 256 blocks. Every cache block contains sixty-four words.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd