What are the analyses that can be done using big data

Assignment Help Basic Computer Science
Reference no: EM131197293

Assignment 1

Part A

Exercise 1: Data Science

Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and answer the following:

What is Data Science?

According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years?

What is the value of petabytestorage?

Exercise 2: Characteristics of Big Data

Read the following research paper from IEEE Xplore Digital Library

Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014 and answer the following questions:

Summarise the motivation of the author (in one paragraph)

What are the 7 v's mentioned in the paper? Briefly describe each V in one paragraph.

Explore the author's future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improvise healthcare sector in 300 words.

Exercise 3: Big Data Platform

In order to build a big data platform one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links:
- http://www.infochimps.com/infochimps-cloud/how-it-works/
- http://www.youtube.com/watch?v=TfuhuA_uaho
- http://www.youtube.com/watch?v=IC6jVRO2Hq4
- http://www.youtube.com/watch?v=2yf_jrBhz5w
Please note: You are encouraged to watch all the videos in that series from Oracle.
How to acquire big data for enterprises and how it can be used?

How to organize and handle the big data?

What are the analyses that can be done using big data?

Part B

Exercise 4: Big Data Products

Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products.

a. Google's PageRank

b. Google's Spell Checker

c. Google's Flu Trends

d. Google's Trends

Like Google - Facebook and LinkedIn also uses large scale data effectively. How?

Exercise 5: Big Data Tools

Briefly explain why a traditional relational database (RDBS) is not effectively used to store big data?

What is NoSQL Database?

Name and briefly describe at least 5 NoSQL Databases

What is MapReduce and how it works?

Briefly describe some notable MapReduce products (at least 5)

Amazon's S3 service lets to store large chunks of data on an online service. List some 5 features for Amazon's S3 service.

Getting the concise, valuable information from a sea of data can be challenging. We need statistical analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools.

Exercise 6: Big Data Application

Name 3 industries that should use Big Data - justify your claim in 250 words for each industry using proper references.

Assignment 2

Part A

Exercise 1: Storage Methods

From your lecture and also based on the below given video link:
Write a paragraph about memory virtualization.
Watch the below mentioned YouTube link:
Based on the video answer the following questions:

What is RAID 0?

Describe Striping, Mirroring and Parity.

Exercise 2: Storage Design

Summarize storage repository design based on the following video link:

Exercise 2: Storage Design

Summarize storage repository design based on the following video link:

What are the 3 main components of the ISS?

How cache works in ISS?

Storage Area Network (SAN) and Network Attached Storage (NAS) are widely used concepts in data storage arena. The following YouTube video links gives detailed description of these concepts:
- http://www.youtube.com/watch?v=csdJFazj3h0
- http://www.youtube.com/watch?v=vdf6CvGQZrk
- [Not working link ] http://www.youtube.com/watch?v=MKZU8zOMiqE
Based on the watched videos answer the following questions:

Describe NAS and SAN briefly using diagrams?

What are the advantages of SAN over NAS?

What are two common NAS file sharing protocols? How they are different from each other?

Part B

Exercise 3: Storage Design

Design Storage Solution for New Application


An organization is deploying a new business application in their environment. The new application requires 1TB of storage space for business and application data. During peak workload, application is expected to generate 4900 IOPS (I/O per second) with typical I/O data block size of 4KB.

The vendor available disk drive option is 15,000 rpm drive with 100 GB capacity. Other specifications of the drives are:

Average Seek time = 5 millisecond and data transfer rate = 40 MB/sec.

You are required to calculate the required number of disk drives that can meet both capacity and performance requirements of an application.

Hint:In order to calculate the IOPS from average seek time, data transfer rate, disk rpm and data block size refer slide 15 in week 7 lecture slide. Once you have IOPS, refer slide 16 in week 7 to calculate the required number of disks.

Exercise 4: Storage Evolution

Watch the following videos for Fiber Channel over Ethernet and answer the questions that follow:

- http://www.youtube.com/watch?v=hSFyf-rmjA8
- http://www.youtube.com/watch?v=iCfJCzfNLrw

What is FCoE and why we need FCoE?

In your opinion how FCoE is cost effective than traditional connection - give brief explanation.

You have read and answered about SAN in part A - based on your understanding and with some research effort answers the following questions:

What is a Virtual SAN?

What is IP SAN protocols and FibreChannel over IP (FCIP)?

Watch the below video about Introduction to Object-based and Unified Storage and:


Choose the correct answer from the following questions:

What is an advantage of a flat address space over a hierarchical address space?
a. Highly scalable with minimal impact on performance
b. Provides access to data, based on retention policies
c. Provides access to block, file, and object with same interface
d. Consumes less bandwidth on network while accessing data

What is a role of metadata service in an OSD node?
a. Responsible for storing data in the form of objects
b. Stores unique IDs generated for objects
c. Stores both objects and objects IDs
d. Controls functioning of storage devices

What is used to generate an object ID in a CAS system?
a. File metadata
b. Source and destination address
c. Binary representation of data
d. File system type and ownership

What accurately describes block I/O access in a unified storage?
a. I/O traverse NAS head and storage controller to disk
b. I/O traverse OSD node and storage controller to disk
c. I/O traverse storage controller to disk
d. I/O is directly sent to the disk

What accurately describes unified storage?
a. Provides block, file, and object-based access within one platform
b. Provides block and file storage access using objects
c. Supports block and file access using flat address space
d. Specialized storage device purposely built for archiving

Assignment 3

Part A

Exercise 1: Green Computing

The questions in this exercise can be answered by doing internet search and/orfrom the YouTube videos. Answer to each question should be one paragraph in your own words.

What is Greenhouse effect?
We are legally, ethically,and socially required to green our IT products, applications, services, and practices - is this statement true? Why?

What is Green IT and what are the benefits of greening IT?

Exercise 2: Environmental Sustainability

Read the article in the below link and answer the questions that follow:

According to the article how do you build a greener environment?

Summarize the article in 150 words

Exercise 3: Environmentally Sound Practices
The questions in this exercise can be answered by doing internet search.

Briefly explain the following terms - a paragraph for each term:

- Power usage effectiveness (PUE) and its reciprocal

- Data center efficiency (DCE)

- Data center infrastructure efficiency (DCiE)

List 5 universities who offers Green Computing course. You should name the university, the course name and the brief description about the course.

Exercise 4: Major Cloud APIs
The following companies are the major cloud service provider: Amazon, GoGrid, Google, and Microsoft.

List and briefly describe (2 lines) the APIs provided by the above major vendors.

Part B

Exercise 1: Greening IT Standards and Regulations
To design green computers and other IT hardware - the following standards and regulations are mainly used Epeat (www.epeat.net), the Energy Star 4.0 standard, and the Restriction of Hazardous Substances Directive (www.rhos.gov.uk). Use the link provide with some internet search - summarize each standards and regulations in 150 words.

Exercise 2: Green cloud computing

Xiong, N.; Han, W.; Vandenberg, A, "Green cloud computing schemes based on networks: a survey," Communications, IET, vol.6, no.18, pp.3294,3300, Dec. 18 2012

Most part of power consumptionin data centers comes from computation processing, diskstorage, network and cooling systems. Nowadays, there are new technologies and methods proposed to reduce energy cost in data centers. From the above paper summarize(in 300 words) the recent work done in these fields.

Exercise3: Cloud API Functionalities

List the functionalities that can be achieved by using the APIs mentioned in the following link:

What API is used in the following link and how it is used?

Openstack is an open source collaborative software project which meets many of the cloud needs. Below links gives vast information about Openstack.
- https://support.rc.nectar.org.au/docs/openstack
- http://docs.openstack.org/api/quick-start/content/
Write a report (2 pages) about the Openstack features and functionalities.

Verified Expert

This assignment deals with the bid data problem, environmental data .It consist of all the basic questions of big data and environmental science.

Reference no: EM131197293

Find the temperature and the quality of the exit stream

A stream of refrigerant-134a at 1 MPa and 20°C is mixed with another stream at 1 MPa and 80°C. If the mass flow rate of the cold stream is twice that of the hot one, determi

Troubleshooting creativity

Identify an instance in which an individual or group was unsuccessful in using creative thinking to solve a problem, such as the passage of prohibition in the United States

Which approach do you favor and why

There are two primary approaches to measuring financial returns on IT investments: 1) Total Cost of Ownership (TCO) and 2) Return on Investment (ROI). Describe each of these

Create a spell checker

Repeat the previous project to create a spell checker, but instead place the words whose spelling you want to check into a bag. The difference between the dictionary (the se

Slope and downward deflection of the free end b

A cantilever beam of length l carrying a distributed load varies uniformly from zero at the free end to w per unit run at the fixed end. Find the slope and downward deflecti

Briefly describe the threat

Locate an article that describes a recent security threat (or attack) on an e-commerce site. Choose a threat or attack that occurred within the last 10 years. Read the artic

Show the transaction between the client and the server

Using RFC 1939, assume a POP3 client is in the download-and-delete mode. Show the transaction between the client and the server if the client has only two messages of 230 an

Computer or internet crime illustrating the computer

Research computer crime laws in your state. (If your state does not have computer crime laws specific to cyber-crimes, look at the laws in a neighboring state). Briefly des


Write a Review

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd