How to balance a dataset in spark

Assignment Help Data Structure & Algorithms
Reference no: EM133189666

Question: The COVID19 pandemic has been devastating for hospitals as limited resources can be stretched. One area of work that is being investigated is the use of simulators that can determine the number of active cases likely to happen at a hospital This simulator uses machine learning algorithms to predict number of patients that may possibly enter into a hospital in the process helping hospitals predict their resource needs. In this assignment, you will try to run a machine learning algorithm in Spark that predicts fatalities.

Using the above dataset, write a Spark machine learning algorithm in order to predict the fatality rates in the Toronto area.

Note that since majority of COVID-19 cases result in recovery, this dataset is not balanced. For example, if you have an algorithm that simply makes all cases as "resolved", then it would be 99% accurate (since 99% of the cases are "resolved") even though it did not predict a single fatality correctly! As a result you cannot use the dataset as is and must balance the dataset. We did not go over the concept of balancing in the Spark Machine Learning lessons but you should have been exposed to this concept in other courses. You will therefore need to investigate how to balance a dataset in Spark.

Once you have a balance dataset, you can run your algorithm on the balanced dataset and report your accuracy.

1. Your machine learning algorithm code in Spark (as a simple text file)

Reference no: EM133189666

Questions Cloud

How much time nationally people spend eating and drinking : A nutritionist wants to determine how much time nationally people spend eating and drinking. Suppose for a random sample of 963 people age 15 or older
Why technical analysis is considered as useless : Why Technical Analysis is considered as useless by the Efficient Market Theory
Discuss the main challenges facing employment relations : Discuss the main challenges facing employment relations in Australia. In your assessment you must discuss the challenges that are currently being faced by union
What is the process of organizational flattening : What is the process of organizational flattening in the healthcare organization, Describe the role of human resources in the healthcare organization
How to balance a dataset in spark : You can run your algorithm on the balanced dataset and report your accuracy - Investigate how to balance a dataset in Spark
How will you build a renowned trail blazer brand : How will you build a renowned Trail Blazer brand, How will you deal with your complaining customers
Determine the estimated proportion from the sample : On May 23, 2013, Gallup reported that of the 1,005 people surveyed, Determine the estimated proportion from the sample
What is facility management : What is Facility Management, What do Facility Managers do
Identify and summarise the main points of arguments : Produce a well-structured, logical, coherent and cohesive response to writing tasks using appropriate academic language structures)

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Implement an open hash table

In this programming assignment you will implement an open hash table and compare the performance of four hash functions using various prime table sizes.

  Use a search tree to find the solution

Explain how will use a search tree to find the solution.

  How to access virtualised applications through unicore

How to access virtualised applications through UNICORE

  Recursive tree algorithms

Write a recursive function to determine if a binary tree is a binary search tree.

  Determine the mean salary as well as the number of salaries

Determine the mean salary as well as the number of salaries.

  Currency conversion development

Currency Conversion Development

  Cloud computing assignment

WSDL service that receives a request for a stock market quote and returns the quote

  Design a gui and implement tic tac toe game in java

Design a GUI and implement Tic Tac Toe game in java

  Recursive implementation of euclids algorithm

Write a recursive implementation of Euclid's algorithm for finding the greatest common divisor (GCD) of two integers

  Data structures for a single algorithm

Data structures for a single algorithm

  Write the selection sort algorithm

Write the selection sort algorithm

  Design of sample and hold amplifiers for 100 msps by using n

The report is divided into four main parts. The introduction about sample, hold amplifier and design, bootstrap switch design followed by simulation results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd