Prepare heritage data for classification learning

Assignment Help Basic Statistics
Reference no: EM131231115

Assignment 1:

1. Using heritage data (release 1) in SQL

a. Find support for all single itemsets

b. List all itemsets with 2 elements and support of at least 0.2

c. List all itemsets with 3 elements and support at least 0.2

2. In Weka

a. Load heritage data (release 1)

b. Apply at least two association rule generation algorithms and compare results

c. Apply FPTree algorithm with at least two measures of rule metrics

Assignment 2:

1. In SQL/Weka:

a. Prepare heritage data for classification learning

b. Load heritage data release 3 (preprocessed to binary representation, including demographics and output attribute(s))

c. Perform exploratory analysis

d. Create at least three classification models for predicting hospitalization based on Year 1 data.

e. Which model performs the best on year 2 data?

f. Create regression model for predicting hospitalization days.

g. What is the difference between regression and classification models?

h. Present your results in a form of short report that includes screenshots, tables, an d needed description.

Assignment 3:

Classification Part 2

1. Using heritage release 3 data prepared last assignment

a. Include drug information into data

b. Include laboratory information into data

c. Import newly created data into Weka and run classification algorithms

d. Does inclusion of the information improve predictions?

There are many ways to complete question 4, so you need to make different decisions.

Try not to overcomplicate the problem.

2. In Weka using heritage 3 dataset

a. Apply kmeans algorithm for k=2, 3, 5, 10

b. Apply EM algorithm. What is the optimal number of clusters obtained by EM?

c. Compare the created clusters to classification based on hospitalization in year 2.

Assignment 4:

3.Using the data table shown below.

a.Calculate distance between all points in 1
-norm, 2
-norm and infinity
-norm. Show dissimilarity matrix.

b.Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c.Apply k
-means clustering algorithm with k=2.

Using the data table shown below.

a. Calculate distance between all points in 1-norm, 2-norm and infinity-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c. Apply k-means clustering algorithm with k=2.

ID

Age

BMI

Gender

Total Cholesterol

1

30

24

M

180

2

70

19

M

190

3

65

26

M

220

4

40

32

F

260

Assignment 5:

-Text Mining

1.Write regular expression to:

a.detect zip codes in text

b.Find last names of all patients whose first name is John (note that regular expressions may have some false positives/false negatives).

2.List challenges in automatically retrieving ICD-9 codes from clinical notes. Search literature for to find relevant published work. Also, include own observations and comments.

3. Using the SMS data

a. Split data into training (80%) and testing (20%) sets

b. Build naïve Bayes classifier for detecting spam based on bag of words

i. List all words in the documents

ii. Count occurrences in spam and ham

iii. Assign likelihoods P(word|spam) and P(word|ham) for all words

iv. Convert test data into list of words. For each message you need, 2 columns: message id and word

v. Classify test data. This can be done by a series of joins with the data prepared in (iii).

vi. Calculate accuracy of your model (accuracy, precision, recall)

Reference no: EM131231115

Questions Cloud

How can u.s. companies protect their digital assets overseas : Prepare a 3 to 5 paragraph briefing statement that can be used to answer the above question. Your audience will be attendees at a conference for small business owners who are interested in expanding their footprint overseas (sales, offices, produc..
Calculate the total amount of co2 released to the atmosphere : Calculate the CO2 emissions in g CO2/MJ (LHV) with gasoline as fuel.
What is privacy in an information security context : What is another name for the Kennedy-Kassebaum Act (1996), and why is it impor- tant to organizations that are not in the health care industry? ?If you work for a financial service organization such as a bank or credit union, which 1999 law affect..
What is the amount of the companys total assets : The liabilities of the Smith Company are $120,000 and its owner's equity is $232,000. What is the amount of the company's total assets?
Prepare heritage data for classification learning : Perform exploratory analysis and create at least three classification models for predicting hospitalization based on Year 1 data.
Excellence in orthopedic care for large geriatric population : Dynamic Health System is a 3-hospital, 500-bed system in the Midwest United States. This system employs 100 physicians, both primary care and specialists, in 12 physician practices. Dynamic also runs a center of excellence in orthopedic care for the ..
Examine the five steps to the evidentiary process : Review the U.S. Department of Justice document explaining the Fourth Amendment protections in context of preparing electronic evidence. What are some noteworthy issues, recommendations, observations, or comments you have regarding these exceptions..
Estimate the maximum permissible cost of the condenser : If the sea power plant described in Problem 11.5 is to deliver power at $8/106 Btu, estimate the maximum permissible cost of the condenser and evaporator heat-exchanger surface in dollars per square foot, assuming a 20-year life, 10% discount rate..
What does the calculation of each ratio represent : What does the calculation of each ratio represent? How does year one compare with year two, and what trend can be seen when you compare the two years? Is the trend from year one to year two positive or negative?

Reviews

len1231115

10/5/2016 1:25:05 AM

I have the data for the first 3 assignment for now which i needed to be done by this coming Saturday and the rest I can wait for them till i got the data-set.Apply at least two association rule generation algorithms and compare results

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models

LINEAR PROGRAMMING MODELS

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd