Describe operations and show resulting data table

Assignment Help Applied Statistics
Reference no: EM131231200

Assignment 1:

1. Using heritage data (release 1) in SQL

a. Find support for all single itemsets

b. List all itemsets with 2 elements and support of at least 0.2

c. List all itemsets with 3 elements and support at least 0.2

2. In Weka

a. Load heritage data (release 1)

b. Apply at least two association rule generation algorithms and compare results

c. Apply FPTree algorithm with at least two measures of rule metrics

Assignment 2:

1. In SQL/Weka:

a. Prepare heritage data for classification learning

b. Load heritage data release 3 (preprocessed to binary representation, including demographics and output attribute(s))

c. Perform exploratory analysis

d. Create at least three classification models for predicting hospitalization based on Year 1 data.

e. Which model performs the best on year 2 data?

f. Create regression model for predicting hospitalization days.

g. What is the difference between regression and classification models?

h. Present your results in a form of short report that includes screenshots, tables, an d needed description.

Assignment 3:

Classification Part 2

1. Using heritage release 3 data prepared last assignment

a. Include drug information into data

b. Include laboratory information into data

c. Import newly created data into Weka and run classification algorithms

d. Does inclusion of the information improve predictions?

There are many ways to complete question 4, so you need to make different decisions.

Try not to overcomplicate the problem.

2. In Weka using heritage 3 dataset

a. Apply kmeans algorithm for k=2, 3, 5, 10

b. Apply EM algorithm. What is the optimal number of clusters obtained by EM?

c. Compare the created clusters to classification based on hospitalization in year 2.

Assignment 4:

3.Using the data table shown below.

a.Calculate distance between all points in 1
-norm, 2
-norm and infinity
-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c.Apply k
-means clustering algorithm with k=2.

Using the data table shown below.

a. Calculate distance between all points in 1-norm, 2-norm and infinity-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c. Apply k-means clustering algorithm with k=2.

ID

Age

BMI

Gender

Total Cholesterol

1

30

24

M

180

2

70

19

M

190

3

65

26

M

220

4

40

32

F

260

Assignment 5:

-Text Mining

1. Write regular expression to:

a. detect zip codes in text

b. Find last names of all patients whose first name is John (note that regular expressions may have some false positives/false negatives).

2. List challenges in automatically retrieving ICD-9 codes from clinical notes. Search literature for to find relevant published work. Also, include own observations and comments.

3. Using the SMS data

a. Split data into training (80%) and testing (20%) sets

b. Build naïve Bayes classifier for detecting spam based on bag of words

i. List all words in the documents

ii. Count occurrences in spam and ham

iii. Assign likelihoods P(word|spam) and P(word|ham) for all words

iv. Convert test data into list of words. For each message you need, 2 columns: message id and word

v. Classify test data. This can be done by a series of joins with the data prepared in (iii).

vi. Calculate accuracy of your model (accuracy, precision, recall)

Attachment:- Assignment 1.rar

Reference no: EM131231200

Questions Cloud

Enablers to prevention programs in managed health care plans : What do you think are some of the barriers and, more importantly, the "enablers" to prevention programs in managed health care plans to prevent diseases and how do you think those barriers could be eliminated?
Determine the illuminances on a vertical : Determine the illuminances (sun, sky, and ground-reflected) on a vertical, south-facing window at solar noon at 36°N latitude on June 21 and December 21 for
Create a balance sheet for a typical bank : Bank Balance Sheet - Create a balance sheet for a typical bank, showing its main liabilities (sources of funds) and assets (uses of funds).
Compute the inventory using lower of average cost or market : Compute the inventory for this department as of January 31, at Retail. Compute the inventory using lower of average cost or market.
Describe operations and show resulting data table : Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table - Calculate distance between all points in 1-norm, 2-norm and infinity-norm.
Discuss nature of victim participation in criminal justice : Discuss the nature of victim participation in the criminal justice process. Provide your assessment on the adequacy of this participation.
What are four major sources of funds for banks : What alternatives does a bank have if it needs temporary funds? What is the most common reason that banks issue bonds?
What are the different potential issues associated : You are a facility manager for a local high school and your facility is being activated by the American Red Cross to serve as an emergency shelter. On the “Facility as Shelter” page of your group wiki, answer the following: What are the different pot..
Calculate the auxiliary energy required in march : Using the SLR method, calculate the auxiliary energy required in March for a 2000 ft2, NLC 12,000 Btu/F-day house in Boston with a 150 ft2, night-insulated double-glazed direct gain system with 6 in thick storage floors of 45 Btu/ft2 F capacity.

Reviews

len1231200

10/5/2016 2:00:52 AM

Here you go I have the data for the first 3 assignment for now which i needed to be done by this coming Saturday and the rest I can wait for them till i got the dataset. I will upload the data set for the 1st question which i need by Fri Oct7 next week I will upload the next data set. - CLIENT TO SHARE THIS Please I need screen shot of the work as will (its required by the professor) It will look something like this

Write a Review

Applied Statistics Questions & Answers

  Which of the following correlation is the strongest

A regression analysis was performed between two variables, x and y. The equation for this line of best fit for the relationship between the x and the y was: y=54+2. If someone had a x=1, what was the y? WHich of the following correlation is the s..

  Sophomore students organized a free raffle for prizes

2. At a school pep rally, a group of sophomore students organized a free raffle for prizes. They claim that they put the names of all of the students in the school in the basket and that they randomly drew 36 names out of this basket. Of the pr..

  Calculate the probability that fewer of randomly selected

Calculate the probability that 45 percent or fewer of 400 randomly selected German adults would have purchased blue jeans in the past three months.

  What is the number of observations in the sample

1)What is the number of Observations in the sample? Write the least squares regression (prediction) equation. Test the usefulness of variable x2 in the model at alpha =.05. Calculate the t statistic and state your conclusions

  Develop a process map about the prescription filling process

Develop a process map about the prescription filling process for HMO's pharmacy, in which you specify the key problems that the HMO's pharmacy might be experiencing. Next, use the supplier, input, process steps, output, and customer (SIPOC) model to ..

  What condition must be satisfied in order for the methods

Find the p-value for this test. Use the p-value to test H0 versus Ha by setting a equal to .10, .05, .01, and .001. What condition must be satisfied in order for the methods of this section to be appropriate?

  What are the effects of gender and worksite location

What are the effects of gender and worksite location (on- or off-site) on level of confidence and is there a gender difference in confidence?

  A statistics professor at the university of imax

3. Mr. Freude Lazer, a statistics professor at the University of IMAX, drives from his home to the university every weekday. He has three options to drive there: he can take the Beltway, or he can take a main highway with some traffic lights, or he c..

  A basketball player who consistently makes 23% of free throw

A basketball player who consistently makes 23% of free throws

  An exam affects student performance as measured

Suppose than an instructor want wants to investigate whether the font used on an exam affects student performance as measured by the final exam score. She uses four different fonts (time , courier, helvetica, comic sans ) and randomly assigns h..

  The mean annual income of certified welders

The mean annual income of certified welders is normally distributed with a mean of $50,000 and a population standard deviation of $2,000. The ship building association wishes to find out whether their welders earn more or less than $50,000 annually. ..

  How would you define the wilcoxon signed rank test

How would you define the Wilcoxon Signed Rank Test?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd