Prepare heritage data for classification learning

Assignment Help PL-SQL Programming
Reference no: EM131299370

Database Assignment 1:

1. Using heritage data (release 1) in SQL

a. Find support for all single itemsets

b. List all itemsets with 2 elements and support of at least 0.2

c. List all itemsets with 3 elements and support at least 0.2

2. In Weka

a. Load heritage data (release 1)

b. Apply at least two association rule generation algorithms and compare results

c. Apply FPTree algorithm with at least two measures of rule metrics

Assignment 2:

1. In SQL/Weka:

a. Prepare heritage data for classification learning

b. Load heritage data release 3 (preprocessed to binary representation, including demographics and output attribute(s))

c. Perform exploratory analysis

d. Create at least three classification models for predicting hospitalization based on Year 1 data.

e. Which model performs the best on year 2 data?

f. Create regression model for predicting hospitalization days.

g. What is the difference between regression and classification models?

h. Present your results in a form of short report that includes screenshots, tables, an d needed description.

Assignment 3:

Classification Part 2

1. Using heritage release 3 data prepared last assignment

a. Include drug information into data

b. Include laboratory information into data

c. Import newly created data into Weka and run classification algorithms

d. Does inclusion of the information improve predictions?

There are many ways to complete question 4, so you need to make different decisions.

Try not to overcomplicate the problem.

2. In Weka using heritage 3 dataset

a. Apply kmeans algorithm for k=2, 3, 5, 10

b. Apply EM algorithm. What is the optimal number of clusters obtained by EM?

c. Compare the created clusters to classification based on hospitalization in year 2.

Assignment 4:

3.Using the data table shown below.

a.Calculate distance between all points in 1
-norm, 2
-norm and infinity
-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c.Apply k
-means clustering algorithm with k=2.

Using the data table shown below.

a. Calculate distance between all points in 1-norm, 2-norm and infinity-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c. Apply k-means clustering algorithm with k=2.

ID

Age

BMI

Gender

Total Cholesterol

1

30

24

M

180

2

70

19

M

190

3

65

26

M

220

4

40

32

F

260

Assignment 5:

-Text Mining

1. Write regular expression to:

a. detect zip codes in text

b. Find last names of all patients whose first name is John (note that regular expressions may have some false positives/false negatives).

2. List challenges in automatically retrieving ICD-9 codes from clinical notes. Search literature for to find relevant published work. Also, include own observations and comments.

3. Using the SMS data

a. Split data into training (80%) and testing (20%) sets

b. Build naïve Bayes classifier for detecting spam based on bag of words

i. List all words in the documents

ii. Count occurrences in spam and ham

iii. Assign likelihoods P(word|spam) and P(word|ham) for all words

iv. Convert test data into list of words. For each message you need, 2 columns: message id and word

v. Classify test data. This can be done by a series of joins with the data prepared in (iii).

vi. Calculate accuracy of your model (accuracy, precision, recall)

Reference no: EM131299370

Questions Cloud

Problem regarding the amount of money : Assume you have $100 in cash, $500 in your checking account, and $2,000 in savings. According to the M1 definition (cash plus checking account balances) the amount of money you have is?
How can we ethically test new drugs for aids : . This is a strong example of the conflict between doing the best we know for patients now and finding better treatments for other patients in the future. How can we ethically test new drugs for AIDS?
What is happening in construction : Can someone please answer this question for me the correct way! Entry and exit of firms-What is happening in construction?
Write a two-page paper following the directions : Write a two-page paper following the directions within the textbook on Case Project 6-4, Case Project 7-2, and Project 1-3. Include a title page and separate reference page
Prepare heritage data for classification learning : Load heritage data release 3 (preprocessed to binary representation, including demographics and output attribute(s)) - Perform exploratory analysis - Create at least three classification models for predicting hospitalization based on Year 1 data.
Design questionnaire to satisfy roxanne freemans information : Critically evaluate the questionnaire.- Will Canterbury Travels gain the information it needs from this survey?-  Design a questionnaire to satisfy Roxanne Freeman's information needs.
Represent situation of restaurants with an e r diagram : Each menu has many menu items, and items can appear on multiple menus, and with different prices on different menus. Represent this situation of restaurants with an E-R diagram.
Provide another recommendation : Provide another recommendation.- Just conduct more research on the area of expanding menu. Read the file for detailed instruction.
Describe national trends that will affect the brand : Give a brief description of the company, the selected brand, its functionality and/ value offering.- Describe national trends that will affect the brand.

Reviews

Write a Review

PL-SQL Programming Questions & Answers

  What is xml, and why is it useful

What are the differences between the characteristics of an operational database and a dimensional database?

  Write a plsql block and declare a variablenbspvsalnbspof

write a plsql block and declare a variablenbspvsalnbspof the type number. include the following statement in the

  How to understand sql ddl and dml

Task one will provide a relational database implementation of the sample solution from the first assignment. This should be done using CASE software (the choice will be up to you).

  What selection criteria are used

Which fields are displayed in the query results

  Available on a major operating system

Which web browser below is natively available on a major operating system? Which type of components below generates the most heat inside of a computer?

  Convert the er/eer model

To represent a problem description given in natural language as an (Enhanced) Entity -Relationship model; to convert the ER/EER model into a relational data model;

  Write a select statement that joins the customers table

Write a SELECT statement that joins the Customers table to the Addresses table and returns these columns: FirstName, LastName, Line1, City, State, ZipCode.

  Create a pivottable on a new worksheet using the data

Use the data on the Christensen worksheet in the range A1:J19 to create a PivotChart on a new sheet. Use the Type, Issue Price, and Est. Value fields and the default locations. Find the average issue price and average estimated value.

  Create a boolean function to determine if free shipping

Identify the logic needed to create a Boolean function to determine if free shipping will be applied based on the total order amount (e.g., free shipping for orders over $75). The function is expected to return TRUE or FALSE depending on the free ..

  Initial radioisotope power supply

A satellite has an initial radioisotope power supply of 70 watts (W). The power output in watts is given by P = 70e-t/250 where t is the time in days. Solve for t to find the time when the power supply is 60 W. (Round your answer to the nearest da..

  Define one data cube only using sql server data tools

The board of XYZ expects the data warehousing system to provide some functions to analyse their customers in order to improve their management and services.

  Write down name of actors in ascending order

Write down actors (or actress, your choice, but not both) who have won at least two (2) Academy Awards for best actor/actress. Provide the actor name, movie title & year. Order the result by actor name."

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd