Explore the bank data available on the lms

Assignment Help Management Information Sys
Reference no: EM131963793

Data Engineering and Mining

Part I:

For this part, you need to explore the bank data (bankdata_csv_all.csv), available on the LMS, and an accompanying description (bankdataDescription.doc) of the attributes and their values. The dataset contains attributes on each person's demographics and banking information in order to determine they will want to obtain the new PEP (Personal Equity Plan).

Your goal is to perform Association Rule discovery on the dataset using R.

First perform the necessary preprocessing steps required for association rule mining, specifically the id field needs to be removed and a number of numeric fields need discretization or otherwise converted to nominal.

Next, set PEP as the right hand side of the rules, and see what rules are generated.

Select the top 5 most "interesting" rules and for each specify the following:

• Support, Confidence and Lift values

• An explanation of the pattern and why you believe it is interesting based on the business objectives of the company.

• Any recommendations based on the discovered rule that might help the company to better understand behavior of its customers or to develop a business opportunity.

Note that the top 5 most interesting rules are most likely not the top 5 in the strong rules. They are rules, that in addition to having high lift and confidence, also provide some non-trivial, actionable knowledge based on underlying business objectives.

To complete this assignment, write a short report describing your association rule mining process and the resulting 5 interesting rules, each with their three items of explanation and recommendations. For at least one of the rules, discuss the support, confidence and lift values and how they are interpreted in this data set.

You should write your answers as if you are working for a client who knows little about data mining. Your report should give your client some insightful and reliable suggestions on what kinds of potential buyers your client should contact, and convince your client that your suggestions are reliable based on the evidence gathered from your experiment results.

In more detail, your answers should include:

• Description of preprocessing steps

• Description of parameters and experiments in order to obtain strong rules

• Give the top 5 most interesting rules and the 3 items listed above for each rule.

Part II:

In this part of homework, you are expected to apply decision tree induction algorithm to solve a mystery in history: who wrote the disputed essays, Hamilton or Madison?

1. About the Federalist Papers

Quote from the Library of Congress

The Federalist Papers were a series of eighty-five essays urging the citizens of New York to ratify the new United States Constitution. Written by Alexander Hamilton, James Madison, and John Jay, the essays originally appeared anonymously in New York newspapers in 1787 and 1788 under the pen name "Publius."

A bound edition of the essays was first published in 1788, but it was not until the 1818 edition published by the printer Jacob Gideon that the authors of each essay were identified by name. The Federalist Papers are considered one of the most important sources for interpreting and understanding the original intent of the Constitution.

2. About the disputed authorship

The original essays can be downloaded from the Library of Congress.

In the author column, you will find 74 essays with identified authors: 51 essays written by Hamilton, 15 by Madison, 3 by Hamilton and Madison, 5 by Jay. The remaining 11 essays, however, is authored by "Hamilton or Madison". These are the famous essays with disputed authorship. Hamilton wrote to claim the authorship before he was killed in a duel. Later Madison also claimed authorship. Historians were trying to find out which one was the real author.

3. Computational approach for authorship attribution

In 1960s, statistician Mosteller and Wallace analyzed the frequency distributions of common function words in the Federalist Papers, and drew their conclusions. This is a pioneering work on using mathematical approaches for authorship attribution.

Nowadays, authorship attribution has become a classic problem in the data mining field, with applications in forensics (e.g. deception detection), and information organization.

The Federalist Paper data set (fedPapers85.csv) is provided in LMS. The features are a set of "function words", for example, "upon". The feature value is the percentage of the word occurrence in an essay. For example, for the essay "Hamilton_fed_31.txt", if the function word "upon" appeared 3 times, and the total number of words in this essay is 1000, the feature value is 3/1000=0.3%

Organize your report using the following template:

Section 1: Data preparation

You will need to separate the original data set to training and testing data for classification experiments. Describe what examples in your training and what in your test data.

Section 2: Build and tune decision tree models

First build a DT model using the default setting, and then tune the parameters to see if better model can be generated. Compare these models using appropriate evaluation measures. Describe and compare the patterns learned in these models.

Section 3: Prediction

After building the classification model, apply it to the disputed papers to find out the authorship and report the performance accuracy of your models.

Reference no: EM131963793

Questions Cloud

Analyze the security mechanisms needed to protect the dms : Analyze the security mechanisms needed to protect the DMS systems from both state employees and users accessing over the Internet.
What is the best predicted interval time : What is the best predicted interval time following an eruption with a duration time of 120 min?
Compute the value of the test statistic : Compute the value of the test statistic. What is the p-value? What is your conclusion?
What is the expected interest rate level one year from now : what is the expected interest rate level one year from now that would equalize the expected rate of return on one year and two year CD's if both were held?
Explore the bank data available on the lms : CISC520 : Explore the bank data available on the LMS, and an accompanying description of the attributes and their values.
Determine which measure should be used : Financial managers may work alongside general services managers to address certain measures of liquidity. How might a financial manager and the department.
Explain the rationale for buying stocks : Explain the rationale for buying stocks when stock prices are not predictable, noting what kind of strategies would be useful for investing $100,000.
Conduct a test of latex : Conduct a test of LaTeX: H_0: mu_1 le mu_2H0:µ1=µ2 versus LaTeX: H_a: mu_1 > mu_2Ha:µ1>µ2 for the sample data summarized here
Analyze the major hardships facing expatriates : Analyze two (2) of the major hardships facing expatriates on their return home after a lengthy assignment. What would your biggest challenge be.

Reviews

Write a Review

Management Information Sys Questions & Answers

  Data planning and modellingi need help on the following

data planning and modellingi need help on the following questions please. each question should be a minimum of 100

  Discuss about the principles of computer security

Discuss about the Principles of Computer Security . choose any information security topic from our textbook to write a paper .

  Research policies for each affected it infrastructure domain

Research policies for each affected IT infrastructure domain. Make a list of policies explaining the following questions: Who? What? When? Why?

  What process flow structures are prevalent in a hospital

what process flow structures are prevalent in a hospital organization?are the process flow structures appropriate?how

  What are the advantages of the technology discussed

Watch the vedio in the link then answer the questions: https://www.youtube.com/watch?v=E2ht0dEQGIw&feature=youtu.be, What are the advantages of the technology discussed

  Ending inventory and cost of goods sold under fifo method

Calculate ending inventory and cost of good s sold under the weighted average method and calculate ending inventory and cost of goods sold under the FIFO method.

  Examine enterprise manual process and propose two bpm models

Examine one enterprise manual process and propose two BPM models that would replace the process. Determine the IT resources that you would employ to automate the process.

  Write your proposal as a memo to the entire c-suite

Identify the main functions of your proposed information system and why they are important to the business.

  Determine the applied business model

Determine the applied business model and whether it is in the exposure stage, interaction stage, e-commerce stage or e- business stage

  Create a one-page flowchart based on the algorithm

Use a while loop to prompt the student for the price of each book based upon the number of classes being taken

  Security and privacy in technology in health care industry

Security and privacy in technology in health care industry - What would you want your health care provider to do in order to secure the privacy of your medical records?

  Develop a strategy for periodically testing your inventory

Develop a strategy for periodically testing your inventory. Make sure your test methodology covers the troublesome area of asset classification: the "official" hardware inventory does not match a sampled inventory of the data center.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd