Survey for the distribution of income of households

Assignment Help Business Management
Reference no: EM131055511

Business Intelligence & Big Data Analytics-

Assignment Specification and Deliverables

You are given a dataset that contains a survey for the distribution of income of households. The survey collects a mix of continuous and discrete data values on source and amount of income, labour force information, and general demographic characteristics. The full data set is given in two Excel files income_data and income_test. The files contain one table providing the training data (income_data table with 32561 rows) and one table providing the data that could be used for testing (income_test table with 16281 rows).

You are asked to complete the following task:

Predict the income of individuals in the income_test table. The prediction task is to determine whether a person makes over 50K a year.

You need to use the techniques discussed in the lectures using Modeler to complete the task.

The deliverables that you should produce are:

• A report describing the approach that you have followed, the study scenarios or streams that you have attempted, the pre-processing of the data and the best predictions that you have achieved.

• The predictions of the income achieved (stored as a table), the streams that you have created (stored as .str files) and the models that you have generated (stored as .gen files). All deliverables should be included in a zip file and submitted through Blackboard.

Below is an indication of the parts that your report should include, together with an indication of the overall weighting attached to each part.

1. A Cover page gives the title of your report, your name, student number and degree programme.

2. An Introduction Section of at most two pages introduces the problem and the approach followed, introduces the analysis and outlines the contents of each section of the report.

3. The Main Section of at most ten pages develops the approach that you have followed, the study scenarios or streams that you have attempted, the pre-processing of the data and the best predictions that you have achieved.

4. A Conclusion Section of at most two pages summarises the main points of the report and draws your overall conclusions. Assume that a bank had requested this survey in order to offer low interest loans to the households with income over 50K. What would be your overall recommendations to the bank? Justify your answer based on your data mining experiments and results (overall weighting 20%)

5. Use of sources, presentation and language and referencing, if needed.

Dataset Description- Below is given a description of the attributes in the dataset

Attribute

Type

Description

age

continuous

The age of the household income earner

workclass

Private, Self-emp-not-inc,

Self-emp-inc, Federal-gov,

Local-gov, State-gov,

Without-pay, Never-worked

Employment status of the household income earner

education

Bachelors,  Some-college,

11th, HS-grad, Prof-school,

Assoc-acdm, Assoc-voc,

9th, 7th-8th, 12th, Masters,

1st-4th, 10th, Doctorate,

5th-6th, Preschool

Information about the highest level of school completed or degree received from the household income earner

education-num

continuous

Number of years in education of the household income earner

marital-status

Married-civ-spouse,

Divorced,

Never-married,

Separated,

Widowed,

Married-spouse-absent,

Married-AF-spouse

Marital status of the household income earner

occupation

Tech-support,

Craft-repair,

Other-service,

Sales,

Exec-managerial,

Prof-specialty,

Handlers-cleaners,

Machine-op-inspct,

Adm-clerical,

Farming-fishing,

Transport-moving,

Priv-house-serv,

Protective-serv,

Armed-Forces

The occupation of the household income earner

relationship

Wife, Own-child, Husband,

Not-in-family, Other-relative,

Unmarried

The relationship  of the interviewee with the income earner

race

White,  Asian-Pac-Islander,

Amer-Indian-Eskimo, Other,

Black

The race of the household income earner

Sex

Female,  Male

The sex of the household income earner

capital-gain

continuous

The increase of the income from the previous year when the last survey was carried out

capital-loss

continuous

The decrease of the income from the previous year when the last survey was carried out

hours-per-week

continuous

The number of hours worked on average each week by the household income earner

Reference no: EM131055511

Questions Cloud

Topic on discounted cash flow valuation : "Starting with last week's introduction to valuation and continuing with this week's topic on discounted cash flow valuation, we learned that one of the key variables in determining the value of any cash flow is the interest rate (sometimes referred ..
Point for modeling regression coefficients : 3. A common prior you may come across is the Cauchy distribution, particularly as a starting point for modeling regression coefficients or as a hyperprior (in the form of a 'half-Cauchy') for variance parameters.
Strategies in strategic management : Do you have an idea of what are the innovation strategies in Strategic Management? We are currently working on a case about Apple Inc. about keeping up or getting ahead of the competition after Steve Jobs passed away.
Customer-based brand equity model : What are some ways marketing communications can contribute to brand equity according to the Customer-based brand equity model?
Survey for the distribution of income of households : BMAN20162 Business Intelligence & Big Data Analytics. You are given a dataset that contains a survey for the distribution of income of households. The survey collects a mix of continuous and discrete data values on source and amount of income, labo..
Major concepts of evaluating and validating training : Write a brief memo explaining the major concepts of evaluating and validating training to the Barbara Maddock, the Director of Information Technology.
Distrust of a centralized government : Explain in your own words, how the Founder's distrust of a centralized government affected the selection of judges.
What is proprietary networking solutions : What is proprietary networking solutions? advantages and disadvantages of proprietary networking solutions?
Prepare journal entries about cuba not open for business yet : Prepare the Journal entries about "Cuba Not Open for Business Yet" and Europe Says U.S. Regulations Keeping It From Trade With Iran.

Reviews

Write a Review

Business Management Questions & Answers

  Explain what implication has the revaluation of the euro

Explain What implication has the revaluation of the Euro vs. the US Dollar since the last year for Toys R Us' international pricing structure

  Productivity and quality problems

Explain the process that Lucy can follow with her staff to address productivity and quality problems.

  Colleges and universities be held accountable for the music

Should colleges and universities be held accountable for the music downloading of their students? Why? What are the ethical/legal standards involved? What should the position be on music downloading where university-owned PCs are involved inso..

  Authority of the federal and state governments to regulate

Write a five-hundred word discussion. The concept of federalism is basic to our understanding of the authority of the federal and state governments to regulate business. The Constitution has a significantly different impact on the regulation of..

  Posting discusses the role of the ciowhat is the role of

posting discusses the role of the cio.what is the role of cios in an organization? what education work experience

  Operations of the global company

Subject: You are provided with a briefing on the operations of the global company "Enterprise rent-a-car" in the the Marketing kit folder.

  Describe company xyz is testing a sales software

Describe Company XYZ is testing a sales software. Their salesforce of 400 people is divided into four regions: Northeast, Southeast, Central and West. Each sales person is expected to sell the same amount of products.

  Increased mailing costs

Due to increased mailing costs, the new rate will cost publishers $67 million; this is 11.8% more than they paid the previous year. How much did it cost publishers last year? (Enter your answers in dollars not in millions. Round to the nearest ..

  Scenario for katrina candies

Predicting Price-Setting Strategies From the scenario for Katrina's Candies, determine the importance of predicting the pricing strategies of rival firms in an industry characterized by mutual interdependence.

  Understanding of the reading-implications of new knowledge

The final case study should demonstrate understanding of the reading as well as the implications of new knowledge. The paper should integrate Readings and Resources and class discussions into work and life experiences. It may include explanation a..

  Christianity and psychology

Will you please clarify rationale for agreeing or disagreeing with psychology is just sinful human beings sinfully thinking about sinful human beings?

  Organizational cultures impact on decision making

Find a qualitative, a quantitative and a mixed methodology study on any of the 3 mentioned topics above. Share the title of each and share the abstract of each in your own words.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd