Provide explanation about the applied binarisation technique

Assignment Help Basic Statistics
Reference no: EM131038163

Scenario

You have just started working as a data miner/analyst in the Analytics Unit of a company. The Head of the Analytics Unit has brought you a data set [a welcome present ;-­-))]. The data set includes two files: description of the attributes and a table with the actual values of these attributes. The Head of the Analytics Unit has mentioned to you that this is some sort of demographic data that a potential client has provided for analysis. The Head of the Analytics Unit would like to have a report with some insights about that data, that she could deliver to the client. Your tasks include:

- understanding the specifics of the data set
- extracting information about each of the attributes, possible associations between them and other specifics of the data set.

The tasks in the assignment are specified below.

Data sets

The description of the attributes is the same for all students and comes in a tiny documentation file (download it from UTS Online). Each student is assigned an individual table with the actual values of these attributes. Please, download the file that is linked to your name from UTS Online.

Tasks

1 A. Initial data exploration

1. Identify the type of each attribute (nominal, ordinal, interval or ratio). If it's not clear you may need to justify why you choose the type.

2. Identify the values of the summarising properties for each attribute including frequency, location and spread (e.g. value ranges of the attributes, frequency of values, distributions, medians, means, variances, percentiles, etc. -­- the statistics that have been covered in the lectures and materials given). Note that not all of these summary statistics will make sense for all the attribute types, so use your judgement! Where necessary, use proper visualisations for the corresponding statistics.

3. Using KNIME or other tools, explore your data set and identify any outliers, clusters of similar instances, "interesting" attributes and specific values of those attributes. Note that you may need to 'temporarily' recode attributes to numeric or from numeric to nominal. In the report include the corresponding snapshots from the tools and explanation of what has been identified there.

Present your findings in the assignment report.

1B. Data preprocessing

Perform each of the following data preparation tasks (each task applies to the original data):

a. Use the following binning techniques to smooth the values of the Age
attribute:
- equi-­-width binning
- equi-­-depth binning.

In the assignment report for each of these techniques you need to illustrate your steps. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet. Use your judgement in choosing the appropriate number of bins -­- and justify this in the report.

b. Use the following techniques to normalise the attribute Age:
- min-­-max normalization to transform the values onto the range [0.0-­-1.0].
- z-­-score normalization to transform the values.

In the assignment report provide explanation about each of the applied techniques. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.

c. Discretise the Age attribute into the following categories: Teenager = 1-­-20; Young = 21-­-30; Mid_Age = 31-­-45; Mature = 46-­-65; Old = 66+. Provide the frequency of each category in your data set.

In the assignment report provide explanation about each of the applied techniques. In your Excel workbook file place the results in a separate column in the corresponding spreadsheet.

d. Binarise the Education variable [with values "0" or "1"].

In the assignment report provide explanation about the applied binarisation technique. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.

Attachment:- Data.csv

Reference no: EM131038163

Questions Cloud

Explain research results described in the literature : Explain research results described in the literature and the methods used to obtain those results, List open questions that need to be addressed by research,
Write an appropriate null hypothesis for this analysis : Based on the scenario, what is the appropriate analysis for examining differences among the Asking for Help subscale scores based on the number of online courses completed? Write an appropriate null hypothesis for this analysis
A major detergent manufacturer proposes : A major detergent manufacturer proposes spending £2.5 million (US$4.5 million) on a twelve-month national advertising campaign to promote an environmentally friendly washing liquid. Advise the marketing director how the effectiveness of the campa..
Evaluate how useful consumer panels : Evaluate how useful consumer panels are in determining the most appropriate marketing mix components at the various stages of the PLC.
Provide explanation about the applied binarisation technique : 32130 Fundamentals of Data Analytics - Identify the values of the summarising properties for each attribute including frequency, location and spread (e.g. value ranges of the attributes, frequency of values, distributions, medians, means, varian..
Define sensitivity and specificity : Based on the outcome of the test, he decides who could participate in his iron supplementation trail. For this purpose, it is very important that diagnostic test has high............... ........
Discuss the circumstances whereby personal interviewing : Discuss the circumstances whereby personal interviewing would be likely to be favoured rather than postal survey techniques in marketing research.
Conduct a hambrick model to determine the companys strategy : Conduct a Hambrick Model to determine the company's strategy. Speculate on what would be considered Driving Forces and Key Success Factors in the industry.
Develop a work breakdown structure : Develop a work breakdown structure (WBS) for a project to build new student housing for 150 students on campus at LTU in Southfield, Mi. The WBS should have at a minimum 5 levels and should cover all areas associated with designing, engineering, cons..

Reviews

Write a Review

Basic Statistics Questions & Answers

  Determine probability of given event using binomial formula

In a binomial situation, n = 4 and 1r = .25. Determine the probabilities of the following events using the binomial formula.

  Find five empirical research articles

Research in the University Library to find five empirical (those containing data) research articles. Select articles that are of interest to you andinclude the citations at the end of this document. Place the information for each one in the follow..

  The data to the right represents the number of days of the

the data to the right represents the number of days of the growing season over the last 10 years in a certain area it

  Describe a queuing process in your work environment

Describe a problem in your workplace (or another workplace) that can be solved by each one of these methods and set up the solution for each

  You get to choose between two envelopes each of which

you get to choose between two envelopes each of which contains a check for some positive amount of money. unlike in the

  Use chebyshev-s theorem to find percent of values

Use Chebyshev's theorem to find what percent of the values will fall between 201 and 345 for a data set with a mean of 273 and standard deviation of 18.

  What is the best point estimate for the population mean

A student wanted to construct a 95% confidence interval for the average age of students in her statistics class. She randomly selected 9 students. Their average age was 19.1 years with a standard deviation of 1.5 years. What is the best point esti..

  Supply electric motors for commercial refrigerators

locations: Jung (Sweden), Reynosa (Mexico), Pohang (South Korea) and East Tamaki (New Zealand). The electric motors are manufactured by Nanuq Europe which has three plants capable of producing these motors. These plants are located in Gaydon (UK),..

  Problem related to minor defect in the brakes

A simple random sample of 100 of a certain popular model car in 2003 found that 20 had a certain minor defect in the brakes. A simple random sample of 400 of this model car in 2004 found that 50 had the minor defect in the brakes.

  The number of bank robberies that occur in a large north

the number of bank robberies that occur in a large north american city is poisson distributed with a mean of 1.8 per

  Problem related to pinochle deck of cards

A pinochle deck has 48 cards, two of each of six denominations (9, J, Q, K, 10, A) and the usual four suits. In a hand of 12 cards, what is the probability of getting a "bare" roundhouse, namely a king and queen of each suit and no other kings or ..

  Compute-interpret confidence interval using confience level

American youngsters aged 6 to 19, 15% were seriously overweight. Calculate and interpret a confidence interval using a 99% confience level for the proportion of all american youngsters.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd