Build the decision tree regression models utilising python

Assignment Help Other Subject
Reference no: EM132116889

Big Data Assignment -

Regression Models - Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning.

Regression models can be used to predict just about any variable of interest. A few examples include the following:

  • Predicting stock returns and other economic variables
  • Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default)
  • Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration)
  • Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns

In the different sections of this chapter, we will do the following:

Introduce the various types of regression models available in ML

  • Explore feature extraction and target variable transformation for regression models
  • Train a number of regression models using ML
  • Building a Regression Model with Spark
  • See how to make predictions using the trained model
  • Investigate the impact on performance of various parameter settings for regression using cross-validation

Types of regression models - The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also referred to as features or independent variables).

y = f(wTx)

Here, y is the target variable, w is the vector of parameters (known as the weight vector), and x is the vector of input features. wTx is the linear predictor (or vector dot product) of the weight vector w and feature vector x. To this linear predictor, we applied a function f (called the link function). Linear models can, in fact, be used for both classification and regression simply by changing the link function. Standard linear regression uses an identity link (that is, y = wTx directly), while binary classification uses alternative link functions as discussed here.

Spark's ML library offers different regression models, which are as follows:

  • Linear regression
  • Generalized linear regression
  • Logistical regression
  • Decision trees
  • Random forest regression
  • Gradient boosted trees
  • Survival regression
  • Isotonic regression
  • Ridge regression

Regression models define the relationship between a dependent variable and one or more independent variables. It builds the best model that fits the values of independent variables or features.

Linear regression unlike classification models such as support vector machines and logistic regression is used for predicting the value of a dependent variable with generalized value rather than predicting the exact class label.

Linear regression models are essentially the same as their classification counterparts, the only difference is that linear regression models use a different loss function, related link function, and decision function. Spark ML provides a standard least squares regression model (although other types of generalized linear models for regression are planned).

Assignment -

1. Utilising Python 3 Build the following regression models:

  • Decision Tree
  • Gradient Boosted Tree
  • Linear regression

2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle.

3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2

  • Gradient boost tree iterations
  • Gradient boost tree Max Bins

4. Build the following in relation to the decision tree and the dataset choosen in step 2

  • Decision Tree Categorical features
  • Decision Tree Log
  • Decision Tree Max Bins
  • Decision Tree Max Depth

5. Build the following in relation to the linear regression and the dataset choosen in step 2

a) Linear regression Cross Validation

  • Intercept
  • Iterations
  • Step size
  • L1 Regularization
  • L2 Regularization

b) Linear regression Log (see section 5.4)

6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1, 3, 4 and 5.

Attachment:- Assignment Files.rar

Verified Expert

Simple linear regression is an approach for predicting a response using a single feature. It is assumed that the two variables are linearly related. Hence, we try to find a linear function that predicts the response value(y) as accurately as possible as a function of the feature or independent variable(x).Let us consider a dataset where we have a value of response y for every feature x

Reference no: EM132116889

Questions Cloud

Time for the teachers devote in preparing their lessons : Time for the teachers devote in preparing their lessons. How to reduce the waiting times and formation of queues at bank counters?
Job between production of batches of heating elements : Is there enough time to do this job between production of batches of heating elements?
How you might select among the issues : For this Assignment, choose between the case studies entitled "Social Work Research: Couple Counseling" and "Social Work Research: Using Multiple Assessments."
Why is the answer 1600 : WHy is the answer 1600? Can someone explain the process please.
Build the decision tree regression models utilising python : ICT707 Big Data Assignment, University of the Sunshine Coast, Australia. Build the Decision Tree regression models Utilising Python
Conjures images of your career counselor in high school : Maybe it’s because the name itself conjures images of your career counselor in high school.
How your classmates would address differing views : You have been hired as an organization development consultant for a medium size manufacturing company. After analyzing the company, you suggest some change.
Explain the five elements that joey must prove to establish : Joey is suing Brandon Corporation for negligence because its delivery truck hit him while he was crossing the street.
The stigma surrounding mental illness : What practical ways can all of us stop the "stigma" surrounding mental illness? What federal laws could bear on this scenario? Choose at least three.

Reviews

inf2116889

11/20/2018 12:55:36 AM

I got a perfect graduate paper with sufficient references in the form of both in-text as well as an end-of-text citation. I am feeling highly obliged with the work quality that I was provided with and my class teacher also appreciated the paper in the class. The appreciation I got was just because of you so I would like to congratulate your whole team. Applauds!!

inf2116889

10/30/2018 3:22:21 AM

Need word file also for program and comments. Please check report part Do not write the program in report, make 1000 words report and describe logic only. Please write content part only, explain how programming concepts work... Please do not write program, the report has additional marks and they will not give marks in case programs are added in report part..

inf2116889

10/25/2018 2:52:53 AM

Experts Mind is a great assignment helper. Their extremely professional writer can turn around any kind of project in a PERFECT SOLUTION. They are great.

len2116889

9/19/2018 1:56:50 AM

The Big Data Assignment is comprised of two parts: The first part is to create the algorithms in the tasks, namely: Decision Tree, Gradient Boosted Tree and Linear regression and then to apply them to the bike sharing dataset provided. Try and produce the output given in the task sections (also given in the Big-Data Assignment.docx provided on Blackboard). The second part is then use those algorithms created in the first part and apply them to another dataset chosen from Kaggle (other than the bike sharing dataset provided).

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd