Develop a model using logistic regression

Assignment Help Advanced Statistics
Reference no: EM131374169

Question 1

Box office collection of 150 Bollywood movies were analysed using the variables described in Table 1.

Table 1. Data Dictionary

S.No

Variable

Variable Type

Code in SPSS output

1

Box office Collection (Y)

Numerical (in crores of rupees)

Box Office Collection

2

Release Time

Categorical with 4 levels

Releasing_Time_Festival Season Releasing_Time_Holiday Season Releasing_Time_Long Weekend Releasing_Time_Normal_Season

3

Genre

Categorical with 5 levels

Genre_Action (Action) Genre_Drama (Drama) Genre_Romance (Romance) Genre_Comedy (Comedy) Genre_Others (Other-G)

4

Movie Content

Categorical with 3 levels

Masala (Masala) Sequel (Sequel) Others (Other_C)

5

Director Category

Categorical with 3 levels

Director_A Director_B Director_O

6

Lead Actor Category

Categorical with 3 levels

Actor_A Actor_B Actor_O

7

Music Director Category

Categorical with 3 levels

Music_Dir_CAT A Music_Dir_CAT B Music_Dir_CAT C

8

Production House Category

Categorical with 3 levels

Prod_House_CAT A Prod_House_CAT B Prod_House_CAT C

9

Item Song

Binary variable

Item_Song (1 implies that the movie has an item song, 0 otherwise)

10

Budget

Numerical (in crores of rupees)

Budget

11

YouTube Views

Numerical

YouTube-V

12

YouTube Likes

Numerical

YouTube-L

13

YouTube Dislikes

Numerical

YouTube-D

14

Budget More than 35 crores

Categorical

Budget_35_Cr (1 if the budget is more than 35 crores 0 otherwise)

A simple linear regression model was developed between Box office collection and budget. SPSS output of the model is shown in Tables 2-3 and Figures 1-2.

Model 1

Y (Box Office Collection) = β0 + β1x Budget

Table 2. Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.650a

 

 

72.02261

a. Predictors: (Constant), Budget
b. Dependent Variable: Box_Office_Collection

Table 3. Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

T

Sig.

B

Std. Error

Beta

(Constant)

-8.354

8.535

 

 

.650

-.979

.329

1

 

 

 

 

Budget

2.175

.210

10.381

.000

a. Dependent Variable: Box_Office_Collection

2453_Figure.jpg

Figure 1. Normal P_P plot for Model 1

1273_Figure1.jpg

Figure 2. Residual plot for Model 1

Question 1.1

Which of the following statements are correct (more than one may be correct)? Tick (?) all right answers or highlight the correct statements with color.

1. The model explains 42.25% of variation in box office collection.
2. There are outliers in the model.
3. The residuals do not follow a normal distribution.
4. The model cannot be used since R-square is low.
5. Box office collection increases as the budget increases.

Question 1.2

Mr Chellappa, CEO of Oho Productions (OP) claims that the regression model in Table 3 is incorrect since it has negative constant value. Comment whether Mr Chellappa is correct in his assessment about the model.

A second model is developed between ln(Box office collection) and movie release time:

Model 2

ln(Y) = β0 + β1 x Release Time FestivalSeason + β2 x Release Time Long Weekend + β3 x Release Time Normal Season + ε

The regression output for Model 2 is given in Table 4.

Table 4 Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

(Constant)

2.685

.396

 

6.776

.000

Releasing_Time_Festival_Season

.727

.568

.136

1.278

.203

Releasing_Time Long_Weekend

1.247

.588

.221

2.122

.036

Releasing_Time Normal_Season

.147

.431

.041

.340

.734

a. Dependent Variable: Ln(Box Office Collection)

Question 1.3

What is the average difference in the box office collection when a movie is released during a holiday season (Releasing_Time_holiday_season) versus movies released during normal season (Releasing_Time_Normal_Season)? Use a significance value of 5%.

Question 1.4

Mr Chellappa of Oho productions claims that the movies released during long weekend (Releasing_Time_Long_Weekend) earn at least 5 crores more than the movies released during normal season (Releasing_Time_Normal_Season). Check whether this claim is true (use α = 0.05).

A stepwise regression model is developed between ln(Box Office Collection) and all the predictor variables listed in Table 1. The outputs are shown in Tables 5-6.

Table 5 Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.709a

.503

.499

1.20651

2

.763b

.581

.576

1.11050

3

.787c

.620

.612

1.06210

4

.802d

.643

.633

1.03307

5

.810e

 

 

1.01749

6

 

 

 

 

Table 6. Coefficients in the model (in the order in which it was added to the model)

Model

 

Unstandardized Coefficients

Standardized Coefficients

T

Correlations

 

 

B

Std. Error

Beta

Zero-order (direct)

Partial

Part

 

(Constant)

3.573

.249

 

14.346

 

 

 

 

Budget_35_Cr

1.523

.207

.443

7.342

.709

.525

.356

 

Youtube_Views

1.1710-07

.000

.242

4.426

.538

.348

.214

Step 6

Prod_House_CAT A

.562

.185

.165

3.033

.444

.247

.147

 

Music_Dir_CAT C

-.645

.199

-.177

-3.245

-.483

-.263

-.157

 

GenreComedy

.456

.197

.115

2.312

.006

.190

.112

 

Director_CAT C

-.434

.203

-.123

-2.143

-.509

-.177

-.104

Question 1.5

What is the variation in response variable, ln(Box office collection), explained by the model after adding all 6 variables?

Question 1.6

Which factor has the maximum impact on the box office collection of a movie? What will be your recommendation to a production house based on the variable that has maximum impact on the box office collection?

Question 1.7

Compare the regressions in Model 2 (Table 4) and Model 3 (Tables 5 and 6). None of the variables in Model 2 are statistically significant in Model 3. Can we conclude that the variables in Model 2 have no association relationship with Box Office Collection? Explain clearly.

Question 1.8

Among the variables in Table 6, which variable is not useful for practical application of the model? Clearly state your reasons.

Question 2

The yearly US Sales of domestically produced cars is collected for the period 1970-1999, along with the data on the following:
PriceIndex - CPI for Transportation
Income - Total Disposable Income in the US (billions of dollars) Interest - Prime Interest Rate (%) Charged by Banks

 

Year

Sales

PriceIndex

Income

Interest

Year

1

 

 

 

 

Sales

-0.5453

1.0000

 

 

 

PriceIndex

 

-0.6089

1.0000

 

 

Income

 

-0.5033

 

1.0000

 

Interest

 

-0.3842

 

 

1.0000

SPSS was used to carry out Stepwise Regression in order to predict Sales. The summary of the models fitted in the first 2 steps; the ANOVA table and Coefficients table obtained are given below.

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

 

 

 

 

2

.708b

.502

.465

760459.004

a. Predictors: (Constant),
b. Predictors: (Constant), , Interest
c. Dependent Variable: Sales

ANOVA

Model

Sum of Squares

 

df

 

Mean Square

 

F

 

Sig.

1

Regression

 

 

 

 

 

 

Residual

 

Total

2

Regression

1.571E13

 

 

 

.000b

 

Residual

1.561E13

 

Total

3.132E13

Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

 

 

 

t

 

 

 

Sig.

B

Std. Error

Beta

1

(Constant)

9102897.600

433224.149

 

-.609

21.012

.000

 

 

-17258.553

4248.694

-4.062

.000

2

(Constant)

1.023E7

576427.867

 

17.740

.000

 

 

-16873.654

3853.737

-.595

-4.379

.000

 

Interest

-124592.212

46820.081

-.362

-2.661

.013

a. Dependent Variable: Sales

Question 2.1

a) What is the predictor variable used in Model 1? Explain clearly.
b) What proportion of variation in Sales does this predictor variable explain in model 1? Explain clearly.
c) What is the Std. Error of the Estimate for Model 1? Explain clearly.

Question 2.2

a) What is the magnitude of the semipartial (or part) correlation for the variable ‘Interest' in Model 2? Explain.
b) Carry out an appropriate test, at 95% confidence level, to determine if Model 2 as a whole is valid (significant). State the null and alternate hypotheses and show all work.
c) Given no change in the other significant explanatory variables, can it be concluded from Model 2 that ‘Interest' has a higher impact on ‘Sales' than the other variable used in the model. Explain clearly.

Question 2.3: Can it be concluded, at 95% confidence level, that an increase in ‘Interest' rate by 5% decreases yearly Sales by at least 250000 units or more? Show all work.

Question 2.4: What can you say about the relationship between ‘Interest' and the other predictor variable used in Models 1 and 2? Explain clearly.

Question 2.5: The partial correlation of the excluded variables; after Model 2 was fitted; are 0.184 and Conduct an appropriate test, at 95% confidence level, to determine if one of these excluded variables should be added to the regression model. State the null and alternate hypotheses and show all work.

Question 3:

A large grocery store in the US wishes to understand the key drivers that determine the amount spent per transaction by their customers. Therefore, it obtained a random sample of 4000 transactions with information on the amount spent (Revenue), the product category on which the transaction was made (Product Family), the annual income of the customer (Annual income), the number of children in the household the customer belongs to, and finally whether the customer owns a home or not. A ‘snapshot' of part of the data is provided below.

Homeowner

Children

Annual Income

Product Family

Revenue

Y

2

$30K - $50K

Food

$27.38

Y

5

$70K - $90K

Food

$14.90

N

2

$50K - $70K

Food

$5.52

Y

3

$30K - $50K

Food

$4.44

Y

3

$130K - $150K

Drink

$14.00

Y

3

$10K - $30K

Food

$4.37

Y

2

$30K - $50K

Food

$13.78

Y

2

$150K +

Food

$7.34

Y

3

$10K - $30K

Non-Consumable

$2.41

N

1

$50K - $70K

Non-Consumable

$8.96

N

0

$30K - $50K

Food

$11.82

In order to enable regression analysis, the following indicator (dummy) variables were created: Own_Hm = 1 (Yes to Homeowner), 0 otherwise,

Ann_Inc2 = 1 (Annual Income in the range $30K - $50K), 0 otherwise Ann_Inc3 = 1 (Annual Income in the range $50K - $70K), 0 otherwise

Ann_Inc4 = 1 (Annual Income in the range $70K - $90K), 0 otherwise

Ann_Inc5 = 1 (Annual Income in the range $90K and above), 0 otherwise Prod_Fam2 = 1 (Product Family is Drink), 0 otherwise

Prod_Fam3 = 1 (Product Family is Non-Consumable), 0 otherwise.

The following outputs were generated using this data

Regression Output 1 (Revenue($) Response Var)

Regression Statistics

Multiple R

0.0340

R Square

0.0012

Adjusted R Square

0.0002

Standard Error

8.1499

  Observations              

4000.0000


ANOVA

 

df

SS

MS

F

Significance F

Regression

4.0000

306.4494

76.6123

1.1535

0.3294

Residual

3995.0000

265348.3672

66.4201

 

 

Total

3999.0000

265654.8166

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

 

Intercept

12.6841

0.2786

45.5352

0.0000

 

Ann_Inc2

0.2787

0.3588

0.7766

0.4374

 

Ann_Inc3

0.7617

0.4106

1.8551

0.0637

 

Ann_Inc4

0.4524

0.4712

0.9602

0.3370

 

Ann_Inc5

-0.0130

0.4229

-0.0307

0.9755

 

Regression Output 2 (Revenue($) Response Var) 

Regression Statistics

Multiple R

0.059

R Square

0.004

Adjusted R Square

0.003

Standard Error

8.137

  Observations                 

4000.000

ANOVA

 

 

 

 

 

 

 

df

 

SS

 

MS

 

F

Significance F

Regression

1.000

933.834

933.834

14.103

0.000

Residual

3998.000

264720.983

66.213

 

 

Total

3999.000

265654.817

 

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

 

t Stat

 

P-value

 

Intercept

12.136

0.255

47.564

0.000

 

Children

0.326

0.087

3.755

0.000

 

 Regression Output 3(Revenue($) Response Var)

 

Regression Statistics

Multiple R

0.046

 

 

 

 

R Square

0.002

 

 

 

 

Adjusted R Square

0.002

 

 

 

 

Standard Error

8.144

 

 

 

 

  Observations                

4000.000

 

 

 

 

ANOVA

 

 

 

 

 

 

 

df

 

SS

 

MS

 

F

Significance F

Regression

2.000

550.538

275.269

4.150

0.016

Residual

3997.000

265104.278

66.326

 

 

Total

3999.000

265654.817

 

 

 

 

 

Coefficients

Standard Error

 

t Stat

 

P-value

 

Intercept

13.192

0.152

86.958

0.000

 

Prod_Fam2

-0.975

0.458

-2.131

0.033

 

Prod_Fam3

-0.743

0.332

-2.240

0.025

 

 Regression Output 4 (Revenue($) Response Var)

Regression Statistics

Multiple R

0.075

R Square

0.006

Adjusted R Square

0.005

Standard Error

8.131

  Observations                  

4000.000

ANOVA

  df SS MS F Significance F
Regression 3 1496.798 498.933 7.548 0
Residual 3996 264158.02 66.106

Total 3999 265654.82      
  Coefficients Standard Error t Stat P-value
Intercept 12.361 0.267 46.345 0
Children 0.329 0.087 3.783 0
Prod_Fam2 -0.992 0.457 -2.171 0.03
Prod_Fam3 -0.747 0.331 -2.256 0.024

Regression Output 5 (Revenue($) Response Var) Child_Fam3 = Children*Prod_Fam3

Regression Statistics

Multiple R

0.080

R Square

0.006

Adjusted R Square

0.006

Standard Error

8.127

  Observations                       

4000.000

ANOVA

 

df

SS

MS

F

Sig F

Regression

3.000

1708.947

569.649

8.624

0.000

Residual

3996.000

263945.870

66.053

 

 

Total

3999.000

265654.817

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

 

Intercept

12.214

0.258

47.399

0.000

 

Children

0.393

0.090

4.379

0.000

 

Prod_Fam2

-1.010

0.455

-2.218

0.027

 

Child_Fam3

-0.322

0.112

-2.882

0.004

 

Use the information given above to answer the following questions.

For each question give adequate explanation and support your answer with given information precisely. Wherever required assume α = 0.05 for significance level.

a) Rank the income groups based on average revenue obtained per transaction in the sample data from largest to smallest. Provide precise reasons as to how you obtained this ranking. Is this ranking valid for the population? What is the average revenue per transaction obtained for the income group ($10K-$30K)?

b) The grocery store wishes to estimate the average amount spent per transaction on non- consumables. Provide the most accurate estimate possible. Provide details on how you obtained this estimate.

c) If in regression output 3, if the base chosen in product family is drinks (Prod_Fam2), then what will be the corresponding prediction equation?

d) Is there a significant difference in the average amount spent per transaction between that on drinks and non-consumables? Why or Why not? Provide precise reasons.

e) The grocery store wishes to target those customers, as well as items on which the amount spent is maximum. Assuming that no customer has more than five children, identify the appropriate customer segment as well as the appropriate product family. Provide precise reasons behind your answer.

f) What is the chance that a customer with 3 children will spend more than $10.00 on food items per transaction? Provide details on your calculations.

g) Do the number of children effect food purchases more than non-consumables? Why or why not? State your reasons precisely. (3 points)
h) If the grocery store has reason to believe that in addition to the independent variables considered in Regression Output 4, homeowners spend significantly more on non-consumables than non- home owners on any product category. If so, how will you modify the model provided in Regression Output 4? Provide the model in β terms. If you are adding new variables to the model, provide details on what you expect the β value to be. Positive? Negative?

Question 4

Go through the case, "Oakland A" and the spreadsheet supplement (Ref: Moodle/Cases and Materials/Module 3). Does mark Nobel increase attendance? If so, how much is the increase worth for Oakland? Support your decision through an appropriate regression model.

Question 5

Box office success of Bollywood movies was analysed using the following variables using logistic regression model. The data model is provided in the following table.

Sl. No

Variable

Variable Type

Code in SPSS output

1

Box office success (Y)

Categorical

1 = Success

0 = Failure

2

Release Data

Categorical     with                      4 levels

1 = Festival Season (FS) 2 = Holiday Season (HS)

 

 

 

3 = Long Weekend (LW) 4 = Other Season (OS)

3

Genre

Categorical     with                      5 levels

1 = Action (Action) 2 = Drama (Drama)

3 = Romance (Romance) 4 = Comedy (Comedy)

5 = Others (Other-G)

4

Movie Content

Categorical     with                      3 levels

Masala (Masala) Sequel (Sequel) Others (Other_C)

5

Director Category

Categorical     with                      3 levels

Director_A Director_B Director_O

6

Lead Actor Category

Categorical     with                      3 levels

Actor_A Actor_B Actor_O

7

Item Song

Binary variable

1 (Movie has an item song) 0 (otherwise)

8

Budget

Numerical (in crores of rupees)

Budget

9

YouTube Views

Numerical

YouTube-V

10

YouTube Likes

Numerical

YouTube-L

11

YouTube Dislikes

Numerical

YouTube-D

A logistic regression model was developed using Budget as independent variable and box office success as the dependent variable (ln(Π/(1-Π) = β0 + β1 x Budget.

The SPSS model-output is shown below (Tables 1-3)

Table 1 Omnibus Tests of Model Coefficients

 

Chi-square

df

Sig.

 

Step

4.000

1

.046

Step 1

Block

4.000

1

.046

 

Model

4.000

1

.046

Table 2 Classification Table

 

Observed

Predicted

 

Success   Failure

Percentage Correct

 

0

1

 

0

2

17

10.5

 

SuccessFaliure

 

 

 

Step 1

1

3

41

93.2

 

Overall Percentage

 

 

68.3

a. The cut value is .500

Table 3 Variables in the Equation

 

B

S.E.

Wald

df

Sig.

Exp(B)

Step 1a

Budget

-.016

.008

3.825

1

.050

.984

 

Constant

1.621

.503

10.395

1

.001

5.058

a. Variable(s) entered on step 1: Budget.

Question 5.1

Calculate the budget for which the box office success and failure are equally likely.

Question 5.2

Is there a sufficient evidence to conclude that the higher budget movies are more likely to fail at the box- office?

Question 5.3

A production house is making a movie with 100 crore budget; what is the success probability for this movie?

Question 5.4

Calculate the optimal cut-off probability when the cost of classifying failure at box office (0) as success at the box office (1) is five times costlier than the cost of classifying success (1) as failure (0). Show all calculations.

Step number: 1

 

Observed

Groups

and

Predicted

Probabilities

 

 

8

+

 

 

 

 

 

 

 

 

 

 

+

 

 

I

 

 

 

 

 

 

 

 

 

 

I

 

 

I

 

 

 

 

 

 

1

 

 

 

I

F

 

I

 

 

 

 

 

 

1

 

 

 

I

R

6

+

 

 

 

 

 

 

1

 

1

1

+

E

 

I

 

 

 

 

 

 

1

 

1

1

I

Q

 

I

 

 

 

 

 

 

1

1

1

1

I

U

 

I

 

 

 

 

 

 

1

1

1

1

I

E

4

+

 

 

 

 

 

 

0

0

11

111

+

N

 

I

 

 

 

 

 

 

0

0

11

111

I

C

 

I

 

 

 

 

 

1

0

0

11

1111

I

Y

 

I

 

 

 

 

 

1

0

0

11

1111

I

2

+

 

 

1

1

1

1

 

1

0

0

001111111

+

 

I

 

 

1

1

1

1

 

1

0

0

001111111

I

 

I

0  1

1 0 1

0

0

0

0

1

1

0

10

000111111

I

 

I

0  1

1 0 1

0

0

0

0

1

1

0

10

000111111

I





















Predicted ---------+---------+---------+---------+---------+---------+-------+---------+---------+---------- Prob:        0         .1        .2        .3      .4        .5      .6        .7        .8        .9        1

Group:        00000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111

Predicted Probability is of Membership for 1 The Cut Value is .50

Symbols: 0 - 0; 1 - 1; Each Symbol Represents .5 Cases.

Figure 1. Classification plot for model 1

A second model is developed using the variable, "item song", the SPSS output is shown in tables 4-5.

Table 4 Classification Table

 

Observed

Predicted

 

SuccessFaliure

Percentage Correct

 

0

1

 

0

11

8

57.9

 

SuccessFaliure

 

 

 

Step 1

1

20

24

54.5

 

Overall Percentage

 

 

55.6

a. The cut value is .700

Table 5 Variables in the Equation

 

B

S.E.

Wald

df

Sig.

Exp(B)

Step 1a

ItemSong

-.501

.202

6.151

1

.013

.606

 

Constant

1.099

.408

7.242

1

.007

3.000

a. Variable(s) entered on step 1: ItemSong.

Question 5.5

Calculate the difference in success probabilities for movies with item song and movies without item song.

Question 5.6

Which is a better model (budget as an independent variable vs item song as an independent variable). Clearly state your reasons.

A stepwise logistic regression model is shown in tables 6 and 7 using significance α = 0.10. 35_Cr_Budget is a derived variable which takes value 1 if the movie budget is more than 35 crores and 0 otherwise.

Table 6 Classification Table

 

Observed

Predicted

 

SuccessFaliure

Percentage Correct

 

0

1

 

Step 1

 

SuccessFaliure

0

14

5

73.7

1

17

27

61.4

Overall Percentage

 

 

65.1

 

Step 2

 

SuccessFaliure

0

14

5

73.7

1

10

34

77.3

Overall Percentage

 

 

76.2

 

Step 3

 

SuccessFaliure

0

12

7

63.2

1

9

35

79.5

Overall Percentage

 

 

74.6

 

Step 4

 

SuccessFaliure

0

13

6

68.4

1

9

35

79.5

Overall Percentage

 

 

76.2

 

Step 5

 

SuccessFaliure

0

15

4

78.9

1

9

35

79.5

Overall Percentage

 

 

79.4

 

Step 6

 

SuccessFaliure

0

13

6

68.4

1

10

34

77.3

Overall Percentage

 

 

74.6

a. The cut value is .700

Table 7 Variables in the Equation

 

B

S.E.

Wald

df

Sig.

Exp(B)

Step 1a

35_Cr_Budget

-1.492

.606

6.063

1

.014

.225

 

Constant

1.686

.487

11.998

1

.001

5.400

 

YoutubeL

.000

.000

4.294

1

.038

1.000

Step 2b

35_Cr_Budget

-2.227

.694

10.285

1

.001

.108

 

Constant

1.108

.550

4.055

1

.044

3.028

 

Budget

-.027

.017

2.356

1

.125

.974

Step 3c

YoutubeL

.000

.000

5.903

1

.015

1.000

35_Cr_Budget

-1.243

.911

1.860

1

.173

.289

 

Constant

1.596

.624

6.554

1

.010

4.935

 

Budget

-.034

.020

2.877

1

.090

.967

 

YoutubeL

.000

.000

6.858

1

.009

1.000

Step 4d

DirectorA

1.544

.890

3.008

1

.083

4.683

 

35_Cr_Budget

-1.621

.981

2.730

1

.098

.198

 

Constant

1.556

.650

5.733

1

.017

4.742

 

Budget

-.032

.021

2.393

1

.122

.969

 

YoutubeL

.000

.000

7.067

1

.008

1.000

Step 5e

DirectorA

1.669

.902

3.427

1

.064

5.308

ActorA

-1.327

.934

2.019

1

.155

.265

 

35_Cr_Budget

-1.046

1.038

1.015

1

.314

.351

 

Constant

1.972

.774

6.492

1

.011

7.187

 

Budget

-.043

.018

5.579

1

.018

.958

 

YoutubeL

.000

.000

7.370

1

.007

1.000

Step 6e

DirectorA

1.602

.895

3.206

1

.073

4.961

 

ActorA

-1.622

.862

3.543

1

.060

.197

 

Constant

2.132

.745

8.177

1

.004

8.429

a. Variable(s) entered on step 1: lt_35_Cr_Budget.

b. Variable(s) entered on step 2: YoutubeL.

c. Variable(s) entered on step 3: Budget.

d. Variable(s) entered on step 4: DirectorA.

e. Variable(s) entered on step 5: ActorA.

Question 5.7

Consider all the information in tables 1 to 7, which model you would recommend to predict the movie success at the box office? Clearly state your reasons.

Question 6

Read the case,"Breaking Barriers - Micro-mortgage analytics". Using the data provided, develop a credit rating model that Shubham can use. (Ref: Moodle/Cases and Materials/Module 3)

Question 7

A Micro-Mortgage company classifies customers into three categories (1, 2 and 3). Category 1 applicants are denied loan, Category 2 applicants are charged an interest rate of 14% per annum and Category 3 applicants are charged an interest rate of 18%. The variables considered in the model are shown below:

Sl. No

Variable

Variable Type

Code in SPSS output

1

Customer Classification

Categorical                          (3 levels)

1    = Category 1

2    = Category 2

3    = Category 3

2

Disposable Income

Numerical

DI

3

Loan to Value ratio

Numerical

LTV

4

Instalment to Income Ratio

Numerical

IIR

5

Marital Status

Categorical

MS = 1 = Married MS = 0 = Unmarried

6

Age

Numerical

Age

7

Old Emi

Categorical

Old Emi = 1 applicant with old EMI

 

Old emi = 0 otherwise

SPSS regression output using category 3 as base category is provided below:

 

Match_Oa

 

B

Standar d Error

 

Wald

 

Sig.

 

Exp(B)

1

Intercept

1.720

0.850

4.095

0.043

5.585

DI

-0.120

0.050

5.760

0.016

0.887

LTV

-0.521

0.260

4.015

0.045

0.594

IIR

-0.220

0.100

4.840

0.028

0.803

MS

0.850

0.620

1.880

0.170

2.340

AGE

-0.340

0.240

2.007

0.157

0.712

OLD EMI

-1.120

0.390

8.247

0.004

0.326

2

Intercept

0.650

0.280

5.389

0.020

1.916

DI

-0.580

0.270

4.615

0.032

0.560

LTV

0.960

0.850

1.276

0.259

2.612

IIR

0.540

0.320

2.848

0.092

1.716

MS

0.710

0.330

4.629

0.031

2.034

AGE

-0.220

0.150

2.151

0.142

0.803

OLD EMI

0.150

0.850

0.031

0.860

1.162

The reference category is 3.

Question 7.1
Comment whether the marital status has any statistical significance on the probability of loan denial. Clearly state your reasons.

Question 7.2
What percentage of the applicants with a DI=20, LTV = 0.5, IIR = 0.8, MS = 0 and Old EMI = 0 will be given a loan at 18% interest? Use only statistically significant variables and assume that the changes in the coefficient values are negligible due to dropping of insignificant variables.

Question 8
Read the case, "Fraud analytics at MCA technology solutions - Predicting Earnings Manipulation by Indian Firms". Develop a model using logistic regression and Random Forest to predict fraudulent transactions. (Ref: Moodle/Cases and Materials/Module 3)

Reference no: EM131374169

Questions Cloud

How colorism affects membership within hispanic group : How Africans in Mexico, Central and South America were absorbed into the Hispanic population, while generally being kept separate in the U.S. How colorism affects membership within the Hispanic/Latino group today (i.e. a Mexican who is Black)
Examine the risks and explain how they were managed : Examine the risks and explain how they were managed. Compare the double-loop learning with that of Escend Technologies, using the project, Discussion.
What are arguments for and against taking such a position : Do you think governments should consider human rights when granting preferential trading rights to countries? What are the arguments for and against taking such a position?
Explain the difference between natural gas and propane : List the safety precautions that should be observed when starting a gas furnace.
Develop a model using logistic regression : Read the case, "Fraud analytics at MCA technology solutions - Predicting Earnings Manipulation by Indian Firms". Develop a model using logistic regression and Random Forest to predict fraudulent transactions.
Who gains most from the antidumping duties : Reread the Management Focus, "U.S. Magnesium Seeks Protection." Who gains most from the antidumping duties levied by the United States on imports of magnesium from China and Russia? Who are the losers?
Identify three kinds of technology in an organization : Identify three kinds of technology in an organization. Explain how different types of technology create different competencies. Describe how technology impacts organizational culture
Summarize the article including major conclusions : Summarize the article including major conclusions. Give your analysis of how the information could impact your life or others. Include the reference with a link
Write a two page paper in which you outline best prapracticc : Assignment: Project Team Selection- Write a one to two page paper in which you outline best practices for project team selection.

Reviews

Write a Review

Advanced Statistics Questions & Answers

  How many seasons are in each repeating cycle

How many seasons are in each repeating cycle and Looking at the deseasonalized time series, what is the slope of a linear time trend in the deseasonalized data

  Find steady-state probabilities for each of the markov chain

Find the steady-state probabilities for each of the Markov chains in Figure 4.2. Assume that all clockwise probabilities in the first graph are the same, say p, and assume that P4,5 = P4,1 in the second graph.

  Show that the probability that is even

Let N(t) be the number of arrivals in the interval (0, t] for a Poisson process of rate λ. Show that the probability that N(t) is even is [1 + exp(-2λt)]/2.

  Find the steady-state pmf for the number of customers

Use the birth and death model described in Figure 6.4 to find the steady-state PMF for the number of customers in the system (queue plus service facility) for the following queues:

  Logistic regression

Foundations of Logistic Regression

  Percentage change in shareholder wealth

Facebook sold shares to investors at $38 each in its IPO. One year later, its stock price was hovering around $26. What was the percentage drop in Facebook shares in its first year as a public company?

  Quantitative analysis

State the hypotheses that you are going to test.

  Construct a scatter diagram of the data points

Construct a scatter diagram of the data points and plot the least squares regression line on it. Find the least squares regression line.

  Find the mean and variance of an idle period

An idle period starts when the server completes a service and there are no waiting arrivals; it ends on the next arrival. Find the mean and variance of an idle period. Are successive idle periods IID?

  Application of mathematics and computer programming

Financial Engineering is the application of mathematics and computer programming skills to solve certain problems in finance drawing on tools of statistics, economics, computer science and applied mathematics.

  Implications of statistical variation

What are the implications of statistical variation? Why are we interested in understanding and measuring variation? Besides using variation in the world of quality, there are also social implications. For example, what does statistical variation su..

  Airphone inc manufactures cellular telephones at a

q1. airphone inc. manufactures cellular telephones at a processing cost of 47 per unit.nbsp the company produces an

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd