To illustrate the importance of model selection. We will use only the first four observations to estimate the function f such that ini t. h = f(h. d) for two different models.
We will then use the last three observations to compare the quality of the different estimations.
(a) We consider first a quadratic regression model: f(x) = ax2 + bx + c. We carry out the regression of h. d on the vector (ini t .h.init.h'")
f1=gH1: 4J
f2=g2 [1: 4J
LinReg1=lm(f2 ~ f1+I(f1A2))
summary (LinReg1)
What are the estimated values of a, b, c? What is the value of the coefficient of determination?
(b) We consider next the regression model: f(x) = ax2 + bx + c + dexp(x/20). We carry out the regression of h. d on the vector (ini t .h.init.h'' ,einit.h/20)
LinReg2=lm(f2 ~ f1+I(f1A2)+I(exp(f1/20)))
summary(LinReg2)
What are the estimated values of a, b, c, d? What is the value of the coefficient of determination? Is this sufficient to conclude that this model is better than the quadratic model ?
(c) To better answer the previous question, we compute and display the prediction on the last tree observations obtained for each of the previous models.
t=(2500:6000)/10
new=data. frame (f1=t)
pred1=predict(LinReg1, new, interval ="none")
pred2=predict(LinReg2, new, interval ="none")
par(bg='cornsilk')
plot(g1,g2,pch = 20,col="black",cex=2,ylim=c(0,1000))
points(t,pred1,pch = 20,col="blue",cex=0.2)
points(t,pred2,pch = 20,col="red",cex=0.2)
What is your conclusion?
(d) We are now using the package leaps of R to perform the model selection. library (leaps)
f1=g1 [1: 5]
f2=g2 [1: 5J
explain=matrix(c(f1,f1A2,exp(f1/20)),ncol=3)
leaps (x=explain, y=f2)
What do we use the instruction explain=matrix(c (£1, f1 A2,exp (£1/20)) ,ncol=3) for? Given the value of the Cp criterion, which one of the seven model do you select?