To illustrate the importance of model selection. We will use only the first four observations to estimate the function *f *such that ini t. h = *f(h. *d) for two different models.

We will then use the last three observations to compare the quality of the different estimations.

(a) We consider first a quadratic regression model: *f**(x**) *= *a**x**2 *+ *b**x *+ c. We carry out the regression of h. d on the vector (ini t .h.init.h'")

f1=gH1: 4J

f2=g2 [1: 4J

LinReg1=lm(f2 ~ f1+I(f1A2))

summary (LinReg1)

What are the estimated values of *a**, **b**, *c? What is the value of the coefficient of determination?

(b) We consider next the regression model: *f**(x**) *= *a**x**2 *+ *b**x *+ c + *d**e**xp(**x/**20**)**. *We carry out the regression of h. d on the vector (ini t .h.init.h'' ,einit.h/20)

LinReg2=lm(f2 ~ f1+I(f1A2)+I(exp(f1/20)))

summary(LinReg2)

What are the estimated values of *a**, **b**, *c, *d**? *What is the value of the coefficient of determination? Is this sufficient to conclude that this model is better than the quadratic model ?

(c) To better answer the previous question, we compute and display the prediction on the last tree observations obtained for each of the previous models.

t=(2500:6000)/10

new=data. frame (f1=t)

pred1=predict(LinReg1, new, interval ="none")

pred2=predict(LinReg2, new, interval ="none")

par(bg='cornsilk')

plot(g1,g2,pch = 20,col="black",cex=2,ylim=c(0,1000))

points(t,pred1,pch = 20,col="blue",cex=0.2)

points(t,pred2,pch = 20,col="red",cex=0.2)

What is your conclusion?

(d) We are now using the package leaps of R to perform the model selection. library (leaps)

f1=g1 **[1: **5]

f2=g2 [1: 5J

explain=matrix(c(f1,f1A2,exp(f1/20)),ncol=3)

leaps (x=explain, y=f2)

What do we use the instruction explain=matrix(c (£1, f1 A2,exp (£1/20)) ,ncol=3) for? Given the value of the *C**p *criterion, which one of the seven model do you select?