# Introduction to Linear Regression Analysis, 5th Edition PDF By Douglas C. Montgomery, Elizabeth A. Peck and G. Geoffrey Vining

## Introduction to Linear Regression Analysis, Fifth Edition

By Douglas C. Montgomery, Elizabeth A. Peck and G. Geoffrey Vining CONTENTS

PREFACE xiii

1. INTRODUCTION 1

1.1 Regression and Model Building / 1

1.2 Data Collection / 5

1.3 Uses of Regression / 9

1.4 Role of the Computer / 10

1. SIMPLE LINEAR REGRESSION 12

2.1 Simple Linear Regression Model / 12

2.2 Least-Squares Estimation of the Parameters / 13

2.2.1 Estimation of β0 and β1 / 13

2.2.2 Properties of the Least-Squares Estimators

and the Fitted Regression Model / 18

2.2.3 Estimation of σ2 / 20

2.2.4 Alternate Form of the Model / 22

2.3 Hypothesis Testing on the Slope and Intercept / 22

2.3.1 Use of t Tests / 22

2.3.2 Testing Significance of Regression / 24

2.3.3 Analysis of Variance / 25

2.4 Interval Estimation in Simple Linear Regression / 29

2.4.1 Confidence Intervals on β0, β1 and σ2 / 29

2.4.2 Interval Estimation of the Mean Response / 30

2.5 Prediction of New Observations / 33

2.6 Coefficient of Determination / 35

2.7 A Service Industry Application of Regression / 37

2.8 Using SAS® and R for Simple Linear Regression / 39

2.9 Some Considerations in the Use of Regression / 42

2.10 Regression Through the Origin / 45

2.11 Estimation by Maximum Likelihood / 51

2.12 Case Where the Regressor x is Random / 52

2.12.1 x and y Jointly Distributed / 53

2.12.2 x and y Jointly Normally Distributed:

Correlation Model / 53

Problems / 58

1. MULTIPLE LINEAR REGRESSION 67

3.1 Multiple Regression Models / 67

3.2 Estimation of the Model Parameters / 70

3.2.1 Least-Squares Estimation of the Regression Coefficients / 71

3.2.2 Geometrical Interpretation of Least Squares / 77

3.2.3 Properties of the Least-Squares Estimators / 79

3.2.4 Estimation of σ2 / 80

3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression / 82

3.2.6 Maximum-Likelihood Estimation / 83

3.3 Hypothesis Testing in Multiple Linear Regression / 84

3.3.1 Test for Significance of Regression / 84

3.3.2 Tests on Individual Regression Coefficients and Subsets of Coefficients / 88

3.3.3 Special Case of Orthogonal Columns in X / 93

3.3.4 Testing the General Linear Hypothesis / 95

3.4 Confidence Intervals in Multiple Regression / 97

3.4.1 Confidence Intervals on the Regression Coefficients / 98

3.4.2 CI Estimation of the Mean Response / 99

3.4.3 Simultaneous Confidence Intervals on Regression

Coefficients / 100

3.5 Prediction of New Observations / 104

3.6 A Multiple Regression Model for the Patient Satisfaction Data / 104

3.7 Using SAS and R for Basic Multiple Linear Regression / 106

3.8 Hidden Extrapolation in Multiple Regression / 107

3.9 Standardized Regression Coefficients / 111

3.10 Multicollinearity / 117

3.11 Why Do Regression Coefficients Have the Wrong Sign? / 119

Problems / 121

4.1 Introduction / 129

4.2 Residual Analysis / 130

4.2.1 Definition of Residuals / 130

4.2.2 Methods for Scaling Residuals / 130

4.2.3 Residual Plots / 136

4.2.4 Partial Regression and Partial Residual Plots / 143

4.2.5 Using Minitab®, SAS, and R for Residual Analysis / 146

4.2.6 Other Residual Plotting and Analysis Methods / 149

4.3 PRESS Statistic / 151

4.4 Detection and Treatment of Outliers / 152

4.5 Lack of Fit of the Regression Model / 156

4.5.1 Formal Test for Lack of Fit / 156

4.5.2 Estimation of Pure Error from Near Neighbors / 160

Problems / 165

1. TRANSFORMATIONS AND WEIGHTING

5.1 Introduction / 171

5.2 Variance-Stabilizing Transformations / 172

5.3 Transformations to Linearize the Model / 176

5.4 Analytical Methods for Selecting a Transformation / 182

5.4.1 Transformations on y: The Box–Cox Method / 182

5.4.2 Transformations on the Regressor Variables / 184

5.5 Generalized and Weighted Least Squares / 188

5.5.1 Generalized Least Squares / 188

5.5.2 Weighted Least Squares / 190

5.5.3 Some Practical Issues / 191

5.6 Regression Models with Random Effect / 194

5.6.1 Subsampling / 194

5.6.2 The General Situation for a Regression Model with a Single Random Effect / 198

5.6.3 The Importance of the Mixed Model in Regression / 202

Problems / 202

1. DIAGNOSTICS FOR LEVERAGE AND INFLUENCE 211

6.1 Importance of Detecting Influential Observations / 211

6.2 Leverage / 212

6.3 Measures of Influence: Cook’s D / 215

6.4 Measures of Influence: DFFITS and DFBETAS / 217

6.5 A Measure of Model Performance / 219

6.6 Detecting Groups of Influential Observations / 220

6.7 Treatment of Influential Observations / 220

Problems / 221

1. POLYNOMIAL REGRESSION MODELS 223

7.1 Introduction / 223

7.2 Polynomial Models in One Variable / 223

7.2.1 Basic Principles / 223

7.2.2 Piecewise Polynomial Fitting (Splines) / 229

7.2.3 Polynomial and Trigonometric Terms / 235

7.3 Nonparametric Regression / 236

7.3.1 Kernel Regression / 237

7.3.2 Locally Weighted Regression (Loess) / 237

7.3.3 Final Cautions / 241

7.4 Polynomial Models in Two or More Variables / 242

7.5 Orthogonal Polynomials / 248

Problems / 254

1. INDICATOR VARIABLES 260

8.1 General Concept of Indicator Variables / 260

8.2 Comments on the Use of Indicator Variables / 273

8.2.1 Indicator Variables versus Regression on Allocated

Codes / 273

8.2.2 Indicator Variables as a Substitute for a Quantitative

Regressor / 274

8.3 Regression Approach to Analysis of Variance / 275

Problems / 280

1. MULTICOLLINEARITY 285

9.1 Introduction / 285

9.2 Sources of Multicollinearity / 286

9.3 Effects of Multicollinearity / 288

9.4 Multicollinearity Diagnostics / 292

9.4.1 Examination of the Correlation Matrix / 292

9.4.2 Variance Inflation Factors / 296

9.4.3 Eigensystem Analysis of X’X / 297

9.4.4 Other Diagnostics / 302

9.4.5 SAS and R Code for Generating Multicollinearity

Diagnostics / 303

9.5 Methods for Dealing with Multicollinearity / 303

9.5.1 Collecting Additional Data / 303

9.5.2 Model Respecification / 304

9.5.3 Ridge Regression / 304

9.5.4 Principal-Component Regression / 313

9.5.5 Comparison and Evaluation of Biased Estimators / 319

9.6 Using SAS to Perform Ridge and Principal-Component

Regression / 321

Problems / 323

1. VARIABLE SELECTION AND MODEL BUILDING 327

10.1 Introduction / 327

10.1.1 Model-Building Problem / 327

10.1.2 Consequences of Model Misspecification / 329

10.1.3 Criteria for Evaluating Subset Regression Models / 332

10.2 Computational Techniques for Variable Selection / 338

10.2.1 All Possible Regressions / 338

10.2.2 Step wise Regression Methods / 344

10.3 Strategy for Variable Selection and Model Building / 351

10.4 Case Study: Gorman and Toman Asphalt Data Using SAS / 354

Problems / 367

1. VALIDATION OF REGRESSION MODELS 372

11.1 Introduction / 372

11.2 Validation Techniques / 373

11.2.1 Analysis of Model Coefficients and Predicted Values / 373

11.2.2 Collecting Fresh Data—Confirmation Runs / 375

11.2.3 Data Splitting / 377

11.3 Data from Planned Experiments / 385

Problems / 386

1. INTRODUCTION TO NONLINEAR REGRESSION 389

12.1 Linear and Nonlinear Regression Models / 389

12.1.1 Linear Regression Models / 389

12.2.2 Nonlinear Regression Models / 390

12.2 Origins of Nonlinear Models / 391

12.3 Nonlinear Least Squares / 395

12.4 Transformation to a Linear Model / 397

12.5 Parameter Estimation in a Nonlinear System / 400

12.5.1 Linearization / 400

12.5.2 Other Parameter Estimation Methods / 407

12.5.3 Starting Values / 408

12.6 Statistical Inference in Nonlinear Regression / 409

12.7 Examples of Nonlinear Regression Models / 411

12.8 Using SAS and R / 412

Problems / 416

1. GENERALIZED LINEAR MODELS 421

13.1 Introduction / 421

13.2 Logistic Regression Models / 422

13.2.1 Models with a Binary Response Variable / 422

13.2.2 Estimating the Parameters in a Logistic Regression Model / 423

13.2.3 Interpretation of the Parameters in a Logistic Regression Model / 428

13.2.4 Statistical Inference on Model Parameters / 430

13.2.5 Diagnostic Checking in Logistic Regression / 440

13.2.6 Other Models for Binary Response Data / 442

13.2.7 More Than Two Categorical Outcomes / 442

13.3 Poisson Regression / 444

13.4 The Generalized Linear Model / 450

13.4.1 Link Functions and Linear Predictors / 451

13.4.2 Parameter Estimation and Inference

in the GLM / 452

13.4.3 Prediction and Estimation with the GLM / 454

13.4.4 Residual Analysis in the GLM / 456

13.4.5 Using R to Perform GLM Analysis / 458

13.4.6 Overdispersion / 461

Problems / 462

1. REGRESSION ANALYSIS OF TIME SERIES DATA 474

14.1 Introduction to Regression Models for Time Series Data / 474

14.2 Detecting Autocorrelation: The Durbin-Watson Test / 475

14.3 Estimating the Parameters in Time Series Regression Models / 480

Problems / 496

1. OTHER TOPICS IN THE USE OF REGRESSION ANALYSIS 500

15.1 Robust Regression / 500

15.1.1 Need for Robust Regression / 500

15.1.2 M-Estimators / 503

15.1 .3 Properties of Robust Estimators / 510

15.2 Effect of Measurement Errors in the Regressors / 511

15.2.1 Simple Linear Regression / 511

15.2.2 The Berkson Model / 513

15.3 Inverse Estimation—The Calibration Problem / 513

15.4 Bootstrapping in Regression / 517

15.4.1 Bootstrap Sampling in Regression / 518

15.4.2 Bootstrap Confidence Intervals / 519

15.5 Classification and Regression Trees (CART) / 524

15.6 Neural Networks / 526

15.7 Designed Experiments for Regression / 529

Problems / 537

APPENDIX A. STATISTICAL TABLES 541

APPENDIX B. DATA SETS FOR EXERCISES 553

APPENDIX C. SUPPLEMENTAL TECHNICAL MATERIAL 574

C.1 Background on Basic Test Statistics / 574

C.2 Background from the Theory of Linear Models / 577

C.3 Important Results on SSR and SSRes / 581

C.4 Gauss-Markov Theorem, Var(ε) = σ2I / 587

C.5 Computational Aspects of Multiple Regression / 589

C.6 Result on the Inverse of a Matrix / 590

C.7 Development of the PRESS Statistic / 591

C.8 Development of S2 (i) / 593

C.9 Outlier Test Based on R-Student / 594

C.10 Independence of Residuals and Fitted Values / 596

C.11 Gauss–Markov Theorem, Var(ε) = V / 597

C.12 Bias in MSRes When the Model Is Underspecified / 599

C.13 Computation of Influence Diagnostics / 600

C.14 Generalized Linear Models / 601

APPENDIX D. INTRODUCTION TO SAS 613

D.1 Basic Data Entry / 614

D.2 Creating Permanent SAS Data Sets / 618

D.3 Importing Data from an EXCEL File / 619

D.4 Output Command / 620

D.5 Log File / 620

D.6 Adding Variables to an Existing SAS Data Set / 622

APPENDIX E. INTRODUCTION TO R TO PERFORM LINEAR

REGRESSION ANALYSIS 623

E.1 Basic Background on R / 623

E.2 Basic Data Entry / 624

E.3 Brief Comments on Other Functionality in R / 626

E.4 R Commander / 627

REFERENCES 628

INDEX 642

This book is US\$10. Order for this book: