Product Cover Image

R for Everyone: Advanced Analytics and Graphics

By Jared P. Lander

Published by Addison-Wesley Professional

Published Date: Dec 20, 2013

Description


Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals


Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution.

Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you’ll need to accomplish 80 percent of modern data tasks.

Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You’ll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques.

By the time you’re done, you won’t just know how to write R programs, you’ll be ready to tackle the statistical problems you care about most.

 

COVERAGE INCLUDES

• Exploring R, RStudio, and R packages

• Using R for math: variable types, vectors, calling functions, and more

• Exploiting data structures, including data.frames, matrices, and lists

• Creating attractive, intuitive statistical graphics

• Writing user-defined functions

• Controlling program flow with if, ifelse, and complex checks

• Improving program efficiency with group manipulations

• Combining and reshaping multiple datasets

• Manipulating strings using R’s facilities and regular expressions

• Creating normal, binomial, and Poisson probability distributions

• Programming basic statistics: mean, standard deviation, and t-tests

• Building linear, generalized linear, and nonlinear models

• Assessing the quality of models and variable selection

• Preventing overfitting, using the Elastic Net and Bayesian methods

• Analyzing univariate and multivariate time series data

• Grouping data via K-means and hierarchical clustering

• Preparing reports, slideshows, and web pages with knitr

• Building reusable R packages with devtools and Rcpp

• Getting involved with the R global community

 

Table of Contents

Foreword xiii

Preface xv

Acknowledgments xix

About the Author xxi

 

Chapter 1: Getting R 11.1 Downloading R 1

1.2 R Version 2

1.3 32-bit vs. 64-bit 2

1.4 Installing 2

1.5 Revolution R Community Edition 10

1.6 Conclusion 11

 

Chapter 2: The R Environment 13

2.1 Command Line Interface 14

2.2 RStudio 15

2.3 Revolution Analytics RPE 26

2.4 Conclusion 27

 

Chapter 3: R Packages 29

3.1 Installing Packages 29

3.2 Loading Packages 32

3.3 Building a Package 33

3.4 Conclusion 33

 

Chapter 4: Basics of R 35

4.1 Basic Math 35

4.2 Variables 36

4.3 Data Types 38

4.4 Vectors 43

4.5 Calling Functions 49

4.6 Function Documentation 49

4.7 Missing Data 50

4.8 Conclusion 51

 

Chapter 5: Advanced Data Structures 53

5.1 data.frames 53

5.2 Lists 61

5.3 Matrices 68

5.4 Arrays 71

5.5 Conclusion 72

 

Chapter 6: Reading Data into R 73

6.1 Reading CSVs 73

6.2 Excel Data 74

6.3 Reading from Databases 75

6.4 Data from Other Statistical Tools 77

6.5 R Binary Files 77

6.6 Data Included with R 79

6.7 Extract Data from Web Sites 80

6.8 Conclusion 81

 

Chapter 7: Statistical Graphics 83

7.1 Base Graphics 83

7.2 ggplot2 86

7.3 Conclusion 98

 

Chapter 8: Writing R Functions 99

8.1 Hello, World! 99

8.2 Function Arguments 100

8.3 Return Values 103

8.4 do.call 104

8.5 Conclusion 104

 

Chapter 9: Control Statements 105

9.1 if and else 105

9.2 switch 108

9.3 ifelse 109

9.4 Compound Tests 111

9.5 Conclusion 112

 

Chapter 10: Loops, the Un-R Way to Iterate 113

10.1 for Loops 113

10.2 while Loops 115

10.3 Controlling Loops 115

10.4 Conclusion 116

 

Chapter 11: Group Manipulation 117

11.1 Apply Family 117

11.2 aggregate 120

11.3 plyr 124

11.4 data.table 129

11.5 Conclusion 139

 

Chapter 12: Data Reshaping 141

12.1 cbind and rbind 141

12.2 Joins 142

12.3 reshape2 149

12.4 Conclusion 153

 

Chapter 13: Manipulating Strings 155

13.1 paste 155

13.2 sprintf 156

13.3 Extracting Text 157

13.4 Regular Expressions 161

13.5 Conclusion 169

 

Chapter 14: Probability Distributions 171

14.1 Normal Distribution 171

14.2 Binomial Distribution 176

14.3 Poisson Distribution 182

14.4 Other Distributions 185

14.5 Conclusion 186

 

Chapter 15: Basic Statistics 187

15.1 Summary Statistics 187

15.2 Correlation and Covariance 191

15.3 T-Tests 200

15.4 ANOVA 207

15.5 Conclusion 210

 

Chapter 16: Linear Models 211

16.1 Simple Linear Regression 211

16.2 Multiple Regression 216

16.3 Conclusion 232

 

Chapter 17: Generalized Linear Models 233

17.1 Logistic Regression 233

17.2 Poisson Regression 237

17.3 Other Generalized Linear Models 240

17.4 Survival Analysis 240

17.5 Conclusion 245

 

Chapter 18: Model Diagnostics 247

18.1 Residuals 247

18.2 Comparing Models 253

18.3 Cross-Validation 257

18.4 Bootstrap 262

18.5 Stepwise Variable Selection 265

18.6 Conclusion 269

 

Chapter 19: Regularization and Shrinkage 271

19.1 Elastic Net 271

19.2 Bayesian Shrinkage 290

19.3 Conclusion 295

 

Chapter 20: Nonlinear Models 297

20.1 Nonlinear Least Squares 297

20.2 Splines 300

20.3 Generalized Additive Models 304

20.4 Decision Trees 310

20.5 Random Forests 312

20.6 Conclusion 313

 

Chapter 21: Time Series and Autocorrelation 315

21.1 Autoregressive Moving Average 315

21.2 VAR 322

21.3 GARCH 327

21.4 Conclusion 336

 

Chapter 22: Clustering 337

22.1 K-means 337

22.2 PAM 345

22.3 Hierarchical Clustering 352

22.4 Conclusion 357

 

Chapter 23: Reproducibility, Reports and Slide Shows with knitr 359

23.1 Installing a LATEX Program 359

23.2 LATEX Primer 360

23.3 Using knitr with LATEX 362

23.4 Markdown Tips 367

23.5 Using knitr and Markdown 368

23.6 pandoc 369

23.7 Conclusion 371

 

Chapter 24: Building R Packages 373

24.1 Folder Structure 373

24.2 Package Files 373

24.3 Package Documentation 380

24.4 Checking, Building and Installing 383

24.5 Submitting to CRAN 384

24.6 C++ Code 384

24.7 Conclusion 390

 

Appendix A: Real-Life Resources 391

A.1 Meetups 391

A.2 Stackoverflow 392

A.3 Twitter 393

A.4 Conferences 393

A.5 Web Sites 393

A.6 Documents 394

A.7 Books 394

A.8 Conclusion 394

 

Appendix B: Glossary 395

 

List of Figures 409

List of Tables 417

General Index 419

Index of Functions 429

Index of Packages 433

Index of People 435

Data Index 437

Purchase Info

ISBN-10: 0-13-325714-2

ISBN-13: 978-0-13-325714-4

Format: eBook (Watermarked)?

This eBook includes the following formats, accessible from your Account page after purchase:

ePubEPUBThe open industry format known for its reflowable content and usability on supported mobile devices.

MOBIMOBIThe eBook format compatible with the Amazon Kindle and Amazon Kindle applications.

Adobe ReaderPDFThe popular standard, used most often with the free Adobe® Reader® software.

This eBook requires no passwords or activation to read. We customize your eBook by discretely watermarking it with your name, making it uniquely yours.

Includes EPUB, MOBI, and PDF

$31.99 $25.59

Add to Cart