14  Additional resources

14.1 Textbooks & Notes used

These textbooks and class notes provide the basis for much of the material and codes presented.

  1. (Shalizi 2013): Advanced data analysis from an elementary point of view

  2. (Hoff 2009) : A first course in Bayesian statistical methods

  3. Rob McCulloch’s homepage: https://www.rob-mcculloch.org/

  4. Richard Hahn’s homepage: https://math.la.asu.edu/~prhahn/resources.html

  5. (Gramacy 2020): Surrogates

  6. Hedibert Lopes homepage: https://hedibert.org/previous-teaching/

Some other useful resources that Demetri and Samantha have found useful or that have been strongly suggested to them:

  • Bayes Rules!: (Johnson, Ott, and Dogucu 2022) is a nice quarto bookdown project on Bayesian statistics.

  • Statistical Rethinking: (McElreath 2018) is a practical introduction to Bayesian statistics.

  • Gaussian Processes for Machine Learning: (Williams and Rasmussen 2006) is a cool textbook on Gaussian processes.

  • Bayesian Optimization: (Garnett 2023) is a good tutorial on Bayesian optimization, which includes a lot of mathematical detail as well.


  • Linear Models in Statistics: (Rencher and Schaalje 2008) is a good place to learn a lot of the linear algebra necessary to understand methodological statistical work.

  • Machine learning: a probabilistic perspective (version 1): (Murphy 2012) provide a thorough examination of many seminal topics in machine learning, with well motivated differentiation between different tools. In particular, this book casts many machine learning methods in terms of the output being a sum of many functions of the inputs and the parameters, which are learned from the data. Understanding this “adaptive basis” framework is a crucial step to a thorough comprehension of many modern statistical tools.


  • Modern Data Science with R: (Baumer, Kaplan, and Horton 2017) is an introduction to data analytics, and is a nice resource for data visualization. Demetri’s first statistics book. Has a big emphasis on baseball data and one of the authors was even a college player!

  • Python for Data Science: This bookdown project is a nice introduction to data analytics using Python.


  • Probability Theory: The Logic of Science: (Jaynes 2003) talks about the philosophy behind probability and Bayesian statistics.

  • Basic Statistics: (Blackwell 1969) is a classic statistical textbook.

  • All of Statistics and All of Non-Parametric Statistics: Both by the esteemed Larry Wasserman (Wasserman 2004) & (Wasserman 2006).

  • Introduction to Mathematical Statistics: (Hogg et al. 2013) A personal favorite undergraduate/graduate statistical text.


  • A Guide on Data Analysis : (Nguyen 2020) is a great bookdown with thorough explanations on a lot of topics.

  • Causal Inference: The Mixtape: (Cunningham 2021) have a nice bookdown as well. This is a really nice and accessible introduction to causal inference with many applications in economics and sociology.

  • Causal inference in R: (Barrett, McGowan D’Agostino, and Gerke 2025) once again, a good bookdown introduction to causal inference with R.

  • Counterfactuals and Causal Inference: (Morgan 2015) provide a more thorough rundown of the basis of causal inference, again with applications to the social sciences.

  • Statistics and Causal Inference: (Holland 1986) provide a really elegant overview of the fundamental difficulties in causal analyses. Here is a link to this highly recommended (and fairly brief, by academic standards) read.

  • Interpretable machine learning: Linked here, (Molnar 2020) is a nice summary of the interpretability in machine learning hoopla, with valid criticisms of the field. Supervised Machine Learning for Science is another read by the same author, which also looks promising (Molnar and Freiesleben 2024).

  • Identification for Prediction and Decision: (Manski 2009) is a good primer on statistical identification and with applications to the social sciences.

  • Data Analysis for Social Science: A Friendly and Practical Introduction: (Llaudet and Imai 2022) is another application book designed for quantitative social science.

  • Probability and Bayesian Modeling: (Albert and Hu 2019) is a nice book on Bayesian modeling with a specific focus on baseball applications. Albert and Hu wrote a few nice Bayesian textbooks with baseball applications, and there are actually a lot of baseball data you can scrape easily using the R package associated with the book.


  • Stats of 1: This initiative is promoting the idea that studies with only 1 person, where data on the individual constitutes the population, is the way to go for future medical research. Worth a glance at least.


  • Fancy data tables in R: A cool bookdown on how to make cool visuals in R.

  • D3.JS in observable: observable page with tutorials on how to make cool interactive visuals for webpages.

  • Bayesian data analysis blog: Blog from Danielle Navarro, focusing on Bayesian pharmacology. Fun writing style with cool plots and well thought out analyses.

  • R data science blog: Blog from Andrew Heiss with beautiful visualizations.

  • Another blog on Bayesian statistics and data science: This blog from Brian Lookabough has some really cool entries with fun data examples and interesting teaching points made via simulation. Additionally, the aesthetic appeal of the blog posts is high, and Brian does a good job documenting a proper data analysis workflow in the code process, which is a good trait for a young professional data science eager to enter the industry.

  • Probably overthinking it: A cool Bayesian statistics blog by Allen Downey.

  • storytelling with data: A cool handbook on data visualization by Cole Nussbaumer Knaflic. Good suggestions we try and follow, even if she makes the graphics in Excel!

A shoutout is also due for Andrej Prsa’s “Modeling” course given at Villanova in spring 2018, which motivated some topics included in this course that we do not often see in a statistics course. Other useful courses for the authors whose content/assignments helped motivate some of the topics or were the source of different datasets include classes taught by Sebastien Motsch, Nicolas Lanchier, Shuang Zhou, and Eric Kostelich at Arizona State University, as well as courses taught by Georgia Papaefthymiou-Davis, Michael Posner, and David Chuss at Villanova University.

General statistical and modeling discussions with Drew Herren, Richard Hahn, and Rafael Alcantara have been very helpful in shaping the contents of this course (knowingly or unknowingly to them). Similar discussions with JJ Ruby and the R&D department department with the Houston Astros, as well as with former colleagues Bryce Barclay, Vincent Mutolo, and Hezekiah Grayer also warrant significant praise.

Finally, we owe a great deal of debt to the wonderful stochtree project, the hard work of Andrew Herren,Richard Hahn, Jared Murray, and others. This project is a valiant effort and will hopefully help more people have access to BART, the most talked about model in these notes that we very strongly advocate for.

14.2 Color schemes

Click here for full code
options(warn=-1) 
suppressMessages(library(gt) )
df = data.frame( color=c('#55AD89', '#073d6d', '#012296','#3f59b5','#2552d9', '#263762', '#6f95d2', '#7BAFD4', '#6F372C', '#8b1a1a','#cd2626',                                  '#7c1d1d','#760e1e','#631b36','#551842','#7f3b08','#d47c17','#FD8700', '#32d9d9', '#028090','#42656b',   '#212E52','#404A79','#727AA0','#7D82BB', '#A3A7D2',                                        '#CED0E8',  '#A49FB6', '#592f8c', '#2d004b', '#674897', '#aba3cd', '#702963','#Da70D6',                    '#E0B0FF', '#f8f9fa'))%>%     dplyr::arrange(color) # Uncomment to order the colors  #noquote(paste0("'", df$color, "'", sep="", collapse=","))  
df %>%   gt() %>%      data_color(columns=vars(color),                          colors=c('#012296','#028090','#073d6d','#212E52','#2552d9','#263762','#2d004b','#32d9d9','#3f59b5','#404A79','#42656b','#551842','#55AD89','#592f8c','#631b36','#674897','#6F372C','#6f95d2','#702963','#727AA0','#760e1e','#7BAFD4','#7c1d1d','#7D82BB','#7f3b08','#8b1a1a','#A3A7D2','#A49FB6','#aba3cd','#cd2626','#CED0E8','#d47c17','#Da70D6','#E0B0FF','#f8f9fa','#FD8700'))
color
#012296
#028090
#073d6d
#212E52
#2552d9
#263762
#2d004b
#32d9d9
#3f59b5
#404A79
#42656b
#551842
#55AD89
#592f8c
#631b36
#674897
#6F372C
#6f95d2
#702963
#727AA0
#760e1e
#7BAFD4
#7D82BB
#7c1d1d
#7f3b08
#8b1a1a
#A3A7D2
#A49FB6
#CED0E8
#Da70D6
#E0B0FF
#FD8700
#aba3cd
#cd2626
#d47c17
#f8f9fa

14.3 Some helpful R functions

# Super helpful function 
#https://stackoverflow.com/questions/5831794/opposite-of-in-exclude-rows-with-values-specified-in-a-vector 
'%!in%' <- function(x,y)!('%in%'(x,y)) 
RMSE <- function(m, o){   sqrt(mean((m - o)^2)) } 
normalize <- function(x, na.rm=T){   return((x-min(x))/(max(x)-min(x))) }

14.5 Visual of gas prices in the U.S. by state (unique to us!)

A datawrapper equivalent: