## Bayesian linear regression for applied mathematicians' salary

In this document, I show a simple Bayesian linear regression model for applied mathematicians’ salary. I compare the same model with two different priors on the coefficients: normal and Laplace.

```
library(R2jags)
library(tidyverse)
library(GGally)
library(knitr)
library(gridExtra)
set.seed(124362)
theme_set(theme_bw())
opts_chunk$set(echo = T, message=F, warning=FALSE, cache = T, eval = F)
```

Consider the data given shown in the following table which describes a data set used to evaluate the relation between intermediate and senior level annual salaries of bachelor’s and master’s level mathematicians (\(y\), in thousand dollars) and an index of work quality (\(x1\)), number of years of experience (\(x2\)), and an index of publication success (\(x3\)).

```
salaries_string <-
" y x1 x2 x3
33.2 3.5 9.0 6.1
40.3 5.3 20.0 6.4
38.7 5.1 18.0 7.4
46.8 5.8 33.0 6.7
41.4 4.2 31.0 7.5
37.5 6.0 13.0 5.9
39.0 6.8 25.0 6.0
40.7 5.5 30.0 4.0
30.1 3.1 5.0 5.8
52.9 7.2 47.0 8.3
38.2 4.5 25.0 5.0
31.8 4.9 11.0 6.4
43.3 8.0 23.0 7.6
44.1 6.5 35.0 7.0
42.8 6.6 39.0 5.0
33.6 3.7 21.0 4.4
34.2 6.2 7.0 5.5
48.0 7.0 40.0 7.0
38.0 4.0 35.0 6.0
35.9 4.5 23.0 3.5
40.4 5.9 33.0 4.9
36.8 5.6 27.0 4.3
45.2 4.8 34.0 8.0
35.1 3.9 15.0 5.0
"
dat <- read.table(textConnection(salaries_string), header = T)
dat %>%
kable()
```

y | x1 | x2 | x3 |
---|---|---|---|

33.2 | 3.5 | 9 | 6.1 |

40.3 | 5.3 | 20 | 6.4 |

38.7 | 5.1 | 18 | 7.4 |

46.8 | 5.8 | 33 | 6.7 |

41.4 | 4.2 | 31 | 7.5 |

37.5 | 6.0 | 13 | 5.9 |

39.0 | 6.8 | 25 | 6.0 |

40.7 | 5.5 | 30 | 4.0 |

30.1 | 3.1 | 5 | 5.8 |

52.9 | 7.2 | 47 | 8.3 |

38.2 | 4.5 | 25 | 5.0 |

31.8 | 4.9 | 11 | 6.4 |

43.3 | 8.0 | 23 | 7.6 |

44.1 | 6.5 | 35 | 7.0 |

42.8 | 6.6 | 39 | 5.0 |

33.6 | 3.7 | 21 | 4.4 |

34.2 | 6.2 | 7 | 5.5 |

48.0 | 7.0 | 40 | 7.0 |

38.0 | 4.0 | 35 | 6.0 |

35.9 | 4.5 | 23 | 3.5 |

40.4 | 5.9 | 33 | 4.9 |

36.8 | 5.6 | 27 | 4.3 |

45.2 | 4.8 | 34 | 8.0 |

35.1 | 3.9 | 15 | 5.0 |

The data can be found in any of the following links:

- http://www.stat.purdue.edu/~boli/stat512/datasets/CH06PR18.DAT
- http://www.stat.purdue.edu/~wsharaba/stat512/Datasets/ch06pr18.dat
- http://www.stat.missouri.edu/~dsun/321/hw8.dat
- http://allman.rhon.itam.mx/~lnieto/index_archivos/salarios.txt
- http://www.utsc.utoronto.ca/~butler/r/mathsal.txt
- http://people.stat.sfu.ca/~pritamr/stat350_05/data/MathSalaries-data_dat.htm

Let’s suppose we’re interested in the relationship between the salary and the covariates already mentioned.

We can see the relationship between the variables in the following plot.

`ggpairs(dat) `

Salary has a positive correlation with the three covariates, although x3 seems to have a decreasing variance, while x1 and x2 seem to have a constant variance. We’ll ignore this for now and assume that there’s a linear relation with constant variance between the salary and the covariates.

### 1) Normal a priori distribution

We’ll adjust a linear model of the following form:

\[ Y_i = \beta_1 + \beta_2 x_{i1} + \beta_3 x_{i2} + \beta_4 x_{i3} + \varepsilon_i \\ \varepsilon \sim \mathrm{N}(\underline{0}, \lambda I) \\ \beta_j \sim \mathrm{N}(0, \lambda_j) \\ \lambda \sim \mathrm{Ga}(a, b) \]

for \(i = 1, 2, ..., n\), \(j = 1, ..., 4\), and \(n = 24\). Where \(\lambda\) is the precision, i.e., if \(\sigma^2\) is the variance, then \(\lambda = \frac{1}{\sigma^2}\). We’’l use non-informative prior distributions, so \(\lambda_j = 0.001\) for each \(j = 1, ..., 4\), \(a = 0.001\) and \(b = 0.001\).

We’ll implement the model in JAGS. The description of the model is:

```
mod_string_1 <- "
model
{
#Likelihood
for (i in 1:n) {
y[i] ~ dnorm(mu[i], lambda)
mu[i]<-beta[1]+beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i]
}
#Priors
for (j in 1:4) {
beta[j] ~ dnorm(0,0.001)
}
lambda ~ dgamma(0.001,0.001)
sigma = 1.0/sqrt(lambda)
#Prediction
for (i in 1:length(y)) {
yf1[i] ~ dnorm(mu[i], lambda)
}
}
"
```

We run a chain of 10,000 iterations and a warm-up of 1000.

We can see in the following plots the convergence for each parameter we’re monitoring. The subplot on the top left shows the convergence of the chains, on the top right we see the posterior distribution with the mean and 95% quantile-based probability interval, on the bottom left the cummulative mean of the parameter and on the bottom right the autocorrelation of the samples.

```
simulations_1 <- as_tibble(mod_jags_sims_1$BUGSoutput$sims.array) %>%
select(c(grep("beta", names(.)),
grep("sigma", names(.)))) %>%
set_names(c("beta_1", "beta_2", "beta_3", "beta_4", "sigma"))
for(x in names(simulations_1)){
check_chains(simulations_1, x)
}
```