Statistical analysis: A simple example
Descriptive statistics
I want to compare the racial difference in education, income, and voting behavior. Here is what I did:
library(reshape2)
library(pander)
panderOptions("table.style", "rmarkdown")
new <- melt(turnout, id = c("race", "age"))
newcast <- dcast(new, race ~ variable, mean)
pandoc.table(newcast, digits = 3)
race | educate | income | vote |
---|---|---|---|
others | 11 | 2.93 | 0.627 |
white | 12.2 | 4.05 | 0.766 |
The table can be further polished, but it does the job. The point is that the reshape2 package can transform the data into arbitrary format, which can then be converted into HTML table using the pander package.
Here is another way to do it. This time I use the “htmlTable” function in the Gmisc package (not yet in CRAN):
library(Gmisc)
# newcast <- as.matrix(newcast) rownames(newcast) <- NULL
rownames(newcast) <- newcast[, 1]
newcast <- newcast[, -1]
htmlTable(newcast, digits = 3, ctable = T, rowlabel = "Race")
Race | educate | income | vote |
---|---|---|---|
others | 11.0 | 2.93 | 0.627 |
white | 12.2 | 4.05 | 0.766 |
The second table looks a lot nicer than the first one.
Graphical exploration
ggplot(turnout, aes(age, educate)) + geom_bar(stat = "identity") + theme_few()
Estimation
Zelig is a powerful R package (Imai et al. 2008 ). It provides an unified interface to a large number of existing R packages for statistical estimation. In addition, it implements the powerful statistical simulation methodology proposed by King et al. (2000) .
The package “texreg” has a function called “htmlreg()'' that can be used to produce regression tables in R Markdown. Here is a simple example showing how "texreg” and “Zelig” can work together:
library(Zelig)
data(turnout)
z.out1 <- zelig(vote ~ age, model = "logit", data = turnout)
z.out2 <- zelig(vote ~ age + race, model = "logit", data = turnout)
z.out3 <- zelig(vote ~ age + race + educate, model = "logit", data = turnout)
The “texreg” package is very powerful and flexible. I also like the “stargazer” package but, unfortunately, it does not produce html code so it cannot be used here.
library(texreg)
htmlreg(list(z.out1, z.out2, z.out3), doctype = F, html.tag = F, inline.css = T,
head.tag = F, body.tag = F, center = F, single.row = T, caption = "")
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
(Intercept) | 0.55 (0.14)^{***} | 0.04 (0.18) | -3.05 (0.33)^{***} |
age | 0.01 (0.00)^{***} | 0.01 (0.00)^{***} | 0.03 (0.00)^{***} |
racewhite | 0.65 (0.13)^{***} | 0.38 (0.14)^{**} | |
educate | 0.22 (0.02)^{***} | ||
AIC | 2254.91 | 2234.82 | 2080.03 |
BIC | 2266.12 | 2251.62 | 2102.43 |
Log Likelihood | -1125.46 | -1114.41 | -1036.01 |
Deviance | 2250.91 | 2228.82 | 2072.03 |
Num. obs. | 2000 | 2000 | 2000 |
^{***}p < 0.001, ^{**}p < 0.01, ^{*}p < 0.05 |
Statistical simulation
Here are some simulation results:
x.out2 <- setx(z.out2, age = 36, race = "white")
x.out2
s.out2 <- sim(z.out2, x = x.out2)
plot(s.out2)
Now what do these figures show?
Of course, you also need R, Rstudio, knitr, ggplot2, ggthemes, pander, Gmisc, and knitcitations, which are all free. The source file can be found from GitHub.
REFERENCES
- Kosuke Imai, Gary King, Olivia Lau, (2008) Toward A Common Framework For Statistical Analysis And Development. Journal of Computational And Graphical Statistics 17 892-913 10.1198/106186008X384898
- Gary King, Michael Tomz, Jason Wittenberg, (2000) Making The Most of Statistical Analyses: Improving Interpretation And Presentation. American Journal of Political Science 44 347-NA 10.2307/2669316