Summary
summary.RmdThis document attempts to list all the features of R and this package that are described in the other vignettes. The intent is that anything you’d need to know how to do for this class is listed here.
Getting Started
Various basic R ideas are covered in the Getting Started page, including how to:
- Open your class “Project”
- Use the panes in Rstudio
- Use the assignment operator
<-to save the result of an operation - Use functions, that is, be able to call a function with one or more (possibly named) parameters
- Use the pipe
|>to send a value to a function - Use the
cfunction to “combine” values together - Get unstuck by using [Control-C]) to cancel the current input line
- Load packages using the
libraryfunction - Create scripts and Quarto files and send commands from them to the console
- Use chunk options in Quarto to control the output (eg,
#| message: false) - Render Quarto files and open the result in your browser
About Data
The About Data page has details on how to read in data, compute descriptive summary statistics, working with data sets, and various functions for summarizing, comparing, creating new variables, and working with factors.
- Reading in data
- Best practices for creating a data set in a spreadsheet
- Reading in a data set using
read_csvandread_excel, with optional parametersna,skip, andsheet -
skimto check that it was read properly -
mutate,as_factor, and optionallyfct_recodeto create factor variables
- Descriptive statistics
-
descriptive_statisticsto get a summary of descriptive statistics for all variables; use the parameterbyto split by another variable first. -
countandmutate(optionally with.byparameter) to get counts and percents for categorical data -
summarize(optionally with.byparameter) to get summary statistics for continuous data
-
- Working with data
-
selectto select by column -
filterto select by row (see comparing functions) -
arrange(anddesc) to sort -
mutateto make new variables
-
- Functions for summarizing
-
countandmutate, to get counts and percents -
summarizefor continuous variables, withmean,sd,var,median,quantile(withprobsparameter),min,max,IQR, andn() - Use
na.rm = TRUEto remove missing values first
-
- Functions for comparing
-
<,<=,>,>=,==,!= %in%-
|,&,! is.na
-
- Functions for creating new variables
- arithmetic functions (
+,-,*,/) -
log,log10,log2 -
if_else,case_when,cut
- arithmetic functions (
- Functions for factors
as_factorfct_recodefct_relevel-
fct_infreq,fct_reorder droplevels
About Graphics
Controlling output
- Combining plots using
+and/ - Using
#| fig-widthand#| fig-heightto control figure size in Quarto
Basics of ggplot2
The ggplot2 library, which uses a “grammar of graphics”
to specify the aspects of a plot. The following pseudo-code plots data
from a data set data_set, and maps the variable
x_var to the x aesthetic, y_var
to the y aesthetic (more as needed), and then adds a
geometric object (XXX); these could be points, lines, or
bars. You can then optionally facet the plot, change the scales, change
the labels, and more.
ggplot(data_set,
mapping=aes(x=x_var, y=y_var, fill=fill_var, color=color_var)) +
geom_XXX() +
facet_XXX() +
scale_XXX() +
labs(...)
Scatterplots
-
geom_point()to add points - Color points by another variable by mapping it to the
coloraesthetic - Use
stat_smooth(), with optional parametersmethod="lm"andse=FALSEto add a fitted line - Use
scale_x_log10()orscale_y_log10to put thexoryaxes on the log scale
Bar plots
- only need an
xmapping; theywill be the count of thexvariable -
geom_bar()to have one bar per value -
geom_bars()to have multiple bars per value, with variable to color by specified by mapping a variable to thefillaesthetic
Histograms and Density plots
- also only need an
xmapping; theywill be computed appropriately - Use
geom_histogramto make a histogram; use parametersbinwidthandboundaryto control the bins - Use
geom_densityto make a density plot; use thecoloraesthetic to do separately by another variable
Box plots
-
geom_boxplot(), usually has a continuousyand a categoricalx- Flip
xandyto plot horizontally - If only a single continuous variable, say
var, usex=var, y=0to plot horizontally, and addhide_y_axis()
- Flip
- add points on top of the boxplot by
- first turn off outliers using
geom_boxplot(outlier.shape = NA) - then add swarmed points with
geom_beeswarm; use thespacingparameter to control the swarm; using the parameterspch=21andfill="white"also help to make the swarm more apparent
- first turn off outliers using
- Use
scale_x_log10()orscale_y_log10to put thexoryaxes on the log scale
Logistic Regression plots
- Use
geom_beeswarmwith a continuousxvariable and a binaryyvariable - Use
scale_y_binaryto make the y-axis on 0-1 scale - Use
geom_smooth_logisticto add a logistic smooth
Statistical Inference
These functions all use a “formula” notation, like this:
function(response ~ explanatory, data=dataset).
-
Inference for Proportions
one_proportion_inferencetwo_proportion_inferencepairwise_proportion_inferencepaired_proportion_inferenceindependence_test
-
Inference for Means
one_t_inferencetwo_t_inferencepairwise_t_inferencepaired_t_inference- These functions can all handle log-transformed responses, with a
backtransformparameter to specify whether output is on the log or original scale.
The functions about models (those starting with model_)
apply to linear and logistic models. They also have a
backtransform parameter to specify whether output is on the
log or original scale (for linear models with log-transformed response)
or on the logistic or probability scale (for logistic models).
- Fitting models:
- linear models:
lm(y ~ x, data = dataset) - logistic models:
glm(y ~ x, data = dataset, family=binomial) - For multiple predictors, use
~ x1 + x2for an additive model orx1 * x2to include interactions
- linear models:
-
Inference about Models
correlation_inferencemodel_anovamodel_glancemodel_coefs-
model_means,pairwise_model_means -
model_slopes,pairwise_model_slopes- For means and slopes, use
|in the formula to specify groupings andatto specify specific values to obtain the means or slopes at.
- For means and slopes, use
Additional Options
Several additional options for controlling the output are available.
-
as_gtto usegtformatting options -
tab_compactto change font size and spacing -
set_digitsto control rounding (except for p-values) -
fmt_pvalueto control rounding of p-values -
as_tibbleto get the underlying result as a data set - Using
+and|to run multiple tests at the same time -
combine_teststo combine results together in a single table