Public Health Data Tabulation with {gtsummary}

Abhijit P Pakhare

Introduction

Acknowledgements

drawing

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA4.0).

Creators of original presentation

Daniel D. Sjoberg

danieldsjoberg.com

@statistishdan

linkedin.com/in/ddsjoberg

github.com/ddsjoberg

Shannon Pileggi

pipinghotdata.com

@pipinghotdata

linkedin.com/in/shannon-m-pileggi

github.com/shannonpileggi

Karissa Whiting

karissawhiting.com

@karissawhiting

linkedin.com/in/karissa-whiting-48877a52

github.com/karissawhiting

Checklist

Install recent R release

Current version 4.2.3

Install RStudio

I am on version 2022.07.1+554

Install packages

install.packages(c("gtsummary", "tidyverse"))

Ensure you can knit Rmarkdown files

{gtsummary} overview

Create tabular summaries with sensible defaults but highly customizable
Types of summaries:
- “Table 1”-types
- Cross-tabulation
- Regression models
- Survival data
- Survey data
- Custom tables

Report statistics from {gtsummary} tables inline in R Markdown
Stack and/or merge any table type
Use themes to standardize across tables
Choose from different print engines

Example Dataset

The “steps.xlsx” data set is available for downloading under Dataset Tab
This is a randomly sampled subset of WHO STEPS Survey conducted for NCD risk factor surveillance
It consists of selected variables from all three steps of the survey.

library(gtsummary)
library(tidyverse)
library(rio)
library(here)
df <- import(here("datasets", "steps.xlsx"))
head(df) |> gt::gt()

Gender	Age	Age_Group	Education	SBP	DBP	BMI	LDL	Smoking	Physical_Inactivity	Raised_BP	District	WSTEP3
Women	60	59-69	Illiterate	116.6667	77.00000	16.10928	49.2	No	No	No	Balaghat	8398.779
Men	67	59-69	Higher Secondary +	166.6667	97.33333	25.29558	74.6	No	No	Yes	Satna	6269.137
Men	18	18-30	Higher Secondary +	127.6667	79.33333	22.17560	53.8	No	No	No	Ratlam	30503.785
Women	67	59-69	Illiterate	106.3333	67.00000	17.28791	42.0	No	Yes	No	Dindori	8898.991
Men	38	30-44	Secondary	114.0000	74.00000	18.14509	28.0	Yes	No	No	Damoh	16746.247
Women	58	44-59	Illiterate	158.3333	87.33333	26.18594	95.6	No	No	Yes	Ratlam	15545.019

Example Dataset

This presentation will use a variables from the dataset to demonstrate different tables.

Variable
Gender
Age
Age_Group
Education
SBP
DBP
BMI
LDL
Smoking
Physical_Inactivity
Raised_BP
District
WSTEP3

sm_trial <-
  trial |> 
  select(trt, age, grade, response)

tbl_summary()

Basic tbl_summary()

df |> 
  tbl_summary()

Characteristic	N = 3,000¹
Gender
Men	1,039 (35%)
Women	1,961 (65%)
Age	40 (30, 51)
Age_Group
18-30	691 (23%)
30-44	1,097 (37%)
44-59	799 (27%)
59-69	413 (14%)
Education
Higher Secondary +	832 (28%)
Illiterate	1,162 (39%)
Primary	493 (16%)
Secondary	513 (17%)
SBP	125 (114, 138)
Unknown	26
DBP	78 (71, 85)
Unknown	26
BMI	20.2 (18.0, 23.3)
Unknown	78
LDL	57 (42, 75)
Smoking	275 (9.2%)
Physical_Inactivity	594 (20%)
Raised_BP	825 (28%)
District
Balaghat	357 (12%)
Bhopal	438 (15%)
Damoh	218 (7.3%)
Dhar	429 (14%)
Dindori	141 (4.7%)
Hoshangabad	235 (7.8%)
Ratlam	281 (9.4%)
Satna	452 (15%)
Sheopur	115 (3.8%)
Shivpuri	334 (11%)
WSTEP3	16,031 (14,218, 17,604)
¹ n (%); Median (IQR)

Four types of summaries: continuous, continuous2, categorical, and dichotomous
Statistics are median (IQR) for continuous, n (%) for categorical/dichotomous
Variables coded 0/1, TRUE/FALSE, Yes/No treated as dichotomous
Lists NA values under “Unknown”
Label attributes are printed automatically

Customize tbl_summary() output

tbl_summary(
  df,
  by = Raised_BP,
  include = c(Gender, Age_Group, BMI)
)

Characteristic	No, N = 2,175¹	Yes, N = 825¹
Gender
Men	744 (34%)	295 (36%)
Women	1,431 (66%)	530 (64%)
Age_Group
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
BMI	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)
Unknown	70	8
¹ n (%); Median (IQR)

by: specify a column variable for cross-tabulation

Customize tbl_summary() output

tbl_summary(
  df,
  by = Raised_BP,
  include = c(Gender, Age_Group, BMI),
  type = BMI ~ "continuous2",
)

Characteristic	No, N = 2,175¹	Yes, N = 825¹
Gender
Men	744 (34%)	295 (36%)
Women	1,431 (66%)	530 (64%)
Age_Group
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
BMI
Median (IQR)	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)
Unknown	70	8
¹ n (%)

by: specify a column variable for cross-tabulation
type: specify the summary type

Customize tbl_summary() output

tbl_summary(
  df,
  by = Raised_BP,
  include = c(Gender, Age_Group, BMI),
  type = BMI ~ "continuous2",
  statistic = 
    list(
      BMI ~ c("{mean} ({sd})", 
              "{min}, {max}"), 
      Gender ~ "{n} / {N} ({p}%)"
    ),
)

Characteristic	No, N = 2,175¹	Yes, N = 825¹
Gender
Men	744 / 2,175 (34%)	295 / 825 (36%)
Women	1,431 / 2,175 (66%)	530 / 825 (64%)
Age_Group
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
BMI
Mean (SD)	20.5 (3.9)	22.5 (4.8)
Range	11.9, 47.8	13.5, 47.8
Unknown	70	8
¹ n / N (%); n (%)

by: specify a column variable for cross-tabulation
type: specify the summary type
statistic: customize the reported statistics

Customize tbl_summary() output

tbl_summary(
  df,
  by = Raised_BP,
  type = BMI ~ "continuous2",
  include = c(Gender, Age_Group, BMI),
  statistic = 
    list(
      BMI ~ c("{mean} ({sd})", 
              "{min}, {max}"), 
      Gender ~ "{n} / {N} ({p}%)"
    ),
  label = 
    Age_Group ~ "Age Group",
)

Characteristic	No, N = 2,175¹	Yes, N = 825¹
Gender
Men	744 / 2,175 (34%)	295 / 825 (36%)
Women	1,431 / 2,175 (66%)	530 / 825 (64%)
Age Group
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
BMI
Mean (SD)	20.5 (3.9)	22.5 (4.8)
Range	11.9, 47.8	13.5, 47.8
Unknown	70	8
¹ n / N (%); n (%)

by: specify a column variable for cross-tabulation
type: specify the summary type
statistic: customize the reported statistics

label: change or customize variable labels

Customize tbl_summary() output

tbl_summary(
  df,
  by = Raised_BP,
  type = BMI ~ "continuous2",
  include = c(Gender, Age_Group, BMI),
  statistic = 
    list(
      BMI ~ c("{mean} ({sd})", 
              "{min}, {max}"), 
      Gender ~ "{n} / {N} ({p}%)"
    ),
  label = 
    Age_Group ~ "Age Group",
  digits = list(all_continuous()~1,
                all_categorical()~ c(0,1))
)

Characteristic	No, N = 2,175¹	Yes, N = 825¹
Gender
Men	744 / 2,175.0 (34%)	295 / 825.0 (36%)
Women	1,431 / 2,175.0 (66%)	530 / 825.0 (64%)
Age Group
18-30	636 (29.2%)	55 (6.7%)
30-44	868 (39.9%)	229 (27.8%)
44-59	495 (22.8%)	304 (36.8%)
59-69	176 (8.1%)	237 (28.7%)
BMI
Mean (SD)	20.5 (3.9)	22.5 (4.8)
Range	11.9, 47.8	13.5, 47.8
Unknown	70	8
¹ n / N (%); n (%)

by: specify a column variable for cross-tabulation
type: specify the summary type
statistic: customize the reported statistics

label: change or customize variable labels
digits: specify the number of decimal places for rounding

{gtsummary} + formulas

Named list are OK too! label = list(age = "Patient Age")

Add-on functions in {gtsummary}

tbl_summary() objects can also be updated using related functions.

add_*() add additional column of statistics or information, e.g. p-values, q-values, overall statistics, treatment differences, N obs., and more
modify_*() modify table headers, spanning headers, footnotes, and more
bold_*()/italicize_*() style labels, variable levels, significant p-values

Update tbl_summary() with add_*()

df |>
  tbl_summary(
    by = Raised_BP
  ) |> 
  add_p()

Characteristic	No, N = 2,175¹	Yes, N = 825¹	p-value²
Gender			0.4
Men	744 (34%)	295 (36%)
Women	1,431 (66%)	530 (64%)
Age	35 (28, 46)	50 (40, 60)	<0.001
Age_Group			<0.001
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
Education			<0.001
Higher Secondary +	610 (28%)	222 (27%)
Illiterate	790 (36%)	372 (45%)
Primary	372 (17%)	121 (15%)
Secondary	403 (19%)	110 (13%)
SBP	119 (111, 127)	149 (142, 161)	<0.001
Unknown	26	0
DBP	75 (69, 80)	91 (83, 97)	<0.001
Unknown	26	0
BMI	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)	<0.001
Unknown	70	8
LDL	55 (42, 72)	61 (43, 82)	<0.001
Smoking	187 (8.6%)	88 (11%)	0.080
Physical_Inactivity	426 (20%)	168 (20%)	0.6
District			<0.001
Balaghat	236 (11%)	121 (15%)
Bhopal	303 (14%)	135 (16%)
Damoh	168 (7.7%)	50 (6.1%)
Dhar	292 (13%)	137 (17%)
Dindori	103 (4.7%)	38 (4.6%)
Hoshangabad	182 (8.4%)	53 (6.4%)
Ratlam	218 (10%)	63 (7.6%)
Satna	356 (16%)	96 (12%)
Sheopur	82 (3.8%)	33 (4.0%)
Shivpuri	235 (11%)	99 (12%)
WSTEP3	16,469 (14,982, 22,945)	14,644 (9,505, 16,230)	<0.001
¹ n (%); Median (IQR)
² Pearson’s Chi-squared test; Wilcoxon rank sum test

add_p(): adds a column of p-values

Update tbl_summary() with add_*()

df |>
  tbl_summary(
    by = Raised_BP,
    missing = "no"
  ) |> 
  add_overall()

Characteristic	Overall, N = 3,000¹	No, N = 2,175¹	Yes, N = 825¹
Gender
Men	1,039 (35%)	744 (34%)	295 (36%)
Women	1,961 (65%)	1,431 (66%)	530 (64%)
Age	40 (30, 51)	35 (28, 46)	50 (40, 60)
Age_Group
18-30	691 (23%)	636 (29%)	55 (6.7%)
30-44	1,097 (37%)	868 (40%)	229 (28%)
44-59	799 (27%)	495 (23%)	304 (37%)
59-69	413 (14%)	176 (8.1%)	237 (29%)
Education
Higher Secondary +	832 (28%)	610 (28%)	222 (27%)
Illiterate	1,162 (39%)	790 (36%)	372 (45%)
Primary	493 (16%)	372 (17%)	121 (15%)
Secondary	513 (17%)	403 (19%)	110 (13%)
SBP	125 (114, 138)	119 (111, 127)	149 (142, 161)
DBP	78 (71, 85)	75 (69, 80)	91 (83, 97)
BMI	20.2 (18.0, 23.3)	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)
LDL	57 (42, 75)	55 (42, 72)	61 (43, 82)
Smoking	275 (9.2%)	187 (8.6%)	88 (11%)
Physical_Inactivity	594 (20%)	426 (20%)	168 (20%)
District
Balaghat	357 (12%)	236 (11%)	121 (15%)
Bhopal	438 (15%)	303 (14%)	135 (16%)
Damoh	218 (7.3%)	168 (7.7%)	50 (6.1%)
Dhar	429 (14%)	292 (13%)	137 (17%)
Dindori	141 (4.7%)	103 (4.7%)	38 (4.6%)
Hoshangabad	235 (7.8%)	182 (8.4%)	53 (6.4%)
Ratlam	281 (9.4%)	218 (10%)	63 (7.6%)
Satna	452 (15%)	356 (16%)	96 (12%)
Sheopur	115 (3.8%)	82 (3.8%)	33 (4.0%)
Shivpuri	334 (11%)	235 (11%)	99 (12%)
WSTEP3	16,031 (14,218, 17,604)	16,469 (14,982, 22,945)	14,644 (9,505, 16,230)
¹ n (%); Median (IQR)

add_overall(): adds a column of overall statistics

Update tbl_summary() with add_*()

df |>
  tbl_summary(
    by = Raised_BP,
    missing = "no"
  ) |> 
  add_overall() |> 
  add_n()

Characteristic	N	Overall, N = 3,000¹	No, N = 2,175¹	Yes, N = 825¹
Gender	3,000
Men		1,039 (35%)	744 (34%)	295 (36%)
Women		1,961 (65%)	1,431 (66%)	530 (64%)
Age	3,000	40 (30, 51)	35 (28, 46)	50 (40, 60)
Age_Group	3,000
18-30		691 (23%)	636 (29%)	55 (6.7%)
30-44		1,097 (37%)	868 (40%)	229 (28%)
44-59		799 (27%)	495 (23%)	304 (37%)
59-69		413 (14%)	176 (8.1%)	237 (29%)
Education	3,000
Higher Secondary +		832 (28%)	610 (28%)	222 (27%)
Illiterate		1,162 (39%)	790 (36%)	372 (45%)
Primary		493 (16%)	372 (17%)	121 (15%)
Secondary		513 (17%)	403 (19%)	110 (13%)
SBP	2,974	125 (114, 138)	119 (111, 127)	149 (142, 161)
DBP	2,974	78 (71, 85)	75 (69, 80)	91 (83, 97)
BMI	2,922	20.2 (18.0, 23.3)	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)
LDL	3,000	57 (42, 75)	55 (42, 72)	61 (43, 82)
Smoking	3,000	275 (9.2%)	187 (8.6%)	88 (11%)
Physical_Inactivity	3,000	594 (20%)	426 (20%)	168 (20%)
District	3,000
Balaghat		357 (12%)	236 (11%)	121 (15%)
Bhopal		438 (15%)	303 (14%)	135 (16%)
Damoh		218 (7.3%)	168 (7.7%)	50 (6.1%)
Dhar		429 (14%)	292 (13%)	137 (17%)
Dindori		141 (4.7%)	103 (4.7%)	38 (4.6%)
Hoshangabad		235 (7.8%)	182 (8.4%)	53 (6.4%)
Ratlam		281 (9.4%)	218 (10%)	63 (7.6%)
Satna		452 (15%)	356 (16%)	96 (12%)
Sheopur		115 (3.8%)	82 (3.8%)	33 (4.0%)
Shivpuri		334 (11%)	235 (11%)	99 (12%)
WSTEP3	3,000	16,031 (14,218, 17,604)	16,469 (14,982, 22,945)	14,644 (9,505, 16,230)
¹ n (%); Median (IQR)

add_overall(): adds a column of overall statistics
add_n(): adds a column with the sample size

Update tbl_summary() with add_*()

df |>
  tbl_summary(
    by = Raised_BP,
    missing = "no"
  ) |> 
  add_overall() |> 
  add_n() |> 
  add_stat_label(
    label = all_categorical() ~ "No. (%)"
  )

Characteristic	N	Overall, N = 3,000	No, N = 2,175	Yes, N = 825
Gender, No. (%)	3,000
Men		1,039 (35%)	744 (34%)	295 (36%)
Women		1,961 (65%)	1,431 (66%)	530 (64%)
Age, Median (IQR)	3,000	40 (30, 51)	35 (28, 46)	50 (40, 60)
Age_Group, No. (%)	3,000
18-30		691 (23%)	636 (29%)	55 (6.7%)
30-44		1,097 (37%)	868 (40%)	229 (28%)
44-59		799 (27%)	495 (23%)	304 (37%)
59-69		413 (14%)	176 (8.1%)	237 (29%)
Education, No. (%)	3,000
Higher Secondary +		832 (28%)	610 (28%)	222 (27%)
Illiterate		1,162 (39%)	790 (36%)	372 (45%)
Primary		493 (16%)	372 (17%)	121 (15%)
Secondary		513 (17%)	403 (19%)	110 (13%)
SBP, Median (IQR)	2,974	125 (114, 138)	119 (111, 127)	149 (142, 161)
DBP, Median (IQR)	2,974	78 (71, 85)	75 (69, 80)	91 (83, 97)
BMI, Median (IQR)	2,922	20.2 (18.0, 23.3)	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)
LDL, Median (IQR)	3,000	57 (42, 75)	55 (42, 72)	61 (43, 82)
Smoking, No. (%)	3,000	275 (9.2%)	187 (8.6%)	88 (11%)
Physical_Inactivity, No. (%)	3,000	594 (20%)	426 (20%)	168 (20%)
District, No. (%)	3,000
Balaghat		357 (12%)	236 (11%)	121 (15%)
Bhopal		438 (15%)	303 (14%)	135 (16%)
Damoh		218 (7.3%)	168 (7.7%)	50 (6.1%)
Dhar		429 (14%)	292 (13%)	137 (17%)
Dindori		141 (4.7%)	103 (4.7%)	38 (4.6%)
Hoshangabad		235 (7.8%)	182 (8.4%)	53 (6.4%)
Ratlam		281 (9.4%)	218 (10%)	63 (7.6%)
Satna		452 (15%)	356 (16%)	96 (12%)
Sheopur		115 (3.8%)	82 (3.8%)	33 (4.0%)
Shivpuri		334 (11%)	235 (11%)	99 (12%)
WSTEP3, Median (IQR)	3,000	16,031 (14,218, 17,604)	16,469 (14,982, 22,945)	14,644 (9,505, 16,230)

add_overall(): adds a column of overall statistics
add_n(): adds a column with the sample size
add_stat_label(): adds a description of the reported statistic

Update with bold_()/italicize_()

df |>
  tbl_summary(
    by = Raised_BP
  ) |>
  add_p() |> 
  bold_labels() |> 
  italicize_levels() |> 
  bold_p(t = 0.8)

Characteristic	No, N = 2,175¹	Yes, N = 825¹	p-value²
Gender			0.4
Men	744 (34%)	295 (36%)
Women	1,431 (66%)	530 (64%)
Age	35 (28, 46)	50 (40, 60)	<0.001
Age_Group			<0.001
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
Education			<0.001
Higher Secondary +	610 (28%)	222 (27%)
Illiterate	790 (36%)	372 (45%)
Primary	372 (17%)	121 (15%)
Secondary	403 (19%)	110 (13%)
SBP	119 (111, 127)	149 (142, 161)	<0.001
Unknown	26	0
DBP	75 (69, 80)	91 (83, 97)	<0.001
Unknown	26	0
BMI	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)	<0.001
Unknown	70	8
LDL	55 (42, 72)	61 (43, 82)	<0.001
Smoking	187 (8.6%)	88 (11%)	0.080
Physical_Inactivity	426 (20%)	168 (20%)	0.6
District			<0.001
Balaghat	236 (11%)	121 (15%)
Bhopal	303 (14%)	135 (16%)
Damoh	168 (7.7%)	50 (6.1%)
Dhar	292 (13%)	137 (17%)
Dindori	103 (4.7%)	38 (4.6%)
Hoshangabad	182 (8.4%)	53 (6.4%)
Ratlam	218 (10%)	63 (7.6%)
Satna	356 (16%)	96 (12%)
Sheopur	82 (3.8%)	33 (4.0%)
Shivpuri	235 (11%)	99 (12%)
WSTEP3	16,469 (14,982, 22,945)	14,644 (9,505, 16,230)	<0.001
¹ n (%); Median (IQR)
² Pearson’s Chi-squared test; Wilcoxon rank sum test

bold_labels(): bold the variable labels
italicize_levels(): italicize the variable levels
bold_p(): bold p-values according a specified threshold

Update tbl_summary() with modify_*()

tbl <-
  df |> 
  tbl_summary(by = Raised_BP, 
              missing = "no") |>
  modify_header(
      stat_1 ~ "**No raised BP**",
      stat_2 ~ "**Raised BP**"
  ) |> 
  modify_spanning_header(
    all_stat_cols() ~ "**BP Status**") |> 
  modify_footnote(
    all_stat_cols() ~ 
      paste("median (IQR) for continuous;",
            "n (%) for categorical")
  )
tbl

Characteristic	BP Status
Characteristic	No raised BP¹	Raised BP¹
Gender
Men	744 (34%)	295 (36%)
Women	1,431 (66%)	530 (64%)
Age	35 (28, 46)	50 (40, 60)
Age_Group
18-30	636 (29%)	55 (6.7%)
30-44	868 (40%)	229 (28%)
44-59	495 (23%)	304 (37%)
59-69	176 (8.1%)	237 (29%)
Education
Higher Secondary +	610 (28%)	222 (27%)
Illiterate	790 (36%)	372 (45%)
Primary	372 (17%)	121 (15%)
Secondary	403 (19%)	110 (13%)
SBP	119 (111, 127)	149 (142, 161)
DBP	75 (69, 80)	91 (83, 97)
BMI	19.7 (17.7, 22.4)	21.8 (18.9, 25.3)
LDL	55 (42, 72)	61 (43, 82)
Smoking	187 (8.6%)	88 (11%)
Physical_Inactivity	426 (20%)	168 (20%)
District
Balaghat	236 (11%)	121 (15%)
Bhopal	303 (14%)	135 (16%)
Damoh	168 (7.7%)	50 (6.1%)
Dhar	292 (13%)	137 (17%)
Dindori	103 (4.7%)	38 (4.6%)
Hoshangabad	182 (8.4%)	53 (6.4%)
Ratlam	218 (10%)	63 (7.6%)
Satna	356 (16%)	96 (12%)
Sheopur	82 (3.8%)	33 (4.0%)
Shivpuri	235 (11%)	99 (12%)
WSTEP3	16,469 (14,982, 22,945)	14,644 (9,505, 16,230)
¹ median (IQR) for continuous; n (%) for categorical

Use show_header_names() to see the internal header names available for use in modify_header()

Column names

show_header_names(tbl)

Column Name	Column Header
label	Characteristic
stat_1	No raised BP
stat_2	Raised BP

all_stat_cols() selects columns "stat_1" and "stat_2"

Update tbl_summary() with add_*()

df |>
  select(Raised_BP, SBP, BMI, LDL, Smoking) |>
  tbl_summary(
    by = Raised_BP,
    statistic = list(c(SBP, BMI, LDL) ~ "{mean} ({sd})",
                     Smoking ~ "{p}%"),
    missing = "no"
  ) |> 
  add_difference()

Characteristic	No, N = 2,175¹	Yes, N = 825¹	Difference²	95% CI^2,3	p-value²
SBP	119 (11)	153 (21)	-34	-35, -32	<0.001
BMI	20.5 (3.9)	22.5 (4.8)	-2.0	-2.4, -1.7	<0.001
LDL	58 (28)	64 (31)	-5.4	-7.9, -3.0	<0.001
Smoking	8.6%	11%	-2.1%	-4.6%, 0.43%	0.092
¹ Mean (SD); %
² Welch Two Sample t-test; Two sample test for equality of proportions
³ CI = Confidence Interval

add_difference(): mean and rate differences between two groups. Can also be adjusted differences

Update tbl_summary() with add_*()

df |>
  tbl_summary(
    by = Raised_BP,
    missing = "no"
  ) |> 
  add_stat(...)

Customize statistics presented with add_stat()
Added statistics can be placed on the label or the level rows
Added statistics may be a single column or multiple

Add-on functions in {gtsummary}

And many more!

See the documentation at http://www.danieldsjoberg.com/gtsummary/reference/index.html

And a detailed tbl_summary() vignette at http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html

Cross-tabulation with tbl_cross()

tbl_cross() is a wrapper for tbl_summary() for n x m tables

df |>
  tbl_cross(
    row = District, 
    col = Raised_BP,
    percent = "row",
    margin = "row"
  ) |>
  add_p(source_note = TRUE) |>
  bold_labels()

	Raised_BP
	No	Yes
District
Balaghat	236 (66%)	121 (34%)
Bhopal	303 (69%)	135 (31%)
Damoh	168 (77%)	50 (23%)
Dhar	292 (68%)	137 (32%)
Dindori	103 (73%)	38 (27%)
Hoshangabad	182 (77%)	53 (23%)
Ratlam	218 (78%)	63 (22%)
Satna	356 (79%)	96 (21%)
Sheopur	82 (71%)	33 (29%)
Shivpuri	235 (70%)	99 (30%)
Total	2,175 (72%)	825 (28%)
Pearson’s Chi-squared test, p<0.001

Continuous Summaries with tbl_continuous()

tbl_continuous() summarizes a continuous variable by 1, 2, or more categorical variables

df |>
  tbl_continuous(
    variable = SBP,
    by = Raised_BP,
    include = c(Age_Group, Gender)
  )

Characteristic	No, N = 2,175¹	Yes, N = 825¹
Age_Group
18-30	116 (109, 125)	141 (134, 144)
30-44	119 (112, 127)	145 (140, 153)
44-59	121 (115, 129)	150 (142, 161)
59-69	125 (116, 132)	157 (147, 175)
Gender
Men	122 (113, 129)	148 (142, 160)
Women	118 (111, 126)	149 (142, 161)
¹ SBP: Median (IQR)

Survey data with tbl_svysummary()

survey::svydesign(
  ids = ~1, 
  data = df, 
  weights = ~WSTEP3
) |>
  tbl_svysummary(
    by = Raised_BP,
    include = c(Age_Group, Gender, District),
    percent = "row"
  ) |>
  add_p() |>
  modify_spanning_header(
    all_stat_cols() ~ "**Raised BP**")

Characteristic	Raised BP		p-value²
Characteristic	No, N = 39,432,956¹	Yes, N = 11,491,276¹	p-value²
Age_Group			<0.001
18-30	16,646,634 (92%)	1,445,876 (8.0%)
30-44	14,073,391 (79%)	3,635,210 (21%)
44-59	7,322,516 (62%)	4,524,442 (38%)
59-69	1,390,415 (42%)	1,885,747 (58%)
Gender			0.9
Men	13,472,904 (78%)	3,889,460 (22%)
Women	25,960,052 (77%)	7,601,815 (23%)
District			0.001
Balaghat	3,785,643 (71%)	1,533,116 (29%)
Bhopal	5,737,817 (75%)	1,935,086 (25%)
Damoh	3,123,656 (81%)	717,595 (19%)
Dhar	4,812,133 (72%)	1,829,268 (28%)
Dindori	1,843,049 (78%)	505,601 (22%)
Hoshangabad	3,468,386 (82%)	750,387 (18%)
Ratlam	4,065,569 (81%)	931,610 (19%)
Satna	6,266,329 (82%)	1,414,737 (18%)
Sheopur	1,814,049 (77%)	536,462 (23%)
Shivpuri	4,516,325 (77%)	1,337,415 (23%)
¹ n (%)
² chi-squared test with Rao & Scott’s second-order correction

Survey data with tbl_svysummary()

survey::svydesign(
  ids = ~1, 
  data = df, 
  weights = ~WSTEP3
) |>
  tbl_svysummary(
    by = Raised_BP,
    include = c(Age_Group, Gender, District),
    percent = "row",
    digits = list(all_categorical() ~ 1),
    statistic = list(all_categorical() ~ "{p}")
  ) |>
  add_p() |>
  modify_spanning_header(
    all_stat_cols() ~ "**Raised BP**")

Characteristic	Raised BP		p-value²
Characteristic	No, N = 39,432,956¹	Yes, N = 11,491,276¹	p-value²
Age_Group			<0.001
18-30	92.0	8.0
30-44	79.5	20.5
44-59	61.8	38.2
59-69	42.4	57.6
Gender			0.9
Men	77.6	22.4
Women	77.3	22.7
District			0.001
Balaghat	71.2	28.8
Bhopal	74.8	25.2
Damoh	81.3	18.7
Dhar	72.5	27.5
Dindori	78.5	21.5
Hoshangabad	82.2	17.8
Ratlam	81.4	18.6
Satna	81.6	18.4
Sheopur	77.2	22.8
Shivpuri	77.2	22.8
¹ %
² chi-squared test with Rao & Scott’s second-order correction

tbl_regression()

Traditional model summary()

df <- df %>% 
  mutate(Raised_BP = if_else(Raised_BP=="Yes", 1,0))
m1 <- 
  glm(
    Raised_BP ~ Age + Smoking + BMI,
    data = df,
    family = binomial(link = "logit")
  )

summary(m1)

Call: glm(formula = Raised_BP ~ Age + Smoking + BMI, family = binomial(link = “logit”), data = df)

Deviance Residuals: Min 1Q Median 3Q Max
-2.4395 -0.7723 -0.4914 0.9125 2.4430

Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.154933 0.298385 -20.628 <2e-16 Age 0.067720 0.003669 18.460 <2e-16 SmokingYes -0.008596 0.148722 -0.058 0.954
BMI 0.104944 0.010593 9.907 <2e-16 ***

Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3463.0  on 2921  degrees of freedom

Residual deviance: 2930.8 on 2918 degrees of freedom (78 observations deleted due to missingness) AIC: 2938.8

Number of Fisher Scoring iterations: 4

Looks messy and it’s not easy to digest

Basic tbl_regression()

::: {.cell output-location=‘column’}

tbl_regression(m1)

Characteristic	log(OR)¹	95% CI¹	p-value
Age	0.07	0.06, 0.07	<0.001
Smoking
No	—	—
Yes	-0.01	-0.30, 0.28	0.9
BMI	0.10	0.08, 0.13	<0.001
¹ OR = Odds Ratio, CI = Confidence Interval

:::

Displays p-values for covariates
Shows reference levels for categorical variables
Model type recognized as logistic regression with odds ratio appearing in header

Customize tbl_regression() output

::: {.cell output-location=‘column’}

tbl_regression(
  m1,
  exponentiate = TRUE
) |> 
  add_global_p() |>
  add_glance_table(
    include = c(nobs,
                logLik,
                AIC,
                BIC)
  )

Characteristic	OR¹	95% CI¹	p-value
Age	1.07	1.06, 1.08	<0.001
Smoking			0.9
No	—	—
Yes	0.99	0.74, 1.32
BMI	1.11	1.09, 1.13	<0.001
No. Obs.	2,922
Log-likelihood	-1,465
AIC	2,939
BIC	2,963
¹ OR = Odds Ratio, CI = Confidence Interval

:::

Display odds ratio estimates and confidence intervals
Add global p-values
Add various model statistics

Supported models in tbl_regression()

biglm::bigglm()
biglmm::bigglm()
brms::brm()
cmprsk::crr()
fixest::feglm()
fixest::femlm()
fixest::feNmlm()
fixest::feols()
gam::gam()
geepack::geeglm()
glmmTMB::glmmTMB()
lavaan::lavaan()
lfe::felm()

lme4::glmer.nb()
lme4::glmer()
lme4::lmer()
logitr::logitr()
MASS::glm.nb()
MASS::polr()
mgcv::gam()
mice::mira
multgee::nomLORgee()
multgee::ordLORgee()
nnet::multinom()
ordinal::clm()
ordinal::clmm()

parsnip::model_fit
plm::plm()
rstanarm::stan_glm()
stats::aov()
stats::glm()
stats::lm()
stats::nls()
survey::svycoxph()
survey::svyglm()
survey::svyolr()
survival::clogit()
survival::coxph()
survival::survreg()
tidycmprsk::crr()
VGAM::vglm()

Custom tidiers can be written and passed to tbl_regression() using the tidy_fun= argument.

Univariate models with tbl_uvregression()

tbl_uvreg <- 
  df |> 
  select(-District, -WSTEP3, - SBP, -DBP) |> 
  tbl_uvregression(
    method = glm,
    y = Raised_BP,
    method.args = 
      list(family = binomial),
    exponentiate = TRUE
  )
tbl_uvreg

Characteristic	N	OR¹	95% CI¹	p-value
Gender	3,000
Men		—	—
Women		0.93	0.79, 1.11	0.4
Age	3,000	1.07	1.06, 1.08	<0.001
Age_Group	3,000
18-30		—	—
30-44		3.05	2.25, 4.20	<0.001
44-59		7.10	5.25, 9.77	<0.001
59-69		15.6	11.2, 22.0	<0.001
Education	3,000
Higher Secondary +		—	—
Illiterate		1.29	1.06, 1.58	0.010
Primary		0.89	0.69, 1.15	0.4
Secondary		0.75	0.58, 0.97	0.031
BMI	2,922	1.12	1.09, 1.14	<0.001
LDL	3,000	1.01	1.00, 1.01	<0.001
Smoking	3,000
No		—	—
Yes		1.27	0.97, 1.65	0.080
Physical_Inactivity	3,000
No		—	—
Yes		1.05	0.86, 1.28	0.6
¹ OR = Odds Ratio, CI = Confidence Interval

Specify model method, method.args, and the response variable
Arguments and helper functions like exponentiate, bold_*(), add_global_p() can also be used with tbl_uvregression()

tbl_merge()/tbl_stack()

tbl_merge() for side-by-side tables

A univariable table:

tbl_uvreg <- 
  df |> 
  select(-District, -WSTEP3, - SBP, -DBP) |> 
  tbl_uvregression(
    method = glm,
    y = Raised_BP,
    method.args = 
      list(family = binomial),
    exponentiate = TRUE
  )
tbl_uvreg

Characteristic	N	OR¹	95% CI¹	p-value
Gender	3,000
Men		—	—
Women		0.93	0.79, 1.11	0.4
Age	3,000	1.07	1.06, 1.08	<0.001
Age_Group	3,000
18-30		—	—
30-44		3.05	2.25, 4.20	<0.001
44-59		7.10	5.25, 9.77	<0.001
59-69		15.6	11.2, 22.0	<0.001
Education	3,000
Higher Secondary +		—	—
Illiterate		1.29	1.06, 1.58	0.010
Primary		0.89	0.69, 1.15	0.4
Secondary		0.75	0.58, 0.97	0.031
BMI	2,922	1.12	1.09, 1.14	<0.001
LDL	3,000	1.01	1.00, 1.01	<0.001
Smoking	3,000
No		—	—
Yes		1.27	0.97, 1.65	0.080
Physical_Inactivity	3,000
No		—	—
Yes		1.05	0.86, 1.28	0.6
¹ OR = Odds Ratio, CI = Confidence Interval

A multivariable table:

tbl_mvreg <- 
  glm(
    Raised_BP ~ Age_Group + Smoking + BMI + LDL,
    data = df,
    family = binomial(link = "logit")
  ) |> 
  tbl_regression(
    exponentiate = TRUE
  ) |> 
  add_global_p() 
tbl_mvreg

Characteristic	OR¹	95% CI¹	p-value
Age_Group			<0.001
18-30	—	—
30-44	2.47	1.79, 3.44
44-59	5.82	4.23, 8.12
59-69	14.6	10.3, 21.0
Smoking			0.8
No	—	—
Yes	1.03	0.77, 1.38
BMI	1.11	1.09, 1.13	<0.001
LDL	1.00	1.00, 1.00	0.7
¹ OR = Odds Ratio, CI = Confidence Interval

tbl_merge() for side-by-side tables

tbl_merge(
  list(tbl_uvreg, tbl_mvreg),
  tab_spanner = c("**Univariable**", "**Multivariable**")
)

Characteristic	Univariable				Multivariable
Characteristic	N	OR¹	95% CI¹	p-value	OR¹	95% CI¹	p-value
Gender	3,000
Men		—	—
Women		0.93	0.79, 1.11	0.4
Age	3,000	1.07	1.06, 1.08	<0.001
Age_Group	3,000						<0.001
18-30		—	—		—	—
30-44		3.05	2.25, 4.20	<0.001	2.47	1.79, 3.44
44-59		7.10	5.25, 9.77	<0.001	5.82	4.23, 8.12
59-69		15.6	11.2, 22.0	<0.001	14.6	10.3, 21.0
Education	3,000
Higher Secondary +		—	—
Illiterate		1.29	1.06, 1.58	0.010
Primary		0.89	0.69, 1.15	0.4
Secondary		0.75	0.58, 0.97	0.031
BMI	2,922	1.12	1.09, 1.14	<0.001	1.11	1.09, 1.13	<0.001
LDL	3,000	1.01	1.00, 1.01	<0.001	1.00	1.00, 1.00	0.7
Smoking	3,000						0.8
No		—	—		—	—
Yes		1.27	0.97, 1.65	0.080	1.03	0.77, 1.38
Physical_Inactivity	3,000
No		—	—
Yes		1.05	0.86, 1.28	0.6
¹ OR = Odds Ratio, CI = Confidence Interval

tbl_strata() for stratified tables

df |>
  mutate(Raised_BP = if_else(Raised_BP == 1, "Raised", "Normal")) |>
  tbl_strata(
    strata = District,
    ~tbl_summary(.x, by = Raised_BP,
                 percent = "row",
                 missing = "no",
                 include = c(Age_Group, Gender),
                 digits = list(all_categorical() ~ 1),
                 statistic = list(all_categorical() ~ "{p}")) |>
      modify_header(all_stat_cols() ~ "**{level}**")
  )

Characteristic	Balaghat		Bhopal		Damoh		Dhar		Dindori		Hoshangabad		Ratlam		Satna		Sheopur		Shivpuri
Characteristic	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹	Normal¹	Raised¹
Age_Group
18-30	89.5	10.5	91.8	8.2	93.3	6.7	87.5	12.5	97.0	3.0	93.8	6.2	93.8	6.2	90.7	9.3	93.9	6.1	96.3	3.7
30-44	74.2	25.8	77.5	22.5	89.8	10.2	68.0	32.0	78.7	21.3	90.2	9.8	76.8	23.2	84.8	15.2	81.8	18.2	76.5	23.5
44-59	60.9	39.1	58.0	42.0	60.0	40.0	59.5	40.5	63.4	36.6	64.3	35.7	69.2	30.8	68.9	31.1	35.0	65.0	61.0	39.0
59-69	37.7	62.3	26.8	73.2	44.0	56.0	34.2	65.8	40.0	60.0	46.2	53.8	63.4	36.6	60.3	39.7	44.4	55.6	33.3	66.7
Gender
Men	67.2	32.8	69.9	30.1	87.1	12.9	63.2	36.8	66.7	33.3	78.7	21.3	73.0	27.0	79.0	21.0	71.1	28.9	62.5	37.5
Women	65.5	34.5	69.0	31.0	69.6	30.4	71.3	28.7	76.3	23.7	76.6	23.4	81.3	18.7	78.6	21.4	71.4	28.6	74.8	25.2
¹ %

In Closing

{gtsummary} website

http://www.danieldsjoberg.com/gtsummary/

Package Authors/Contributors

Daniel D. Sjoberg

Michael Curry

Joseph Larmarange

Jessica Lavery

Karissa Whiting

Emily C. Zabor

Xing Bai

Esther Drill

Jessica Flynn

Margie Hannum

Stephanie Lobaugh

Shannon Pileggi

Amy Tin

Gustavo Zapata Wainberg

Other Contributors

@ablack3, @ABorakati, @aghaynes, @ahinton-mmc, @aito123, @akarsteve, @akefley, @albertostefanelli, @alexis-catherine, @amygimma, @anaavu, @andrader, @angelgar, @arbet003, @arnmayer, @aspina7, @asshah4, @awcm0n, @barthelmes, @bcjaeger, @BeauMeche, @benediktclaus, @berg-michael, @bhattmaulik, @BioYork, @brachem-christian, @bwiernik, @bx259, @calebasaraba, @CarolineXGao, @ChongTienGoh, @Chris-M-P, @chrisleitzinger, @cjprobst, @clmawhorter, @CodieMonster, @coeus-analytics, @coreysparks, @ctlamb, @davidgohel, @davidkane9, @dax44, @dchiu911, @ddsjoberg, @DeFilippis, @denis-or, @dereksonderegger, @dieuv0, @discoleo, @djbirke, @dmenne, @ElfatihHasabo, @emilyvertosick, @ercbk, @erikvona, @eweisbrod, @feizhadj, @fh-jsnider, @ge-generation, @ghost, @gjones1219, @gorkang, @GuiMarthe, @hass91, @HichemLa, @hughjonesd, @iaingallagher, @ilyamusabirov, @IndrajeetPatil, @IsadoraBM, @j-tamad, @jalavery, @jeanmanguy, @jemus42, @jenifav, @jennybc, @JeremyPasco, @JesseRop, @jflynn264, @jjallaire, @jmbarajas, @jmbarbone, @JoanneF1229, @joelgautschi, @jojosgithub, @JonGretar, @jordan49er, @jthomasmock, @juseer, @jwilliman, @karissawhiting, @kendonB, @kmdono02, @kwakuduahc1, @lamhine, @larmarange, @leejasme, @loukesio, @lspeetluk, @ltin1214, @lucavd, @LuiNov, @maia-sh, @Marsus1972, @matthieu-faron, @mbac, @mdidish, @MelissaAssel, @michaelcurry1123, @mljaniczek, @moleps, @motocci, @msberends, @mvuorre, @myensr, @MyKo101, @oranwutang, @palantre, @Pascal-Schmidt, @pedersebastian, @perlatex, @philsf, @polc1410, @postgres-newbie, @proshano, @raphidoc, @RaviBot, @rich-iannone, @RiversPharmD, @rmgpanw, @roman2023, @ryzhu75, @sachijay, @saifelayan, @sammo3182, @sandhyapc, @sbalci, @sda030, @shannonpileggi, @shengchaohou, @ShixiangWang, @simonpcouch, @slb2240, @slobaugh, @spiralparagon, @StaffanBetner, @Stephonomon, @storopoli, @szimmer, @tamytsujimoto, @TarJae, @themichjam, @THIB20, @tibirkrajc, @tjmeyers, @tldrcharlene, @tormodb, @toshifumikuroda, @UAB-BST-680, @uakimix, @uriahf, @Valja64, @vvm02, @xkcococo, @yonicd, @yoursdearboy, @zabore, @zachariae, @zaddyzad, @zeyunlu, @zhengnow, @zlkrvsm, @zongell-star, and @Zoulf001.

Thank you

Ask on stackoverflow.com

Use the gtsummary tag

Hundreds of Qs already answered!

danieldsjoberg.com

@statistishdan

linkedin.com/in/ddsjoberg/

github.com/ddsjoberg

Public Health Data Tabulation with {gtsummary}

Introduction

Acknowledgements

Creators of original presentation

Daniel D. Sjoberg

Checklist

{gtsummary} overview

Example Dataset

Example Dataset

tbl_summary()

Basic tbl_summary()

Customize tbl_summary() output

Customize tbl_summary() output

Customize tbl_summary() output

Customize tbl_summary() output

Customize tbl_summary() output

{gtsummary} + formulas

Add-on functions in {gtsummary}

Update tbl_summary() with add_*()

Update tbl_summary() with add_*()

Update tbl_summary() with add_*()

Update tbl_summary() with add_*()

Update with bold_*()/italicize_*()

Update tbl_summary() with modify_*()

Column names

Update tbl_summary() with add_*()

Update tbl_summary() with add_*()

Add-on functions in {gtsummary}

Cross-tabulation with tbl_cross()

Continuous Summaries with tbl_continuous()

Survey data with tbl_svysummary()

Survey data with tbl_svysummary()

tbl_regression()

Traditional model summary()

Basic tbl_regression()

Customize tbl_regression() output

Supported models in tbl_regression()

Univariate models with tbl_uvregression()

tbl_merge()/tbl_stack()

tbl_merge() for side-by-side tables

tbl_merge() for side-by-side tables

tbl_strata() for stratified tables

In Closing

{gtsummary} website

Package Authors/Contributors

Other Contributors

Thank you

Update with bold_()/italicize_()