Implements the sample selection and assess external validity. This software is free for private, non-commercial use only. Please contact the authors for commercial usage.
This code describes respondent participation and creates three sets of variables describing the respondent characteristics. The first examines survey participation and confirms the assignment of the 18 quotas, 4 blocks, and coma quiz. The second creates 11 categorical variables to assess dropout biases and to comparison the analytical sample with the US 2021 ACS. These results are shown in three Appendix tables. The third creates five categorical variables to produce Table 1 for the primary manuscript describing the analytical sample. This code does not confirm the experimental design or examine response behaviors or preference evidence.
# set directory
setwd("C:/Users/maksat/Box Sync/USF PC/EuroQol/Kaisen task/paper1/Wave 1/markdown")
# clear memory
rm(list = ls())
# load libraries
library(gtsummary) # for command tbl_summary()
library(reshape2) # for command mint and dcast
library(writexl) # for command write_xlsx
library(dplyr) # for command bind_rows
library(stringr) # for command substr and str_sub
library(expss) # for command cross_cases
library(readxl) # for command read_excel
Below, we present data notations and definitions of new variables created.
Subject, i = 1 to N; task, t = 1 to T;
T1 is the number of coma and paired comparison tasks including warmup.
T2 is the number of kaizen tasks including warmup.
‘freqpc’ is the frequency of paired comparison responses to inform dropout
‘freqkz’ is the frequency of kaizen task responses to inform dropout
‘dropout_cat’ is an indicator variable for dropouts by
component:
0 respondent completed coma comparison, paired comparison and kaizen
tasks
1 respondents who dropped out before completing coma quiz
2 respondents who dropped out after completing coma quiz, but before CC
3 respondents who dropped out during coma component
4 respondents who dropped out during paired comparison
5 respondents who dropped out during kaizen tasks
6 respondents who dropped out during follow-up
‘dropout’ is a binary 0/1 variable for respondent who dropped
out
1 the respondent dropped out
0 the respondent completed the survey
Three demographic variable to identify respondent quota
‘female’ is an indicator variable for female respondents (1: Female, 0:
Other)
‘raceth’ (1: Hispanic, 2: non-Hispanics Black, 3: Other)
‘age_range’ (1: 18 to 34, 2: 35 to 54, 3: 55 +)
‘actual_quota’ shows the actual quota to be assigned to the
respondent
‘assigned_quota’ shows the quota that was assigned to the
respondent
‘quota_confirmed’ is an indicator variable showing whether the
respondent
has the same ‘assigned_quota’ and ‘actual_quota’
‘actual_subjectindex’ shows the actual block to be assigned to the
respondent
‘subjectindex_confirmed’ is an indicator variable showing whether
the
respondent has the same ‘subjectindex’ and ‘actual_subjectindex’
‘attempts’ calculates how many coma quiz attempts each respondent took
‘race_cat’ will be categorized as following:
0 the respondent did not choose any race
1 the respondent chose White alone
2 the respondent chose Black or African American alone
3 the respondent chose American Indian or Alaska Native alone
4 the respondent chose Asian alone
5 the respondent chose Native Hawaiian or Other Pacific Islander
alone
6 the respondent chose Some other race alone
7 the respondent chose Two or more races
‘hispanic’ is an indicator variable for Hispanic respondents
1 the respondent is Hispanic
0 the respondent is non-Hispanic
‘region’ is an indicator variable for the US regions
1 the state is in the Northeast region of the US
2 the state is in the Midwest region of the US
3 the state is in the South region of the US
4 the state is in the West region of the US
‘work_status’ is a categorical variable indicating work
categories
1 working for pay at a job or business
2 with a job or business, but not at work
3 looking for work
4 working, but not for pay, at a family-owned job or business
5 not working at a job or business and not looking for work
6 retired
7 don’t know/not sure
8 refuse to answer
‘total_children’ shows the number of children the respondent has
‘community’ is a categorical variable for the community the
respondent lives in 1 urban
2 suburban
3 rural
4 don’t know/not sure/refuse to answer
‘marital_status’ is a categorical variable for mariatal status
1 married
2 divorced
3 separated
4 never married
5 living with a partner
6 don’t know/not sure
7 refuse to answer
‘educational’ is a categorical variable for educational
attainment
1 18 to 24 years of age
2 less than 9th grade and over 25 years of age
3 9th to 12th grade, no diploma and over 25 years of age
4 High school graduate and over 25 years of age
5 Some college, no degree and over 25 years of age
6 Associate’s degree and over 25 years of age
7 Bachelor’s degree and over 25 years of age
8 Graduate or professional degree and over 25 years of age
‘income’ is a categorical variable for respondent’s income
1 Less than 10,000 USD
2 10,000 USD to 49,999 USD
3 50,000 USD to 74,999 USD
4 75,000 USD to 99,999 USD
5 100,000 USD to 149,999 USD
6 150,000 USD or more
‘political_aff’ is a categorical varaiable for Political
affiliation
1 a Democrat (strong)
2 a Democrat (moderate)
3 a Republican (strong)
4 a Republican (moderate)
5 an Independent (lean Democrat)
6 an Independent (lean Republican)
7 an Independent (don’t lean)
8 don’t know/not sure/refuse to answer
‘political_pers’ is a categorical variable for Political
perspective
1 a liberal (very)
2 a liberal (somewhat)
3 a conservative (very)
4 a conservative (somewhat)
5 a moderate (lean liberal)
6 a moderate (lean conservative)
7 a moderate (don’t lean)
8 don’t know/not sure/refuse to answer
Note: ‘hispanic’ is already defined above
age_range2 is an indicator variable
1: 18 to 24
2: 25 to 34
3: 35 to 54
4: 55 to 64
5: 65+
‘race_cat2’ is an indicator variable
1 White alone
2 Black or African American alone
3 Asian alone
4 Other
‘male’ is an indicator variable
0: female/other
1: male
‘educ2’ is an indicator variable
1: high school or less
2: associate/some college
3: bachelors or more
# specify the setting
N = 1681 ## number of survey records in final dataset
Z1 = 144 ## the number of respondent-specific variables in final dataset
Z2 = 21 ## the number of task-specific variables in final pc and kz datasets
Z3 = 3 ## the number of temporal variables (e.g., page times) in final dataset
T1 = 17 ## number of paired comparisons per respondent
T2 = 11 ## number of kaizen tasks per respondent
spec = c(N, Z1, Z2, Z3, T1, T2) ## summary of setting specification
# function to map EQ-5D attributes and levels
map_levels <- function(var, levels_map) {
levels <- word(var, 3)
ifelse(levels %in% names(levels_map), levels_map[levels], "")
}
# load the respondent data
resp_i = read.csv("resp_i_231010.csv")
# load the paired comparison data
pc_it = read.csv("pc_it_231010.csv")
# load the kaizen task data
kz_it = read.csv("kz_it_231010.csv")
# load the ACS tables (Appendix 1)
table1_acs = data.frame(read_excel("ACS230923_2.xlsx", sheet = "Table 1 and 3",
skip = 2, col_names = F)[1:24,6:7])
table2_acs = data.frame(t(read_excel("ACS230923_2.xlsx", sheet = "Table 2",
col_names = F)[7:8,1:51]))
table3_acs = data.frame(read_excel("ACS230923_2.xlsx", sheet = "Table 1 and 3",
skip = 2, col_names = F)[1:24,15:16])
# Blocks (from design matrix, Appendix 2)
block_b <- read_excel("230824_ExpDes_App2.xlsx",
sheet = "design matrix for instrument", skip=4)
In this section we recode the data to create the variables defined above
# create 'freqpc' and 'freqkz'
resp_i = merge(resp_i, as.data.frame(table(pc_it$id, dnn = list("id")),
responseName = "freqpc"), by="id", all.x=T)
resp_i$freqpc[is.na(resp_i$freqpc)] <- 0
resp_i = merge(resp_i, as.data.frame(table(kz_it$id, dnn = list("id")),
responseName = "freqkz"), by="id", all.x=T)
resp_i$freqkz[is.na(resp_i$freqkz)] <- 0
# page number 52 is the coma warm up, and 54 is the actual coma comparison, \\
# 60 is either pc or kz warm up, 62 is pc or kz (main)
# create 'dropout_cat'
resp_i$dropout_cat = ifelse(resp_i$lastpage == 101, 0, # completed
ifelse(resp_i$exclusion == "N" & resp_i$lastpage <= 22 &
resp_i$Exclusion2 != "Y", 1, # dropped out before completing coma quiz
ifelse(resp_i$lastpage > 22 &
resp_i$lastpage <52, 2, # dropped out after completing coma quiz, but before CC
ifelse(resp_i$lastpage > 22 & resp_i$freqpc < 6, 3, # dropped out during CC
ifelse(resp_i$freqpc >= 6 & resp_i$freqpc != 17, 4, # dropped out during PC
ifelse(resp_i$freqpc >= 6 & resp_i$freqkz != 11, 5, # dropped out during KZ
ifelse(resp_i$freqpc == 17 & resp_i$freqkz == 11 & # dropped out during follow-up
resp_i$lastpage != 101, 6, ""))))))) # did not drop out
# create 'dropout'
resp_i$dropout = ifelse(resp_i$dropout_cat == 0, 0,
ifelse(resp_i$dropout_cat > 0, 1, "")) # create binary variable from 'dropout_cat'
# create 'assigned_quota'
resp_i$assigned_quota = apply(subset(resp_i, select = c(quota01:quota18)), 1,
function(x) ifelse(x[1]!="",as.integer(which(x == "yes")),""))
# create 'female'
resp_i$female = as.numeric(ifelse(resp_i$G02Q03 == "female", 1,
ifelse(resp_i$G02Q03 == "", "",0)))
# create 'raceth'
resp_i$raceth = as.numeric(ifelse((resp_i$G02Q05.AO01. == "N/A") |
(resp_i$G02Q04==""), "", ifelse(substr(resp_i$G02Q04,1,1) == "y", 1,
ifelse(resp_i$G02Q05.AO01. == "Yes", 2, 3))))
# create 'age_range'
resp_i$age_range = as.numeric(cut(resp_i$G02Q02, breaks = c(18, 34, 54,
max(resp_i$G02Q02, na.rm = T)), labels = c(1, 2, 3), include.lowest = T))
# create all combinations of quotas and order them by first:female, \\
# second:raceth, third:age_range
quota_comb = expand.grid(female = c(0,1), raceth = c(1, 2, 3), age_range =
c(1, 2, 3))[with(expand.grid(female = c(0,1), raceth = c(1, 2, 3),
age_range = c(1, 2, 3)), order(-female, raceth, age_range)),]
# create 'actual_quota' from these three variables: 'female', 'age_range', \\
# and 'raceth'
quota_comb$actual_quota = as.numeric(1:nrow(quota_comb)) # label quota numbers
resp_i = merge(resp_i, quota_comb, by = c("female", "raceth", "age_range"),
all.x = T)
# combine pc and kz data, merge with block_b, and create actual_subjectindex
block_b = block_b %>% rename("actual_subjectindex" = "b")
block_b$vec <- paste(do.call(paste, c(block_b[, seq(2,56,by=2)], sep = " ")),
do.call(paste, c(block_b[, seq(3,57,by=2)], sep = " ")))
block_b <- block_b[,c("actual_subjectindex", "vec")]
block_i = rbind(pc_it[,c("id", "task", "alt1", "alt2")], kz_it[,c("id", "task", "alt1",
"alt2")])[with(rbind(pc_it[,c("id", "task", "alt1", "alt2")],
kz_it[,c("id", "task", "alt1", "alt2")]), order(id)),]
block_i <- dcast(block_i, id~task, value.var=c("alt1","alt2"))
block_i$vec <- do.call(paste, c(block_i[,2:57], sep = " "))
block_i = merge(block_i, block_b, by = "vec", all.x = T)
block_i <- block_i[,c("id", "actual_subjectindex")]
resp_i = merge(resp_i, block_i, by = "id", all.x = T)
# create a data consisting of quiz variables
quiz = subset(resp_i, select = c(id, G04Q01.SQ001.:G04Q03.SQ005.))
# convert true/false into 0/1
quiz[,2:ncol(quiz)] = apply(quiz[,2:ncol(quiz)], 2, function(x) as.integer(as.logical(x)))
# rename columns to make it easier to understand
names(quiz) = c("id", paste(rep(paste0("quiz",1:3),each = 5),"_q",1:5, sep = ""))
# transform quiz into 0/1 where 1 represents whether answer is correct
quiz[,2:16] = t(apply(quiz, 1, function(x) x[2:16] == rep(c(0,1,0,1,0),3)))*1
# create 'attemps'
quiz$attempts = ifelse(is.na(quiz$quiz1_q1), 0,
ifelse(is.na(quiz$quiz2_q1), 1,
ifelse(is.na(quiz$quiz3_q1), 2, 3)))
# 3 age groups are already recoded above under 'age_range'
# gender (1: female; 2: male; 3: other/prefer not to say)
resp_i$gender = ifelse(resp_i$G02Q03 == "", "", ifelse(resp_i$G02Q03 ==
"female", 1, ifelse(resp_i$G02Q03 == "male", 2, 3)))
# create 'race_cat'
race = subset(resp_i, select = c(id, G02Q05.AO00.:G02Q05.AO14.))
names(race) = c("id", "White", "Black", "AmerIndianAlaska", "Chinese",
"Filipino", "AsianInd", "Vietnamese", "Korean", "Japanese",
"NatHawaii", "Samoan", "Chamorro", "OtherAsian", "OtherPacific",
"SomeOtherRace")
race[,2:ncol(race)] = as.matrix(race[,2:ncol(race)] == "Yes")*1
race$sum = apply(race[,2:ncol(race)], 1, function(x) sum(x))
race$race_cat = ifelse(race$sum == 0, 0, "")
race$race_cat = ifelse(race$White == 1 & race$sum == 1, 1, race$race_cat)
race$race_cat = ifelse(race$Black == 1 & race$sum == 1, 2, race$race_cat)
race$race_cat = ifelse(race$AmerIndianAlaska == 1 & race$sum == 1, 3, race$race_cat)
race$race_cat = ifelse((race$Chinese == 1 | race$Filipino == 1 |
race$AsianInd == 1 | race$Vietnamese == 1 | race$Korean == 1 |
race$Japanese == 1 | race$OtherAsian == 1) & race$sum == 1, 4, race$race_cat)
race$race_cat = ifelse((race$NatHawaii == 1 | race$Samoan == 1 |
race$Chamorro == 1 | race$OtherPacific == 1) & race$sum == 1, 5, race$race_cat)
race$race_cat = ifelse(race$SomeOtherRace == 1 & race$sum == 1, 6, race$race_cat)
race$race_cat = ifelse(race$sum >= 2, 7, race$race_cat)
# merge 'race_cat' with 'resp_i'
resp_i = merge(resp_i, subset(race, select = c(id, race_cat)),
by = "id", all.x = T)
# create 'hispanic'
resp_i$hispanic = ifelse(resp_i$raceth == "" | is.na(resp_i$raceth), "",
ifelse(resp_i$raceth == 1, 1, 0))
# create a list consisting of four US regions
states = subset(resp_i, select = c(id, G02Q01))
regions = list(
northeast = c("Connecticut", "Maine", "Massachusetts", "New Hampshire",
"Rhode Island", "Vermont", "New York", "Pennsylvania", "New Jersey"),
midwest = c("Indiana", "Illinois", "Michigan", "Ohio", "Wisconsin", "Iowa",
"Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota",
"South Dakota"),
south = c("Delaware", "Washington, DC", "Florida", "Georgia", "Maryland",
"North Carolina", "South Carolina", "Virginia", "West Virginia",
"Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas",
"Louisiana", "Oklahoma", "Texas"),
west = c("Arizona", "Colorado", "Idaho", "New Mexico", "Montana", "Utah",
"Nevada", "Wyoming", "Alaska", "California", "Hawaii", "Oregon",
"Washington"))
# create a 0/1 variable or each state indicating which region they belong
for(region in names(regions)) {
resp_i[region] = as.integer(as.logical(sapply(resp_i$G02Q01,
function(x) any(trimws(x) %in% regions[[region]]))))}
# convert 0/1 to a categorical variable
# (1: Northeast, 2: Midwest, 3: South, 4: West)
resp_i$region = apply(resp_i[,c("northeast", "midwest", "south", "west")], 1,
function(x) ifelse(x[1] == 1, 1, ifelse(x[2] == 1, 2, ifelse(x[3] == 1, 3,
ifelse(x[4] == 1, 4, "")))))
# rename columns 'G11Q03.SQ001_SQ001' through 'G11Q03.SQ006_SQ004' to make it \\
# easier to read. The child categories are 'Newborn', 'Toddler', 'Preschool',
# 'Elementary', 'Adolescent', and 'Adult'. \\
# The numbers after each child category are defined as following:\\
# 1 They do not live with me \\
# 2 They live with me part-time
# 3 They live with me full-time
# 4 They are no longer living
resp_i = resp_i %>% rename("Newborn1" = "G11Q03.SQ001_SQ001.",
"Newborn2" = "G11Q03.SQ001_SQ002.", "Newborn3" = "G11Q03.SQ001_SQ003.",
"Newborn4" = "G11Q03.SQ001_SQ004.", "Toddler1" = "G11Q03.SQ002_SQ001.",
"Toddler2" = "G11Q03.SQ002_SQ002.", "Toddler3" = "G11Q03.SQ002_SQ003.",
"Toddler4" = "G11Q03.SQ002_SQ004.", "Preschool1" = "G11Q03.SQ003_SQ001.",
"Preschool2" = "G11Q03.SQ003_SQ002.", "Preschool3" = "G11Q03.SQ003_SQ003.",
"Preschool4" = "G11Q03.SQ003_SQ004.", "Elementary1" = "G11Q03.SQ004_SQ001.",
"Elementary2" = "G11Q03.SQ004_SQ002.", "Elementary3" = "G11Q03.SQ004_SQ003.",
"Elementary4" = "G11Q03.SQ004_SQ004.", "Adolescent1" = "G11Q03.SQ005_SQ001.",
"Adolescent2" = "G11Q03.SQ005_SQ002.", "Adolescent3" = "G11Q03.SQ005_SQ003.",
"Adolescent4" = "G11Q03.SQ005_SQ004.", "Adult1" = "G11Q03.SQ006_SQ001.",
"Adult2" = "G11Q03.SQ006_SQ002.", "Adult3" = "G11Q03.SQ006_SQ003.",
"Adult4" = "G11Q03.SQ006_SQ004.")
# create 'total_children'
resp_i$total_children = apply(subset(resp_i, select = c(Newborn1:Adult4)), 1,
function(x) sum(x, na.rm = T))
# create 'work_status'
resp_i$work_status = ifelse(resp_i$G11Q04 == "", "", ifelse(resp_i$G11Q04 ==
"working for pay at a job or business", 1, ifelse(resp_i$G11Q04 ==
"with a job or business, but not at work", 2, ifelse(resp_i$G11Q04 ==
"looking for work", 3, ifelse(resp_i$G11Q04 ==
"working, but not for pay, at a family-owned job or business", 4,
ifelse(resp_i$G11Q04 == "not working at a job or business and not looking for work", 5,
ifelse(resp_i$G11Q04 == "retired", 6, ifelse(resp_i$G11Q04 ==
"don't know/not sure", 7, 8))))))))
# create 'community'
resp_i$community = ifelse(resp_i$G11Q05 == "", "", ifelse(resp_i$G11Q05 ==
"urban", 1, ifelse(resp_i$G11Q05 == "suburban", 2, ifelse(resp_i$G11Q05
== "rural", 3, 4))))
# create 'marital_status'
resp_i$marital_status = ifelse(resp_i$G11Q06 == "", "", ifelse(resp_i$G11Q06 ==
"now married" | resp_i$G11Q06 == "married", 1, ifelse(resp_i$G11Q06 == "widowed", 2,
ifelse(resp_i$G11Q06 == "divorced", 3, ifelse(resp_i$G11Q06 ==
"separated", 4, ifelse(resp_i$G11Q06 == "never married", 5, 6))))))
# create 'education'
resp_i$education = ifelse(resp_i$G11Q07 == "", "", ifelse(resp_i$G02Q02<24,
1 , ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == "less than 1st grade" |
resp_i$G11Q07 == "1st, 2nd, 3rd, or 4th grade" | resp_i$G11Q07 ==
"5th or 6th grade" | resp_i$G11Q07 == "7th and 8th grade"), 2,
ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == "9th grade" |
resp_i$G11Q07 == "10th grade" | resp_i$G11Q07 == "11th grade" |
resp_i$G11Q07 == "12th grade no diploma"), 3, ifelse((resp_i$G02Q02>=25) &
(resp_i$G11Q07 == "high school graduate - high school diploma or equivalent"), 4,
ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == "some college but no degree"), 5,
ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 ==
"associate degree in college - occupation/vocation program" |
resp_i$G11Q07 == "associate degree in college - academic program"), 6,
ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 ==
"bachelor's degree (for example: BA, AB, BS)"), 7,8))))))))
# create 'income'
resp_i$income = ifelse(resp_i$G11Q08 == "", "", ifelse(resp_i$G11Q08 ==
"under $10,000", 1, ifelse(resp_i$G11Q08 == "$10,000 to under $20,000" |
resp_i$G11Q08 == "$20,000 to under $30,000" | resp_i$G11Q08 ==
"$30,000 to under $40,000" | resp_i$G11Q08 == "$40,000 to under $50,000", 2,
ifelse(resp_i$G11Q08 == "$50,000 to under $75,000",3,
ifelse(resp_i$G11Q08 == "$75,000 to under $100,000", 4, ifelse(
resp_i$G11Q08 == "$100,000 to under $150,000", 5, 6))))))
# create 'political_aff'
resp_i$political_aff = ifelse(resp_i$G11Q09 == "", "", ifelse(
resp_i$G11Q09 == "a Democrat" & resp_i$G11Q10A == "a strong Democrat", 1,
ifelse(resp_i$G11Q09 == "a Democrat" & resp_i$G11Q10A == "a moderate Democrat",
2, ifelse(resp_i$G11Q09 == "a Republican" & resp_i$G11Q10C ==
"a strong Republican", 3, ifelse(resp_i$G11Q09 == "a Republican" &
resp_i$G11Q10C == "a moderate Republican", 4, ifelse(resp_i$G11Q09 ==
"an independent" & resp_i$G11Q10B =="lean Democrat", 5, ifelse(
resp_i$G11Q09 == "an independent" & resp_i$G11Q10B =="lean Republican", 6,
ifelse(resp_i$G11Q09 == "an independent" & resp_i$G11Q10B =="don't lean", 7,
8))))))))
# create 'political_pers'
resp_i$political_pers = ifelse(resp_i$G11Q11 == "", "", ifelse(
resp_i$G11Q11 == "a liberal" & resp_i$G11Q10A == "very liberal", 1,
ifelse(resp_i$G11Q11 == "a liberal" & resp_i$G11Q10A == "somewhat liberal",
2, ifelse(resp_i$G11Q11 == "a conservative" & resp_i$G11Q11C ==
"very conservative", 3, ifelse(resp_i$G11Q11 == "a conservative" &
resp_i$G11Q11C == "somewhat conservative", 4, ifelse(resp_i$G11Q11 ==
"a moderate" & resp_i$G11Q11B =="lean liberal", 5, ifelse(
resp_i$G11Q11 == "a moderate" & resp_i$G11Q11B =="lean conservative", 6,
ifelse(resp_i$G11Q11 == "a moderate" & resp_i$G11Q11B =="don't lean", 7,
8))))))))
# create 'age_range2'
resp_i$age_range2 = as.numeric(cut(resp_i$G02Q02, breaks = c(18, 24, 34, 54, 64,
max(resp_i$G02Q02, na.rm = T)), labels = c(1, 2, 3, 4, 5), include.lowest = T))
# create 'male'
resp_i$male = ifelse(resp_i$G02Q03 == "male", 1, ifelse(resp_i$G02Q03 ==
"female" | resp_i$G02Q03 == "other", 2 , NA))
# create 'race_cat2'
resp_i$race_cat2 = ifelse(resp_i$race_cat == "", "", ifelse(resp_i$race_cat ==
1, 1, ifelse(resp_i$race_cat == 2, 2, ifelse(
resp_i$race_cat == 4, 3, 4))))
# create 'educ2'
resp_i$educ2 = as.numeric(cut(as.numeric(resp_i$education),
breaks = c(1, 4, 6 , 8), labels = c(1, 2, 3), include.lowest = T))
# create 'quota_confirmed'
resp_i$quota_confirmed = ifelse(resp_i$assigned_quota ==
resp_i$actual_quota, 1, 0)
ifelse(length(table(resp_i$quota_confirmed[resp_i$incomplete == 0])) == 1,
"Quota confirmed", "Quota is not confirmed")
## [1] "Quota confirmed"
# create 'subjectindex_confirmed'
resp_i$subjectindex_confirmed = ifelse(resp_i$subjectindex ==
resp_i$actual_subjectindex, 1, 0)
ifelse(length(table(resp_i$quota_confirmed[resp_i$incomplete == 0])) == 1,
"Subjectindex confirmed", "Subjectindex is not confirmed")
## [1] "Subjectindex confirmed"
quiz[nrow(quiz) + 1,] = c("Total", colSums(quiz[,2:17], na.rm = T))
# coma quiz analysis of completion by question and attempt
quiz_table = quiz[-nrow(quiz),] %>% select(c(quiz1_q1:attempts)) %>% tbl_summary(missing = "no")
quiz_table
| Characteristic | N = 1,6811 |
|---|---|
| quiz1_q1 | |
| 0 | 762 (63%) |
| 1 | 455 (37%) |
| quiz1_q2 | |
| 0 | 136 (11%) |
| 1 | 1,081 (89%) |
| quiz1_q3 | |
| 0 | 251 (21%) |
| 1 | 966 (79%) |
| quiz1_q4 | |
| 0 | 55 (4.5%) |
| 1 | 1,162 (95%) |
| quiz1_q5 | |
| 0 | 378 (31%) |
| 1 | 839 (69%) |
| quiz2_q1 | |
| 0 | 393 (40%) |
| 1 | 580 (60%) |
| quiz2_q2 | |
| 0 | 144 (15%) |
| 1 | 829 (85%) |
| quiz2_q3 | |
| 0 | 244 (25%) |
| 1 | 729 (75%) |
| quiz2_q4 | |
| 0 | 58 (6.0%) |
| 1 | 915 (94%) |
| quiz2_q5 | |
| 0 | 315 (32%) |
| 1 | 658 (68%) |
| quiz3_q1 | |
| 0 | 298 (48%) |
| 1 | 317 (52%) |
| quiz3_q2 | |
| 0 | 113 (18%) |
| 1 | 502 (82%) |
| quiz3_q3 | |
| 0 | 191 (31%) |
| 1 | 424 (69%) |
| quiz3_q4 | |
| 0 | 57 (9.3%) |
| 1 | 558 (91%) |
| quiz3_q5 | |
| 0 | 246 (40%) |
| 1 | 369 (60%) |
| attempts | |
| 0 | 464 (28%) |
| 1 | 244 (15%) |
| 2 | 358 (21%) |
| 3 | 615 (37%) |
| 1 n (%) | |
# create variables on EQ-5D-5L and EQ-VAS
eq5d = subset(resp_i, select = c(id, EQ5D5LMO:EQ5D5LVAS))
# Create a data frame with the levels mapping
levels_map <- c(
"no" = 1,
"slight" = 2,
"moderate" = 3,
"severe" = 4,
"unable" = 5,
"extreme" = 5,
"not" = 1,
"slightly" = 2,
"moderately" = 3,
"severely" = 4,
"extremely" = 5
)
# Apply the mapping function to your columns
eq5d$EQ5D5LMO <- map_levels(eq5d$EQ5D5LMO, levels_map)
eq5d$EQ5D5LSC <- map_levels(eq5d$EQ5D5LSC, levels_map)
eq5d$EQ5D5LUA <- map_levels(eq5d$EQ5D5LUA, levels_map)
eq5d$EQ5D5LPD <- map_levels(eq5d$EQ5D5LPD, levels_map)
eq5d$EQ5D5LAD <- map_levels(eq5d$EQ5D5LAD, levels_map)
# create EQ-5D
eq_table = eq5d[!is.na(eq5d$EQ5D5LVAS),] %>% select(c(EQ5D5LMO:EQ5D5LVAS)) %>%
tbl_summary(missing = "no")
eq_table
| Characteristic | N = 1,2281 |
|---|---|
| EQ5D5LMO | |
| 1 | 972 (79%) |
| 2 | 159 (13%) |
| 3 | 67 (5.5%) |
| 4 | 21 (1.7%) |
| 5 | 9 (0.7%) |
| EQ5D5LSC | |
| 1 | 1,105 (90%) |
| 2 | 72 (5.9%) |
| 3 | 35 (2.9%) |
| 4 | 9 (0.7%) |
| 5 | 7 (0.6%) |
| EQ5D5LUA | |
| 1 | 931 (76%) |
| 2 | 180 (15%) |
| 3 | 81 (6.6%) |
| 4 | 26 (2.1%) |
| 5 | 10 (0.8%) |
| EQ5D5LPD | |
| 1 | 568 (46%) |
| 2 | 399 (32%) |
| 3 | 191 (16%) |
| 4 | 47 (3.8%) |
| 5 | 23 (1.9%) |
| EQ5D5LAD | |
| 1 | 637 (52%) |
| 2 | 268 (22%) |
| 3 | 196 (16%) |
| 4 | 81 (6.6%) |
| 5 | 46 (3.7%) |
| EQ5D5LVAS | 79 (65, 89) |
| 1 n (%); Median (IQR) | |
dropout_analysis = resp_i[resp_i$dropout != "",] %>% select(c(dropout_cat, pc_cnum2)) %>%
mutate(pc_cnum2 = case_when(pc_cnum2 == "2"~"Paired comparison first",
pc_cnum2 == "3"~"Kaizen task first"),
dropout_cat = case_when(dropout_cat == "0"~"Completed",
dropout_cat == "4"~"During paired comparison",
dropout_cat == "5"~"During kaizen task",
dropout_cat == "6"~"During follow-up"),
pc_cnum2 = factor(pc_cnum2, levels = c("Paired comparison first",
"Kaizen task first")),
dropout_cat = factor(dropout_cat, levels = c("Completed",
"During paired comparison",
"During kaizen task", "During follow-up"))) %>%
tbl_summary(by = "pc_cnum2", missing = "no", statistic = list(all_categorical()~"{p}% ({n})"),
label = list(dropout_cat~"Dropout category")) %>% add_overall() %>%
add_p(everything() ~ "chisq.test")
options(knitr.kable.NA = '')
dropout_analysis %>% knitr:: kable()
| Characteristic | Overall, N = 651 | Paired comparison first, N = 319 | Kaizen task first, N = 332 | p-value |
|---|---|---|---|---|
| Dropout category | 0.093 | |||
| Completed | 97% (631) | 97% (310) | 97% (321) | |
| During paired comparison | 1.7% (11) | 0.9% (3) | 2.4% (8) | |
| During kaizen task | 0.6% (4) | 1.3% (4) | 0% (0) | |
| During follow-up | 0.8% (5) | 0.6% (2) | 0.9% (3) |
We now create and present tables for appendices
# create table1_appx and compare complete vs dropped out by demographic characteristics
table1_appx = resp_i %>% select(c(dropout, age_range, gender,
race_cat,hispanic, region)) %>% mutate(dropout = case_when(
dropout == "1" ~ "Dropped out", dropout == "0" ~ "Completed"), age_range =
case_when(age_range == 1 ~ "18 to 34", age_range == 2 ~ "35 to 54",
age_range == 3 ~ "55+"), gender = case_when(gender == "1" ~ "Female",
gender == "2" ~ "Male", gender == "3" ~ "Other/prefer not to say"),
race_cat = case_when(race_cat == "1" ~ "White alone", race_cat == "2" ~
"Black or African American alone", race_cat == "3" ~
"American Indian or Alaska Native alone", race_cat == "4" ~ "Asian alone",
race_cat == "5" ~ "Native Hawaiian or Other Pacific Islander alone",
race_cat == "6" ~ "Some other race alone", race_cat == "7" ~
"Two or more races"), hispanic = case_when(hispanic == "0" ~ "Other",
hispanic == "1" ~"Hispanic or Latino"), region = case_when(region ==
"1"~"Northeast", region == "2"~"Midwest", region == "3"~"South",
region=="4"~"West"), region = factor(region, levels = c("Northeast",
"Midwest", "South","West")),race_cat = factor(race_cat,
levels = c("White alone", "Black or African American alone",
"American Indian or Alaska Native alone", "Asian alone",
"Native Hawaiian or Other Pacific Islander alone", "Some other race alone",
"Two or more races"))) %>%
tbl_summary(by = dropout, label = list(age_range~"Age range",
gender~"Gender", race_cat~"Race category",
hispanic~"Hispanic", region~"Region"), missing = "no",
statistic = list(all_categorical()~"{p} ({n})")) %>%
add_p(everything() ~ "chisq.test")
# create table1_2appx to extract only the complete, so that i can compare complete vs ACS
table1_2appx = resp_i[resp_i$dropout==0 & !is.na(resp_i$dropout),] %>% select(c(age_range, gender,
race_cat,hispanic, region)) %>% mutate(age_range =
case_when(age_range == 1 ~ "18 to 34", age_range == 2 ~ "35 to 54",
age_range == 3 ~ "55+"), gender = case_when(gender == "1" ~ "Female",
gender == "2" ~ "Male", gender == "3" ~ "Other/prefer not to say"),
race_cat = case_when(race_cat == "1" ~ "White alone", race_cat == "2" ~
"Black or African American alone", race_cat == "3" ~
"American Indian or Alaska Native alone", race_cat == "4" ~ "Asian alone",
race_cat == "5" ~ "Native Hawaiian or Other Pacific Islander alone",
race_cat == "6" ~ "Some other race alone", race_cat == "7" ~
"Two or more races"), hispanic = case_when(hispanic == "0" ~ "Other",
hispanic == "1" ~"Hispanic or Latino"), region = case_when(region ==
"1"~"Northeast", region == "2"~"Midwest", region == "3"~"South",
region=="4"~"West"), region = factor(region, levels = c("Northeast",
"Midwest", "South","West")),race_cat = factor(race_cat,
levels = c("White alone", "Black or African American alone",
"American Indian or Alaska Native alone", "Asian alone",
"Native Hawaiian or Other Pacific Islander alone", "Some other race alone",
"Two or more races"))) %>% tbl_summary(label = list(age_range~"Age range",
gender~"Gender", race_cat~"Race category",
hispanic~"Hispanic", region~"Region"), missing = "no",
statistic = list(all_categorical()~"{p}"))
# extract the column for 'completed' from table1_2appx and calculate the \\
# p-value for each of the demographic characteristics
table1_acs$Completed = as.numeric(table1_2appx$table_body$stat_0)
table1_acs[1,4] = chisq.test(table1_acs[2:4,2:3])$p.value
table1_acs[5,4] = chisq.test(table1_acs[6:7,2:3])$p.value
table1_acs[9,4] = chisq.test(table1_acs[10:16,2:3])$p.value
table1_acs[17,4] = chisq.test(table1_acs[18:19,2:3])$p.value
table1_acs[20,4] = chisq.test(table1_acs[21:24,2:3])$p.value
# combine ACS, Complete and Dropout columns into one table and rename columns
table1_acs$Completed = table1_appx$table_body$stat_1
table1_acs$Dropped_out = table1_appx$table_body$stat_2
table1_acs$p_value = as.numeric(table1_appx[["table_body"]][["p.value"]])
table1_acs = table1_acs[,c(1,3,5,6,2,4)]
names(table1_acs) = c("Quota", "Completed", "Dropped-out", "p-value", "ACS",
"p-value_acs")
# extract the number of respondents for each column from table1_appx and paste it \\
# into table1_acs
a = c("N",table1_appx[["df_by"]][["n"]][1], table1_appx[["df_by"]][["n"]][2],"","","")
table1_acs = rbind(a, table1_acs)
table1_acs$`p-value` = as.character(round(as.numeric(table1_acs$`p-value`), digits = 4))
table1_acs$ACS = as.character(round(as.numeric(table1_acs$ACS), digits = 2))
table1_acs$`p-value_acs` = as.character(round(as.numeric(table1_acs$`p-value_acs`), digits = 4))
table1_acs[is.na(table1_acs)] = ""
# print Table 1 of the appendix
table1_acs %>% knitr:: kable()
| Quota | Completed | Dropped-out | p-value | ACS | p-value_acs |
|---|---|---|---|---|---|
| N | 631 | 168 | |||
| Age in years | 4e-04 | 0.2558 | |||
| 18 to 34 years | 35 (224) | 22 (37) | 29.13 | ||
| 35 to 54 years | 37 (234) | 37 (62) | 32.66 | ||
| 55 years and over | 27 (173) | 41 (69) | 38.2 | ||
| Gender | 7e-04 | 0.9464 | |||
| Female | 49 (309) | 65 (109) | 50.98 | ||
| Male | 50 (314) | 33 (56) | 49.02 | ||
| Other/prefer not to say | 1.3 (8) | 1.8 (3) | |||
| Race | 0.0125 | 0.4773 | |||
| White alone | 75 (476) | 67 (113) | 63.66 | ||
| Black or African American alone | 12 (77) | 23 (39) | 11.8 | ||
| American Indian and Alask Native alone | 0.6 (4) | 0.6 (1) | 0.89 | ||
| Asian alone | 4.3 (27) | 1.2 (2) | 5.97 | ||
| Native Hawaiian and Other Pacific Islander alone | 0.2 (1) | 0 (0) | 0.14 | ||
| Some Other Race alone | 2.4 (15) | 3.0 (5) | 6.67 | ||
| Two or More Races | 4.9 (31) | 4.8 (8) | 10.87 | ||
| Ethnicity | 0.9662 | 0.3166 | |||
| Hispanic or Latino | 11 (72) | 12 (20) | 16.91 | ||
| Other | 89 (559) | 88 (148) | 83.09 | ||
| US regions | 0.2068 | 0.9507 | |||
| Northeast | 20 (126) | 18 (31) | 17.58 | ||
| Midwest | 22 (138) | 18 (30) | 20.67 | ||
| South | 38 (237) | 46 (78) | 38.12 | ||
| West | 21 (130) | 17 (29) | 23.64 |
# create table2_appx and compare complete vs dropped out by state
table2_appx = resp_i[resp_i$dropout != "",] %>% select(c(dropout, G02Q01)) %>%
tbl_summary(by = dropout, missing = "no",
statistic = list(all_categorical()~"{p} ({n})")) %>%
add_p(everything() ~ "chisq.test")
# create table2_2appx to extract only the complete, so that i can compare complete vs ACS
table2_2appx = resp_i[resp_i$dropout==0,] %>% select(c(G02Q01)) %>%
tbl_summary(statistic = list(all_categorical()~"{p}"))
# extract necessary variables from table2_appx and table2_2appx and \\
# paste them into table2_acs
table2_3appx = as.data.frame(table2_2appx[["table_body"]][["label"]])
table2_3appx$b = table2_2appx$table_body$stat_0
table2_3appx = table2_3appx[2:nrow(table2_3appx),]
names(table2_3appx) = c("States", "Completed")
rownames(table2_acs) = c(1:nrow(table2_acs))
names(table2_acs) = c("States", "ACS")
table2_acs$ACS = round(as.numeric(table2_acs$ACS), digits = 4)
table2_3appx$Completed = as.numeric(table2_3appx$Completed)
#table2_3appx$States = str_sub(table2_3appx$States,2,-2)
# merge table2_3appx with table2_acs
table2_acs = merge(table2_acs, table2_3appx, by = "States", all.x = T)
table2_acs[is.na(table2_acs)] = 0
table2_acs[1,4] = round(as.numeric(chisq.test(table2_acs[,2:3])$p.value), digits = 4)
# extract necessary completes, dropouts, and p-values from table2_appx, \\
# round the number, and get the table ready for presentation
table2_4appx = as.data.frame(table2_appx[["table_body"]][["label"]])
table2_4appx$Completed = table2_appx$table_body$stat_1
table2_4appx$dropped = table2_appx$table_body$stat_2
table2_4appx[2,4] = round(as.numeric(table2_appx[["meta_data"]][["p.value"]]), digits = 4)
names(table2_4appx) = c("States", "Completed", "Dropped_out", "p-value")
table2_4appx = table2_4appx[2:nrow(table2_4appx),]
table2_acs = merge(table2_acs, table2_4appx, by = "States", all.x = T)
table2_acs = table2_acs[,c(1,5:7,2,4)]
names(table2_acs) = c("States", "Completed", "Dropped_out", "p-value", "ACS", "p-vapue-acs")
a = c("N",table2_appx[["df_by"]][["n"]][1], table2_appx[["df_by"]][["n"]][2],"","","")
table2_acs = rbind(a, table2_acs)
table2_acs[is.na(table2_acs)] = ""
# print Table 2 of the appendix
table2_acs %>% knitr:: kable()
| States | Completed | Dropped_out | p-value | ACS | p-vapue-acs |
|---|---|---|---|---|---|
| N | 631 | 168 | |||
| Alabama | 1.7 (11) | 2.4 (4) | 0.4609 | 1.5171 | 1 |
| Alaska | 0.2142 | ||||
| Arizona | 3.3 (21) | 3.0 (5) | 2.1922 | ||
| Arkansas | 1.3 (8) | 0 (0) | 0.8958 | ||
| California | 8.2 (52) | 7.1 (12) | 11.7966 | ||
| Colorado | 2.5 (16) | 1.2 (2) | 1.7695 | ||
| Connecticut | 1.4 (9) | 1.8 (3) | 1.1139 | ||
| Delaware | 0.8 (5) | 0.6 (1) | 0.3066 | ||
| Florida | 7.1 (45) | 11 (18) | 6.773 | ||
| Georgia | 3.3 (21) | 4.2 (7) | 3.2032 | ||
| Hawaii | 0.3 (2) | 0 (0) | 0.4387 | ||
| Idaho | 0.5 (3) | 1.2 (2) | 0.5535 | ||
| Illinois | 3.5 (22) | 1.8 (3) | 3.8203 | ||
| Indiana | 2.1 (13) | 0.6 (1) | 2.0208 | ||
| Iowa | 0.6 (4) | 2.4 (4) | 0.9509 | ||
| Kansas | 1.0 (6) | 0 (0) | 0.8638 | ||
| Kentucky | 1.7 (11) | 4.2 (7) | 1.3515 | ||
| Louisiana | 0.5 (3) | 1.2 (2) | 1.3703 | ||
| Maine | 0.3 (2) | 1.2 (2) | 0.4345 | ||
| Maryland | 1.6 (10) | 1.8 (3) | 1.8593 | ||
| Massachusetts | 1.7 (11) | 2.4 (4) | 2.1764 | ||
| Michigan | 2.9 (18) | 4.2 (7) | 3.0575 | ||
| Minnesota | 1.4 (9) | 1.2 (2) | 1.7024 | ||
| Mississippi | 1.0 (6) | 0 (0) | 0.8741 | ||
| Missouri | 1.7 (11) | 3.6 (6) | 1.8504 | ||
| Montana | 0.3 (2) | 0.6 (1) | 0.3347 | ||
| Nebraska | 0.6 (4) | 0 (0) | 0.5734 | ||
| Nevada | 0.8 (5) | 1.2 (2) | 0.9472 | ||
| New Hampshire | 0.2 (1) | 0.6 (1) | 0.4381 | ||
| New Jersey | 2.5 (16) | 1.8 (3) | 2.8044 | ||
| New Mexico | 0.3 (2) | 0.6 (1) | 0.6361 | ||
| New York | 6.8 (43) | 6.0 (10) | 6.0884 | ||
| North Carolina | 3.0 (19) | 6.0 (10) | 3.193 | ||
| North Dakota | 0.2 (1) | 0 (0) | 0.225 | ||
| Ohio | 6.5 (41) | 1.8 (3) | 3.5513 | ||
| Oklahoma | 1.3 (8) | 0.6 (1) | 1.1727 | ||
| Oregon | 1.6 (10) | 1.2 (2) | 1.3116 | ||
| Pennsylvania | 6.0 (38) | 3.6 (6) | 3.9824 | ||
| Rhode Island | 0.5 (3) | 1.2 (2) | 0.3421 | ||
| South Carolina | 1.4 (9) | 1.2 (2) | 1.5769 | ||
| South Dakota | 0.2 (1) | 0 (0) | 0.261 | ||
| Tennessee | 1.6 (10) | 3.0 (5) | 2.1036 | ||
| Texas | 7.4 (47) | 7.1 (12) | 8.5456 | ||
| Utah | 0.3 (2) | 0 (0) | 0.9263 | ||
| Vermont | 0.5 (3) | 0 (0) | 0.201 | ||
| Virginia | 3.3 (21) | 3.0 (5) | 2.6154 | ||
| Washington | 2.2 (14) | 1.2 (2) | 2.3474 | ||
| Washington, DC | 0.2102 | ||||
| West Virginia | 0.5 (3) | 0.6 (1) | 0.5469 | ||
| Wisconsin | 1.3 (8) | 2.4 (4) | 1.789 | ||
| Wyoming | 0.2 (1) | 0 (0) | 0.1697 |
# create marital variables for table 3 of the appendix and paste it into table3_acs
marital_t3 = as.data.frame(prop.table(table(resp_i$marital_status[resp_i$dropout == 0]))*100)
marital_t3$Freq2 = paste("(",table(resp_i$marital_status[resp_i$dropout == 0]),")",sep = "")
a = as.data.frame(c(1:6))
names(a) = "Var1"
b = merge(a, marital_t3, by = "Var1", all.x = T)
table3_acs[2:7,3:4] = b[1:6,2:3]
# create education variables for table 3 of the appendix and paste it into table3_acs
educ_t3 = as.data.frame(prop.table(table(resp_i$education[resp_i$dropout == 0]))*100)
educ_t3$Freq2 = paste("(",table(resp_i$education[resp_i$dropout == 0]),")",sep = "")
a = as.data.frame(c(1:8))
names(a) = "Var1"
b = merge(a, educ_t3, by = "Var1", all.x = T)
table3_acs[9,3:4] = b[1,2:3]
table3_acs[10,3] = sum(b[2:nrow(b),2], na.rm = T)
table3_acs[10,4] = paste("(",sum(as.numeric(gsub(".*?([0-9]+).*", "\\1", b[2:nrow(b),3])),
na.rm = T),")",sep = "")
table3_acs[11:17,3:4] = b[2:nrow(b),2:3]
# create income variables for table 3 of the appendix and paste it into table3_acs
income_t3 = as.data.frame(prop.table(table(resp_i$income[resp_i$dropout == 0]))*100)
income_t3$Freq2 = paste("(",table(resp_i$income[resp_i$dropout == 0]),")",sep = "")
a = as.data.frame(c(1:6))
names(a) = "Var1"
b = merge(a, income_t3, by = "Var1", all.x = T)
table3_acs[19:24,3:4] = b[,2:3]
# perform chi square test for marital, education and income variable
table3_acs[is.na(table3_acs)] = 0
table3_acs[1,5] = ifelse(table3_acs[7,2] == 0 & table3_acs[7,3] == 0,
chisq.test(table3_acs[2:6,2:3])$p.value,
chisq.test(table3_acs[2:7,2:3])$p.value)
table3_acs[8,5] = chisq.test(table3_acs[9:17,2:3])$p.value
table3_acs[18,5] = chisq.test(table3_acs[19:24,2:3])$p.value
table3_acs[table3_acs == 0] = NA
# round integers and rename columns of table3_acs to make it more presentable
table3_acs[,2:3] = round(table3_acs[,2:3], digits = 2)
table3_acs[,5] = round(table3_acs[,5], digits = 4)
table3_acs[,5] = apply(data.frame(table3_acs[,5]),1,function(x) ifelse(is.na(x),x,ifelse(x==0,"<0.001",x)))
table3_acs[is.na(table3_acs)] = ""
table3_acs$Freq3 = paste(table3_acs$Freq,table3_acs$Freq2,sep = " ")
table3_acs = table3_acs[,c(1,6,2,5)]
names(table3_acs) = c("Variables","Completed", "ACS","p-value")
# print Table 3 of the appendix
table3_acs %>% knitr:: kable()
| Variables | Completed | ACS | p-value |
|---|---|---|---|
| Marital status | 0.6536 | ||
| Married | 43.9 (277) | 50.41 | |
| Widowed | 3.49 (22) | 5.74 | |
| Divorced | 9.98 (63) | 11.15 | |
| Separated | 1.74 (11) | 1.79 | |
| Never married | 40.89 (258) | 30.91 | |
| don’t know/not sure/refuse | |||
| Educational attainment | 0.4565 | ||
| 18 to 24 years | 12.68 (80) | 11.7 | |
| 25 years and over | 87.32 (551) | 88.3 | |
| Less than 9th grade | 4.2 | ||
| 9th to 12th grade, no diploma | 2.22 (14) | 5.19 | |
| High school graduate | 32.01 (202) | 23.22 | |
| Some college, no degree | 12.2 (77) | 17.05 | |
| Associate’s degree | 7.45 (47) | 7.73 | |
| Bachelor’s degree | 20.92 (132) | 18.76 | |
| Graduate or professional degree | 12.52 (79) | 12.16 | |
| Household Income in 2021 | 0.5147 | ||
| Less than $10,000 | 4.28 (27) | 6.03 | |
| $10,000 to $44,999 | 33.91 (214) | 30.44 | |
| $50,000 to $74,999 | 22.19 (140) | 16.81 | |
| $75,000 to $99,999 | 15.85 (100) | 12.76 | |
| $100,000 to $149,999 | 14.42 (91) | 16.26 | |
| $150,000 or more | 9.35 (59) | 17.7 |
# create table 1 of the primary paper
table1_primary = resp_i[resp_i$dropout==0 & !is.na(resp_i$dropout),] %>%
select(c(age_range2, male, race_cat2, hispanic, educ2)) %>%
mutate(age_range2 = case_when(age_range2 == 1~"18 to 24", age_range2 == 2 ~
"25 to 34", age_range2 == 3~"35 to 54", age_range2 == 4~"55 to 64",
age_range2 == 5~"65 and over"), male = case_when(male == 1~"Male",
male == 2~"Female/Other"), race_cat2 = case_when(race_cat2 == "1" ~ "White alone",
race_cat2 == "2" ~ "Black or African American alone", race_cat2 == "3" ~ "Asian alone",
race_cat2 == "4"~ "Other"), race_cat2 = factor(race_cat2, levels = c("White alone",
"Black or African American alone", "Asian alone", "Other")),
hispanic = case_when(hispanic == "0" ~ "Other", hispanic == "1" ~"Hispanic or Latino"),
educ2 = case_when(educ2 == "1"~"High school or less", educ2 == "2"~"Associate/some college",
educ2 == "3"~"Bachelors or more"), educ2 = factor(educ2, levels = c("High school or less",
"Associate/some college", "Bachelors or more"))) %>%
tbl_summary(label = list(age_range2~"Age range", male~"Gender",
race_cat2~"Race category", hispanic~"Hispanic", educ2~"Education"),
statistic = list(all_categorical()~"{p} ({n})"), missing = "no")
# print Table 1 of the primary paper
table1_primary
| Characteristic | N = 6311 |
|---|---|
| Age range | |
| 18 to 24 | 15 (92) |
| 25 to 34 | 21 (132) |
| 35 to 54 | 37 (234) |
| 55 to 64 | 6.5 (41) |
| 65 and over | 21 (132) |
| Gender | |
| Female/Other | 50 (315) |
| Male | 50 (314) |
| Race category | |
| White alone | 75 (476) |
| Black or African American alone | 12 (77) |
| Asian alone | 4.3 (27) |
| Other | 8.1 (51) |
| Hispanic | |
| Hispanic or Latino | 11 (72) |
| Other | 89 (559) |
| Education | |
| High school or less | 47 (296) |
| Associate/some college | 20 (124) |
| Bachelors or more | 33 (211) |
| 1 % (n) | |
# Appendix 1, Table 1
write.csv(table1_acs, "AppendixTable1.csv", row.names = F)
# Appendix 1, Table 2
write.csv(table2_acs, "AppendixTable2.csv", row.names = F)
# Appendix 1, Table 3
write.csv(table3_acs, "AppendixTable3.csv", row.names = F)
# Primary paper, Table 1
table1_primary %>% gtsummary::as_tibble() %>%
writexl::write_xlsx(., "PrimaryTable1.xlsx")