1 Purpose

Implements the sample selection and assess external validity. This software is free for private, non-commercial use only. Please contact the authors for commercial usage.

2 Summary

This code describes respondent participation and creates three sets of variables describing the respondent characteristics. The first examines survey participation and confirms the assignment of the 18 quotas, 4 blocks, and coma quiz. The second creates 11 categorical variables to assess dropout biases and to comparison the analytical sample with the US 2021 ACS. These results are shown in three Appendix tables. The third creates five categorical variables to produce Table 1 for the primary manuscript describing the analytical sample. This code does not confirm the experimental design or examine response behaviors or preference evidence.

3 Load the libraries

# set directory
setwd("C:/Users/maksat/Box Sync/USF PC/EuroQol/Kaisen task/paper1/Wave 1/markdown")
# clear memory
rm(list = ls())
# load libraries
library(gtsummary) # for command tbl_summary()
library(reshape2) # for command mint and dcast
library(writexl) # for command write_xlsx
library(dplyr) # for command bind_rows
library(stringr) # for command substr and str_sub
library(expss) # for command cross_cases
library(readxl) # for command read_excel

4 Data notation

Below, we present data notations and definitions of new variables created.

4.1 Subject and task index

Subject, i = 1 to N; task, t = 1 to T;

T1 is the number of coma and paired comparison tasks including warmup.

T2 is the number of kaizen tasks including warmup.

4.2 Survey participation and assignment variables

‘freqpc’ is the frequency of paired comparison responses to inform dropout

‘freqkz’ is the frequency of kaizen task responses to inform dropout

‘dropout_cat’ is an indicator variable for dropouts by component:
0 respondent completed coma comparison, paired comparison and kaizen tasks
1 respondents who dropped out before completing coma quiz
2 respondents who dropped out after completing coma quiz, but before CC 3 respondents who dropped out during coma component
4 respondents who dropped out during paired comparison
5 respondents who dropped out during kaizen tasks
6 respondents who dropped out during follow-up

‘dropout’ is a binary 0/1 variable for respondent who dropped out
1 the respondent dropped out
0 the respondent completed the survey

Three demographic variable to identify respondent quota
‘female’ is an indicator variable for female respondents (1: Female, 0: Other)
‘raceth’ (1: Hispanic, 2: non-Hispanics Black, 3: Other)
‘age_range’ (1: 18 to 34, 2: 35 to 54, 3: 55 +)

‘actual_quota’ shows the actual quota to be assigned to the respondent
‘assigned_quota’ shows the quota that was assigned to the respondent
‘quota_confirmed’ is an indicator variable showing whether the respondent
has the same ‘assigned_quota’ and ‘actual_quota’

‘actual_subjectindex’ shows the actual block to be assigned to the respondent
‘subjectindex_confirmed’ is an indicator variable showing whether the
respondent has the same ‘subjectindex’ and ‘actual_subjectindex’

‘attempts’ calculates how many coma quiz attempts each respondent took

4.3 Additional categorical Variables created for Tables 1-3 of the appendix

‘race_cat’ will be categorized as following:
0 the respondent did not choose any race
1 the respondent chose White alone
2 the respondent chose Black or African American alone
3 the respondent chose American Indian or Alaska Native alone
4 the respondent chose Asian alone
5 the respondent chose Native Hawaiian or Other Pacific Islander alone
6 the respondent chose Some other race alone
7 the respondent chose Two or more races

‘hispanic’ is an indicator variable for Hispanic respondents
1 the respondent is Hispanic
0 the respondent is non-Hispanic

‘region’ is an indicator variable for the US regions
1 the state is in the Northeast region of the US
2 the state is in the Midwest region of the US
3 the state is in the South region of the US
4 the state is in the West region of the US

‘work_status’ is a categorical variable indicating work categories
1 working for pay at a job or business
2 with a job or business, but not at work
3 looking for work
4 working, but not for pay, at a family-owned job or business
5 not working at a job or business and not looking for work
6 retired
7 don’t know/not sure
8 refuse to answer

‘total_children’ shows the number of children the respondent has

‘community’ is a categorical variable for the community the respondent lives in 1 urban
2 suburban
3 rural
4 don’t know/not sure/refuse to answer

‘marital_status’ is a categorical variable for mariatal status
1 married
2 divorced
3 separated
4 never married
5 living with a partner
6 don’t know/not sure
7 refuse to answer

‘educational’ is a categorical variable for educational attainment
1 18 to 24 years of age
2 less than 9th grade and over 25 years of age
3 9th to 12th grade, no diploma and over 25 years of age
4 High school graduate and over 25 years of age
5 Some college, no degree and over 25 years of age
6 Associate’s degree and over 25 years of age
7 Bachelor’s degree and over 25 years of age
8 Graduate or professional degree and over 25 years of age

‘income’ is a categorical variable for respondent’s income
1 Less than 10,000 USD
2 10,000 USD to 49,999 USD
3 50,000 USD to 74,999 USD
4 75,000 USD to 99,999 USD
5 100,000 USD to 149,999 USD
6 150,000 USD or more

‘political_aff’ is a categorical varaiable for Political affiliation
1 a Democrat (strong)
2 a Democrat (moderate)
3 a Republican (strong)
4 a Republican (moderate)
5 an Independent (lean Democrat)
6 an Independent (lean Republican)
7 an Independent (don’t lean)
8 don’t know/not sure/refuse to answer

‘political_pers’ is a categorical variable for Political perspective
1 a liberal (very)
2 a liberal (somewhat)
3 a conservative (very)
4 a conservative (somewhat)
5 a moderate (lean liberal)
6 a moderate (lean conservative)
7 a moderate (don’t lean)
8 don’t know/not sure/refuse to answer

4.4 Additional categorical variables for Table 1 of the primary paper

Note: ‘hispanic’ is already defined above

age_range2 is an indicator variable
1: 18 to 24
2: 25 to 34
3: 35 to 54
4: 55 to 64
5: 65+

‘race_cat2’ is an indicator variable
1 White alone
2 Black or African American alone
3 Asian alone
4 Other

‘male’ is an indicator variable
0: female/other
1: male

‘educ2’ is an indicator variable
1: high school or less
2: associate/some college
3: bachelors or more

5 Specify the settings

# specify the setting
N = 1681 ## number of survey records in final dataset
Z1 = 144 ## the number of respondent-specific variables in final dataset 
Z2 = 21  ## the number of task-specific variables in final pc and kz datasets
Z3 = 3  ## the number of temporal variables (e.g., page times) in final dataset   
T1 = 17 ## number of paired comparisons per respondent
T2 = 11 ## number of kaizen tasks per respondent
spec = c(N, Z1, Z2, Z3, T1, T2)  ## summary of setting specification

6 Specify the functions

# function to map EQ-5D attributes and levels
map_levels <- function(var, levels_map) {
  levels <- word(var, 3)
  ifelse(levels %in% names(levels_map), levels_map[levels], "")
}

7 Load the data

# load the respondent data
resp_i = read.csv("resp_i_231010.csv")

# load the paired comparison data
pc_it = read.csv("pc_it_231010.csv")

# load the kaizen task data
kz_it = read.csv("kz_it_231010.csv")

# load the ACS tables (Appendix 1) 
table1_acs = data.frame(read_excel("ACS230923_2.xlsx", sheet = "Table 1 and 3", 
                        skip = 2, col_names = F)[1:24,6:7])

table2_acs = data.frame(t(read_excel("ACS230923_2.xlsx", sheet = "Table 2",
                        col_names = F)[7:8,1:51]))

table3_acs = data.frame(read_excel("ACS230923_2.xlsx", sheet = "Table 1 and 3", 
                        skip = 2, col_names = F)[1:24,15:16])

# Blocks (from design matrix, Appendix 2)
block_b <- read_excel("230824_ExpDes_App2.xlsx",
                    sheet = "design matrix for instrument", skip=4)

8 Recode the data

In this section we recode the data to create the variables defined above

8.1 Survey participation

# create 'freqpc' and 'freqkz'
resp_i = merge(resp_i, as.data.frame(table(pc_it$id, dnn = list("id")),
                                     responseName = "freqpc"), by="id", all.x=T)
resp_i$freqpc[is.na(resp_i$freqpc)] <- 0
resp_i = merge(resp_i, as.data.frame(table(kz_it$id, dnn = list("id")),
                                     responseName = "freqkz"), by="id", all.x=T)
resp_i$freqkz[is.na(resp_i$freqkz)] <- 0

# page number 52 is the coma warm up, and 54 is the actual coma comparison, \\
# 60 is either pc or kz warm up, 62 is pc or kz (main)
# create 'dropout_cat'
resp_i$dropout_cat = ifelse(resp_i$lastpage == 101, 0, # completed
       ifelse(resp_i$exclusion == "N" & resp_i$lastpage <= 22 & 
       resp_i$Exclusion2 != "Y", 1, # dropped out before completing coma quiz
       ifelse(resp_i$lastpage > 22 & 
       resp_i$lastpage <52, 2, # dropped out after completing coma quiz, but before CC
       ifelse(resp_i$lastpage > 22 & resp_i$freqpc < 6, 3, # dropped out during CC
       ifelse(resp_i$freqpc >= 6 & resp_i$freqpc != 17, 4, # dropped out during PC
       ifelse(resp_i$freqpc >= 6 & resp_i$freqkz != 11, 5, # dropped out during KZ
       ifelse(resp_i$freqpc == 17 & resp_i$freqkz == 11 &  # dropped out during follow-up
       resp_i$lastpage != 101, 6, ""))))))) # did not drop out

# create 'dropout'
resp_i$dropout = ifelse(resp_i$dropout_cat == 0, 0, 
                ifelse(resp_i$dropout_cat > 0, 1, "")) # create binary variable from 'dropout_cat'

8.2 Quota membership

# create 'assigned_quota'
resp_i$assigned_quota = apply(subset(resp_i, select = c(quota01:quota18)), 1,
                  function(x) ifelse(x[1]!="",as.integer(which(x == "yes")),""))

# create 'female' 
resp_i$female = as.numeric(ifelse(resp_i$G02Q03 == "female", 1, 
                       ifelse(resp_i$G02Q03 == "", "",0)))

# create 'raceth' 
resp_i$raceth = as.numeric(ifelse((resp_i$G02Q05.AO01. == "N/A") | 
    (resp_i$G02Q04==""), "", ifelse(substr(resp_i$G02Q04,1,1) == "y", 1, 
                ifelse(resp_i$G02Q05.AO01. == "Yes", 2, 3))))

# create 'age_range' 
resp_i$age_range = as.numeric(cut(resp_i$G02Q02, breaks = c(18, 34, 54, 
      max(resp_i$G02Q02, na.rm = T)), labels = c(1, 2, 3), include.lowest = T))

# create all combinations of quotas and order them by first:female, \\
# second:raceth, third:age_range 
quota_comb = expand.grid(female = c(0,1), raceth = c(1, 2, 3), age_range = 
    c(1, 2, 3))[with(expand.grid(female = c(0,1), raceth = c(1, 2, 3), 
    age_range = c(1, 2, 3)), order(-female, raceth, age_range)),]
# create 'actual_quota' from these three variables: 'female', 'age_range', \\
# and 'raceth'
quota_comb$actual_quota = as.numeric(1:nrow(quota_comb)) # label quota numbers
resp_i = merge(resp_i, quota_comb, by = c("female", "raceth", "age_range"), 
               all.x = T)

8.3 Block assignment

# combine pc and kz data, merge with block_b, and create actual_subjectindex
block_b = block_b %>% rename("actual_subjectindex" = "b")
block_b$vec <- paste(do.call(paste, c(block_b[, seq(2,56,by=2)], sep = " ")),
                     do.call(paste, c(block_b[, seq(3,57,by=2)], sep = " ")))
block_b <- block_b[,c("actual_subjectindex", "vec")]

block_i = rbind(pc_it[,c("id", "task", "alt1", "alt2")], kz_it[,c("id", "task", "alt1", 
        "alt2")])[with(rbind(pc_it[,c("id", "task", "alt1", "alt2")], 
                             kz_it[,c("id", "task", "alt1", "alt2")]), order(id)),]
block_i <- dcast(block_i, id~task, value.var=c("alt1","alt2"))
block_i$vec <- do.call(paste, c(block_i[,2:57], sep = " "))
block_i = merge(block_i, block_b, by = "vec", all.x = T)
block_i <- block_i[,c("id", "actual_subjectindex")]
resp_i = merge(resp_i, block_i, by = "id", all.x = T)

8.4 Coma quiz

# create a data consisting of quiz variables
quiz = subset(resp_i, select = c(id, G04Q01.SQ001.:G04Q03.SQ005.))

# convert true/false into 0/1
quiz[,2:ncol(quiz)] = apply(quiz[,2:ncol(quiz)], 2, function(x) as.integer(as.logical(x)))

# rename columns to make it easier to understand 
names(quiz) = c("id", paste(rep(paste0("quiz",1:3),each = 5),"_q",1:5, sep = ""))

# transform quiz into 0/1 where 1 represents whether answer is correct 
quiz[,2:16] = t(apply(quiz, 1, function(x) x[2:16] == rep(c(0,1,0,1,0),3)))*1

# create 'attemps'
quiz$attempts = ifelse(is.na(quiz$quiz1_q1), 0, 
                ifelse(is.na(quiz$quiz2_q1), 1,
                ifelse(is.na(quiz$quiz3_q1), 2, 3)))

8.5 Respondent characteristics to assess dropout and external validity

8.5.1 Gender, race and ethnicity

# 3 age groups are already recoded above under 'age_range'
# gender (1: female; 2: male; 3: other/prefer not to say)
resp_i$gender = ifelse(resp_i$G02Q03 == "", "", ifelse(resp_i$G02Q03 == 
                      "female", 1, ifelse(resp_i$G02Q03 == "male", 2, 3)))

# create 'race_cat'
race = subset(resp_i, select = c(id, G02Q05.AO00.:G02Q05.AO14.))
names(race) = c("id", "White", "Black", "AmerIndianAlaska", "Chinese", 
                "Filipino", "AsianInd", "Vietnamese", "Korean", "Japanese", 
                "NatHawaii", "Samoan", "Chamorro", "OtherAsian", "OtherPacific",
                "SomeOtherRace")
race[,2:ncol(race)] = as.matrix(race[,2:ncol(race)] == "Yes")*1
race$sum = apply(race[,2:ncol(race)], 1, function(x) sum(x))
race$race_cat = ifelse(race$sum == 0, 0, "")
race$race_cat = ifelse(race$White == 1 & race$sum == 1, 1, race$race_cat)
race$race_cat = ifelse(race$Black == 1 & race$sum == 1, 2, race$race_cat)
race$race_cat = ifelse(race$AmerIndianAlaska == 1 & race$sum == 1, 3, race$race_cat)
race$race_cat = ifelse((race$Chinese == 1 | race$Filipino == 1 | 
    race$AsianInd == 1 | race$Vietnamese == 1 | race$Korean == 1 | 
    race$Japanese == 1 | race$OtherAsian == 1) & race$sum == 1, 4, race$race_cat)
race$race_cat = ifelse((race$NatHawaii == 1 | race$Samoan == 1 | 
    race$Chamorro == 1 | race$OtherPacific == 1) & race$sum == 1, 5, race$race_cat)
race$race_cat = ifelse(race$SomeOtherRace == 1 & race$sum == 1, 6, race$race_cat)
race$race_cat = ifelse(race$sum >= 2, 7, race$race_cat)

# merge 'race_cat' with 'resp_i'
resp_i = merge(resp_i, subset(race, select = c(id, race_cat)), 
              by = "id", all.x = T)

# create 'hispanic' 
resp_i$hispanic = ifelse(resp_i$raceth == "" | is.na(resp_i$raceth), "", 
                         ifelse(resp_i$raceth == 1, 1, 0))

8.5.2 US Census regions

# create a list consisting of four US regions
states = subset(resp_i, select = c(id, G02Q01))
regions = list(
  northeast = c("Connecticut", "Maine", "Massachusetts", "New Hampshire", 
          "Rhode Island", "Vermont", "New York", "Pennsylvania", "New Jersey"),
  midwest = c("Indiana", "Illinois", "Michigan", "Ohio", "Wisconsin", "Iowa", 
          "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", 
          "South Dakota"),
  south = c("Delaware", "Washington, DC", "Florida", "Georgia", "Maryland", 
          "North Carolina", "South Carolina", "Virginia", "West Virginia", 
          "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", 
          "Louisiana", "Oklahoma", "Texas"),
  west = c("Arizona", "Colorado", "Idaho", "New Mexico", "Montana", "Utah", 
          "Nevada", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", 
          "Washington"))

# create a 0/1 variable or each state indicating which region they belong
for(region in names(regions)) {
  resp_i[region] = as.integer(as.logical(sapply(resp_i$G02Q01, 
                      function(x) any(trimws(x) %in% regions[[region]]))))}

# convert 0/1 to a categorical variable 
# (1: Northeast, 2: Midwest, 3: South, 4: West)
resp_i$region = apply(resp_i[,c("northeast", "midwest", "south", "west")], 1, 
      function(x) ifelse(x[1] == 1, 1, ifelse(x[2] == 1, 2, ifelse(x[3] == 1, 3, 
      ifelse(x[4] == 1, 4, "")))))

8.6 Variables for secondary table

# rename columns 'G11Q03.SQ001_SQ001' through 'G11Q03.SQ006_SQ004' to make it \\
# easier to read. The child categories are 'Newborn', 'Toddler', 'Preschool', 
# 'Elementary', 'Adolescent', and 'Adult'. \\
# The numbers after each child category are defined as following:\\
# 1 They do not live with me \\
# 2 They live with me part-time
# 3 They live with me full-time
# 4 They are no longer living
resp_i = resp_i %>% rename("Newborn1" = "G11Q03.SQ001_SQ001.", 
        "Newborn2" = "G11Q03.SQ001_SQ002.", "Newborn3" = "G11Q03.SQ001_SQ003.",
        "Newborn4" = "G11Q03.SQ001_SQ004.", "Toddler1" = "G11Q03.SQ002_SQ001.",
        "Toddler2" = "G11Q03.SQ002_SQ002.", "Toddler3" = "G11Q03.SQ002_SQ003.",
  "Toddler4" = "G11Q03.SQ002_SQ004.", "Preschool1" = "G11Q03.SQ003_SQ001.",
  "Preschool2" = "G11Q03.SQ003_SQ002.", "Preschool3" = "G11Q03.SQ003_SQ003.",
  "Preschool4" = "G11Q03.SQ003_SQ004.", "Elementary1" = "G11Q03.SQ004_SQ001.",
  "Elementary2" = "G11Q03.SQ004_SQ002.", "Elementary3" = "G11Q03.SQ004_SQ003.",
  "Elementary4" = "G11Q03.SQ004_SQ004.", "Adolescent1" = "G11Q03.SQ005_SQ001.",
  "Adolescent2" = "G11Q03.SQ005_SQ002.", "Adolescent3" = "G11Q03.SQ005_SQ003.",
  "Adolescent4" = "G11Q03.SQ005_SQ004.", "Adult1" = "G11Q03.SQ006_SQ001.",
  "Adult2" = "G11Q03.SQ006_SQ002.", "Adult3" = "G11Q03.SQ006_SQ003.",
  "Adult4" = "G11Q03.SQ006_SQ004.")

# create 'total_children' 
resp_i$total_children = apply(subset(resp_i, select = c(Newborn1:Adult4)), 1, 
                              function(x) sum(x, na.rm = T))

# create 'work_status'
resp_i$work_status = ifelse(resp_i$G11Q04 == "", "", ifelse(resp_i$G11Q04 == 
    "working for pay at a job or business", 1, ifelse(resp_i$G11Q04 == 
    "with a job or business, but not at work", 2, ifelse(resp_i$G11Q04 == 
    "looking for work", 3, ifelse(resp_i$G11Q04 == 
    "working, but not for pay, at a family-owned job or business", 4, 
    ifelse(resp_i$G11Q04 == "not working at a job or business and not looking for work", 5,
    ifelse(resp_i$G11Q04 == "retired", 6, ifelse(resp_i$G11Q04 == 
    "don't know/not sure", 7, 8))))))))

# create 'community'
resp_i$community = ifelse(resp_i$G11Q05 == "", "", ifelse(resp_i$G11Q05 == 
      "urban", 1, ifelse(resp_i$G11Q05 == "suburban", 2, ifelse(resp_i$G11Q05 
      == "rural", 3, 4))))

# create 'marital_status'
resp_i$marital_status = ifelse(resp_i$G11Q06 == "", "", ifelse(resp_i$G11Q06 == 
      "now married" | resp_i$G11Q06 == "married", 1, ifelse(resp_i$G11Q06 == "widowed", 2, 
      ifelse(resp_i$G11Q06 == "divorced", 3, ifelse(resp_i$G11Q06 == 
      "separated", 4, ifelse(resp_i$G11Q06 == "never married", 5, 6))))))

# create 'education'
resp_i$education = ifelse(resp_i$G11Q07 == "", "", ifelse(resp_i$G02Q02<24,
      1 , ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == "less than 1st grade" | 
      resp_i$G11Q07 == "1st, 2nd, 3rd, or 4th grade" | resp_i$G11Q07 == 
      "5th or 6th grade" | resp_i$G11Q07 == "7th and 8th grade"), 2, 
      ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == "9th grade" | 
      resp_i$G11Q07 == "10th grade" | resp_i$G11Q07 == "11th grade" | 
      resp_i$G11Q07 == "12th grade no diploma"), 3, ifelse((resp_i$G02Q02>=25) & 
      (resp_i$G11Q07 == "high school graduate - high school diploma or equivalent"), 4, 
      ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == "some college but no degree"), 5, 
      ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == 
      "associate degree in college - occupation/vocation program" | 
      resp_i$G11Q07 == "associate degree in college - academic program"), 6, 
      ifelse((resp_i$G02Q02>=25) & (resp_i$G11Q07 == 
      "bachelor's degree (for example: BA, AB, BS)"), 7,8))))))))

# create 'income'
resp_i$income = ifelse(resp_i$G11Q08 == "", "", ifelse(resp_i$G11Q08 ==
    "under $10,000", 1, ifelse(resp_i$G11Q08 == "$10,000 to under $20,000" | 
    resp_i$G11Q08 == "$20,000 to under $30,000" | resp_i$G11Q08 == 
    "$30,000 to under $40,000" | resp_i$G11Q08 == "$40,000 to under $50,000", 2, 
    ifelse(resp_i$G11Q08 == "$50,000 to under $75,000",3, 
    ifelse(resp_i$G11Q08 == "$75,000 to under $100,000", 4, ifelse(
    resp_i$G11Q08 == "$100,000 to under $150,000", 5, 6))))))

# create 'political_aff'
resp_i$political_aff = ifelse(resp_i$G11Q09 == "", "", ifelse(
  resp_i$G11Q09 == "a Democrat" & resp_i$G11Q10A == "a strong Democrat", 1, 
  ifelse(resp_i$G11Q09 == "a Democrat" & resp_i$G11Q10A == "a moderate Democrat", 
  2, ifelse(resp_i$G11Q09 == "a Republican" & resp_i$G11Q10C == 
  "a strong Republican", 3, ifelse(resp_i$G11Q09 == "a Republican" & 
  resp_i$G11Q10C == "a moderate Republican", 4, ifelse(resp_i$G11Q09 == 
  "an independent" & resp_i$G11Q10B =="lean Democrat", 5, ifelse(
  resp_i$G11Q09 == "an independent" & resp_i$G11Q10B =="lean Republican", 6, 
  ifelse(resp_i$G11Q09 == "an independent" & resp_i$G11Q10B =="don't lean", 7,
  8))))))))

# create 'political_pers'
resp_i$political_pers = ifelse(resp_i$G11Q11 == "", "", ifelse(
  resp_i$G11Q11 == "a liberal" & resp_i$G11Q10A == "very liberal", 1, 
  ifelse(resp_i$G11Q11 == "a liberal" & resp_i$G11Q10A == "somewhat liberal", 
  2, ifelse(resp_i$G11Q11 == "a conservative" & resp_i$G11Q11C == 
  "very conservative", 3, ifelse(resp_i$G11Q11 == "a conservative" & 
  resp_i$G11Q11C == "somewhat conservative", 4, ifelse(resp_i$G11Q11 == 
  "a moderate" & resp_i$G11Q11B =="lean liberal", 5, ifelse(
  resp_i$G11Q11 == "a moderate" & resp_i$G11Q11B =="lean conservative", 6, 
  ifelse(resp_i$G11Q11 == "a moderate" & resp_i$G11Q11B =="don't lean", 7,
  8))))))))

8.7 Variable for primary table

# create 'age_range2' 
resp_i$age_range2 = as.numeric(cut(resp_i$G02Q02, breaks = c(18, 24, 34, 54, 64,
  max(resp_i$G02Q02, na.rm = T)), labels = c(1, 2, 3, 4, 5), include.lowest = T))

# create 'male'
resp_i$male = ifelse(resp_i$G02Q03 == "male", 1, ifelse(resp_i$G02Q03 == 
              "female" | resp_i$G02Q03 == "other", 2 , NA))

# create 'race_cat2'
resp_i$race_cat2 = ifelse(resp_i$race_cat == "", "", ifelse(resp_i$race_cat == 
                   1, 1, ifelse(resp_i$race_cat == 2, 2, ifelse(
                     resp_i$race_cat == 4, 3, 4))))

# create 'educ2'
resp_i$educ2 = as.numeric(cut(as.numeric(resp_i$education), 
      breaks = c(1, 4, 6 , 8), labels = c(1, 2, 3), include.lowest = T))

9 Descriptive analyses

9.1 Quota membership and block assignment (Appendix 1, confirmation)

# create 'quota_confirmed'
resp_i$quota_confirmed = ifelse(resp_i$assigned_quota == 
                                  resp_i$actual_quota, 1, 0)
ifelse(length(table(resp_i$quota_confirmed[resp_i$incomplete == 0])) == 1,
       "Quota confirmed", "Quota is not confirmed")

## [1] "Quota confirmed"

# create 'subjectindex_confirmed'
resp_i$subjectindex_confirmed = ifelse(resp_i$subjectindex == 
                                         resp_i$actual_subjectindex, 1, 0)
ifelse(length(table(resp_i$quota_confirmed[resp_i$incomplete == 0])) == 1,
       "Subjectindex confirmed", "Subjectindex is not confirmed")

## [1] "Subjectindex confirmed"

9.2 Coma quiz (Appendix 1, Confirmation)

quiz[nrow(quiz) + 1,] = c("Total", colSums(quiz[,2:17], na.rm = T))

# coma quiz analysis of completion by question and attempt
quiz_table = quiz[-nrow(quiz),] %>% select(c(quiz1_q1:attempts)) %>% tbl_summary(missing = "no")
quiz_table

Characteristic	N = 1,681¹
quiz1_q1
0	762 (63%)
1	455 (37%)
quiz1_q2
0	136 (11%)
1	1,081 (89%)
quiz1_q3
0	251 (21%)
1	966 (79%)
quiz1_q4
0	55 (4.5%)
1	1,162 (95%)
quiz1_q5
0	378 (31%)
1	839 (69%)
quiz2_q1
0	393 (40%)
1	580 (60%)
quiz2_q2
0	144 (15%)
1	829 (85%)
quiz2_q3
0	244 (25%)
1	729 (75%)
quiz2_q4
0	58 (6.0%)
1	915 (94%)
quiz2_q5
0	315 (32%)
1	658 (68%)
quiz3_q1
0	298 (48%)
1	317 (52%)
quiz3_q2
0	113 (18%)
1	502 (82%)
quiz3_q3
0	191 (31%)
1	424 (69%)
quiz3_q4
0	57 (9.3%)
1	558 (91%)
quiz3_q5
0	246 (40%)
1	369 (60%)
attempts
0	464 (28%)
1	244 (15%)
2	358 (21%)
3	615 (37%)
¹ n (%)

10 EQ-5D-5L analysis

# create variables on EQ-5D-5L and EQ-VAS
eq5d = subset(resp_i, select = c(id, EQ5D5LMO:EQ5D5LVAS))

# Create a data frame with the levels mapping
levels_map <- c(
  "no"       = 1,
  "slight"   = 2,
  "moderate" = 3,
  "severe"   = 4,
  "unable"   = 5,
  "extreme"  = 5,
  "not"       = 1,
  "slightly"   = 2,
  "moderately" = 3,
  "severely"   = 4,
  "extremely"  = 5
)

# Apply the mapping function to your columns
eq5d$EQ5D5LMO <- map_levels(eq5d$EQ5D5LMO, levels_map)
eq5d$EQ5D5LSC <- map_levels(eq5d$EQ5D5LSC, levels_map)
eq5d$EQ5D5LUA <- map_levels(eq5d$EQ5D5LUA, levels_map)
eq5d$EQ5D5LPD <- map_levels(eq5d$EQ5D5LPD, levels_map)
eq5d$EQ5D5LAD <- map_levels(eq5d$EQ5D5LAD, levels_map)

# create EQ-5D
eq_table = eq5d[!is.na(eq5d$EQ5D5LVAS),] %>% select(c(EQ5D5LMO:EQ5D5LVAS)) %>% 
  tbl_summary(missing = "no")
eq_table

Characteristic	N = 1,228¹
EQ5D5LMO
1	972 (79%)
2	159 (13%)
3	67 (5.5%)
4	21 (1.7%)
5	9 (0.7%)
EQ5D5LSC
1	1,105 (90%)
2	72 (5.9%)
3	35 (2.9%)
4	9 (0.7%)
5	7 (0.6%)
EQ5D5LUA
1	931 (76%)
2	180 (15%)
3	81 (6.6%)
4	26 (2.1%)
5	10 (0.8%)
EQ5D5LPD
1	568 (46%)
2	399 (32%)
3	191 (16%)
4	47 (3.8%)
5	23 (1.9%)
EQ5D5LAD
1	637 (52%)
2	268 (22%)
3	196 (16%)
4	81 (6.6%)
5	46 (3.7%)
EQ5D5LVAS	79 (65, 89)
¹ n (%); Median (IQR)

10.1 Dropout analysis by component and component order

dropout_analysis = resp_i[resp_i$dropout != "",] %>% select(c(dropout_cat, pc_cnum2)) %>% 
  mutate(pc_cnum2 = case_when(pc_cnum2 == "2"~"Paired comparison first", 
           pc_cnum2 == "3"~"Kaizen task first"),
           dropout_cat = case_when(dropout_cat == "0"~"Completed",
           dropout_cat == "4"~"During paired comparison", 
           dropout_cat == "5"~"During kaizen task", 
           dropout_cat == "6"~"During follow-up"),
        pc_cnum2 = factor(pc_cnum2, levels = c("Paired comparison first",
           "Kaizen task first")),
        dropout_cat = factor(dropout_cat, levels = c("Completed", 
        "During paired comparison", 
        "During kaizen task", "During follow-up"))) %>%  
  tbl_summary(by = "pc_cnum2", missing = "no", statistic = list(all_categorical()~"{p}% ({n})"),
  label = list(dropout_cat~"Dropout category")) %>% add_overall() %>%
  add_p(everything() ~ "chisq.test")
options(knitr.kable.NA = '')
dropout_analysis %>% knitr:: kable()

Characteristic	Overall, N = 651	Paired comparison first, N = 319	Kaizen task first, N = 332	p-value
Dropout category				0.093
Completed	97% (631)	97% (310)	97% (321)
During paired comparison	1.7% (11)	0.9% (3)	2.4% (8)
During kaizen task	0.6% (4)	1.3% (4)	0% (0)
During follow-up	0.8% (5)	0.6% (2)	0.9% (3)

10.2 Dropout analysis by demographic characteristics and by US region (Appendix 1, Table 1)

We now create and present tables for appendices

10.2.1 Table 1 of the appendix

# create table1_appx and compare complete vs dropped out by demographic characteristics 
table1_appx = resp_i %>% select(c(dropout, age_range, gender,
  race_cat,hispanic, region)) %>% mutate(dropout = case_when(
  dropout == "1" ~ "Dropped out", dropout == "0" ~ "Completed"), age_range = 
  case_when(age_range == 1 ~ "18 to 34", age_range == 2 ~ "35 to 54", 
  age_range == 3 ~ "55+"), gender = case_when(gender == "1" ~ "Female",
  gender == "2" ~ "Male", gender == "3" ~ "Other/prefer not to say"), 
  race_cat = case_when(race_cat == "1" ~ "White alone", race_cat == "2" ~ 
  "Black or African American alone", race_cat == "3" ~ 
  "American Indian or Alaska Native alone", race_cat == "4" ~ "Asian alone", 
  race_cat == "5" ~ "Native Hawaiian or Other Pacific Islander alone", 
  race_cat == "6" ~ "Some other race alone", race_cat == "7" ~ 
  "Two or more races"), hispanic = case_when(hispanic == "0" ~ "Other", 
  hispanic == "1" ~"Hispanic or Latino"), region = case_when(region == 
  "1"~"Northeast", region == "2"~"Midwest", region == "3"~"South", 
  region=="4"~"West"), region = factor(region, levels = c("Northeast", 
  "Midwest", "South","West")),race_cat = factor(race_cat, 
  levels = c("White alone", "Black or African American alone", 
  "American Indian or Alaska Native alone", "Asian alone", 
  "Native Hawaiian or Other Pacific Islander alone", "Some other race alone", 
  "Two or more races"))) %>% 
  tbl_summary(by = dropout, label = list(age_range~"Age range", 
  gender~"Gender", race_cat~"Race category", 
  hispanic~"Hispanic", region~"Region"), missing = "no",
  statistic = list(all_categorical()~"{p} ({n})")) %>% 
  add_p(everything() ~ "chisq.test") 

# create table1_2appx to extract only the complete, so that i can compare complete vs ACS
table1_2appx = resp_i[resp_i$dropout==0 & !is.na(resp_i$dropout),] %>% select(c(age_range, gender,
   race_cat,hispanic, region)) %>% mutate(age_range = 
   case_when(age_range == 1 ~ "18 to 34", age_range == 2 ~ "35 to 54", 
   age_range == 3 ~ "55+"), gender = case_when(gender == "1" ~ "Female",
   gender == "2" ~ "Male", gender == "3" ~ "Other/prefer not to say"), 
   race_cat = case_when(race_cat == "1" ~ "White alone", race_cat == "2" ~ 
   "Black or African American alone", race_cat == "3" ~ 
   "American Indian or Alaska Native alone", race_cat == "4" ~ "Asian alone", 
   race_cat == "5" ~ "Native Hawaiian or Other Pacific Islander alone", 
   race_cat == "6" ~ "Some other race alone", race_cat == "7" ~ 
   "Two or more races"), hispanic = case_when(hispanic == "0" ~ "Other", 
   hispanic == "1" ~"Hispanic or Latino"), region = case_when(region == 
   "1"~"Northeast", region == "2"~"Midwest", region == "3"~"South", 
   region=="4"~"West"), region = factor(region, levels = c("Northeast", 
   "Midwest", "South","West")),race_cat = factor(race_cat, 
   levels = c("White alone", "Black or African American alone", 
   "American Indian or Alaska Native alone", "Asian alone", 
   "Native Hawaiian or Other Pacific Islander alone", "Some other race alone", 
   "Two or more races"))) %>% tbl_summary(label = list(age_range~"Age range", 
   gender~"Gender", race_cat~"Race category", 
   hispanic~"Hispanic", region~"Region"), missing = "no",
   statistic = list(all_categorical()~"{p}"))

# extract the column for 'completed' from table1_2appx and calculate the \\
# p-value for each of the demographic characteristics
table1_acs$Completed = as.numeric(table1_2appx$table_body$stat_0)
table1_acs[1,4] = chisq.test(table1_acs[2:4,2:3])$p.value
table1_acs[5,4] = chisq.test(table1_acs[6:7,2:3])$p.value
table1_acs[9,4] = chisq.test(table1_acs[10:16,2:3])$p.value
table1_acs[17,4] = chisq.test(table1_acs[18:19,2:3])$p.value
table1_acs[20,4] = chisq.test(table1_acs[21:24,2:3])$p.value

# combine ACS, Complete and Dropout columns into one table and rename columns
table1_acs$Completed = table1_appx$table_body$stat_1
table1_acs$Dropped_out = table1_appx$table_body$stat_2
table1_acs$p_value = as.numeric(table1_appx[["table_body"]][["p.value"]])
table1_acs = table1_acs[,c(1,3,5,6,2,4)]
names(table1_acs) = c("Quota", "Completed", "Dropped-out", "p-value", "ACS",
                      "p-value_acs")

# extract the number of respondents for each column from table1_appx and paste it \\
# into table1_acs
a = c("N",table1_appx[["df_by"]][["n"]][1], table1_appx[["df_by"]][["n"]][2],"","","")
table1_acs = rbind(a, table1_acs)
table1_acs$`p-value` = as.character(round(as.numeric(table1_acs$`p-value`), digits = 4))
table1_acs$ACS = as.character(round(as.numeric(table1_acs$ACS), digits = 2))
table1_acs$`p-value_acs` = as.character(round(as.numeric(table1_acs$`p-value_acs`), digits = 4))
table1_acs[is.na(table1_acs)] = ""

# print Table 1 of the appendix
table1_acs %>% knitr:: kable()

Quota	Completed	Dropped-out	p-value	ACS	p-value_acs
N	631	168
Age in years			4e-04		0.2558
18 to 34 years	35 (224)	22 (37)		29.13
35 to 54 years	37 (234)	37 (62)		32.66
55 years and over	27 (173)	41 (69)		38.2
Gender			7e-04		0.9464
Female	49 (309)	65 (109)		50.98
Male	50 (314)	33 (56)		49.02
Other/prefer not to say	1.3 (8)	1.8 (3)
Race			0.0125		0.4773
White alone	75 (476)	67 (113)		63.66
Black or African American alone	12 (77)	23 (39)		11.8
American Indian and Alask Native alone	0.6 (4)	0.6 (1)		0.89
Asian alone	4.3 (27)	1.2 (2)		5.97
Native Hawaiian and Other Pacific Islander alone	0.2 (1)	0 (0)		0.14
Some Other Race alone	2.4 (15)	3.0 (5)		6.67
Two or More Races	4.9 (31)	4.8 (8)		10.87
Ethnicity			0.9662		0.3166
Hispanic or Latino	11 (72)	12 (20)		16.91
Other	89 (559)	88 (148)		83.09
US regions			0.2068		0.9507
Northeast	20 (126)	18 (31)		17.58
Midwest	22 (138)	18 (30)		20.67
South	38 (237)	46 (78)		38.12
West	21 (130)	17 (29)		23.64

10.2.2 Table 2 of the appendix

# create table2_appx and compare complete vs dropped out by state 
table2_appx = resp_i[resp_i$dropout != "",] %>% select(c(dropout, G02Q01)) %>%
  tbl_summary(by = dropout, missing = "no", 
  statistic = list(all_categorical()~"{p} ({n})")) %>% 
  add_p(everything() ~ "chisq.test")


# create table2_2appx to extract only the complete, so that i can compare complete vs ACS
table2_2appx = resp_i[resp_i$dropout==0,] %>% select(c(G02Q01)) %>%
  tbl_summary(statistic = list(all_categorical()~"{p}"))


# extract necessary variables from table2_appx and table2_2appx and \\
# paste them into table2_acs
table2_3appx = as.data.frame(table2_2appx[["table_body"]][["label"]])
table2_3appx$b = table2_2appx$table_body$stat_0
table2_3appx = table2_3appx[2:nrow(table2_3appx),]
names(table2_3appx) = c("States", "Completed")
rownames(table2_acs) = c(1:nrow(table2_acs)) 
names(table2_acs) = c("States", "ACS")
table2_acs$ACS = round(as.numeric(table2_acs$ACS), digits = 4)
table2_3appx$Completed = as.numeric(table2_3appx$Completed)
#table2_3appx$States = str_sub(table2_3appx$States,2,-2)

# merge table2_3appx with table2_acs
table2_acs = merge(table2_acs, table2_3appx, by = "States", all.x = T)
table2_acs[is.na(table2_acs)] = 0
table2_acs[1,4] = round(as.numeric(chisq.test(table2_acs[,2:3])$p.value), digits = 4)

# extract necessary completes, dropouts, and p-values from table2_appx, \\
# round  the number, and get the table ready for presentation
table2_4appx = as.data.frame(table2_appx[["table_body"]][["label"]])
table2_4appx$Completed = table2_appx$table_body$stat_1
table2_4appx$dropped = table2_appx$table_body$stat_2
table2_4appx[2,4] = round(as.numeric(table2_appx[["meta_data"]][["p.value"]]), digits = 4)
names(table2_4appx) = c("States", "Completed", "Dropped_out", "p-value")
table2_4appx = table2_4appx[2:nrow(table2_4appx),]
table2_acs = merge(table2_acs, table2_4appx, by = "States", all.x = T)
table2_acs = table2_acs[,c(1,5:7,2,4)]
names(table2_acs) = c("States", "Completed", "Dropped_out", "p-value", "ACS", "p-vapue-acs")
a = c("N",table2_appx[["df_by"]][["n"]][1], table2_appx[["df_by"]][["n"]][2],"","","")
table2_acs = rbind(a, table2_acs)
table2_acs[is.na(table2_acs)] = ""

# print Table 2 of the appendix
table2_acs %>% knitr:: kable()

States	Completed	Dropped_out	p-value	ACS	p-vapue-acs
N	631	168
Alabama	1.7 (11)	2.4 (4)	0.4609	1.5171	1
Alaska				0.2142
Arizona	3.3 (21)	3.0 (5)		2.1922
Arkansas	1.3 (8)	0 (0)		0.8958
California	8.2 (52)	7.1 (12)		11.7966
Colorado	2.5 (16)	1.2 (2)		1.7695
Connecticut	1.4 (9)	1.8 (3)		1.1139
Delaware	0.8 (5)	0.6 (1)		0.3066
Florida	7.1 (45)	11 (18)		6.773
Georgia	3.3 (21)	4.2 (7)		3.2032
Hawaii	0.3 (2)	0 (0)		0.4387
Idaho	0.5 (3)	1.2 (2)		0.5535
Illinois	3.5 (22)	1.8 (3)		3.8203
Indiana	2.1 (13)	0.6 (1)		2.0208
Iowa	0.6 (4)	2.4 (4)		0.9509
Kansas	1.0 (6)	0 (0)		0.8638
Kentucky	1.7 (11)	4.2 (7)		1.3515
Louisiana	0.5 (3)	1.2 (2)		1.3703
Maine	0.3 (2)	1.2 (2)		0.4345
Maryland	1.6 (10)	1.8 (3)		1.8593
Massachusetts	1.7 (11)	2.4 (4)		2.1764
Michigan	2.9 (18)	4.2 (7)		3.0575
Minnesota	1.4 (9)	1.2 (2)		1.7024
Mississippi	1.0 (6)	0 (0)		0.8741
Missouri	1.7 (11)	3.6 (6)		1.8504
Montana	0.3 (2)	0.6 (1)		0.3347
Nebraska	0.6 (4)	0 (0)		0.5734
Nevada	0.8 (5)	1.2 (2)		0.9472
New Hampshire	0.2 (1)	0.6 (1)		0.4381
New Jersey	2.5 (16)	1.8 (3)		2.8044
New Mexico	0.3 (2)	0.6 (1)		0.6361
New York	6.8 (43)	6.0 (10)		6.0884
North Carolina	3.0 (19)	6.0 (10)		3.193
North Dakota	0.2 (1)	0 (0)		0.225
Ohio	6.5 (41)	1.8 (3)		3.5513
Oklahoma	1.3 (8)	0.6 (1)		1.1727
Oregon	1.6 (10)	1.2 (2)		1.3116
Pennsylvania	6.0 (38)	3.6 (6)		3.9824
Rhode Island	0.5 (3)	1.2 (2)		0.3421
South Carolina	1.4 (9)	1.2 (2)		1.5769
South Dakota	0.2 (1)	0 (0)		0.261
Tennessee	1.6 (10)	3.0 (5)		2.1036
Texas	7.4 (47)	7.1 (12)		8.5456
Utah	0.3 (2)	0 (0)		0.9263
Vermont	0.5 (3)	0 (0)		0.201
Virginia	3.3 (21)	3.0 (5)		2.6154
Washington	2.2 (14)	1.2 (2)		2.3474
Washington, DC				0.2102
West Virginia	0.5 (3)	0.6 (1)		0.5469
Wisconsin	1.3 (8)	2.4 (4)		1.789
Wyoming	0.2 (1)	0 (0)		0.1697

10.2.3 Table 3 of the appendix

# create marital variables for table 3 of the appendix and paste it into table3_acs
marital_t3 = as.data.frame(prop.table(table(resp_i$marital_status[resp_i$dropout == 0]))*100)
marital_t3$Freq2 = paste("(",table(resp_i$marital_status[resp_i$dropout == 0]),")",sep = "")
a = as.data.frame(c(1:6))
names(a) = "Var1"
b = merge(a, marital_t3, by = "Var1", all.x = T)
table3_acs[2:7,3:4] = b[1:6,2:3]

# create education variables for table 3 of the appendix and paste it into table3_acs
educ_t3 = as.data.frame(prop.table(table(resp_i$education[resp_i$dropout == 0]))*100)
educ_t3$Freq2 = paste("(",table(resp_i$education[resp_i$dropout == 0]),")",sep = "")
a = as.data.frame(c(1:8))
names(a) = "Var1"
b = merge(a, educ_t3, by = "Var1", all.x = T)
table3_acs[9,3:4] = b[1,2:3]
table3_acs[10,3] = sum(b[2:nrow(b),2], na.rm = T)
table3_acs[10,4] = paste("(",sum(as.numeric(gsub(".*?([0-9]+).*", "\\1", b[2:nrow(b),3])), 
                                 na.rm = T),")",sep = "")
table3_acs[11:17,3:4] = b[2:nrow(b),2:3]

# create income variables for table 3 of the appendix and paste it into table3_acs
income_t3 = as.data.frame(prop.table(table(resp_i$income[resp_i$dropout == 0]))*100)
income_t3$Freq2 = paste("(",table(resp_i$income[resp_i$dropout == 0]),")",sep = "")
a = as.data.frame(c(1:6))
names(a) = "Var1"
b = merge(a, income_t3, by = "Var1", all.x = T)
table3_acs[19:24,3:4] = b[,2:3]

# perform chi square test for marital, education and income variable
table3_acs[is.na(table3_acs)] = 0
table3_acs[1,5] = ifelse(table3_acs[7,2] == 0 & table3_acs[7,3] == 0, 
                  chisq.test(table3_acs[2:6,2:3])$p.value, 
                  chisq.test(table3_acs[2:7,2:3])$p.value)
table3_acs[8,5] = chisq.test(table3_acs[9:17,2:3])$p.value
table3_acs[18,5] = chisq.test(table3_acs[19:24,2:3])$p.value
table3_acs[table3_acs == 0] = NA

# round integers and rename columns of table3_acs to make it more presentable
table3_acs[,2:3] = round(table3_acs[,2:3], digits = 2)
table3_acs[,5] = round(table3_acs[,5], digits = 4)
table3_acs[,5] = apply(data.frame(table3_acs[,5]),1,function(x) ifelse(is.na(x),x,ifelse(x==0,"<0.001",x)))
table3_acs[is.na(table3_acs)] = ""
table3_acs$Freq3 = paste(table3_acs$Freq,table3_acs$Freq2,sep = " ")
table3_acs = table3_acs[,c(1,6,2,5)]
names(table3_acs) = c("Variables","Completed", "ACS","p-value")

# print Table 3 of the appendix
table3_acs %>% knitr:: kable()

Variables	Completed	ACS	p-value
Marital status			0.6536
Married	43.9 (277)	50.41
Widowed	3.49 (22)	5.74
Divorced	9.98 (63)	11.15
Separated	1.74 (11)	1.79
Never married	40.89 (258)	30.91
don’t know/not sure/refuse
Educational attainment			0.4565
18 to 24 years	12.68 (80)	11.7
25 years and over	87.32 (551)	88.3
Less than 9th grade		4.2
9th to 12th grade, no diploma	2.22 (14)	5.19
High school graduate	32.01 (202)	23.22
Some college, no degree	12.2 (77)	17.05
Associate’s degree	7.45 (47)	7.73
Bachelor’s degree	20.92 (132)	18.76
Graduate or professional degree	12.52 (79)	12.16
Household Income in 2021			0.5147
Less than $10,000	4.28 (27)	6.03
$10,000 to $44,999	33.91 (214)	30.44
$50,000 to $74,999	22.19 (140)	16.81
$75,000 to $99,999	15.85 (100)	12.76
$100,000 to $149,999	14.42 (91)	16.26
$150,000 or more	9.35 (59)	17.7

10.2.4 Table 1 of the primary paper

# create table 1 of the primary paper
table1_primary = resp_i[resp_i$dropout==0 & !is.na(resp_i$dropout),] %>% 
  select(c(age_range2, male, race_cat2, hispanic, educ2)) %>% 
  mutate(age_range2 = case_when(age_range2 == 1~"18 to 24", age_range2 == 2 ~ 
  "25 to 34", age_range2 == 3~"35 to 54", age_range2 == 4~"55 to 64", 
  age_range2 == 5~"65 and over"), male = case_when(male == 1~"Male", 
  male == 2~"Female/Other"), race_cat2 = case_when(race_cat2 == "1" ~ "White alone", 
  race_cat2 == "2" ~ "Black or African American alone", race_cat2 == "3" ~ "Asian alone", 
  race_cat2 == "4"~ "Other"), race_cat2 = factor(race_cat2, levels = c("White alone", 
  "Black or African American alone", "Asian alone", "Other")), 
  hispanic = case_when(hispanic == "0" ~ "Other", hispanic == "1" ~"Hispanic or Latino"),
  educ2 = case_when(educ2 == "1"~"High school or less", educ2 == "2"~"Associate/some college",
  educ2 == "3"~"Bachelors or more"), educ2 = factor(educ2, levels = c("High school or less",
  "Associate/some college", "Bachelors or more"))) %>%
  tbl_summary(label = list(age_range2~"Age range", male~"Gender", 
   race_cat2~"Race category", hispanic~"Hispanic", educ2~"Education"), 
   statistic = list(all_categorical()~"{p} ({n})"), missing = "no")

# print Table 1 of the primary paper
table1_primary

Characteristic	N = 631¹
Age range
18 to 24	15 (92)
25 to 34	21 (132)
35 to 54	37 (234)
55 to 64	6.5 (41)
65 and over	21 (132)
Gender
Female/Other	50 (315)
Male	50 (314)
Race category
White alone	75 (476)
Black or African American alone	12 (77)
Asian alone	4.3 (27)
Other	8.1 (51)
Hispanic
Hispanic or Latino	11 (72)
Other	89 (559)
Education
High school or less	47 (296)
Associate/some college	20 (124)
Bachelors or more	33 (211)
¹ % (n)

11 Save the data

# Appendix 1, Table 1
write.csv(table1_acs, "AppendixTable1.csv", row.names = F)
# Appendix 1, Table 2
write.csv(table2_acs, "AppendixTable2.csv", row.names = F)
# Appendix 1, Table 3
write.csv(table3_acs, "AppendixTable3.csv", row.names = F) 
# Primary paper, Table 1
table1_primary %>% gtsummary::as_tibble() %>% 
  writexl::write_xlsx(., "PrimaryTable1.xlsx")

Sample Selection

Maksat Jumamyradov, maksat@usf.edu; Benjamin M. Craig, bcraig@usf.edu

2023-12-24