1 Purpose

Use this R Markdown file to load and produce the analytical files using DCE survey data. This file and data attached to it is free for private, non-commercial use only. Please contact the authors for commercial usage.

2 Summary

This R Markdown loads the DCE survey data from LimeSurvey (Community Edition) version 5.6 (including the question themes for paired comparison and kaizen tasks) and produce analytical files for descriptive, primary and sensitivity analyses. Specifically, it loads the Limesurvey data, recode the data in long format and parsed (where needed), and saves the comma-separated-values files, namely respondent data (i.e., resp_i), temporal data in long format (i.e., temp_it), paired comparison data in long format (i.e., pc_it), and kaizen task data in long format (i.e., kz_it). These data are de-identified and complete (i.e., no loss of information).”

3 Load the libraries

# set directory
setwd("C:/Users/maksat/Box Sync/USF PC/EuroQol/Kaisen task/paper1/Wave 1/markdown")
# clear memory
rm(list = ls())
# load libraries
library(dplyr) ## for command relocate()
library(stringr) ## for command strsplit() and str_split()
library(reshape2) ## for command melt() and dcast()

4 Data notation

Below, we present data notations and definitions of new variables created.

4.1 Subject and task index

subject, i = 1 to N; task, t = 1 to T;

T1 is the number of coma and paired comparison tasks including two warmups

T2 is the number of kaizen tasks including warmup

4.2 Definitions of recoded variables by analytical file

4.2.1 Respondent data in wide format

‘incomplete’ is the indicator for incomplete surveys (1 if lastpage < 103).

‘pc_cnum1’ is the order of paired comparison component based on HS.

‘pc_cnum2’ is the order of paired comparison component based on E.

‘kz_cnum1’ is the order of kaizen component based on HS.

‘kz_cnum2’ is the order of kaizen component based on E.

4.2.2 Temporal data in long format

‘id’ is the identification number for each respondent

‘page’ is the page title

‘ptime’ is time spent for each page in seconds

4.2.3 Paired comparison data in long format

‘id’ is the identification number for each respondent.

‘task’ is the label for each task as it appears in the LimeSurvey.

‘task_str’ string that contains necessary information about each task.

‘ADO’ is the order of the attributes without position randomization.

‘AMO’ is the order of the attributes with position randomization.

‘CA’ shows which attribute information buttons were clicked.

‘E’ is the time and date when the task was completed.

‘HS’ shows attribute levels both in written and coded format.

‘ODO’ is the order of the objects without position randomization

‘OMO’ is the order of the objects with position randomization

‘Q’ is the question indicator

‘TP’ shows page time for each task in seconds.

‘TT’ shows the total time spent on survey in seconds.

‘U’ the alternative that was clicked on (0 is left, 1 is right).

‘incomplete’ is an indicator of survey completion (1 if complete).

‘alt1’ shows the attribute levels for the alternative on the left.

‘alt2’ shows the attribute levels for the alternative on the right.

‘tnum1’ is the order of tasks after randomization of tasks based on HS.

‘tnum2’ is the order of tasks after randomization of tasks based on E.

‘final_choice’ is the location of the chosen object after randomization.

‘actual_choice’ is the location of the chosen object before randomization.

4.2.4 Kaizen task data in long format

‘id’ is the identification number for each respondent.

‘task’ is the label for each task as it appears in the LimeSurvey.

‘task_str’ string that contains necessary information about each task.

‘ADO’ is the order of the attributes without position randomization.

‘ASO’ is the order of the attributes with position randomization.

‘final_choice’ is the location of the chosen attribute after randomization, first, second, third and fourth, respectively.

‘actual_choice’ is the location of the chosen attribute before
randomization, first, second, third and fourth, respectively.

‘holdout_actual’ is the location of hold-out attribute for ‘actual_choice’.

‘holdout_final’ is the location of hold-out attribute for ‘final_choice’.

‘CA’ shows which attribute information buttons were clicked.

‘E’ is the time and date when the task was completed.

‘HS’ shows attribute levels both in written and coded format.

‘ODO’ is the order of the objects without position randomization

‘OMO’ is the order of the objects with position randomization

‘Q’ is the question indicator

‘TP’ shows page time for each task in seconds.

‘TT’ shows the total time spent on survey in seconds.

‘U’ the alternative that was clicked on (0 is left, 1 is right).

‘incomplete’ is an indicator of survey completion (1 if complete).

‘alt1’ shows the attribute levels for the alternative on the left.

‘alt2’ shows the attribute levels for the alternative on the right.

‘tnum1’ is the order of tasks after randomization of tasks based on HS.

‘tnum2’ is the order of tasks after randomization of tasks based on E.

5 Specify the settings and functions

In this section, we specify the settings and define the functions used in our analysis.

5.1 Specify the settings

N = 1681 ## the number of survey record in raw data
Z0 = 237 ## the number of variables per survey record in raw data
Z1 = 144 ## the number of respondent-specific variables in final dataset 
Z2 = 21  ## the number of task-specific variables in final pc and kz datasets
Z3 = 3  ## the number of temporal variables (e.g., page times) in final dataset   
T1 = 17 ## number of paired comparisons per respondent
T2 = 11 ## number of kaizen tasks per respondent
spec = c(N, Z0, Z1, Z2, Z3, T1, T2)  ## summary of setting specification

5.2 Specify the functions

This function converts ‘final_choice’ to ‘actual_choice’ accounting for attribute position randomization.

actual_choice = function(ASO, final_choice) {
  choice = 0
  for (i in 1:length(ASO)) {
    if (final_choice[i] == 0) {choice[i] = substr(ASO[i], 1,1)}
    if (final_choice[i] == 1) {choice[i] = substr(ASO[i], 3,3)}
    if (final_choice[i] == 2) {choice[i] = substr(ASO[i], 5,5)}
    if (final_choice[i] == 3) {choice[i] = substr(ASO[i], 7,7)}
    if (final_choice[i] == 4) {choice[i] = substr(ASO[i], 9,9)}
  }
  return(choice)
}

6 Load the data

This section loads the Limesurvey data and converts them to respondent, temporal and task specific data.

## load the limesurvey data
all_i <- read.csv("markdown_data.csv") ## load the limesurvey data

## recode prior to separating respondent, task, and time data
all_i = all_i %>% rename("id" = "ï..id", "dynata" = "panelid") ## rename id 
all_i <- cbind(Filter(function(x)!all(is.na(x)), 
               all_i %>% select(-contains(c("G11Q03")))),
               all_i %>% select( contains(c("G11Q03")))) ## remove empty columns
all_i$iminutes <- all_i$interviewtime/60 ## recode interview time in minutes

## respondent data
resp_i <- all_i %>% select(-contains(c("G05Q04", "G06", "G07Q02", "G08", 
                                       "G09Q02", "G10", "Time" )))  ## N by Z1 

## temporal data with id
time_i <- all_i %>% select(contains(c("id", "Time" )))  ## N by (1 + Z2)

## task data with id
task_i <- all_i %>% select(contains(c("id", "G05Q04", "G06", "G07Q02", 
                                      "G08", "G09Q02", "G10")))  ## N by (1+ Z3)
task_i <- task_i %>% select(-contains(c("Time" )))

7 Format the data

This section formats the respondent, temporal and task data into a long format and creates paired comparison and kaizen task datasets.

7.1 Temporal data in long format

time_it <- melt(time_i, id.vars = "id")  ## long format: id, variable, value 
names(time_it)[names(time_it) == 'variable'] <- 'page' ## rename variable to page
names(time_it)[names(time_it) == 'value'] <- 'ptime' ## rename value to ptime
time_it <- time_it[is.na(time_it$ptime)==0, ] ## drop missing values
time_it$page <- data.frame(do.call(rbind,strsplit(as.character(time_it$page),
                                          "T",2)))[,1] ## renames the variables

7.2 Paired comparison data in long format

pc_i <- subset(task_i, select = c(1:(1+T1) )) ## wide format: N by (1+T1)
pc_it <- melt(pc_i, id.vars = "id")  ## long format: id, variable, value   
names(pc_it)[names(pc_it) == 'variable'] <- 'task' ## rename variable to 'task'
names(pc_it)[names(pc_it) == 'value'] <- 'task_str' ## rename value to 'task_str'
pc_it <- pc_it[pc_it$task_str!="", ] ## drop missing values 
pc_it <- cbind(pc_it, data.frame(do.call(rbind,str_split(pc_it$task_str,"#")))) ## parse value
pc_itz <- melt(pc_it, 
          id.vars = c("id", "task", 
          "task_str"))  ## long format: 'id', 'task', 'task_str', variable, value   
pc_itz <- pc_itz[pc_itz$value!="", ] ## drop missing values 
pc_itz$variable <- data.frame(do.call(rbind,
                        strsplit(pc_itz$value,"[",2)))[,1] ## renames the variables
pc_itz$value <- data.frame(do.call(rbind,
                        strsplit(pc_itz$value,"[",2)))[,2] ## removes the left bracket in value
pc_itz$value <- data.frame(do.call(rbind,
                        strsplit(pc_itz$value,"]",2)))[,1] ## removes the right bracket in value
pc_it <- dcast(pc_itz, id+task+task_str~variable, 
              value.var = "value") ## long format: 'id', 'task', 'task_str', and parsed variables

7.3 Kaizen task data in long format

kz_i <- subset(task_i, select = c(1, (1+T1+1):(1+T1+T2)))  ## wide format: N by (1+T2)
kz_it <- melt(kz_i, id.vars = "id")  # long format (3 variables): id, variable, value  
names(kz_it)[names(kz_it) == 'variable'] <- 'task' ## rename variable to task
names(kz_it)[names(kz_it) == 'value'] <- 'task_str' ## rename value to 'task_str'
kz_it <- kz_it[kz_it$task_str!="", ] ## drop missing values
kz_it <- cbind(kz_it, data.frame(do.call(rbind,str_split(kz_it$task_str,"#")))) ## parse value
kz_itz <- melt(kz_it, 
          id.vars = c("id", "task", 
          "task_str"))  ## long format: 'id', 'task', 'task_str', variable, value   
kz_itz <- kz_itz[kz_itz$value!="", ] ## drop missing values 
kz_itz$variable <- data.frame(do.call(rbind,
                        strsplit(kz_itz$value,"[",2)))[,1] ## renames the variables
kz_itz$value <- data.frame(do.call(rbind,
                        strsplit(kz_itz$value,"[",2)))[,2] ## removes the left bracket in value
kz_itz$value <- data.frame(do.call(rbind,
                        strsplit(kz_itz$value,"]",2)))[,1] ## removes the right bracket in value
kz_it <- dcast(kz_itz, 
        id+task+task_str~variable) ## long format: 'id', 'task', 'task_str', and parsed variables

8 Recode the data

In this section we recode the data to create the variables defined above

8.1 Paired comparisons

# resp_i for sample selection
resp_i$incomplete = ifelse(resp_i$lastpage < 
                          max(resp_i$lastpage), 1, 0) ## indicator variable for incomplete survey
time_it <- merge(time_it, subset(resp_i, select = c(id, incomplete)), by = "id", all.x = T)
pc_it <- merge(pc_it, subset(resp_i, select = c(id, incomplete)), by = "id", all.x = T)
kz_it <- merge(kz_it, subset(resp_i, select = c(id, incomplete)), by = "id", all.x = T)


# pc_hs for component sequence, task sequence, alternatives
pc_hs <- data.frame(cbind(pc_it$id,pc_it$task,data.frame(do.call(rbind,
                str_split(pc_it$HS,"\\|"))))) ## parse value
pc_hs <- pc_hs %>% rename("HS1" = "X1", "HS2" = "X2") # rename variables
pc_hs <- data.frame(cbind(pc_hs, data.frame(do.call(rbind,
                str_split(pc_hs$HS1,"@"))))) ## parse value
pc_hs <- pc_hs %>% rename("HS1_1" = "X1", "HS1_2" = "X2") # rename variables
pc_hs <- data.frame(cbind(pc_hs, data.frame(do.call(rbind,
                str_split(pc_hs$HS1_1," "))[,c(1,4)]))) ## parse value
pc_hs <- pc_hs %>% rename("alt1"="X1", "tnum" = "X2") # rename variables
pc_hs <- data.frame(cbind(pc_hs, data.frame(do.call(rbind,
                str_split(pc_hs$HS2,"@"))))) ## parse value
pc_hs <- pc_hs %>% rename("alt2" = "X1", "HS2_2" = "X2") # rename variables
pc_hs <- subset(pc_hs, select = c(pc_it.id, pc_it.task, alt1, alt2, tnum))

# add pc_num component sequence to resp_i 
pc_cnum <- subset(pc_hs[pc_hs$pc_it.task == "G07Q02",], select = c(pc_it.id, tnum))
pc_cnum <- pc_cnum %>% rename("id" = "pc_it.id", "pc_cnum" = "tnum") # rename variables
resp_i <- merge(resp_i, pc_cnum, by="id", all.x=TRUE)

# pc_it task sequence and alternatives 
pc_hs$tnum = ifelse(pc_hs$pc_it.task == "G05Q04" | pc_hs$pc_it.task == "G07Q02", 
                    0, pc_hs$tnum)  ## recode the two warmup tnum to zero 
pc_hs <- pc_hs %>% rename("id" = "pc_it.id", "task" = "pc_it.task") # rename variables
pc_it <- merge(pc_it, pc_hs, by=c("id","task"), all.x=TRUE)

# create 'tnum2' for paired comparison and coma comparison
pc_it$E[pc_it$id==1632 & pc_it$task=="G05Q04"] <- -1  # error in E for 1632
pc_it$E[pc_it$id==1632 & pc_it$task=="G06Q03"] <- -2  # error in E for 1632
pc_it <- pc_it[with(pc_it, order(id, E)),]
pc_it <- pc_it %>% group_by(id) %>% mutate(tnum2=1:n())

# pc_it for final_choice and actual_choice
pc_it$final_choice = as.numeric(substr(pc_it$U, nchar(pc_it$U), nchar(pc_it$U)))
pc_it$actual_choice = ifelse(pc_it$ODO == "1|0", 1-pc_it$final_choice, pc_it$final_choice)

# remove S and O from pc_it (always undefined and zero, respectively)
pc_it <- subset(pc_it, select = -c(S,O))

8.2 Kaizen tasks

# kz_hs for component sequence, task sequence, actual choice, alternatives
kz_hs <- data.frame(cbind(kz_it$id, kz_it$task, data.frame(do.call(rbind,
                  str_split(kz_it$HS,"\\|"))))) ## parse value
kz_hs <- kz_hs %>% rename("HS1" = "X1", "HS2" = "X2") # rename variables
kz_hs <- data.frame(cbind(kz_hs, data.frame(do.call(rbind,
                  str_split(kz_hs$HS1,"@"))))) ## parse value
kz_hs <- kz_hs %>% rename("HS1_1" = "X1", "HS1_2" = "X2") # rename variables
kz_hs <- data.frame(cbind(kz_hs, data.frame(do.call(rbind,
                  str_split(kz_hs$HS1_1," "))[,c(1,4)]))) ## parse value
kz_hs <- kz_hs %>% rename("alt1"="X1", "tnum" = "X2") # rename variables
kz_hs <- data.frame(cbind(kz_hs, data.frame(do.call(rbind,
                  str_split(kz_hs$HS2,"@"))))) ## parse value
kz_hs <- kz_hs %>% rename("alt2" = "X1", "HS2_2" = "X2") # rename variables
kz_hs <- subset(kz_hs, select = c(kz_it.id, kz_it.task, alt1, alt2, tnum))

# add kz_num component sequence to resp_i 
kz_cnum <- subset(kz_hs[kz_hs$kz_it.task == "G09Q02",], select = c(kz_it.id, tnum))
kz_cnum <- kz_cnum %>% rename("id" = "kz_it.id", "kz_cnum" = "tnum") # rename variables
resp_i <- merge(resp_i, kz_cnum, by="id", all.x=TRUE)

# recode missing values of pc_cnum and kz_cnum
resp_i$pc_cnum[is.na(resp_i$pc_cnum) & 
                 (resp_i$kz_cnum==1)] <- 2 ## drop out prior to paired comparisons
resp_i$kz_cnum[is.na(resp_i$kz_cnum) & (resp_i$pc_cnum==1)] <- 2 ## drop out prior to kaizen tasks

# kz_it task sequence and alternatives
kz_hs$tnum = ifelse(kz_hs$kz_it.task == "G09Q02" , 
                    0, kz_hs$tnum)  ## recode the two warmup tnum to zero 
kz_hs <- kz_hs %>% rename("id" = "kz_it.id", "task" = "kz_it.task") ## rename variables
kz_it <- merge(kz_it, kz_hs, by=c("id","task"), all.x=TRUE)

# create 'tnum2' for kaizen tasks
kz_it <- kz_it[with(kz_it, order(id, E)),]
kz_it <- kz_it %>% group_by(id) %>% mutate(tnum2=1:n())

# kz_it for actual choice
kz_it$final_choice = substr(kz_it$U, nchar(kz_it$U) - 4, nchar(kz_it$U))
kz_it$actual_choice = paste(actual_choice(kz_it$ASO, 
    substr(kz_it$final_choice,1,1)), "|" , actual_choice(kz_it$ASO, 
    substr(kz_it$final_choice,3,3)), "|" , actual_choice(kz_it$ASO, 
    substr(kz_it$final_choice,5,5)), sep = "")
kz_it = kz_it %>% relocate(c(final_choice, actual_choice), .after = ASO)

# finding holdout location for actual choice
kz_it$holdout_actual = apply(as.matrix(data.frame(do.call(rbind,
    strsplit(kz_it$alt1,"",5)))==data.frame(do.call(rbind,strsplit(kz_it$alt2,
    "",5))))*1, 1, function(x) which(x==1)) - 1

# incorporating the fourth improvement to actual choices
kz_it$actual_choice2 = paste(kz_it$actual_choice, "|", 
  ifelse(kz_it$holdout_actual==t(apply(data.frame(do.call(rbind,
  strsplit(gsub("\\D", "", kz_it$actual_choice),"",3))), 1, function(x) 
  setdiff(as.matrix(c(0:4), nrow = 1), x)))[,1],t(apply(data.frame(do.call(rbind,
  strsplit(gsub("\\D", "", kz_it$actual_choice),"",3))), 1, function(x) 
  setdiff(as.matrix(c(0:4), nrow = 1), x)))[,2],t(apply(data.frame(do.call(rbind,
  strsplit(gsub("\\D", "", kz_it$actual_choice),"",3))), 1, function(x) 
  setdiff(as.matrix(c(0:4), nrow = 1), x)))[,1]), sep = "")

# incorporating the fourth improvement to final choices
kz_it$final_choice2 = paste(kz_it$final_choice, "|", 
  apply(data.frame(data.frame(do.call(rbind,strsplit(gsub("\\D", "", 
  kz_it$ASO),"",5)))==as.matrix(replicate(5,substr(kz_it$actual_choice2,
  nchar(kz_it$actual_choice2),nchar(kz_it$actual_choice2))),
  nrow=length(substr(kz_it$actual_choice2,nchar(kz_it$actual_choice2),
  nchar(kz_it$actual_choice2)))))*1,1,function(x) which(x==1))-1, sep = "")
kz_it$holdout_final = apply(data.frame(do.call(rbind,strsplit(gsub("\\D", "", 
  as.matrix(kz_it$final_choice2)),"",4))),1,function(x) setdiff(c(0:4),x))
kz_it$actual_choice = kz_it$actual_choice2 # replace columns
kz_it$final_choice = kz_it$final_choice2 # replace columns
kz_it = subset(kz_it,select = -c(actual_choice2, final_choice2)) # drop columns

# relocate holdout after actual choice
kz_it = kz_it %>% relocate(c(holdout_actual, holdout_final), 
                           .after = actual_choice)

# remove O from pc_it (always zero)
kz_it <- subset(kz_it, select = -c(O))

# create 'pc_cnum2' and 'kz_cnum2' using E
warmup <- rbind(subset(pc_it,select=c("id", "task", "E")),
                subset(kz_it,select=c("id", "task", "E"))) ## combine tasks 
warmup = warmup[warmup$task=="G07Q02" |warmup$task=="G09Q02", ] ## only warmups 
warmup <- warmup[with(warmup, order(id, E)),] ## sort by id and E
warmup <- warmup %>% group_by(id) %>% mutate(cnum=1:n()) # create cnum
warmup$cnum[warmup$cnum>0] <- warmup$cnum+1
warmup <- dcast(warmup, id~task, value.var = "cnum")
warmup[is.na(warmup)] <- 3
warmup <- warmup %>% rename("pc_cnum2" = "G07Q02", "kz_cnum2" = "G09Q02")
resp_i <- merge(resp_i, warmup, by = "id", all.x=TRUE)

9 Save the data

## Respondents (N by questions, regardless of completion)
write.csv(resp_i, file = "resp_i_231010.csv", row.names = F) 

## Page times (long format, N*pages, only completed pages)
write.csv(time_it, file = "time_it_231010.csv", row.names = F)  

## Paired comparisons (long format, N*tasks, only completed tasks)
write.csv(pc_it, file = "pc_it_231010.csv", row.names = F) 

## Kaizen tasks (long format, N*tasks, only completed tasks)
write.csv(kz_it, file = "kz_it_231010.csv", row.names = F)

Format Database/Load DCE Survey Data in R, Version 1.2

Maksat Jumamyradov, maksat@usf.edu; Benjamin M. Craig, bcraig@usf.edu

2023-12-24