In this entry, we will lead you through the steps to perform a simple analysis of a DCE with a conditional logit model. For more information on different models, choices you have to made in the analysis you can consult the following article: http://www.valueinhealthjournal.com/article/S1098-3015(16)30291-1/abstract. Finally we will reproduce all results from Table 3 in this article.
We start first with importing and wrangling of the data. The dataset dce_data.csv contains the answers on a discrete choice experiment with 27 questions and 1000 respondents. The dataset Design_broad.csv contains the design of the DCE.
First we load both datasets with read_csv() from the library readr.
library(readr)
dce_data <- read_csv("data/dce_data.csv")
broad_design <- read_csv("data/Design_broad.csv")
With the function head you can check the first lines of both datasets:
head(dce_data)
## # A tibble: 6 x 28
## id q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0 0 1 0 0 1 0 0 1 0 0
## 2 2 1 1 1 1 1 0 0 1 0 1 1 1
## 3 3 1 1 1 1 1 0 0 0 0 1 1 1
## 4 4 1 0 1 0 1 1 0 0 0 1 1 1
## 5 5 0 1 0 0 1 1 0 1 0 1 1 0
## 6 6 1 1 0 1 1 0 1 0 0 1 1 1
## # … with 15 more variables: q13 <dbl>, q14 <dbl>, q15 <dbl>, q16 <dbl>,
## # q17 <dbl>, q18 <dbl>, q19 <dbl>, q20 <dbl>, q21 <dbl>, q22 <dbl>,
## # q23 <dbl>, q24 <dbl>, q25 <dbl>, q26 <dbl>, q27 <dbl>
head(broad_design)
## # A tibble: 6 x 6
## Eff1 Side1 Mode1 Eff2 Side2 Mode2
## <dbl> <chr> <chr> <dbl> <chr> <chr>
## 1 10 Mild Tablet 5 Moderate Injection
## 2 5 Mild Tablet 3 Moderate Injection
## 3 3 Mild Tablet 10 Moderate Injection
## 4 10 Moderate Tablet 5 Severe Injection
## 5 5 Moderate Tablet 3 Severe Injection
## 6 3 Moderate Tablet 10 Severe Injection
The object dce_data contains 1000 rows and 28 columns, the first column named id is the identifier for the respondents and the remaining columns contain the answers to the 27 DCE questions (paired comparison).
The object broad_design has 27 rows and 6 columns with the design of the DCE. The dataset doesn’t contain an identifier for the choiceset, but the rows are in the same order as the answer to the questions in the other dataset. The attributes of a choiceset, containing two profiles are represented in the columns, next to each other. In this example there are three attributes: Effectivess (Eff for left profile and Eff2 for the right profile, levels 3, 5 and 10), Mode of administration (Mode1 and Mode2, levels Infusion, Injection and Tablet) and Side effects (Side1 and Side2, levels Mild, Moderate and Severe)..
To analyse the data we need a so-called long format dataset where each profile has a seperate row and the design is attached to the answers. To generate this dataset we first transform dce_data into a dataset where each question has a row with the function pivot_longer(). With the attribute cols you can indicate which columns to include (in this case q1:q27). With the attribute names_to = "choiceset" you can indicate that the name of the question will be stored in the column choiceset. With names_prefix = "q" it will remove the q from the name of the question (so only a number is left) and with values_to = "choice" you indicate that the values should be stored in the column choice.
library(tidyverse)
dce_long <- pivot_longer(dce_data, cols = q1:q27,
names_to = "choiceset",
names_prefix = "q",
values_to = "choice"
)
head(dce_long)
## # A tibble: 6 x 3
## id choiceset choice
## <dbl> <chr> <dbl>
## 1 1 1 1
## 2 1 2 0
## 3 1 3 0
## 4 1 4 1
## 5 1 5 0
## 6 1 6 0
The column choiceset is still a character vector, so we change it into an integer vector with the function as.integer() and mutate.
dce_long <- dce_long %>%
mutate(choiceset = as.numeric(choiceset))
In the design file the attributes of the two profiles per question are in seperate columns next to each other. First we add a column choiceset with the rownumber with the function mutate and row_number. Then, we transform the design file into a file where the profiles are below each other with pivot_longer(), using the following attributes: cols = Eff1:Mode2 to indicate which columns to transform, names_to = c(".value","profile") to indicate how the columns should be names in the transformed table and in the new column profile the number of the profile will be stored, names_pattern = "(.*)(.)" indicates the way how the names of the columns are in the original dataset (this is a so-called regular expression). Next we transform the column profile from character to numeric.
temp <- broad_design %>%
mutate(choiceset = row_number()) %>%
pivot_longer(cols = Eff1:Mode2,
names_to = c(".value","profile"),
names_pattern = "(.*)(.)") %>%
mutate(profile = as.numeric(profile))
Now we can join the data of the questionnaire and the design into one dataset with the function full_join(), where we join based on the column choiceset:
data <- dce_long %>% full_join(temp, by = "choiceset")
The column choice contains the answer whether the left (first) profile was chosen over the second. Therefore we change the value for the second profile such that it is 0 if choice is 1 and 1 if choice is zero.
data <- mutate(data, choice = if_else(profile == 2, (choice+1)%%2, choice))
## or alternatively:
#data <- mutate(data, choice =
# if_else((profile == 2 & choice == 0), 1,
# if_else((profile == 2 & choice == 1), 0, choice)))
If you want to save this long dataset as csv-file you can use write_csv.
Categorical attributes can be included in the regression model after coding the levels into for example dummy or effects coding. The dummy variables will be created automatically when estimating the model if those variables are changed into so-called factor variables. By default the contrasts of these factor variables are contr.treatment which is dummy coding. With the function constrast() you can check the contrast matrix.
In the following code we change the columns Eff, Side and Mode into factor variables with mutate and factor and check the contrastmatrix of the three attributes:
data <- data %>%
mutate(Eff = factor(Eff),
Side = factor(Side),
Mode = factor(Mode))
contrasts(data$Eff)
## 5 10
## 3 0 0
## 5 1 0
## 10 0 1
contrasts(data$Side)
## Moderate Severe
## Mild 0 0
## Moderate 1 0
## Severe 0 1
contrasts(data$Mode)
## Injection Tablet
## Infusion 0 0
## Injection 1 0
## Tablet 0 1
You can see that for Eff level 3 is the reference level, for Mode it is Infusion and for Side it is Mild. By default the reference levels are the alphabetically first levels in the categorical variable levels.
Suppose you want to use the level Tablet as reference level for Mode instead of Infusion. You can explicitly set the order by adding as attribute to the factor function the levels of a variable in your preferred order: levels = c("Tablet", "Injection", "Infusion"). In case of dummy coding the reference level is the first of the levels.
data <- data %>%
mutate(Side = factor(Side, levels = c("Mild", "Moderate", "Severe")),
Eff = factor(Eff, levels = c(10,5,3)),
Mode = factor(Mode, levels = c("Tablet","Injection","Infusion")))
contrasts(data$Mode)
## Injection Infusion
## Tablet 0 0
## Injection 1 0
## Infusion 0 1
To change the dummy coding into effects coding you can define this globally (for all categorical variables) with options(contrasts = c("contr.sum","contr.poly")). With effects coding the last level is the reference level. To change again to dummy coding globally you can use options(contrasts = c("contr.treatment","contr.poly"))).
You can also change the contrasts for each categorical variable separately with the function contr.treatment (dummy coding) or cont.sum (effects coding). With the attribute base you indicate which level the reference level is (only for dummy coding). We will set the reference levels the same as in the article.
contrasts(data$Eff) <- contr.treatment(3, base = 3)
contrasts(data$Side) <- contr.treatment(3, base = 3)
contrasts(data$Mode) <- contr.treatment(3, base = 3)
contrasts(data$Mode)
## 1 2
## Tablet 1 0
## Injection 0 1
## Infusion 0 0
Finally, to define manually the contrasts you can also explicitly overwrite the contrast matrix with the following code.
my.contrasts <- matrix(c(-1,1,0,-1,0,1),ncol = 2)
contrasts(data$Eff) <- my.contrasts
contrasts(data$Eff)
Now we are ready to estimate the conditional logit model. For this we use the package mlogit. In order to apply the function mlogit you need first to turn your data object into a specific data frame with three indexes representing the respondents (id), the profiles (profile) and choiceset (csetid). The index csetid is a number of the choicesets different for each respondent, and this index needs to be added to the data.
library(mlogit)
temp <- data %>%
mutate(csetid = ((row_number()+1)%/%2)) %>%
dfidx(choice = "choice",
idx = list(c("csetid","id"),"profile"),
idnames = c("csetid", "profile"))
The conditional logit model can be estimated now as follows (the -1 indicates that no intercept is estimated):
fit1 <- mlogit(choice ~ Eff + Side + Mode |-1, data = temp)
summary(fit1) ## to display the summary of the regression analysis
##
## Call:
## mlogit(formula = choice ~ Eff + Side + Mode | -1, data = temp,
## method = "nr")
##
## Frequencies of alternatives:choice
## 1 2
## 0.50289 0.49711
##
## nr method
## 4 iterations, 0h:0m:1s
## g'(-H)^-1g = 0.00401
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## Eff1 0.545827 0.018286 29.850 < 2.2e-16 ***
## Eff2 0.307805 0.018078 17.026 < 2.2e-16 ***
## Side1 0.656178 0.018375 35.711 < 2.2e-16 ***
## Side2 0.359332 0.018047 19.911 < 2.2e-16 ***
## Mode1 0.236449 0.018078 13.080 < 2.2e-16 ***
## Mode2 0.395129 0.018234 21.670 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -17388
The AIC of this model equals 3.4787138^{4} (estimated with AIC(fit1)).
Finally we reproduce the analysis with effects coding (with the same reference levels as in dummy coding, namely respectively 3 for Eff, Severe for Side and Infusion for Mode:
contrasts(temp$Eff) <- contr.sum(3)
contrasts(temp$Side) <- contr.sum(3)
contrasts(temp$Mode) <- contr.sum(3)
fit2 <- mlogit(choice ~ Eff + Side + Mode |-1, data = temp)
summary(fit2) ## to display the summary of the regression analysis
##
## Call:
## mlogit(formula = choice ~ Eff + Side + Mode | -1, data = temp,
## method = "nr")
##
## Frequencies of alternatives:choice
## 1 2
## 0.50289 0.49711
##
## nr method
## 4 iterations, 0h:0m:1s
## g'(-H)^-1g = 0.00401
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## Eff1 0.261283 0.010499 24.8864 < 2e-16 ***
## Eff2 0.023261 0.010378 2.2413 0.02501 *
## Side1 0.317675 0.010527 30.1775 < 2e-16 ***
## Side2 0.020828 0.010336 2.0152 0.04389 *
## Mode1 0.025923 0.010394 2.4940 0.01263 *
## Mode2 0.184603 0.010485 17.6068 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -17388