/**
* jQuery Plugin: Sticky Tabs
*
* @author Aidan Lister
// Set the correct tab when the page loads showStuffFromHash(context);
// Set the correct tab when a user uses their back/forward button $(window).on('hashchange', function() { showStuffFromHash(context); });
// Change the URL when tabs are clicked $('a', context).on('click', function(e) { history.pushState(null, null, this.href); showStuffFromHash(context); });
return this; }; }(jQuery));
window.buildTabsets = function(tocID) {
// build a tabset from a section div with the .tabset class function buildTabset(tabset) {
// check for fade and pills options var fade = tabset.hasClass("tabset-fade"); var pills = tabset.hasClass("tabset-pills"); var navClass = pills ? "nav-pills" : "nav-tabs";
// determine the heading level of the tabset and tabs var match = tabset.attr('class').match(/level(\d) /); if (match === null) return; var tabsetLevel = Number(match[1]); var tabLevel = tabsetLevel + 1;
// find all subheadings immediately below var tabs = tabset.find("div.section.level" + tabLevel); if (!tabs.length) return;
// create tablist and tab-content elements var tabList = $('
'); $(tabs[0]).before(tabList); var tabContent = $('
'); $(tabs[0]).before(tabContent);
// build the tabset var activeTab = 0; tabs.each(function(i) {
// get the tab div var tab = $(tabs[i]);
// get the id then sanitize it for use with bootstrap tabs var id = tab.attr('id');
// see if this is marked as the active tab if (tab.hasClass('active')) activeTab = i;
// remove any table of contents entries associated with // this ID (since we'll be removing the heading element) $("div#" + tocID + " li a[href='#" + id + "']").parent().remove();
// sanitize the id for use with bootstrap tabs id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); tab.attr('id', id);
// get the heading element within it, grab it's text, then remove it var heading = tab.find('h' + tabLevel + ':first'); var headingText = heading.html(); heading.remove();
// build and append the tab list item var a = $('' + headingText + ''); a.attr('href', '#' + id); a.attr('aria-controls', id); var li = $('
'); li.append(a); tabList.append(li);
// set it's attributes tab.attr('role', 'tabpanel'); tab.addClass('tab-pane'); tab.addClass('tabbed-pane'); if (fade) tab.addClass('fade');
// move it into the tab content div tab.detach().appendTo(tabContent); });
// set active tab $(tabList.children('li')[activeTab]).addClass('active'); var active = $(tabContent.children('div.section')[activeTab]); active.addClass('active'); if (fade) active.addClass('in');
if (tabset.hasClass("tabset-sticky")) tabset.rmarkdownStickyTabs(); }
// convert section divs with the .tabset class to tabsets var tabsets = $("div.section.tabset"); tabsets.each(function(i) { buildTabset($(tabsets[i])); }); };
In this entry, through the analysis of a data set, we examine how to
make histograms, tables, transitive graphs, and scatter plots and the
way to calculate medians and interquartile ranges (IQR).
In health preference research (HPR), it is naturally difficult to
determine patients’ real necessities. This is because it is often the
case that even patients themselves are not necessarily able to tell
specifically what medical procedure they hope to receive, facing complex
choices and variable expected consequences. A discrete choice experiment
(DCE), which presents multiple health descriptions and infers the
patients’ weighting factors from the responses, is one of the effective
ways to address this problem.
In this entry, before entering the details of DCEs, we are going to
analyze the data set accessorily generated in the process of DCE; our
focus here is the time spent on choosing a choice for each question,
which is called the first response time. Respondents may change their
answer to the other before proceeding to the next question, but the
first response time only measures the time from starting the question to
selecting the first choice. In the DCE held in 2016, 4088 US
participants responded to 20 questions regarding hypothetical health
conditions. The data set that we are going to analyze contains each
individual’s first response time for 20 questions.
library(knitr) #this is for the function "kable," which makes well-organized tables.
library(tidyverse) #this is for the function "read_csv."
library(tinytex)
library(gt)
data1 <- read_csv("C:\\Users\\aaa\\OneDrive\\USF\\Dr. Craig\\resp1wave1_220723.csv")
Please replace the inside of quotation marks with the address of the
file location.
Let \(x_i\) be the value of each
individual observation (\(1 \le i \le
n\)) \[n = \sum_{k=1}^{K}m_k\]
where
\[\begin{aligned}
n &= \text{the total number of observations } \\
m_k &= \text{the number of}\ x_i\ \text{within the}\ k\ \text{th
bin}\ (1\le k \le K)\\
&= \text{the number of}\ x_i\ s.t.\ r_{k-1}\le x_i < r_k\\
&= \text{frequency}\\
r_j &= \text{the upper boundary of the}\ j\ \text{th bin}\
(min(k)-1\le j \le max(k),\ \text{that is,}\ 0\le j \le K)\\
&= \text{the lower boundary of the}\ j+1\ \text{th bin}\\
r_j-r_{j-1} &= \text{the width of each bin}\\
&=\frac{max(r_j)-min(r_j)}{K}=\frac{r_{K}-r_0}{K}
\end{aligned}\]
It is necessary to decide the range, \([min(r_j), max(r_j)) = [r_0, r_K)\), such
that \(r_0=min(r_j) \le min(x_i)\ \wedge\
max(x_i) \le r_{K+1}=max(r_j)\), to create an understandable
histogram.
Now, let \(K=50\) (breaks = 50) and
\(x_i = data1\$time_i\), which is the
first response time (\(n=4088\)). We
get the following histogram.
hist(data1$time,
breaks = 50,
main = "Histogram 3.2: Histogram of the first response time",
xlab = "the first response time (milliseconds)",
col = "lightgreen")
Now, let \(K=50\) (breaks = 50) and
\(x_i = log(data1\$time_i)\) (\(n=4088\)) instead of \(x_i = data1\$time_i\) . we get the
following histogram.
hist(log(data1$time),
breaks = 50,
main = "Histogram 3.3: Log of the first response time",
xlab = "Log of the first response time",
col = "lightblue")
Comparing the two histograms above, we observe that by taking a
logarithmic scale of the variable, the first response time
(milliseconds), the originally right-skewed histogram, Histogram 3.1,
was converted into the more bellshaped histogram, Histogram 3.2.
Let \(n = \text{the total number of
observations }\).
Assume we have data set, \(x[1]\le x[2]\le
...\le x[n]\).
When \(n\) is even, that is, \(\exists\ a \in Z \ s.t.\ n = 2a\), \(median =
\frac{1}{2}(x[\frac{n}{2}]+x[\frac{n}{2}+1])\).
When \(n\) is odd, that is, \(\exists\ b \in Z \ s.t.\ n = 2b+1\), \(median = x[\frac{n+1}{2}]\).
\(\forall n\ in\ N, \exists\ k \in N\ s.t\
\)
\[ n =
\begin{cases}
{4k}\\
{4k-1}\\
{4k+1}\\
{4k+2}
\end{cases}
\]
\[\begin{aligned}
\text{(i)}\ n=4k \ \ \ \\
Q1&=\frac{3}{4}x[k]+\frac{1}{4}x[k+1]\\
Q3&=\frac{1}{4}x[3k]+\frac{3}{4}x[3k+1]\\
IQR&=Q3-Q1\\
&=(\frac{1}{4}x[3k]+\frac{3}{4}x[3k+1])-(\frac{3}{4}x[k]+\frac{1}{4}x[k+1])
\\
\text{(ii)}\ n=4k-1\\
Q1&=\frac{1}{2}x[k]+\frac{1}{2}x[k+1] \\
Q3&=\frac{1}{2}x[3k+1]+\frac{1}{2}x[3k+2] \\
IQR&=Q3-Q1\\
&=(\frac{1}{2}x[3k+1]+\frac{1}{2}x[3k+2])-(\frac{1}{2}x[k]+\frac{1}{2}x[k+1])
\\
\text{(iii)}\ n=4k+1 \\
Q1&=x[k] \\
Q3&=x[3k] \\
IQR&=Q3-Q1\\
&=x[3k]-x[k] \\
\text{(iv)}\ n=4k+2 \\
Q1&=\frac{1}{4}x[k]+\frac{3}{4}x[k+1] \\
Q3&=\frac{3}{4}x[3k+2]+\frac{1}{4}x[3k+2] \\
IQR&=Q3-Q1\\
&=(\frac{3}{4}x[3k+2]+\frac{1}{4}x[3k+2])-(\frac{1}{4}x[k]+\frac{3}{4}x[k+1])
\\
\end{aligned}\]
Let’s suppose we hope to calculate the median and interquartile range
(IQR) for Question 1. Then, we firstly need to gather the individual
values of the first response time for Question 1 from the data set and
secondly, calculate the median and IQR for that question. When we want
to gain the median and IQR for each question, we are going to simply
repeat that process 20 times since there are 20 questions in the
experiment. Here we use function “quantile” to calculate IQR.
m2 =NULL
ir2 = NULL
iqr2 = NULL
for(i in 1:20) {d1 <- data1[data1$task == i, "time"]
m0 = median(d1$time)
m2 = rbind(m2, m0)
ir0 = quantile(d1$time)
ir2 = rbind(ir2, ir0)
iqr0 = IQR(d1$time)
iqr2 = rbind(iqr2, iqr0)
table1 = cbind(m2,iqr2, ir2)
}
rownames(table1) = paste0("Question ", 1:20)
colnames(table1) <- c("Median", "IQR", "0% percentile", "25% percentile",
"50% percentile", "75% percentile ","100% percentile")
table4.2 <- data.frame(cbind(c(1:20),table1))
colnames(table4.2) <- c("n th Question", "Median", "IQR", "0% percentile", "25% percentile", "50% percentile", "75% percentile ","100% percentile")
kable(table4.2, align = "c", caption = "**Table 4.2: Median and IQR**", row.names = FALSE, escape = FALSE, centering = T)
| n th Question | Median | IQR | 0% percentile | 25% percentile | 50% percentile | 75% percentile | 100% percentile |
|---|---|---|---|---|---|---|---|
| 1 | 29645.0 | 23640.25 | 2023 | 20152.00 | 29645.0 | 43792.25 | 3e+05 |
| 2 | 20553.0 | 18588.75 | 2050 | 12991.25 | 20553.0 | 31580.00 | 3e+05 |
| 3 | 19209.5 | 17642.50 | 1475 | 11748.75 | 19209.5 | 29391.25 | 3e+05 |
| 4 | 17394.5 | 16838.00 | 1590 | 10983.00 | 17394.5 | 27821.00 | 3e+05 |
| 5 | 16897.5 | 16081.25 | 1998 | 10359.75 | 16897.5 | 26441.00 | 3e+05 |
| 6 | 16023.5 | 15845.00 | 1732 | 9773.25 | 16023.5 | 25618.25 | 3e+05 |
| 7 | 15416.5 | 14887.75 | 1787 | 9423.50 | 15416.5 | 24311.25 | 3e+05 |
| 8 | 15081.0 | 15779.50 | 1700 | 9047.00 | 15081.0 | 24826.50 | 3e+05 |
| 9 | 14317.0 | 14732.75 | 1786 | 8566.75 | 14317.0 | 23299.50 | 3e+05 |
| 10 | 14210.5 | 14608.50 | 1669 | 8305.75 | 14210.5 | 22914.25 | 3e+05 |
| 11 | 13901.0 | 13757.50 | 1723 | 8545.75 | 13901.0 | 22303.25 | 3e+05 |
| 12 | 9884.0 | 10562.50 | 1938 | 5845.75 | 9884.0 | 16408.25 | 3e+05 |
| 13 | 8982.0 | 9519.50 | 1818 | 5109.25 | 8982.0 | 14628.75 | 3e+05 |
| 14 | 8580.0 | 9704.75 | 1944 | 4813.25 | 8580.0 | 14518.00 | 3e+05 |
| 15 | 8158.5 | 9037.50 | 1829 | 4461.75 | 8158.5 | 13499.25 | 3e+05 |
| 16 | 7905.5 | 8648.75 | 1801 | 4377.00 | 7905.5 | 13025.75 | 3e+05 |
| 17 | 7386.0 | 8255.75 | 1847 | 4082.25 | 7386.0 | 12338.00 | 3e+05 |
| 18 | 7250.5 | 8094.75 | 1538 | 4111.00 | 7250.5 | 12205.75 | 3e+05 |
| 19 | 7133.5 | 7991.25 | 1290 | 4044.25 | 7133.5 | 12035.50 | 3e+05 |
| 20 | 6908.5 | 7638.25 | 1731 | 3905.50 | 6908.5 | 11543.75 | 3e+05 |
Now, we try to plot the results of the table above and connect plots
to reveal the transition of median and IQR as the number of questions
increases. The black points represent the median and red circles denote
the 25% percentile and the 75% percentile for each question.
We prepare a plain where x-axis is the number of question (Question
in Table 4.2) and y-axis is the first response time
(milliseconds).
To plot medians, \((x,y)=(n\ th\ Question,
Median)\)
To plot 25% percentile, let \((x,y)=(n\ th\
Question,25\%\ percentile)\) To plot 75% percentile, let \((x,y)=(n\ th\ Question,75\%\
percentile)\)
xmax <- 20
xmin <- 1
ymax <- 60000
ymin <- 0
plot(table4.2$'n th Question', table4.2$Median, bty = "l", pch = 16, type ="o",
xlim = c(xmin, xmax), ylim = c(ymin, ymax),
xlab = NA, ylab =NA, )
par(new=T)
plot(table4.2$'n th Question', table4.2$`25% percentile`, bty = "l", pch = 1, col = "red", type ="o",
xlim = c(xmin, xmax), ylim = c(ymin, ymax), xlab = NA, ylab =NA,)
par(new=T)
plot(table4.2$'n th Question', table4.2$`75% percentile`, bty = "l", pch = 1, col = "red", type ="o",
xlim = c(xmin, xmax), ylim = c(ymin, ymax),
xlab = "n th Question", ylab ="The first response time (milliseconds)",
main = "Graph 5: The median and interquartile range of the first response time")
In Graph 5, we can observe the tendency where the median and IQR
decline as the experiment proceeds and the number of questions
increases. Now, then, we focus on each individual. Since each
participant answered 20 questions, let us classify the questions into
two groups: the first ten questions and the second ten questions.
Focusing on a specific individual, We are able to calculate his or her
medians of the first response time for the first group of questions as
well as the second group of questions. Let us call them median1 and
median2. Preparing the plane with median1 in the x-axis and median2 in
the y-axis, we can plot that individual in the plane. We repeat that
process for all participants, and at the end, we can create the scatter
plot with all individuals (Graph 6).
Again, we set \((x,y)=(Median1,Median2)\), where
Median1: Median of the first response time for the first ten
questions
Median2: Median of the first response time for the second ten
questions
inv2 = NULL
for (i in 1:max(data1$survey_id)) {
ften <- subset(data1, subset = survey_id == i & task <10.5)
sten <- subset(data1, subset = survey_id == i & task >10.5)
inv0 = c(median(ften$time), median(sten$time))
inv2 = rbind(inv2, inv0)
}
table3 <- data.frame(cbind(c(1:max(data1$survey_id)),inv2))
colnames(table3) <- c("ID", "First_ten", "Second_ten" )
rownames(table3) = paste0("ID ", 1:max(data1$survey_id))
plot(table3$First_ten, table3$Second_ten, bty = "l", pch = 1, cex = 0.5,
xlab = "Median1",
ylab ="Median2",
main ="Graph 6: Scatter plots of the first median and second median"
)
par(new=T)
x1 <- seq(0:160000)
y1 <- x1
plot(x=x1, y=y1, type ="l", xlab = NA, ylab =NA, xaxt="n", yaxt="n",
bty ="n", col = "red")
The red line is the line where Median1 = Median2. We can observe that
majority of plots are under the red line, and thus, for the majority of
individuals, Median1 < Median2. This indicates that even when
focusing on individual respondents, we can assume the first response
time declines as the number of questions becomes larger.
Jakubczyk, M., Craig, B. M., Barra, M., Groothuis-Oudshoorn, C. G.
M., Hartman, J. D., Huynh, E., Ramos-Goñi, J. M., Stolk, E. A., &
Rand, K. (2017). Choice defines value: A predictive modeling competition
in Health Preference Research. Value in Health, 21(2), 229–238.
https://doi.org/10.1016/j.jval.2017.09.016
Okubo, S. (yyyy, month dd). Analysis of initial response time across
20 pair comparisons. R4HPR. https://r4hpr.org/visor/?e=analysis-of-initial-response-time-across-20-pair-comparisons
// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.odd').parent('tbody').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });