Study design
Data were from an open-label, multicenter, randomized, crossover study (ClinicalTrials.gov identifier: NCT03724981) [9, 10] assessing patient preference for the dulaglutide single-use pen [11] and the semaglutide single-patient-use pen among injection-naïve patients with T2D [12]. The devices used in the study were those commercially available in the United States. The study design is illustrated in Fig. 1. Study participants were recruited at 13 clinical sites across the US, including nine general practice clinics and four endocrinology clinics. After providing consent to participate in the study, participants were randomly assigned to one of the two device orders (i.e., either dulaglutide or semaglutide first, followed by the other device). After being trained to use each device based on device instructions for use (IFU), participants performed all steps of injection preparation and administered mock injections into an injection pad. Further details of the study design, inclusion/exclusion criteria, and methods have been published previously [9].
Measures
After completing training and performing mock injections with both devices, participants completed the measures described below. Both questionnaires were administered on paper forms and used the brand names (Trulicity for dulaglutide; Ozempic for semaglutide). The questionnaires included color images of the injection devices at the top of the page to avoid any confusion regarding which device corresponded to each question and response option.
Global preference item
The global preference item evaluated patient preference between the devices. The item asked “Overall, which device do you prefer?” Response options were Ozempic, Trulicity, or No Preference. All participants completed the global preference item before completing the DID-PQ.
Diabetes Injection Device Preference Questionnaire (DID-PQ)
The DID-PQ was designed to assess patient preferences between two non-insulin injection devices [6, 7]. The 10 questionnaire items were developed based on qualitative research with patients. Items 1 to 7 focus on preference related to specific characteristics of injection delivery systems. Items 8 to 10 are global items assessing preference based on overall satisfaction, ease of use, and convenience of the injection devices. Each item is rated on a five-point scale allowing respondents to indicate whether they prefer or strongly prefer one of the devices over the other. For each item, participants could also select the “no preference” response. As the five response options are categorical, mean scores are not calculated.
Statistical analysis
Analyses were performed using data from participants who had (1) been randomized to a device order, (2) been exposed to both devices regardless of whether they successfully completed the mock injection, and (3) completed the global preference item. No imputations were performed for missing data. All statistical tests were two-sided with a significance level of 5%. Descriptive statistics (mean, standard deviation, range, and frequency) were used to summarize demographic and clinical characteristics, as well as responses to questionnaires.
The categorical response options of the DID-PQ cannot be treated as continuous scores. Therefore, correlations with a criterion measure that would typically be conducted to examine construct validity of PRO instruments cannot be used. Instead, the 10 DID-PQ items were compared to the global preference item using categorical analyses so that concordance between the two instruments could be assessed. For these analyses, the five DID-PQ response options were collapsed into three categories by combining the “prefer” and “strongly prefer” response options. Thus, the DID-PQ and global preference items had the same three levels of response: prefer dulaglutide device, prefer semaglutide device, and no preference between devices.
These three-level responses were compared to responses on the global preference item in three ways: (1) percent agreement, (2) Gwet’s AC1 statistic [13, 14], and (3) the prevalence-adjusted and bias-adjusted Kappa (PABAK) statistic [15]. The Gwet’s AC1 and PABAK statistics were used to assess concordance instead of the traditional Kappa statistic because Kappa is sensitive to uneven data distributions [16]. For example, when there is high agreement in situations with an uneven distribution of responses across the possible response options (e.g., high prevalence observed for one response option), Kappa may not accurately represent concordance [16]. Gwet’s AC1 is similar to Kappa, but it uses a different definition of chance agreement with a more realistic assumption that only a portion of the observed ratings will potentially lead to agreement by chance [13]. Thus, it is more robust to an uneven distribution of data. The PABAK statistic defines and incorporates both a bias index and prevalence index into its calculation of the estimate of chance agreement, therefore mitigating potential effects of rater bias and overall prevalence [15]. The Gwet AC1 and PABAK statistics were interpreted using benchmarks commonly used to interpret agreement statistics. For example, values over 0.80 are thought to indicate “almost perfect” agreement or “very good” agreement [17, 18].
To determine whether significantly more participants preferred one device over the other with regard to each item of the DID-PQ, comparisons between devices were performed according to the following steps: (1) participants who provided a neutral response for an item were dropped from analysis of that item; (2) for each item, responses were grouped into two categories (prefer dulaglutide device or prefer semaglutide device); and (3) a two-sided binomial test was performed to determine whether the difference in preference between the devices was statistically significant. This test assessed whether the proportion indicating preference for one of the two devices differed from 0.5. For each DID-PQ item, the null hypothesis was that the probability of preferring one of the devices was 0.5, which would indicate that an equal number of respondents preferred each device. If the binomial test yielded a significant p-value, then the null hypothesis could be rejected, which would mean that significantly more participants preferred one device over the other.