Design
A Latin square crossover design enabling the randomization of four arms (sequences) and four periods (schedules) and balanced for first-order carryover was employed [17, 18]. This design incorporates blocks of 4 sequences of 4 individual administrations, with sequences randomly allocated within each block. Each sequence contains a single instance of each administration in such a way that within each block the treatment periods contain the same number of each administration, and individual administrations are preceded by each other administration the same number of times (balanced first order carryover). This particular design reduces errors as a result of imbalance contribution of the interventions and requires a relatively small sample size to conduct the trial. On each period, one of the following formats was administered: 1) A provisioned device not requiring scrolling (Samsung Galaxy J7: screen size: 5.5-in., screen resolution: 720 × 1280 pixels); 2) a provisioned device requiring scrolling to reveal all item text and including a “smart-scrolling” feature that disabled the “next” navigation button until all information was viewed (Samsung Galaxy Core Prime: screen size: 4.5-in., screen resolution: 480 × 800 pixels); 3) a provisioned device requiring scrolling (Samsung Galaxy Core Prime: screen size: 4.5-in., screen resolution: 480 × 800 pixels) without the smart-scrolling feature (user can advance without scrolling to reveal all information); and 4) BYOD (Android or iOS) with smart-scrolling. We provided no instruction to the participants regarding the type of Android or iOS mobile device that they could bring for use in the BYOD administration period The format layout differences and smart-scrolling feature are illustrated in Fig. 1. A washout period of 1 hour was used between each ePROM administration schedule. This washout period included a distraction task comprising a Paced Visual Serial Addition Test (PVSAT), developed using Apple Research Kit by ICON Clinical Research (Dublin, Ireland) and CRF Bracket (Arlington, VA). This task comprised a working memory addition test with numbers repeated every 3 s for 60 repeats, and was deployed on an iPad Mini device.
Included in the study was a mix of US English-speaking and US Spanish-speaking participants, aged 18 years and older, with a self-reported chronic medical condition causing daily pain or discomfort. Participants completed a selected set of PROMs. Study procedures were conducted at ICON’s office (Maryland, USA), with all participants being recruited from the US District of Columbia metropolitan area by Shugoll Research (Bethesda, USA) using their client database, referrals, and social media. All participants provided written informed consent. Salus Institutional Review Board (Austin, TX) provided ethical approval for the study. Participants were randomized to an administration schedule according to a pre-defined randomization list. Participants received training on use of the provisioned electronic smartphone devices from research staff to complete the PROMs.
The PROMs were delivered using the mProve Health ePRO platform (CRF Bracket, Arlington, VA). The ePRO platform was available in both US-English and US-Spanish versions, and participants were provided with the version corresponding with their primary language. The PROMs included the 12-Item Health Survey (SF-12) [19], EuroQol-5 Dimension- 5 Level (EQ-5D-5L), EuroQol Visual Analog Scale (EQ-VAS) [20,21,22,23], and three items measuring pain over the past week: a visual analogue scale (VAS), an 11-point numeric rating scale (NRS), and a 7-point Likert scale (LIK). The electronic implementation of the SF-12 and EQ-5D instruments were approved by the license holders, and the VAS, NRS, and LIK for pain were implemented according to ePRO design best practices [24]. Information was collected from participants on their attitudes towards BYOD use, along with familiarity with smartphone devices, by administering an end-of-study questionnaire on paper. The ePRO platform was configured such that no item could be skipped. However, it was possible that the participant could withdraw from the study during schedule or after finishing a schedule. These participants were excluded to ensure a balanced crossover design. Hence, missing information was only possible at schedule level and not at item level. However, we only included the participants who completed all four schedules.
To calculate the required sample size, we assumed 80% power with a one-sided alpha significance of 0.05 and a true underlying Intraclass Correlation (ICC) of 0.85. We further assumed the difference we wished to equate at least a lower bound for ICC of 0.70 [7, 9]. Subsequently, the required sample size per arm of the study was calculated to be 26 subjects. To compensate for losing five degrees of freedom as a result of extra variables in the model, we added 5 to the initial sample size (N = 31). The target recruitment sample size of 165 participants (assuming 25% dropout) was determined to provide 124 fully evaluable subjects with approximately 31 participants per sequence. We used the formula offered by Walter et al. to calculate the sample size [25]. No power analysis was performed for the logistic regression assumptions; however, we used a two-sided alpha at 0.05 as the significance level to interpret the results of the logistic regression analysis.
Statistical analysis
Analyses were conducted using SAS 9.4 (SAS Institute, Inc., NC, USA), Stata 15 (StataCorp LLC, College Station, TX), and SPSS 25 (IBM, Armonk, NY). Mixed-effects generalized linear models (ME-GLM) were employed to fit the data and test the association between the treatment variables (e.g. scrolling vs. non-scrolling) with each PRO score. A random intercept model with study participants treated as random effects was specified with all the covariates (schedules and sequence of administration) modelled as fixed effects. ICCs were calculated using the method specified by McGraw & Wong to derive ICCs with 95% confidence interval. ICC (A, K) for a two-way mixed effects model with absolute agreement among more than two experiments (here schedules) was applied [26] to the PROMs. Additionally, the ICCs were calculated by dividing the variance of the random intercept by the total variance of the ME-GLM model, which is the sum of variance for the random intercept and that of the error term. The 95% confidence interval was obtained using the “delta method” [27, 28]. The more conservative method of estimating ICC (the one with a lower estimate) was eventually used as the primary method. Measurement equivalence was considered when a lower bound of the 95% Confidence Interval for the estimated ICC was at least 0.70 [7, 9]. The results on post-estimation ICCs were compared between two software applications, SAS 9.4 and STATA 15 for consistency.
Sensitivity analyses were conducted to examine differences between participants with any missing schedules and those who completed all four schedules. We fitted logistic regression models in which sex and age groups were set as the predictor variables and schedule completion status was set as the outcome variable. We also generated ICCs using all information (complete schedules and missing schedules) as well as only-complete schedules to evaluate the difference in the results given the input. Statistical significance was calculated for the two-sided 0.05 level throughout.