Skip to main content

Does scrolling affect measurement equivalence of electronic patient-reported outcome measures (ePROM)? Results of a quantitative equivalence study

Abstract

Background

Scrolling is a perceived barrier in the use of bring your own device (BYOD) to capture electronic patient reported outcomes (ePROs). This study explored the impact of scrolling on the measurement equivalence of electronic patient-reported outcome measures (ePROMs) in the presence and absence of scrolling.

Methods

Adult participants with a chronic condition involving daily pain completed ePROMs on four devices with different scrolling properties: a large provisioned device not requiring scrolling; two provisioned devices requiring scrolling – one with a “smart-scrolling” feature that disabled the “next” button until all information was viewed, and a second without this feature; and BYOD with smart-scrolling. The ePROMs included were the SF-12, EQ-5D-5L, and three pain measures: a visual analogue scale, a numeric response scale and a Likert scale. Participants completed English or Spanish versions according to their first language. Associations between ePROM scores were assessed using intraclass correlation coefficients (ICCs), with lower bound of 95% confidence interval (CI) > 0.7 indicating comparability.

Results

One hundred fifteen English- or Spanish-speaking participants (21-75y) completed all four administrations. High associations between scrolling and non-scrolling were observed (ICCs: 0.71–0.96). The equivalence threshold was met for all but one SF-12 domain score (bodily pain; lower 95% CI: 0.65) and two EQ-5D-5L item scores (pain/discomfort, usual activities; lower 95% CI: 0.64/0.67). Age, language, and device size produced insignificant differences in scores.

Conclusions

The measurement properties of PROMs are preserved even in the presence of scrolling on a handheld device. Further studies that assess scrolling impact over long-term, repeated use are recommended.

Introduction

Patient-Reported Outcome (PRO) measures have been increasingly gaining momentum in clinical outcome research because of recent movement toward patient-centeredness in both clinical practice and research [1, 2]. In the last two decades, the Food and Drug Administration (FDA) and European Medicines Agency (EMA) have progressively contributed to patient-focused drug development by requiring PRO endpoints in new drug applications [3] and including data from PROMs in drug labelling [4,5,6].

An increasing number of clinical research studies employ electronic formats to collect PRO measures (PROMs) in field-based and in-clinic settings [7]. This has been driven by the availability, low cost, and reliability of modern mobile devices such as smartphones and tablets, along with the requirement to improve the integrity and quality of data collected while limiting missing data entries and ensuring the timeliness of PROM completion [8]. Because most PROMs were originally developed and validated in pen-and-paper forms, migrating a PROM to an electronic format (ePROM) requires care to ensure the measurement properties of the original instrument are unaffected by the change in format [9].

Many clinical trials provide an electronic mobile device (provisioned device: PD) of a common make and model to all participants, to ensure that PROM presentation is identical for all participants. However, the drive to make clinical studies more patient-centric has led to increasing interest in collecting PROMs using the participants’ own device (bring your own device: BYOD) with the aim to make PROM collection more convenient. Due to smartphone screen size, ePRO solution providers typically aim to present a single PROM question per screen and to ensure all content is displayed without the requirement to scroll [10, 11]. When collecting PROMs using BYOD, the screen size and resolution of the participants’ devices may vary, and this may introduce the requirement for the user to scroll the screen to reveal both the question and response options for some or all PROM items.

Previous studies have provided some evidence on the equivalence of the PROMs after migrating from paper to electronic formats [12, 13]. However, past research examining the comprehension of information presented on computer monitors has reported mixed results when considering the impact of the requirement to scroll to retrieve information [14, 15]. One concern for studies utilizing ePROM is that a user may not review the complete question and response options before giving an answer to a questionnaire item with the presence of scrolling, and this behaviour may adversely affect the PROM measurement properties. While the measurement equivalence of PROMs comparing BYOD to PD has been studied [16], the impact of scrolling features on the response pattern associated with PROM completion has not been addressed. In this study, we aimed to evaluate the measurement equivalence of ePROMs in the presence and absence of scrolling on a set of provisioned smartphone devices as well as BYOD smartphones.

Methods

Design

A Latin square crossover design enabling the randomization of four arms (sequences) and four periods (schedules) and balanced for first-order carryover was employed [17, 18]. This design incorporates blocks of 4 sequences of 4 individual administrations, with sequences randomly allocated within each block. Each sequence contains a single instance of each administration in such a way that within each block the treatment periods contain the same number of each administration, and individual administrations are preceded by each other administration the same number of times (balanced first order carryover). This particular design reduces errors as a result of imbalance contribution of the interventions and requires a relatively small sample size to conduct the trial. On each period, one of the following formats was administered: 1) A provisioned device not requiring scrolling (Samsung Galaxy J7: screen size: 5.5-in., screen resolution: 720 × 1280 pixels); 2) a provisioned device requiring scrolling to reveal all item text and including a “smart-scrolling” feature that disabled the “next” navigation button until all information was viewed (Samsung Galaxy Core Prime: screen size: 4.5-in., screen resolution: 480 × 800 pixels); 3) a provisioned device requiring scrolling (Samsung Galaxy Core Prime: screen size: 4.5-in., screen resolution: 480 × 800 pixels) without the smart-scrolling feature (user can advance without scrolling to reveal all information); and 4) BYOD (Android or iOS) with smart-scrolling. We provided no instruction to the participants regarding the type of Android or iOS mobile device that they could bring for use in the BYOD administration period The format layout differences and smart-scrolling feature are illustrated in Fig. 1. A washout period of 1 hour was used between each ePROM administration schedule. This washout period included a distraction task comprising a Paced Visual Serial Addition Test (PVSAT), developed using Apple Research Kit by ICON Clinical Research (Dublin, Ireland) and CRF Bracket (Arlington, VA). This task comprised a working memory addition test with numbers repeated every 3 s for 60 repeats, and was deployed on an iPad Mini device.

Fig. 1
figure1

Format and layout differences for (a) no scrolling, (b) scrolling with the smart-scrolling feature, and (c) scrolling without the smart-scrolling feature

Included in the study was a mix of US English-speaking and US Spanish-speaking participants, aged 18 years and older, with a self-reported chronic medical condition causing daily pain or discomfort. Participants completed a selected set of PROMs. Study procedures were conducted at ICON’s office (Maryland, USA), with all participants being recruited from the US District of Columbia metropolitan area by Shugoll Research (Bethesda, USA) using their client database, referrals, and social media. All participants provided written informed consent. Salus Institutional Review Board (Austin, TX) provided ethical approval for the study. Participants were randomized to an administration schedule according to a pre-defined randomization list. Participants received training on use of the provisioned electronic smartphone devices from research staff to complete the PROMs.

The PROMs were delivered using the mProve Health ePRO platform (CRF Bracket, Arlington, VA). The ePRO platform was available in both US-English and US-Spanish versions, and participants were provided with the version corresponding with their primary language. The PROMs included the 12-Item Health Survey (SF-12) [19], EuroQol-5 Dimension- 5 Level (EQ-5D-5L), EuroQol Visual Analog Scale (EQ-VAS) [20,21,22,23], and three items measuring pain over the past week: a visual analogue scale (VAS), an 11-point numeric rating scale (NRS), and a 7-point Likert scale (LIK). The electronic implementation of the SF-12 and EQ-5D instruments were approved by the license holders, and the VAS, NRS, and LIK for pain were implemented according to ePRO design best practices [24]. Information was collected from participants on their attitudes towards BYOD use, along with familiarity with smartphone devices, by administering an end-of-study questionnaire on paper. The ePRO platform was configured such that no item could be skipped. However, it was possible that the participant could withdraw from the study during schedule or after finishing a schedule. These participants were excluded to ensure a balanced crossover design. Hence, missing information was only possible at schedule level and not at item level. However, we only included the participants who completed all four schedules.

To calculate the required sample size, we assumed 80% power with a one-sided alpha significance of 0.05 and a true underlying Intraclass Correlation (ICC) of 0.85. We further assumed the difference we wished to equate at least a lower bound for ICC of 0.70 [7, 9]. Subsequently, the required sample size per arm of the study was calculated to be 26 subjects. To compensate for losing five degrees of freedom as a result of extra variables in the model, we added 5 to the initial sample size (N = 31). The target recruitment sample size of 165 participants (assuming 25% dropout) was determined to provide 124 fully evaluable subjects with approximately 31 participants per sequence. We used the formula offered by Walter et al. to calculate the sample size [25]. No power analysis was performed for the logistic regression assumptions; however, we used a two-sided alpha at 0.05 as the significance level to interpret the results of the logistic regression analysis.

Statistical analysis

Analyses were conducted using SAS 9.4 (SAS Institute, Inc., NC, USA), Stata 15 (StataCorp LLC, College Station, TX), and SPSS 25 (IBM, Armonk, NY). Mixed-effects generalized linear models (ME-GLM) were employed to fit the data and test the association between the treatment variables (e.g. scrolling vs. non-scrolling) with each PRO score. A random intercept model with study participants treated as random effects was specified with all the covariates (schedules and sequence of administration) modelled as fixed effects. ICCs were calculated using the method specified by McGraw & Wong to derive ICCs with 95% confidence interval. ICC (A, K) for a two-way mixed effects model with absolute agreement among more than two experiments (here schedules) was applied [26] to the PROMs. Additionally, the ICCs were calculated by dividing the variance of the random intercept by the total variance of the ME-GLM model, which is the sum of variance for the random intercept and that of the error term. The 95% confidence interval was obtained using the “delta method” [27, 28]. The more conservative method of estimating ICC (the one with a lower estimate) was eventually used as the primary method. Measurement equivalence was considered when a lower bound of the 95% Confidence Interval for the estimated ICC was at least 0.70 [7, 9]. The results on post-estimation ICCs were compared between two software applications, SAS 9.4 and STATA 15 for consistency.

Sensitivity analyses were conducted to examine differences between participants with any missing schedules and those who completed all four schedules. We fitted logistic regression models in which sex and age groups were set as the predictor variables and schedule completion status was set as the outcome variable. We also generated ICCs using all information (complete schedules and missing schedules) as well as only-complete schedules to evaluate the difference in the results given the input. Statistical significance was calculated for the two-sided 0.05 level throughout.

Results

Participants

Of the 151 eligible participants (42 US Spanish-speaking and 109 US English-speaking) initially recruited, 36 participants were excluded from the analysis for reasons described in Fig. 2. The final analyses included 115 participants (95 English-speaking and 20 Spanish-speaking), aged 21 to 75 years who completed all four schedules. Table 1 conveys detailed information on demographic features of the participants included in the final analyses. The most common self-reported cause of pain was arthritis (33.9%) followed by back pain (13%). Approximately 41% (N = 47) of the participants reported a heterogeneous array of reported morbidities, including diabetes. For all reported morbidities, only an indirect causal link between the reported morbidity and chronic pain was conceivable.

Fig. 2
figure2

The flow of data gathering completion

Table 1 Demographics and health conditions of participants

Familiarity with and attitudes toward BYOD

Table 2 provides further details about BYOD familiarity and preference and attitudes toward BYOD devices. Out of 115 participants, 66 (57.4%) participants used an Apple device and 49 (42.6%) used an Android device as the BYOD device in this study. Seventy-eight participants (67.8%) carried large devices, arbitrarily defined as one with a diagonal size at least 140 mm (5.5 in). Only two of the 115 participants reported inability to download and run the study app on the BYOD device and required assistance from this ePROM study’s research assistant to download the study app. Ninety-nine participants (86.1%) indicated that they were “definitely willing” to use a BYOD device for a clinical trial. Finally, 49 participants (57.4%) expressed that it was “essential/very important” that others could not see their data on their device.

Table 2 Patient familiarity, preferences, and attitudes towards BYOD devices

Measurement equivalence

Table 3 presents the mean (SD) for each scale or item score under each of the four schedules and provides estimated ICC (95% CI). Comparing the scrolling and non-scrolling schedules, the equivalence threshold criterion (a minimum of 0.7 lower band of the 95% confidence interval for ICC) was met for all scale/item scores except for the bodily pain scale score from SF-12, and usual activity and pain/discomfort items of the EQ-5D-5L. Estimated ICCs for SF-12 ranged between 0.72 and 0.96, and that for EQ-5D-5L items and scores ranged between 0.71–0.90. For the three pain scales, the ICCs showed a range between 0.81 and 0.95. The lower bound for 95% CI for bodily pain from SF-12 was 0.65 and for usual activity and pain/discomfort items of the EQ. 5D-5L was 0.67 and 0.64 respectively. The same pattern of success in meeting the measurement equivalence criteria was preserved for the overall ICC (a model with no comparison), contrasting BYOD schedule with non-scrolling schedule, and smart scrolling schedule versus non-smart scrolling schedule. The equivalence threshold criterion was met for eleven of twelve SF-12 items across all the three comparisons and for the overall estimated ICCs (results are not shown).

Table 3 Intra-class Correlations for questionnaire items between scrolling features

Table 4 provides detailed information on the estimated ICC (95% CI) for the models for the covariate impact. The reliability threshold meeting success pattern remained unchanged for the impact of three covariates: language (Spanish versus English), device size (large versus normal), and age (45–64 years versus 18–44 years and 65+ versus 18–44 years). For all three covariate effects, the estimated ICCs ranged from 0.71 to 0.96 across all the PROMs. For bodily pain of SF-12, and usual activity and pain/discomfort of the EQ-5D-5L the lower band of the 95% CI for the ICCs ranged between 0.62 and 0.71 across all the PROMs.

Table 4 Intraclass Correlations for covariate impacts (language, device size, and age)

Sensitivity analysis

Cutting down the analytic sample from the full sample (N = 151) to the balanced sample (N = 115) trivially affected the ICCs and the confidence intervals. For instance, the overall ICCs for the SF-12-Physical Component Summary (PCS) score were estimated as 0.91(95% CI: 089–0.93) using the full sample and 0.92(95% CI: 0.89–0.94) after using the balanced sample. The equivalence analysis to obtain the ICCs with 95% CI was compared among SAS, STATA, and SPSS. SAS and STATA generated the exact results. However, by ignoring the covariate effect, SPSS consistently generated inflated ICCs. As an example, using SPSS the overall ICCs for SF-12 PCS score were calculated as 0.98 (0.97–0.98) using SPSS and 0.92 (0.89–0.94) using SAS and STATA. Mean differences of the scale/item scores across the four schedules using one-way analysis of variance and by including only the first administration (e.g., excluding administrations B, C, and D in ABCD sequence) were not statistically significant (one-sided P-value > 0.1 consistently).

Discussion

The ePRO design good practice guidelines, such as those reported by the Critical Path Institute’s ePRO Consortium require the visibility of the full item stem text and its entire response options on the electric devices [24]. It follows that a principal concern in regard with migrating an existing pen and paper format PROM to an ePROM is that the participant may respond differently to items when the question and its response options are displayed fully compared to when items are partially displayed on a single screen. The difference in participants’ response patterns could theoretically stem from their unawareness of all the response options if some appear off screen. In addition, participants could find it inconvenient to scroll and, therefore, pick an item in view so that they can move to the next question quickly. For that reason, we examined the hypothesis that scrolling can alter participants’ response pattern.

This study provided a strong indication that the presence of scrolling is unlikely to affect PROM measurement properties. More specifically, we demonstrated measurement equivalence of the SF-12, EQ-5D-5L, and three different pain scales using common response scale types in the presence and absence of scrolling on provisioned and BYOD smartphone devices. There was measurement equivalence when comparing BYOD smartphones with non-scrolling provisioned devices satisfied the measurement equivalence. Similarly, measurement equivalence was preserved in comparing smart-scrolling with non-scrolling devices. Bodily pain scale score of the SF-12 and usual activity and pain/discomfort items of the EQ-5D-5L were the only scale/items which did not pass the measurement equivalence test. However, the lower band of the 95% confidence interval for the three pain scales exceeded the threshold of 0.7. Such inconsistencies may indicate discrepancies in item-level properties across different instruments that measure similar constructs. The impact of age, language, and smartphone size on the measurement equivalence was negligible and not statistically significant. The sensitivity analysis was done by preserving only the first administration, which converted the crossover design into a parallel design at the price of losing some power; however, the analysis of variance model showed no difference in mean scores of the PROMs across the four schedules. These sets of analyses supported the insignificant impact of the sequential testing on the measurement equivalence results.

It is noteworthy to emphasize that the focus of the current study was not to test the psychometric properties of these instruments on electronic devices. This study is meant to evaluate whether the changes in the question-answer display format on smartphone screens may result in changing the subject responses. A number of approaches are currently offered by ePRO solution vendors to mitigate the need for scrolling. One is to detect device features (make, model, etc.) on app installation and block devices that do not meet minimum size/specification criteria. Such an approach typically employs a look-up table of device specifications. While commercial databases exist, these have limitations, as it is hard to keep up to date with all makes and models (esp. Android) to enable this option for inclusion of all possible devices. A second method is to detect scrolling on a per-page basis and provide a scrolling indicator or disable navigation until scrolling has been accomplished (smart-scrolling). Finally, one can ensure that the navigation buttons are always at the foot of the page so the need to scroll to advance is required to reveal the entire questionnaire item before it is possible to advance to the next question. We utilized the smart-scrolling approach in this investigation.

In terms of design and analysis of the study, we employed a Latin square crossover design, which allowed the randomization of the four schedules and four different sequences. We followed previous research [7, 9] to select the acceptable lower band 95% confidence interval limit (i.e., 0.70 to serve as the equivalence threshold). The fixed one-hour distraction task between each subsequent pair of ePROM administration was assumed to effectively mitigate the participant’s recall of the response pattern from the previous administration to the next. By including two covariates, sequence and schedule, in the regression model for the equivalence estimation we tried to further mitigate the carryover effect.

The study comes with some limitations. While we were able to demonstrate measurement equivalence in the presence or absence of scrolling during repeated administration on a single day, we did not study the possible effects of scrolling during repeated use that is common with a typical clinical trial scenario. It would be valuable to study whether scrolling has a negative effect on completion compliance during longitudinal use, and whether response behaviour might be affected longitudinally if scrolling produces additional completion burden for the patient. Secondly, we only examined one method to mitigate scrolling, although it is likely that the other scrolling mitigation approaches would yield similar results. Finally, we had a small sample of participants who presented small BYOD smartphones and were not able to breakdown the sample for detailed analysis of the BYOD size effect. According to the latest data on smartphone sale by screen size, it is evident that small smartphones are still used by some people [29]. Hence, the results of this study on the impact of the BYOD size should be interpreted with caution.

Conclusions

This study, to our knowledge, is the first research that evaluates scrolling providing some positive signals to help mitigate concerns over use of a scrolling feature when it is necessary. While the need for scrolling is unlikely on larger devices and can be completely prevented when providing a provisioned smartphone to study participants, the need to scroll cannot be completely eliminated in a BYOD setting where a pre-defined criteria to exclude small BYOD devices is not set up. Based on the results of our study, we make the following recommendations relevant to ePRO design in the future: 1) continue to design ePROMs to avoid scrolling when using a provisioned device; 2) mitigate scrolling by using one of the approaches described (smart-scrolling, scrolling indicator/pop-up, or navigation buttons at the foot of the screen requiring scrolling to progress), 3) over-ride certain user-adjusted screen display settings within the app display where possible; and 4) always provide partial provisioning as an option to allow for patients with unsuitable smartphones, which can be facilitated by defining a minimum specifications that can be easily identified by patient/site [9].

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. 1.

    Tan P. Pham was an employee of ICON at the time the study was conducted.

Abbreviations

BYOD :

Bring your own device

CI :

Confidence interval

EMA :

European Medicines Agency

ePRO :

Electronic patient reported outcome

ePROM :

Electronic patient reported outcome measure

EQ-VAS :

EuroQol Visual Analog Scale

EQ-5D-5L :

EuroQol-5 Dimension- 5 Level

FDA :

Food and Drug Administration

ICC :

Intraclass correlation coefficient

LIK :

Likert scale

ME-GLM :

Mixed-effects generalized linear models

NRS :

Numeric rating scale

PCS :

Physical Component Summary

PRO :

Patient-reported outcome

PROM :

Patient reported outcome measure

PVSAT :

Paced Visual Serial Addition Test

SD :

Standard Deviation

SF-12 :

12-Item Health Survey

VAS :

Visual analogue scale

References

  1. 1.

    Weldring, T., & Smith, S. (2013). Article commentary: Patient-reported outcomes (PROs) and patient-reported outcome measures (PROMs). Health Services Insights, 6, 61–68.

    Article  Google Scholar 

  2. 2.

    Cappelleri, J. C., Zou, K. H., & Bushmakin, A. G. (2013). Patient-reported outcomes: Measurement, implementation and interpretation, (1st ed., ). New York: Chapman&Hall/CRC Press.

    Google Scholar 

  3. 3.

    Burke, L., Kennedy, D., Miskala, P., Papadopoulos, E., & Trentacosti, A. (2008). The use of patient-reported outcome measures in the evaluation of medical products for regulatory approval. Clinical Pharmacology and Therapeutics, 84(2), 281–283.

    CAS  Article  Google Scholar 

  4. 4.

    Gnanasakthy, A., Mordin, M., Evans, E., Doward, L., & DeMuro, C. (2017). A review of patient-reported outcome labeling in the United States (2011-2015). Value in Health, 20, 420–429.

    Article  Google Scholar 

  5. 5.

    U.S. Food and Drug Administration (2009). Guidance for industry - patient-reported outcome measures: Use in medical product Develpment to support labeling claims Available from: http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf., Accessed 04/04/2019.

    Google Scholar 

  6. 6.

    European Medicines Agency, Committee for Medicinal Products for Human Use (CHMP) (2005). Reflection paper on the regulatory guidance for the use of health-related quality of life (HRQL) measures in the evaluation of medicinal products Available from: http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003637.pdf. Accessed 4 Apr 2019.

  7. 7.

    Byrom, B., Gwaltney, C., Slagle, A., et al. (2019). Measurement equivalence of patient-reported outcome measures migrated to electronic formats: A review of evidence and recommendations for clinical trials and bring your own device. Ther Innov Regul Sci, 53, 426–430. https://doi.org/10.1177/2168479018793369

  8. 8.

    Coons, S. J., Eremenco, S., Lundy, J. J., et al. (2015). Capturing patient-reported outcome (PRO) data electronically: The past, present, and promise of ePRO measurement in clinical trials. Patient, 8(4), 301–309.

    Article  Google Scholar 

  9. 9.

    Coons, S. J., Gwaltney, C. J., Hays, R. D., et al. (2009). Recommendations on evidence needed to support measurement equivalence between electronic and paper-based patient-reported outcome (PRO) measures: ISPOR ePRO good research practices task force report. Value in Health, 12(4), 419–429.

    Article  Google Scholar 

  10. 10.

    Oxford University Innovation (2016). Patient-reported outcomes – From paper to ePROs: Good practice for migration Available from: https://innovation.ox.ac.uk/wp-content/uploads/2016/05/ePRO_guide_2016.pdf. Accessed 04/02/2019.

  11. 11.

    Critical Path Institute ePRO Consortium (2018). Best practices for electronic implementation of response scales for patient-reported outcome measures Available from: https://c-path.org/wp-content/uploads/2018/09/BestPractices2_Response_Scales.pdf. Accessed 4 Apr 2019.

  12. 12.

    Muehlhausen, W., Doll, H., Quadri, N., et al. (2015). Equivalence of electronic and paper administration of patient-reported outcome measures: A systematic review and meta-analysis of studies conducted between 2007 and 2013. Health and Quality of Life Outcomes, 13, 167.

    Article  Google Scholar 

  13. 13.

    Gwaltney, C. J., Shields, A. L., & Shiffman, S. (2008). Equivalence of electronic and paper-and-pencil administration of patient-reported outcome measures: A meta-analytic review. Value in Health, 11(2), 322–333.

    Article  Google Scholar 

  14. 14.

    Sanchez, C. A., & Wiley, J. (2009). To scroll or not to scroll: Scrolling, working memory capacity, and comprehending complex texts. Human Factors, 51(5), 730–738.

    Article  Google Scholar 

  15. 15.

    Klyszejko, A., Wieczorek, A. M., Sarzynska, J., et al. (2014). Mode of text presentation and its influence on reading efficiency: Scrolling versus pagination. Studia Pscyhologica, 56(4), 309–321.

    Article  Google Scholar 

  16. 16.

    Byrom, B., Doll, H., Muehlhausen, W., et al. (2017). Measurement equivalence of patient-reported outcome measure response scale types collected using bring your own device compared to paper and a provisioned device: Results of a randomized equivalence trial. Value in Health, 21(5), 581–589.

    Article  Google Scholar 

  17. 17.

    Gao, L. (2015). Latin squares in experimental design Available from: http://compneurosci.com/wiki/images/9/98/Latin_square_Method.pdf. [Accessed 1 Nov 2018].

    Google Scholar 

  18. 18.

    Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences, (4th ed., ). Thousand Oaks: SAGE Publications.

    Google Scholar 

  19. 19.

    Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–233.

    Article  Google Scholar 

  20. 20.

    The EuroQol Group (1990). EuroQol-a new facility for the measurement of health-related quality of life. Health Policy, 16(3), 199–208.

    Article  Google Scholar 

  21. 21.

    Brooks, R. (1996). EuroQol: The current state of play. Health Policy, 37(1), 53–72.

    CAS  Article  Google Scholar 

  22. 22.

    Herdman, M., Gudex, C., Lloyd, A., et al. (2011). Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Quality of Life Research, 20(10), 1727–1736.

    CAS  Article  Google Scholar 

  23. 23.

    Janssen, M. F., Pickard, A. S., Golicki, D., et al. (2013). Measurement properties of the EQ-5D-5L compared to the EQ-5D-3L across eight patient groups: A multi-country study. Quality of Life Research, 22(7), 1717–1727.

    CAS  Article  Google Scholar 

  24. 24.

    Critical Path Institute ePRO Consortium (2018). Best practices for migrating existing patient-reported outcome instruments to a new data collection mode Available from: https://c-path.org/wp-content/uploads/2018/09/BestPractices3_Migrating.pdf. [Accessed 26 Mar 2019].

    Google Scholar 

  25. 25.

    Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in Medicine, 17(1), 101–110.

    CAS  Article  Google Scholar 

  26. 26.

    McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46.

    Article  Google Scholar 

  27. 27.

    Hankinson, S. E., Manson, J. E., Spiegelman, D., et al. (1995). Reproducibility of plasma hormone levels in postmenopausal women over a two to three year period. Cancer Epidemiology, Biomarkers & Prevention, 4(6), 649–654.

    CAS  Google Scholar 

  28. 28.

    Rao, C. R. (1973). Linear statistical inference and its applications. New York: Wiley.

    Google Scholar 

  29. 29.

    Statistica (2019). Smartphone unit shipments worldwide by screen size from 2018 to 2022 (in millions) Available from: https://www.statista.com/statistics/684294/global-smartphone-shipments-by-screen-size/. [Accessed 4 June 2019].

    Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge Hayley Johnson, Jack Mardekian, Parthena Psyllos, Christopher Eliopulos, and Maya Hardigan their contribution to the study as well as Matthew Miera for his editorial assistance.

Funding

This study was sponsored by Pfizer, Inc. The data used in this study are proprietary.

Author information

Affiliations

Authors

Contributions

Each named author has substantially contributed to conducting the underlying research and drafting of this manuscript. MB, SP, SN, JC, CL, PZ, JL, and BB contributed to the study’s conception. SK, CD, MB, SP, SN, JC, CL, PZ, BB, SS, MG, and MDLC contributed to the study’s design. MG and MDLC contributed to the acquisition of the data. JC and SS contributed to the analysis of the data. SK, CD, SP, JC, BB, and SS contributed to data interpretation. PZ, JL, and BB developed the software used in the study. SK, CD, SP, SN, JC, PZ, BB, SS, and MDLC drafted and/or revised the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Saeid Shahraz.

Ethics declarations

Ethics approval and consent to participate

The study was conducted in accordance with the Declaration of Helsinki, and approved by Salus IRB. All participants were given a Salus IRB-approved informed consent form to complete prior to study participation.

Consent for publication

Not applicable.

Competing interests

Patrick Zornow, Jeff Lee, Bill Byrom are employees of Signant Health (known as Bracket Health at the time of conducting the study), who were paid consultants to Pfizer in connection with the development of this study and manuscript. Saeid Shahraz, Tan P. Pham,Footnote 1 Marc Gibson, Marie De La Cruz are employees of ICON Clinical Research and were contracted through Signant Health on behalf of Pfizer. Suyash Nigam is an employee of Infosys, who were paid contractors to Pfizer in the development of this manuscript and for study project management.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Tan P. Pham was an employee of ICON at the time the study was conducted.

Craig Lipset and Munther Baara were employees of Pfizer at the time the study was conducted.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shahraz, S., Pham, T.P., Gibson, M. et al. Does scrolling affect measurement equivalence of electronic patient-reported outcome measures (ePROM)? Results of a quantitative equivalence study. J Patient Rep Outcomes 5, 23 (2021). https://doi.org/10.1186/s41687-021-00296-z

Download citation

Keywords

  • Patient-reported outcome
  • Patient-reported outcome measures
  • Intraclass correlation
  • Scrolling
  • BYOD
  • Measurement equivalence
  • Latin Square crossover design
  • ePRO
  • ePROM
\