Patient-proxy agreement on change in acute stroke patient-reported outcome measures: a prospective study

Objectives Research has indicated proxies overestimate symptoms on patients’ behalves, however it is unclear whether patients and proxies agree on meaningful change across domains over time. The objective of this study is to assess patient-proxy agreement over time, as well as agreement on identification of meaningful change, across 10 health domains in patients who underwent acute rehabilitation following stroke. Methods Stroke patients were recruited from an ambulatory clinic or inpatient rehabilitation unit, and were included in the study if they were undergoing rehabilitation. At baseline and again after 30 days, patients and their proxies completed PROMIS Global Health and eight domain-specific PROMIS short forms. Reliability of patient-proxy assessments at baseline, follow-up, and the change in T-score was evaluated for each domain using intra-class correlation coefficients (ICC(2,1)). Agreement on meaningful improvement or worsening, defined as 5+ T-score points, was compared using percent exact agreement. Results Forty-one patient-proxy dyads were included in the study. Proxies generally reported worse symptoms and functioning compared to patients at both baseline and follow-up, and reported less change than patients. ICCs for baseline and change were primarily poor to moderate (range: 0.06 (for depression change) to 0.67 (for physical function baseline)), and were better at follow-up (range: 0.42 (for anxiety) to 0.84 (for physical function)). Percent exact agreement between indicating meaningful improvement versus no improvement ranged from 58.5–75.6%. Only a small proportion indicated meaningful worsening. Conclusions Patient-proxy agreement across 10 domains of health was better following completion of rehabilitation compared to baseline or change. Overall change was minimal but the majority of patient-proxy dyads agreed on meaningful change. Our study provides important insight for clinicians and researchers when interpreting change scores over time for questionnaires completed by both patients and proxies.


Introduction
Multiple domains of health are impacted in patients with stroke including physical health, fatigue, pain interference, cognitive function, and overall global health [1]. Patient-reported outcome measures (PROMs) are increasingly utilized as endpoints for assessing these areas which are best evaluated through self-report. One challenge in the interpretation of PROMs is when caregivers, or proxies, respond instead of the patient, which can occur for as many as 30% of stroke patients [2,3]. Research has indicated proxies overestimate symptoms on patients' behalves, and this overestimation is greater for more subjective domains such as emotional or cognitive functioning [4,5]. Patient-proxy disagreement has implications both for research studies and clinical care. In research studies, inclusion of unbalanced numbers of proxy respondents in different treatment groups may bias analyses of outcomes. At the patient-level, this disagreement could affect the clinical treatment of symptoms, which could differentially impact more subjective domains such as anxiety or depression.
Prior work by our group has demonstrated patientproxy disagreement results in small effect sizes for group-level analyses, but large meaningful differences at the individual-level which affects the interpretability, and thus utilization, of PROMs during clinical care [6]. Furthermore, it is unclear whether patients and proxies agree on meaningful change across domains over time. Prior research evaluating stroke patient-proxy agreement on PROMs over time has been limited and results have been inconsistent, with one study finding low agreement between change scores [7] and another finding moderate agreement [8]. To our knowledge, no studies have investigated patient-proxy agreement on detecting meaningful change.
The objective of this study is to expand upon previous work and assess patient-proxy agreement over time, as well as agreement on identification of meaningful change, across 10 health domains in patients who underwent acute rehabilitation following stroke.

Methods
Patients with ischemic stroke or intracerebral hemorrhage were recruited from an ambulatory clinic, an inpatient rehabilitation unit, and an outpatient rehabilitation unit. Patients were included in the study if they were currently undergoing or about to undergo rehabilitation, cognitively and physically able to complete questionnaires, and had a proxy available with them to answer questionnaires. Informed consent was obtained for participating patients and their proxy prior to clinic visit or during their rehabilitation admission. The full study protocol has been previously published [6]. Briefly, each proxy participant was instructed to answer the questions in the way they believed the patient would answer, according to the "patient-proxy" perspective [9]. For patients who were unable to complete the questionnaires at the time of the visit, surveys were collected by emails sent via REDCap electronic data capture tools [10]. Following completion of an initial set of surveys, patients and their proxies each received a $25 stipend for participation. Patients and proxies received an additional $15 stipend after completing a second set of the same surveys 30 days following completion of the initial surveys. Patients attended rehabilitation during this time, and it is anticipated that patients improved in these measured domains during this window.
As part of the questionnaire set at both time points, patients and proxies completed 9 PROMs: PROMIS Global Health (resulting in global mental and global physical health summary scores) and PROMIS 8-item short forms for physical function, satisfaction with participation in social roles and activities, anxiety, fatigue, pain interference, sleep disturbance, Neuro-QoL cognitive function, and the Patient Health Questionnaire 9 depression screen which was calibrated to the PROMIS Depression metric [11]. PROMIS measures are transformed to a T-score metric with a mean of 50 and standard deviation (SD) of 10, which is representative of the mean and SD of the general United Status population [12].

Statistical analyses
Descriptive statistics were utilized to present patient and proxy characteristics, as well as responses to PROMs at baseline, follow-up, and change in PROM. Differences between patient versus proxy-reported PROM were compared using t-test. Significant change in PROM reported by patients and proxies was evaluated using paired t-test. Reliability of patient-proxy assessments at baseline, follow-up, and the change in T-score was assessed for each domain using intra-class correlation coefficients (ICC(2,1)) with 95% confidence intervals based on two-way random effects models for single rater agreement [13].
To identify agreement on meaningful improvement or worsening, a minimal important difference (MID) was calculated as half a SD, or 5 points [14,15]. Agreement between patients and proxies reporting MIDs were compared using percent exact agreement and unweighted kappa with 95% CI. Analyses were conducted using R version 4.0.0. Statistical significance was established throughout at p < 0.05.

Results
Forty-one patient-proxy dyads were included in the study with PROMs completed by both patients and their proxies at two time points (average ± sd 35.0 ± 13.9 days apart). The majority of patients were male (58.5%), white (85.4%), and married (85.4%), with average age 60.8 (± 13.3) years (Table 1). Proxies were predominately female (78.0%) and spouses of the patient (73.2%).
Proxies reported worse symptoms and functioning compared to patients on the domains of cognitive function, anxiety, depression, and fatigue at baseline, and on all domains but pain interference and sleep disturbance at follow-up (Table 2). These findings were statistically significant at baseline for the domain of cognitive function and at follow-up for the domains of cognitive function, global mental health, anxiety, and fatigue. Proxies typically reported less change than patients, with statistically significant proxy-patient differences on the domains of global mental health, social role satisfaction, and fatigue. Patients reported improvement on all domains, and significant improvement on 5 domains, compared to proxies who reported minimal change on domains and significant worsening on global mental health (− 2.6 T-score points).
At baseline, ICCs were poor to substantial, ranging from 0.09 for depression to 0.67 for physical function ( Table 2). Compared to baseline, reliability was better at follow-up for all domains except sleep disturbance, and ranged from 0.42 for anxiety to 0.84 for physical function. Compared to baseline and follow-up, agreement with determining change had the lowest ICCs for the majority of domains, and was generally poor, ranging from 0.06 for depression to 0.53 for global physical health.
Based on MIDs, patients indicated more meaningful improvement than proxies across the majority of domains (Fig. 1). The number of patient-proxy dyads that both indicated meaningful improvement ranged from 2 (4.9%) for global mental health to 11 (26.8%) for anxiety (Table 3). Percent exact agreement between indicating meaningful improvement versus no improvement was fairly high, from 58.5% for social role satisfaction to 75.6% for global mental health. Based on the kappa statistic, agreement between dyads on meaningful improvement was generally slight, with the lowest agreement on the domains of social role satisfaction and depression (kappa statistic = 0.04 and 0.05, respectively) ( Table 3). The highest agreement was on the domains of sleep disturbance, pain interference, and physical function (kappa = 0.47, 0.34, and 0.34, respectively).
Overall, only a small proportion of patients and proxies indicated meaningful worsening on PROMs (Fig. 1). Less than 10 % of dyads designated meaningful worsening among the different domains (ranging from 0% for physical function, social role satisfaction, and fatigue to 9.8% for global mental health and anxiety) ( Table 3). Percent exact agreement between patient and proxy scores on meaningful worsening ranged from 62.5-

Discussion
Our study assessed patient-proxy agreement, both over time and with identifying meaningful change, for 10 PROM domains in 41 patients who underwent rehabilitation following stroke. Patient-proxy agreement was better at follow-up compared to baseline or change, with higher agreement on more objective domains (ICC = 0.84 for physical function) and lower agreement on more subjective domains (ICC = 0.42 for anxiety). This is similar to a study of 164 stroke patients and their proxies where greater agreement was found on PROMs 6 months post-stroke compared to time of stroke [8]. Agreement was higher for more objective domains of ambulation/dexterity (ICCs = 0.75-0.87) and lower on more subjective domains such as hearing and cognition    (ICCs = 0.20-0.31). In a study of 65 patient-proxy dyads, however, higher agreement was found at the time of stroke (ICCs> 0.69 for SF-12 physical and mental component scores) as compared to 6 months later or change in scores [7]. Generally, results from cross-sectional studies have been mixed when assessing patient-proxy agreement as time from stroke increases. Prior studies by our group have not shown an association between time from stroke and patient-proxy agreement [6,16]. Overall, patients indicated significantly more improvement over time than proxies. Patient-proxy agreement on PROM change scores, as well as kappa statistics for assessing improvement, was better for more objective domains (physical function, global physical health, pain interference) and worse for more subjective domains (social role satisfaction, fatigue, depression, global mental health). When evaluating patient-proxy agreement on detecting worsening, there were minimal differences by domain, potentially owing to the low level of worsening overall. Similarly, the literature has indicated a lack of clinical change across health-related domains following stroke. Studies have shown that common post-stroke symptoms, such as fatigue, pain, anxiety, and depression, remain issues 6 months after stroke [17][18][19]. Minimal functional recovery has been demonstrated following mild stroke [20], and studies have indicated worse Neuro-QoL cognitive function scores at 3 months [21]. In our study, proxies indicated worse functioning and symptoms at follow-up and less change than patients. It has been posited that observers tend to place more weight on negative information than positive when providing impressions of others [22]. Our study is novel in that it evaluated patient-proxy agreement on reporting meaningful change. A prior study by our group found high levels of meaningful patient-proxy disagreement in a cross-sectional analysis, with 40-57% of dyads differing by 5+ T-score points across domains [6]. Our current study indicates patient-proxy agreement on meaningful change over time may be more reliable, as the majority of dyads agreed on meaningful improvement (59-76%) and worsening (63-81%) across domains. This has practical implications for the interpretation of assessing change in PROMs based on proxy-reports. Given the variability in patient-proxy disagreement at the individual-level in our prior study [6], and the current finding that proxies report worse scores and indicate less change than patients, it is unclear how reliable the clinical interpretation of a change in PROMs would be if patients answered at one time point and proxies at another. At a minimum, PROMs should include a question identifying whether they were completed by a patient or proxy, and clinicians should take this information into account when interpreting PROMs for use in clinical care.
There are limitations to our study, the most apparent being the small study sample. The full range of scores may not be observed in studies with small sample sizes, and patient-proxy agreement and correlations may be inflated by a few large standard errors [23]. Second, kappa statistics offer an added benefit of accounting for chance agreement [24], however they are limited when the marginal probability of one group is much smaller than the other [25]. Since the number of dyads indicating meaningful change was low in this study compared to dyads indicating no change, percent exact agreement may be more accurate than kappa statistics for assessing patient-proxy agreement. Third, there was variability in the amount of time that passed between the two assessments (range 17-93 days), further limiting the interpretation of the results. Fourth, our study sample was largely male, of White race, and married, which could limit generalizability of results. Lastly, our study did not include a clinical assessment for cognitive impairment and is limited to patients who were able to self-report their health status. Larger longitudinal studies over longer time periods that include clinical indicators are necessary to determine if proxies, and patients, can accurately assess meaningful change over time.
In conclusion, our study found patient-proxy agreement was better at follow-up in a study of 41 patientproxy pairs who completed PROMs across 10 domains of health at baseline and again following completion of rehabilitation. When evaluating change, patient-proxy agreement on detecting improvement was better for more objective domains than more subjective domains. Although change was minimal, the majority of patientproxy dyads agreed on meaningful improvement and worsening. Our study provides important insight for clinicians and researchers when interpreting change scores over time for PROMs completed by both patients and proxies.