Responsiveness of PROMIS® to change in chronic obstructive pulmonary disease

Background Chronic obstructive pulmonary disease (COPD) is a progressive chronic disease characterized by airflow obstruction that leads to shortness of breath and substantial negative impacts on health-related quality of life (HRQL). The course of COPD includes periodic acute exacerbations that require changes in treatment and/or hospitalizations. This study was designed to examine the responsiveness of Patient-Reported Outcomes Measurement Information System® (PROMIS®) measures to changes associated with COPD exacerbation recovery. Methods A longitudinal analysis using mixed-effects models was conducted of people who were enrolled while stable (n = 100) and those who experienced an acute exacerbation (n = 85). PROMIS (physical function, pain interference, pain behavior, fatigue, anxiety, depression, anger, social roles, discretionary social activities, Global Health, dyspnea severity and dyspnea functional limitations) and COPD-targeted HRQL measures were completed at baseline and at 12 weeks. Results We administered PROMIS measures using computer adaptive testing (CAT), followed by administration of any remaining short form (SF) items that had not yet been administered by CAT. Examination of the difference between group differences from baseline to 12 weeks in the stable and exacerbation groups revealed that the exacerbation group changed (improved) significantly more than the stable group in anxiety (p < .001 to p < .01; f2 effect size [ES] = 0.023/0.021), fatigue (p < .0001; ES = 0.036/0.047) and social roles (p < .001 to p < .05; ES = 0.035/0.024). All effect sizes were small in magnitude and smaller than hypothesized. Depression was also statistically significant (p < .05, SF only) but the ES was trivial. For all other PROMIS domains, the differences were not significant and ES were trivial. Conclusions This longitudinal study provides some support for the validity of the PROMIS fatigue, anxiety, and social roles domains in COPD, but further evaluation of responsiveness is warranted.


Background
Chronic obstructive pulmonary disease (COPD) is a heterogeneous group of slowly progressive diseases characterized by airflow obstruction that interferes with normal breathing and leads to shortness of breath or dyspnea that can limit physical activity [1]. COPD is the third leading cause of death in the U.S. and the only leading cause of death increasing in prevalence [2,3]. Those with COPD experience limitations in functioning and well-being or health-related quality of life (HRQL) comparable to or worse than patients with advanced lung cancer [4].
COPD leads to progressive decline in lung function associated with worsening of symptoms. Many patients experience periodic exacerbations, defined as an acute sustained worsening of their COPD, that result in unscheduled clinic or emergency department (ED) visits and require antibiotics and/or steroids, with severe cases requiring hospitalization for observation and treatment. The decline in lung function, symptoms, and physical function associated with exacerbations represent substantial negative impacts on HRQL and contribute to the downward trajectory of disease over time [5,6].
The Patient Reported Outcomes Measurement Information System® (PROMIS®) quantifies self-reported health with domains relevant to multiple chronic diseases and conditions, permitting selective or comprehensive outcome assessment based on user interests and needs. Each measure comprising this system was developed using standardized, rigorous psychometric methods [32,33] with testing in the general population and across a number of different patient subgroups [34][35][36][37][38][39]. Most PROMIS measures are universal (i.e., not diseasespecific), but some are particularly relevant to patients with COPD such as fatigue, physical function, anxiety, depression, dyspnea and social function. Although other instruments exist to measure some or all of these outcomes in various combinations [7,40], PROMIS offers single-site open access (www.healthmeasures.net) to each measure, computer adaptive test (CAT) versions for efficient assessments, and information on normative values. Cross-sectional analyses in patients with COPD provide support for the reliability and validity of PRO-MIS measures [41].
Acute exacerbations of COPD are clinically relevant events with important therapeutic and prognostic implications and with considerable heterogeneity in terms of their clinical presentation [42]. The majority of the literature on impacts of acute exacerbations of COPD has focused on respiratory symptoms, morbidity, mortality, hospitalizations and disease-specific health-related quality of life questionnaires (e.g., St. George's Respiratory Questionnaire [43]) [44][45][46][47][48][49][50][51]. In contrast, this study examines the responsiveness of PROMIS measures of specific physical, mental and social health status domains over 12 weeks in patients with COPD who were recovering from an acute COPD exacerbation. We also studied patients with COPD who were enrolled in a stable (nonexacerbating) state and followed for 12 weeks to allow us to explore any changes in the measures that were unrelated to exacerbation recovery. We hypothesized that there would be significant improvement in most of the PROMIS domain measures in COPD patients during recovery from exacerbation to a stable period, and that the largest magnitude changes would be in physical function, fatigue, anxiety, dyspnea and social function. While those in the stable condition might change over time due to unmeasured factors, we hypothesized that there would be little to no change over 12 weeks, and any observed change would be smaller than that of patients recovering from an acute exacerbation.

Methods
This study used a longitudinal, multisite prospective cohort of two groups: 1) patients with COPD enrolled at the time of an exacerbation, and 2) patients with COPD enrolled at a time of stability. Both groups were followed for 12 weeks and completed assessments at enrollment (baseline) and then weekly for the remaining 11 weeks. In this paper, we report on the baseline and 12-week findings. Subjects were recruited from outpatient clinics and hospitals at four research sites (University of North Carolina Health System, NorthShore University Health-System, Pittsburgh VA Medical Center, and Durham VA Medical Center).

Participants
We enrolled patients 40 years and older with an established clinical history of COPD in accordance with the GOLD definition at the time of the study [52] and at least a 10 pack/year history of smoking. Participants had to be able to read and speak English and be able to see and interact with a computer screen, mouse, and keyboard. People were excluded if, based on the input of clinic staff or evidence in the medical chart, they had a concurrent medical or psychiatric condition that precluded participation in the study or completion of selfreport questionnaires (e.g., dementia, uncontrolled schizophrenia). They were also excluded if they had a history of asthma without co-existent COPD or were experiencing a heart failure exacerbation. For those enrolled into the exacerbation group, participants recruited in the outpatient setting may have started treatment no more than 3 days prior to the day of enrollment, and for participants recruited in the inpatient setting, no more than 6 days prior to the day of enrollment. Those enrolled into the stable state group had to be exacerbation-free for a minimum of 2 months prior to enrollment. An exacerbation was defined as sustained worsening of COPD symptoms from stable state and beyond normal day-to-day variations that is acute in onset and necessitates a change in regular medication in a patient with underlying COPD; in addition, the exacerbation had to be established by a clinic visit or hospitalization with a medical diagnosis of COPD exacerbation and treatment with antibiotics or corticosteroids [53].
The study was conducted in accordance with the amended Declaration of Helsinki and was approved by the Institutional Review Board (IRB) at each site (University of North Carolina, 08-0138; NorthShore University HealthSystem, EH04-179; Pittsburgh VA Medical Center, 02683; Duke University, Pro00006904). At the time of enrollment and prior to beginning the baseline assessment, informed consent was obtained from all participants included in the study.

Procedures
For those stable at enrollment, the baseline assessment included questionnaire measures, pulmonary function testing with GOLD classification [54], and a six-minute walk test. Because of the compromised health of those in an exacerbation state at enrollment, only the questionnaires were administered; the sixminute walk test (6MWT) and pulmonary function testing with GOLD classification were performed at the 12-week follow-up when patients had returned to stable state. Thus, all analysis of clinical measures (i.e., pulmonary function testing, GOLD classification, 6MWT) reflected data obtained when patients were deemed stable.
All baseline questionnaires were completed by patients on a laptop computer in the clinic or in the hospital and included demographics, comorbid conditions, COPD history (symptoms, duration of diagnosis, number of exacerbations, and recent hospitalizations and ED visits), and the PRO measures. The research assistant also reviewed the clinical chart to record clinical variables such as body mass index (BMI) and COPD medications. If patients completed pulmonary function tests in-clinic that same day, the values were obtained from the medical chart and they were not asked to repeat the spirometry for the study. The follow-up at 12 weeks (+ 30-day window) was completed in-person in the clinic on the laptop computer. If, during the course of follow-up, a participant had a recurrent exacerbation, as defined previously, they were censored from analysis.

PROMIS measures
The primary goal of this study was to assess the responsiveness of PROMIS version 1.0 measures to changes associated with COPD exacerbation recovery. These measures assessed anxiety, depression, anger [55], fatigue [56], pain behavior [57], pain interference [58], physical function [59], satisfaction with participation in discretionary social activities [discretionary social activities], satisfaction with participation in social roles [social roles] [60], the 10-item Global Health short form [61] (producing scores for physical health and mental health) and dyspnea (Functional Assessment of Chronic Illness Therapy (FACIT) dyspnea severity and functional limitations measures). The FACIT dyspnea short forms (now also included as PROMIS measures) consist of items that assess dyspnea severity (10 items) and related functional limitations (10 items) and were newly developed, with this being the first longitudinal administration [62,63].
PROMIS measures can be administered as fixedlength short forms (SF) or dynamically by CAT. Questionnaires were administered using Assessment Center SM , a web-based data collection platform [64]. For PROMIS CATs, the first item administered in a CAT is usually one in the middle of the range of function or symptom severity. After a person provides a response, an estimated score is calculated. The CAT algorithm then selects the best item in the item bank for refining the estimated score. After a person provides a response, the estimated score is recalculated. The CAT continues to administer items until a specified level of measurement precision is reached (standard error < 0.3 on theta metric or 3.0 on a T-score metric) or a specified maximum number of items was administered (12). For this study, Assessment Center administered the PROMIS measures using CAT, followed by administration of any remaining SF items that had not yet been administered by CAT. These SF items were derived from the version 1.0 anger 8a, anxiety 7a, depression 8b, fatigue 7a, physical function 10a, pain interference 6b, pain behavior 7a, discretionary social activities 7a, and social roles 7a short forms.
PROMIS scores are estimated using item response theory parameters and scored on a T-score metric, with 50 typically representing the mean and 10 the standard deviation in the U. S. general population (for most domains). Exceptions to this include the PROMIS dyspnea severity and functional limitations measures, for which the mean and standard deviation (50/10) reflect the sample on which the measures were developed -people with COPD. For all PROMIS measures, the direction of scoring is guided by the domain name; higher scores indicate more of the construct being measured. Thus, for the PROMIS domains of anger, anxiety, depression, fatigue, dyspnea severity, dyspnea functional limitations, pain behavior and pain interference, higher scores indicate worse health, and for domains of physical function, discretionary social activities, social roles, and Global Health (physical health and mental health), higher scores indicate better health.

Additional measures
We administered two PRO assessments commonly used to assess COPD or related symptoms at baseline and week 12. The St. George's Respiratory Questionnaire (SGRQ) contains three domains (symptoms, activity, and impacts) and a summary score on a 0-100 possible range with 100 representing the worst HRQL [43]. The Modified Medical Research Council (MMRC) dyspnea scale is scored on a scale of 0 to 4 (0 = not troubled with breathlessness except with strenuous exercise, to 4 = too breathless to leave the house or breathless when dressing or undressing) [65,66]. We transformed the MMRC score linearly to a 0-100 possible range (1 = 0; 2 = 25; 3 = 50; 4 = 75; 5 = 100) for some analyses.

Clinical measures
Two clinical assessments, FEV1 and the 6MWT, were obtained from participants in a stable state (either baseline or week 12). FEV1 was measured using a portable spirometer administered by a research assistant trained in spirometry. Study-related spirometry was not performed if the patient had already undergone testing inclinic that same day. The 6MWT measured the distance in meters that a participant was able to walk in a sixminute time span [67].

Data analysis Sample size considerations
Sample size estimates were based on an ability to detect approximately medium effect sizes (ES) in PROMIS scores between stable and acute COPD patients during the 12-week period [68][69][70]. With intraclass correlation of 0.5, a sample size of 81 in each group would enable us to detect a medium between-group ES with 80% power when using a two-sided alpha level of 0.05. Based on our previous research experience and the literature, we anticipated a 10% attrition rate over the 12-week study period. Thus, we targeted a total sample size of 180 based on a 10% attrition rate to account for potential study dropout and missing data for the planned study. All power calculations are based on the methods recommended in Cohen [71] and Kraemer and Thieman [72].
Data was summarized using descriptive statistics (e.g., means and standard deviation for continuous (and ordered) variables; frequency, mode and percentages for categorical variables) for demographic variables and all PRO and clinical measures. All item responses were examined using measures of central tendency (mean, median), spread (standard deviation, range), and response category (frequencies).
Responsiveness is an aspect of construct validity [73] and is estimated by evaluating the relationship between changes in clinical or patient-reported "anchors" and changes in the PRO scores over time; it can be evaluated in intervention studies, clinical trials and observational studies [74]. The COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) initiative [75] has proposed a definition of "responsiveness" as "The ability of an HR-PRO instrument to detect change over time in the construct to be measured." In this paper, we examine responsiveness of the PROMIS measures over 12 weeks in patients with COPD who were recovering from an acute COPD exacerbation.
As detailed in Table 1, we hypothesized that some PROMIS domains (physical function, fatigue, anxiety, dyspnea severity, dyspnea functional limitations, discretionary social activities and social roles) would be impaired during a COPD exacerbation, and recovery from the exacerbation would be associated with improvements in these domains. Associated with changes in these physical and mental health domains, we also hypothesized responsiveness during recovery in Global Health (physical health and mental health). We hypothesized no or small changes over the 12 week recovery period for depression, anger, pain interference, and pain behavior. We hypothesized little change between baseline and 12 weeks on domains reported by participants enrolled in a stable state, and any change observed would be of a lesser magnitude than that observed in the exacerbation group. Thus, we compared the slope of changes in PROMIS and other PRO scores using mixed models (details below). We used Cohen's f 2 as a measure of local effect within a multivariate, mixed-effects regression model [71]. Cohen's f 2 convention is to classify these ES as small (≥0.02), medium (≥0.15) or large (≥0.35).
The mixed-effect model approach provides the flexibility of modeling not only the means of the data (as in the standard linear model), but also individual patients' variation over time. Mixed models take into account the fact that measurements taken close in time are more highly correlated than measurements taken far apart in time. With mixed effect model approach, the slope of change for fixed effect (e.g., between baseline and 12-weeks later) is equivalent to the average group change. Other advantages of using mixed models include its handling of unbalanced research designs and missing data. In model fitting, the mixed model uses all available data from each subject rather than only including data from the subjects who have complete data at all of the time points. While we attempted to over-sample to compensate for the foreseeable loss of subjects due to dropout, the use of mixed models minimizes the impact of data missing throughout the course of the study. We hypothesized that there would be significant improvement (positive slope) in most PROMIS and clinical measures in COPD patients during recovery to a stable period. For each PROMIS and clinical outcome measure, the mixed model included group (fixed effect; i.e., stable and exacerbation), time (fixed effect; i.e., baseline and 12 weeks), and group by time interaction. We anticipated there would be significant group by time interaction effects in the PRO measures hypothesized to show large magnitude change (i.e., physical function, fatigue, anxiety, dyspnea severity, dyspnea functional limitations, discretionary social activities, social roles, Global physical health and Global mental health), illustrating a different slope of change between the stable and exacerbation groups across the 12-week study period for these domains.
Product-moment correlations between clinical measures (6MWT and FEV1) and PROMIS and SGRQ scores were estimated at baseline for stable patients and for patients enrolled in an exacerbated state, when the patients were deemed stable (i.e., 12 weeks after the baseline visit).

Results
Across four sites, 770 individuals were screened. Of those, 288 were ineligible (e.g., heart failure exacerbation, smoking status, altered mental status, etc.), 212 refused participation, 11 consented but withdrew before completing an assessment, and 74 were not enrolled for various and unknown reasons. One hundred individuals with COPD were enrolled in a stable state and 85 were enrolled in an exacerbated state. Of the stable subjects, 90 (90%) completed the study (3 withdrew due to health, 2 died, 3 lost to follow up, 2 withdrew for unknown reasons) and 11 were censored due to exacerbations. Of the 85 in the exacerbation group, 61 (72%) completed the study (2 withdrew due to health, 2 died, 11 lost to follow-up, 9 withdrew for unknown reasons), and 15 had subsequent exacerbations. Thus, at week 12 follow-up, there were 79 (79%) individuals remaining in the stable group and 46 (54%) in the exacerbation group.
We conducted post-hoc analyses to examine whether drop-out differed by group (exacerbation vs stable). Results of the binary logistic regression indicated that the exacerbation group experienced a higher rate of dropout than the stable group (χ 2 (1) = 12.5, p = .0004). We examined whether age, gender, education, physical function and fatigue were associated with dropout and found that only physical function approached significance (p = .0512). When we added physical function in the model, the drop-out rate difference between the groups remained significant (χ 2 (1) = 8.45, p = .0037). There were no significant group by covariate interactions.
Baseline demographic and clinical characteristics of the enrolled subjects are presented in Table 2. Those enrolled in stable and exacerbation states did not differ on any demographic characteristic other than age, with the exacerbation group patients being younger. Exacerbation group participants also reported worse COPD and health at baseline. More patients enrolled in an exacerbation state reported having a COPD diagnosis for less than a year (p = .03), having more exacerbations in the past 12 months (80%, p < .0001), more COPD-related hospitalizations in the past 12 months (72%, p < .0001) and  more COPD-related ED visits in the past 12 months (50%, p < .0001). The PRO scores for the baseline and 12-week followup by baseline enrollment status as well as the responsiveness in PRO measures over 12 weeks, as evaluated by the difference between stable and exacerbation group differences over 12 weeks is summarized in Table 1 and detailed in Table 3. Significant differences were found in anxiety SF/CAT (p = .001/p < .01), fatigue (p < .0001/ p < .0001), social roles (p < .001/p < .05) and depression (p < .05, SF only), with the exacerbation group reporting greater change (improvement) than the stable group. The magnitude of change (ES) for SF/CAT anxiety (0.023/0.021), fatigue (0.036/0.047) and social roles (0.035/0.024), which were hypothesized to be medium/ large/large, respectively, were all small. The ES for depression (0.018) was trivial (less than the threshold for small). Physical function, discretionary social activities, and the Global-physical and Global-mental health scales, all hypothesized to improve during recovery, did not significantly change from baseline and had no to trivial ES. Dyspnea severity and dyspnea functional limitations, also hypothesized to improve during recovery, did not change significantly but showed near-small ES (0.017, 0.015, respectively). All other PROMIS domains that were hypothesized to have small to large ES demonstrated trivial to no effect sizes. The SGRQ symptoms, impacts and total scores (but not activities) also showed significant change from baseline (all p < .0001) and all (but activities) demonstrated small effect sizes.

Discussion
The HRQL of COPD patients during an exacerbation is known to be significantly poorer than COPD patients in a stable state [16,41,68,76], and in our study, participants enrolled during an exacerbation reported worse baseline health than those enrolled in stable status on nearly all HRQL measures. Given the significant symptom burden associated with exacerbations, these known group differences were expected and provide some evidence of construct validity of some PROMIS measures for people with COPD.
Of the nine domains hypothesized to be responsive to recovery from an exacerbation, only three demonstrated statistically significant change using both SF and CAT, and in none of the three was the hypothesized magnitude of change supported by the findings; all demonstrated small effect sizes. We had hypothesized that pain interference, pain behavior, depression and anger would not change over the 12 weeks. Only the change in the depression SF (not CAT) was statistically significant and the ES was trivial (below the threshold for small). Finally, we hypothesized that stable participants (exacerbationfree at enrollment and throughout the study period) would not demonstrate any change in PROs from baseline to the 12-week follow-up. The changes in PRO scores over 12 weeks in the stable group did reflect some change, but the changes were uniformly of lesser magnitude than the changes in the exacerbation group. Given the stability in their COPD during this time, such changes may reflect variations in other life and nonmeasured health events in these patients.
While some of the findings were unexpected, the most curious and unexpected finding was the lack of change in PROMIS physical function. Of note, the SGRQ did not demonstrate change in the activity section of the instrument, which is most similar to the PROMIS physical function. One possibility is that physical function does not change during recovery from an exacerbation, but we reject this notion based on obvious clinical characteristics of these patients. Declines in physical function with the onset of exacerbations are well documented  [48,77,78]. Furthermore, physical function (as measured with PROMIS) has been shown to affect people across the spectrum of disease severity, including mild disease, whereas mental health was impaired only in patients with more severe disease [78]. Lung function does not always return to pre-exacerbation levels following an exacerbation, but the trajectory of recovery of physical function/activity is less well established [79][80][81]. Some reports have documented recovery to (or nearly to) baseline levels [82]. One of the few longitudinal studies evaluating "objectively" measured (accelerometer) physical activity for up to 6 months during COPD exacerbations and periods of clinical stability found that physical activity decreased significantly during exacerbations and persisted for about 2 weeks after symptomatic recovery.
Others have reported a slow recovery trajectory or failure to return to baseline [5,83], leading some to suggest that recovery in health status after an exacerbation may take longer than previously expected [48]. It is possible that improvement in physical function following an exacerbation takes longer than 12 weeks. In addition to the literature that contradicts this [81,84,85], we censored individuals with subsequent exacerbations, which is likely to have eliminated those with an extended recovery trajectory. Finally, these patients may be sedentary in the absence of exacerbation and not regularly testing their own physical function [86,87]. An alternative explanation is that physical recovery occurs, but the PROMIS physical function measures (and SGRQ activity measure) did not detect it. This might have occurred for several reasons. The severity of exacerbations experienced by our participants might not have been of sufficient magnitude to be reflected in physical function recovery; however, 38 of the participants enrolled during an exacerbated state were hospitalized for their exacerbation, suggesting severe exacerbations. It is also possible that the sample size limited the power to detect significant change. The dropout in the exacerbation group was larger than anticipated and resulted in a sample size at follow-up smaller than the power analysis indicated was needed to detect hypothesized change. However, some other domain measures demonstrated significant change, so there appeared to be adequate power for the other domains. Another explanation is the possibility that, for the physical function items, patient responses during an exacerbation did not reflect their true current (i.e., exacerbation) state. Most PROMIS domains have a 7-day context ("In the past 7 days …"), but the physical function items do not have a specific time interval. It is possible that patients were not reflecting on their exacerbation (and often hospitalized) state, but rather they were considering their physical capability prior to deterioration. Thus, the lack of a context might represent an opportunity for participants to variably interpret their "current" state of health. The lack of context may work well in a state without rapid changes (e.g., arthritis), but may not work as well for conditions with acute periods of worsening (e.g., COPD). Some support for this derives from analyses of other chronic conditions. For example, the PRO-MIS physical function scores for stable patients and those recovering from a COPD exacerbation in this study (34)(35)(36)(37)(38) are not altogether dissimilar from physical function scores from samples of patients with back pain and chronic heart failure prior to receiving an intervention (38 and 35, respectively) [36]. However, the back pain and heart failure samples demonstrated notable improvements in physical function scores in response to clinical interventions (all p ≤ 0.001) [36]. Investigators hypothesized that, in contrast with the relatively stable disability associated with back pain and heart failure, acute worsening associated with COPD exacerbations may lead patients to over-report their physical function because they reference their usual state rather than acutely ill state [36]. However, to our knowledge, there is no evidence at present to support the possibility that reframing the question (e.g., "Considering how you feel right now…") would produce different results, so this remains a hypothesis for future research to evaluate.
Similarly, we did not demonstrate significant change in the PROMIS dyspnea severity and functional limitation measures. It's reasonable to expect dyspnea severity and functional limitations to improve during recovery from a COPD exacerbation, as dyspnea is one of the primary symptoms associated with an exacerbation. However, these were newly developed measures and this was the first longitudinal study in which they had been administered. Items on both measures reference the same set of activities, and individuals may not have had the opportunity to perform some of these activities during an exacerbation state, especially if hospitalized (e.g., preparing meals, washing dishes). The MMRC dyspnea scale score also did not improve during the recovery, and it is similarly based on activity (e.g., walking, strenuous exercise, etc.). Although the effect sizes reflecting change in PROMIS dyspnea severity and functional limitations approached the threshold for small magnitude and did not reach statistical significance in this study, prior cross-sectional studies have provided support for the reliability and validity of these measures in COPD [62,63,78] and other chronic lung diseases [88], suggesting that the small sample size might have limited the responsiveness observed here.
Our findings indicate that responsiveness to change was demonstrated for three of the nine PROMIS domains in which change was hypothesized. While there is only limited support for responsiveness among the domains tested, the score differences and the effect sizes reflecting the difference in differences between the two groups reflected greater magnitude of change in the exacerbation group. With the exception of depression, responsiveness to changes was similar for the PROMIS dynamic CATs and corresponding static SFs, suggesting there is no advantage of one administration option over the other, with equivalent precision and responsiveness to clinicians and researchers. CAT administration offers the advantage of minimal participant burden without sacrificing measurement precision, but requires a computer for administration. Short forms can be administered via paper and pencil and do not require a computer for administration. Both were developed with rigorous qualitative and quantitative methodology and offer the advantages of comparability across conditions, reliability, validity, and precision.
SGRQ total, symptoms, and impacts scores significantly discriminated longitudinal change between the stable and exacerbation groups, indicating the health status of these two groups were different, consistent with the intent of the study design. Three PROMIS measures fatigue, anxiety and social roles, also demonstrated differential longitudinal change between the two groups. We note also that the magnitude of effect sizes for the SGRQ symptoms and impacts subscales scores and total score were in the small range, albeit larger than that of the PROMIS measures. This is not unexpected, because one of the putative benefits of disease-specific compared to generic measures is their sensitivity to the disease itself. The SGRQ mean score for symptoms, impacts and total score (but not activities) for the exacerbation group also exceeded the minimal clinically important difference threshold estimates reported in the literature (4-7), including the higher threshold (7+) for patients with severe disease [89][90][91][92][93].
There are several limitations to this study worth noting. Both groups experienced drop-out, but this was most prominent in the exacerbation group, which reduced the available sample size and precision of our estimates of change. However, we were still able to demonstrate responsiveness, i.e., change over the course of recovery from an exacerbation, in some measures that were hypothesized to reflect such a change. Nevertheless, the high dropout rate likely limited our power to detect the hypothesized effect size. The demographic and clinical characteristics of patients lost to follow-up were not significantly different from those of patients that completed the study. Fatigue and physical function (as measured by both PROMIS SF and CAT) were associated with drop out, however, with those who dropped out from the exacerbation group demonstrating the highest levels of fatigue and lowest levels of physical function. The findings still showed strong evidence of responsiveness in the fatigue measures in the exacerbation group, but it is unclear if or how this might have impacted responsiveness of the physical function measures. We recruited participants across four distinct clinical sites, but our sample size precluded our ability to analyze site differences. However, a rigorous threeday face-to-face training was held for all study staff before the study was launched to standardize implementation of the study protocol and ensure consistent recruitment, enrollment and assessment procedures.

Conclusion
This longitudinal study provides some initial support for the responsiveness of the PROMIS fatigue, anxiety and social roles domains relevant to COPD. As responsiveness is one element of a measure's validity [73], this study provides some preliminary evidence of validity of some of the PROMIS measures as well. Further longitudinal studies with larger sample sizes are required to evaluate additional aspects of reliability and validity of the PROMIS measures and their performance in the COPD population. Additional evaluation of selected domains, especially the PROMIS physical function domain, is warranted, given the lack of responsiveness in this study.
Because PROMIS measures are generic, they can be used to assess the relative health status of respondents across diseases or functional characteristics. The multiple options for PROMIS administration (various short form lengths and CAT), allow for a customizable and efficient solution for measuring health outcomes important to patients. PROMIS is a useful tool for tracking select domains of HRQL in patients with COPD for research and possibly for clinical care.