Skip to main content

Meaningful score changes for SF-36v2, FACIT-fatigue, and RASIQ in rheumatoid arthritis



Interpretation thresholds for patient-reported outcome (PRO) scores are of crucial importance, particularly when interpreting treatment benefit. This study was designed to determine the within-patient meaningful improvement (WPMI) thresholds for the Short-Form 36 Health Survey version 2 (SF-36v2), the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue), and the novel Rheumatoid Arthritis Symptoms and Impact Questionnaire (RASIQ) among patients with rheumatoid arthritis (RA).


In this post-hoc analysis, anchor-based and supportive distribution-based methods were used to derive WPMI based on blinded data from all treatment arms in two Phase 2 RA trials with otilimab. Patient’s Global Assessment of Disease Activity (PtGA) was the general anchor for all SF-36v2 scales. SF-36 Patient’s Global Impression of Status (PGIS), PtGA, and VT03 (an SF-36v2 item) were used as anchors for FACIT-Fatigue. SF-36 PGIS, PtGA, and Patient’s Assessment of Arthritis Pain (PAIN) were anchors for RASIQ. Mean change was calculated for the anchor category associated with minimal meaningful improvement from baseline to Week 24 for SF-36v2 and FACIT-Fatigue, and to Week 12 for RASIQ. Sensitivity and specificity were used to evaluate the accuracy of estimated WPMI values.


For the SF-36v2 physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health domains, anchor-based estimates of WPMI based on 0–100 scores were 24.5, 24.5, 25.4, 13.6, 21.5, 20.5, 16.9, and 14.3, respectively. Anchor-based WPMI estimates were 9.7 for the Physical Component Summary score and 7.6 for the Mental Component Summary score (using norm-based T-score metric). For FACIT-Fatigue (range 0–52), WPMI estimates ranged from 9.7 to 11.3 points. For RASIQ (range 0–100), anchor-based WPMI was determined as a change between -32.7 and -21.7 points for the Joint Pain scale, -26.7 to -23.7 for the Joint Stiffness scale, and -21.1 to -17.4 for the Impact scale.


This study derived WPMI thresholds for SF-36v2, FACIT-Fatigue, and RASIQ among patients with RA, using multiple anchors. Derivation of WPMI thresholds for these PRO instruments will enable their broader use in evaluating and interpreting treatment benefit in future RA studies.

Plain english summary

When assessing medical treatments in clinical trials, it is important to understand whether the treatment improves symptoms or impacts of a disease to an extent which is meaningful for patients. Patients are often asked to complete questionnaires about their symptoms throughout clinical trials to measure if and how symptoms change. Questionnaire responses are used to calculate a score that is compared before and after treatment. This study was designed to investigate how much scores in three questionnaires (SF-36v2, FACIT-Fatigue, and RASIQ) changed for patients with rheumatoid arthritis who reported experiencing meaningful symptom improvement based on data from two clinical trials. As the RASIQ is a new questionnaire that was designed specifically for rheumatoid arthritis, this research is particularly important for interpretation of RASIQ results.


Thresholds of meaningful score change for patient-reported outcomes (PROs) are of crucial importance, particularly when assessing and interpreting treatment benefit. Within-patient meaningful improvement (WPMI) represents the smallest difference in an outcome measure which is considered by patients to be beneficial [1, 2]. As recommended by the United States Food and Drug Administration, appropriate thresholds that indicate clinically meaningful within-patient change should be established a priori via anchor-based methods, using anchors such as the Patient Global Impression of Severity [3]. These thresholds for WPMI can subsequently be used to interpret clinical trial data.

The Short-Form 36 Health Survey version 2 (SF-36v2) and Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue) are both PRO instruments used to quantify key concepts important to patients with various diseases, including rheumatoid arthritis (RA) [4]. Although both are well-established instruments, this study aims to contribute to the ability to interpret results obtained from these instruments by estimating thresholds for WPMI. More recently, the Rheumatoid Arthritis Symptoms and Impact Questionnaire (RASIQ) was developed to specifically evaluate the symptoms of RA and their impact on patients [5]. Establishing interpretation thresholds of the score change for both new and established PRO instruments furthers understanding of results obtained from these PROs.

This post-hoc analysis used data from the Phase 2 BAROQUE (NCT02504671) [6] and RENAISSANCE (NCT02799472) [7] otilimab trials to determine the WPMI thresholds for SF-36v2, FACIT-Fatigue, and RASIQ among patients with RA. This study extends prior work by comparing previously established interpretation thresholds for SF-36v2 and FACIT-Fatigue [8,9,10] to those obtained using data from the otilimab trials, and establishing WPMI thresholds for RASIQ.


WPMI thresholds for SF-36v2, FACIT-Fatigue, and RASIQ were established using anchor-based methods, with supportive distribution-based methods and measures of accuracy (sensitivity and specificity) used to further triangulate across the estimates obtained from different anchors. Cumulative distribution function (CDF) plots were also generated to illustrate how well anchor-based change categories were separated across the entire range of RASIQ scale change scores.

Survey content and scoring

The SF-36v2 is a 36-item, self-report survey of functional health and well-being that is scored as two component summary scores (physical and mental health) and as eight domain scores; physical functioning (PF), role limitations due to physical health (RP), bodily pain (BP), general health perceptions (GH), vitality (VT), social functioning (SF), role limitations due to emotional problems (RE), and mental health (MH) [11].

For the eight domain scores, results are presented using a score range from 0 (worst possible health) to 100 (best possible health). Additional File 1 reports results for norm-based scores (NBS), which standardize scale and component scores using the means and standard deviations (SD) from a US general population normative sample [11]. The Physical Component Summary (PCS) and Mental Component Summary (MCS) scores are always based on NBS, using a mean of 50 and a SD of 10 in the US adult general population, with higher scores indicating better health.

The 13-item FACIT-Fatigue questionnaire assesses self-reported fatigue and its impact upon daily activities and function over the past 7 days; item responses are added with equal weight to obtain the total score which ranges from 0 (most fatigue) to 52 (least fatigue) [12].

RASIQ is a novel measure comprised of 16 items across three domains (Joint Pain [JP], Joint Stiffness [JS], and Impact [IM]). Scores from each item are summed and transformed to a metric ranging from 0 (least pain/stiffness/impact) to 100 (most pain/stiffness/impact) [5].

Data sources

BAROQUE [6] was a randomized, Phase 2b, dose-adaptive, multi-center, double-blind, placebo-controlled trial which assessed the efficacy of the anti-granulocyte-macrophage colony-stimulating factor monoclonal antibody, otilimab, in patients with active, moderate-to-severe RA despite treatment with methotrexate. RENAISSANCE [7] was a Phase 2a, multi-center, double-blind, placebo-controlled trial which evaluated change from baseline in various exploratory biomarkers among patients with RA treated with otilimab. While both trials included the RASIQ, the SF-36v2 and FACIT-Fatigue were only used in the BAROQUE trial, and completed at baseline, Weeks 4, 12, 24, 36, and 52, and follow-up. The RASIQ was completed at screening, baseline, Weeks 1, 6, and 12, and follow-up in the RENAISSANCE study, and at Weeks 1, 12, 24, 36, and 52, and follow-up in the BAROQUE trial.

Data from baseline to Week 24 in the BAROQUE trial were used in the SF-36v2 and FACIT-Fatigue analyses. Pooled data from baseline to Week 12 in the BAROQUE and RENAISSANCE trials were used in the analyses of RASIQ.

Anchor items

The general anchor for all SF-36v2 scales was the Patient’s Global Assessment of Disease Activity (PtGA), with scores ranging from 0 (very well) to 100 (very poor). In addition, Patient’s Assessment of Arthritis Pain (PAIN; scores range from 0 [no pain] to 100 [most severe pain]) was used as an anchor for the BP scale. One item from the FACIT-Fatigue questionnaire, AN5 (I have energy; Not at all / A little bit / Somewhat / Quite a bit / Very much), was used as an anchor for the VT scale.

The PtGA item and two items from the SF-36v2 were used as anchors for FACIT-Fatigue. The first SF-36v2 item assessed the Patient’s Global Impression of Status (PGIS; In general, would you say your health is: Excellent / Very good / Good / Fair / Poor), and the second item focused on fatigue (How much of the time during the past 4 weeks did you feel worn out? Not at all / A little bit / Somewhat / Quite a bit / Very much).

WPMI analyses of RASIQ were based on the SF-36 PGIS, PtGA, PAIN, and additional items on pain and overall impact. The SF-36 PGIS and PtGA were used as anchors for all RASIQ scales. The PAIN and one SF-36v2 item focused on pain (How much bodily pain have you had during the past 4 weeks? None / Very mild / Mild / Moderate / Severe / Very severe) were used as additional anchors for the JP scale, and two FACIT-Fatigue items (I feel tired and I feel listless [washed out]: response scale for both; Not at all / A little bit / Somewhat / Quite a bit / Very much) were used as additional anchors for the IM scale. Full details of the anchors used are shown in Table 1.

Table 1 Anchors used for SF-36v2, FACIT-Fatigue, and RASIQa

For categorical anchors, a one-point (or one-category) improvement was deemed to be associated with the smallest meaningful change indicating improvement. The categorizations of change groups for anchors that used a continuous metric were based on results from studies that established thresholds for within-person change for the same measure and among a sample of patients with RA [13]. For PtGA, a value of -18 was used, and for PAIN, a value of -20 was used.

Statistical analysis

The association between change in each PRO score and the proposed anchors was evaluated using the Spearman correlation coefficient with a recommended value of at least 0.30 indicating adequacy of the anchor [14, 15]. WPMI was estimated as the mean score change from baseline to Week 12 or 24 in the group associated with the smallest meaningful improvement in each corresponding anchor. Effect sizes were calculated using standardized response mean (SRM) to better compare the magnitude of the mean change scores, using:

$$ SRM=\frac{{\stackrel{-}{X}}_{change}}{{SD}_{change}} $$

where the numerator consists of the mean of the change scores and the denominator is the SD of the same change score.

The reliable change index (RCI) was used to identify change that can be considered beyond measurement error [16]. First, the standard error of the measurement (SEM) was calculated using:

$$ SEM= {SD}_{baseline}\text{*}\sqrt{1-reliability }$$

Reliability was estimated using Cronbach’s alpha [17], a measure based on inter-item correlations. As a sensitivity analysis, reliability was also estimated using the omega coefficient [18] and the greatest lower bound [19]. These analyses gave very similar results. Next, the RCI was calculated using:

$$ RCI= \sqrt{2 } \times SEM \times 1.282$$

In the equation above, 1.282 is taken from the standardized normal distribution; it represents the half-width of the 80% confidence interval, which is a reasonable criterion for individual respondents proviiding an appropriate balance between the risks of falsely identifying change and overlooking true change [11]. Half of a standard deviation (based on baseline scores) is also reported for completeness, as this has been advocated by researchers in the field [20].

Sensitivity and specificity were used as measures of accuracy to characterize and compare the various anchor-based estimates. Sensitivity indicates the likelihood of correctly identifying a truly improved individual, while specificity indicates the likelihood that an individual that has not improved is correctly classified as such. For the current analyses, the anchor was used as the gold-standard, while the PRO measure was used as the classification or ‘test’ variable.

The CDF plots of change scores were used to better understand the separation between anchor-based change groups across the entire range of observed RASIQ change scores. CDF plots were obtained for each RASIQ scale using the respective anchors, focusing on the anchor category where the patients are defined by the anchor measure as having experienced meaningful change. A consistent separation across the score range between the curve for this category of change and those of adjacent groups indicates support for the anchor.


Estimated WPMIs of SF-36v2

The correlation between the SF-36v2 PF, RP, BP, VT, SF, and PCS change scores and change in PtGA ranged between -0.30 and -0.48 (absolute value), as shown in Table 1. For the four remaining SF-36v2 scales (GH, RE, MH, and MCS), correlations ranged between -0.22 and -0.29, indicating that the PtGA is not an empirically adequate anchor for these scales.

Anchor-based WPMI values for the eight SF-36v2 0–100 domain scores ranged between 13.6 for the GH scale (with an SRM of 0.87) and 26.6 for the BP scale (with an SRM of 1.73) (Table 2). PCS and MCS had WPMI estimates based on NBS of 9.7 and 7.6, and SRM of 1.47 and 0.70, respectively. The accuracy measures of these threshold values for identifying meaningful improvement indicated that for most scales, the thresholds have better sensitivity (0.66 to 0.87) than specificity (0.43 to 0.58).

Table 2 Anchor- and distribution-based estimates for the SF-36v2, FACIT-fatigue, and RASIQ

RCI-based estimates were 12.7 for PF, 10.7 for RP, 11.1 for BP, 16.7 for GH, 13.1 for VT, 19.7 for SF, 12.1 for RE, 13.3 for MH, 4.1 for PCS, and 6.0 for MCS (Table 2). Estimates based on 0.5 SD were 9.8, 9.1, 7.6, 8.2, 8.3, 10.5, 12.4, 9.5, 3.4, and 5.4, respectively.

WPMI estimates based on mean change were generally similar, although slightly smaller in some cases (e.g., for the MH scale), to those provided by the cut point associated with the best balance between sensitivity and specificity (Additional File 2, Supplementary Table S1). CDF curves generally mirrored correlation values, with PtGA-based curves being less separated for GH, SF, and RE domains (Additional File 2, Supplementary Figure S1); similarly, PtGA-based CDF curves for PCS were more clearly separated when compared to MCS (Additional File 2, Supplementary Figure S2).

Estimated WPMIs of FACIT-fatigue

Anchor-based WPMI estimates ranged from 9.7 to 11.3 (SRM 0.99–1.15; Table 2). RCI generated a value of 4.9. The cut point associated with the best sensitivity/specificity balance was slightly smaller than the values obtained with mean change analyses (Additional File 2, Supplementary Table S2). A clear separation between all CDF curves was observed (Additional File 2, Supplementary Figure S3).

Estimated WPMIs of RASIQ

Joint pain scale

Analysis of mean change scores for a one-point improvement in the SF-36 PGIS indicated that a meaningful improvement in the RASIQ’s JP scale was equal to a 24.0-point reduction in score (Table 2). The estimate for the BP anchor (BP01) was -21.7 while the two anchors that are based on a binary categorization of a continuous scale (PAIN and PtGA) provided higher WPMI estimates (-32.7 and -31.0, respectively).

RCI generated a value of -6.8. A clear separation between the CDF curves was observed (Additional File 2, Supplementary Figure S4).

For the values found in the anchor-based analyses (Table 2), sensitivity (range: 0.73–0.94) was higher than specificity (range: 0.49–0.58). For example, with SF-36 PGIS as the anchor, at a threshold of -24.0, the sensitivity was 0.73 while the specificity was 0.56, indicating better performance at correctly classifying patients who have improved than those who have not improved.

Joint stiffness scale

Analysis of mean change scores based on an improvement of one point or better in SF-36 PGIS indicated that a meaningful improvement in the RASIQ JS scale was equal to a 23.3-point reduction in score (Table 2). When using PtGA as the anchor, the estimate was approximately 3 points higher (-26.1) in absolute value. Estimates based on RCI (-13.1) were approximately half of those obtained under the mean change score analysis, with the estimate based on 0.5 SD equal to -9.1.

A clear separation between the CDF curves was observed (Additional File 2, Supplementary Figure S5).

Impact scale

Analysis of mean change scores based on an improvement of at least one point in SF-36 PGIS indicated that a meaningful improvement in the RASIQ’s Impact scale translated to a 21.0-point reduction in score, which was nearly identical to the estimate obtained under the PtGA anchor (Table 2). The remaining two anchors, AN2 (I feel tired) and AN1 (I feel listless), resulted in estimates that were slightly smaller (in absolute value) at -17.4 and − 17.8, respectively. The 0.5 SD and RCI criteria resulted in values of -7.8 and -12.8, respectively.

The CDF plots indicate that the curves obtained for each change group were generally separated, except for the plot corresponding to the AN1 anchor (Additional File 2, Supplementary Figure S6).

At a threshold of -21.0, the sensitivity and specificity were 0.89/0.48 when SF-36 PGIS was the anchor; the estimate of -21.1 associated with the PtGA anchor resulted in values of sensitivity and specificity equal to 0.93 and 0.44, respectively. For the smaller WPMI estimates based on the two FACIT-Fatigue items– AN2 and AN1– sensitivity values were slightly lower (0.85/0.79) and specificity slightly higher (0.57/0.58).


Understanding the thresholds for within-patient meaningful change scores for PRO instruments is important for assessing and interpreting benefits of a treatment. In this study, we sought to determine the WPMI thresholds for SF-36v2, FACIT-Fatigue, and RASIQ among patients with RA. As the RASIQ is a new questionnaire that was designed specifically for RA, this research will allow increased use of the measure in the future.

For SF-36v2 NBS scores, most of the WPMI estimates obtained in the current study (using mean change score in the anchor category) were similar or up to 2 times greater than those recommended by the developers, which were derived from the US general population (average number of chronic conditions reported was 2.6 [SD = 2.5]) [11]. It should be noted that the thresholds for within-individual change recommended by the developers were based on SEM around the change score (similar to RCI) rather than confidence intervals for observed change based on patient-rated anchors. WPMI estimates for SF-36v2 items based on 0–100 scores were substantially greater (by a magnitude of 3 to 5 times) than those that have been applied to RA trial data, which identify meaningful within-patient change using a change score of 5 points for the eight SF-36v2 scales [21, 22]. For FACIT-Fatigue, the WPMI estimates ranged between 9.7 and 11.3; again, these estimates are higher than those used in previous studies with patients with RA [22]. A couple of factors should be noted as likely contributors to overestimation of WPMI. Firstly, the operationalization of PtGA and PAIN (i.e., their dichotomization) did not distinguish between a large and a small improvement in health status; however, the analysis of mean change assumes a category of small but meaningful change. In addition, simulation studies have shown that mean change analyses often overestimate the threshold for meaningful change [23].

Factors beyond methodological aspects of the analyses should be noted as potential drivers of the differences between current and previous results. Earlier studies have used different methods/anchors, whereas the anchors used in this study were specific for patients with moderate/severe RA. Patient demographic and clinical characteristics can also influence WPMI estimates. In addition, the commonly used 5-point thresholds for 0–100 scores of the eight domains of the SF-36v2 and 2.5-point thresholds for the two summary measures (PCS and MCS), as well as those recommended for NBS, were established some time ago and have not been frequently re-evaluated, particularly in the RA patient population. Over time, meaningful improvement scores may have changed with the improvement of treatments, more effective patient care, and increased patient awareness of disease management. A likely driver behind using the 2.5- and 5-point thresholds is the metric underlying these scores, rather than empirical findings based on analyses similar to those carried out in the current study. NBS scores are set to have a mean of 50 and a SD of 10 (based on the US general population); 0.5/0.25 of a 10-point SD is ~ 5 points and 2.5 points, which have been common metrics [24, 25].

Based on available anchors, of which SF-36 PGIS was considered the primary anchor, our analyses for RASIQ indicate a change between approximately -33 and -22 points in the JP scale score (range: 0–100) could be interpreted as being meaningful for patients; for the JS scale (range: 0–100), this range was approximately -24 to -27, while for the IM scale (range: 0–100), the range of change scores was approximately -21 to -17. For all three scales, distribution-based results indicated that the changes within these ranges were well beyond error that would occur by chance in the measurement process. Overall, anchor-based estimates were associated with high values for sensitivity, indicating that the WPMI estimates were good at identifying patients who improved; values for specificity were low, indicating that these thresholds may have included a lot of patients that were not “truly” improved. Further studies and/or assessment with other measures is therefore warranted.


Only anchors that were included in the two Phase 2 trials were available in the current study. These anchors were not specifically developed for the purposes of deriving WPMI thresholds and did not include patients’ direct assessment of change. As a result, for some SF-36v2 domain scales, the anchor used was not sufficiently correlated with the scale it was intended to detect signal from. RASIQ is a novel PRO instrument, hence there is limited published literature against which our findings can be compared. Due to trial assessments being too far apart, we calculated the SEM using measures of internal consistency reliability across all PROs, which is a further shortcoming given that some researchers would recommend SEM is calculated from a measure of test-test reliability. For all three PRO instruments, our analyses were limited to estimation of thresholds related to improvement. Further work is needed to estimate interpretation thresholds that indicate decline and worsening of symptoms, to confirm the values derived in the current study, and to allow exploration of the potential non-linearity across score distributions (the latter of which was not possible due to insufficient sample size).


This study derived WPMI thresholds for SF-36v2, FACIT-Fatigue, and RASIQ, using multiple anchors. Derivation of WPMI thresholds for these PRO instruments will enable their broader use to assist with evaluation and interpretation of treatment benefit in future RA studies.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.





bodily pain


cumulative distribution function


confidence interval


Functional Assessment of Chronic Illness Therapy-Fatigue


general health perceptions




Joint Pain


Joint Stiffness


Mental Component Summary


mental health


norm-based scores


Patient’s Assessment of Arthritis Pain


Physical Component Summary


physical functioning


Patient’s Global Impression of Status


patient-reported outcomes


Patient’s Global Assessment of Disease Activity


rheumatoid arthritis


Rheumatoid Arthritis Symptoms and Impact Questionnaire


reliable change index


role limitations due to emotional problems


receiver operating characteristic


role limitations due to physical health


standard deviation


standard error of the measurement


social functioning


Short-Form 36 Health Survey version 2


standardized response mean




within-patient meaningful improvement


  1. Crosby RD, Kolotkin RL, Williams GR (2003) Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 56(5):395–407.

    Article  PubMed  Google Scholar 

  2. Byrom B, Breedon P, Tulkki-Wilke R, Platko JV (2020) Meaningful change: defining the interpretability of changes in endpoints derived from interactive and mHealth technologies in healthcare and clinical research. J Rehabil Assist Technol Eng 7:2055668319892778.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. US Food and Drug Administration (2019) Incorporating clinical outcome assessments into endpoints for regulatory decision-making. Available via Accessed 19 April 2023

  4. Kalyoncu U, Dougados M, Daurès JP, Gossec L (2008) Reporting of patient-reported outcomes in recent trials in rheumatoid arthritis: a systematic literature review. Ann Rheum Dis 68(2):183–190.

    Article  PubMed  Google Scholar 

  5. Becker B, Bracher M, Chauhan D, Rendas-Baum R, Lin X, Raymond K, O’Connor M, Kosinski M (2021) Development, psychometric evaluation and cognitive debriefing of the rheumatoid arthritis symptom and impact questionnaire (RASIQ). J Patient Rep Outcomes 5(1):129.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Buckley CD, Simón-Campos JA, Zhdan V, Becker B, Davy K, Fisheleva E, Gupta A, Hawkes C, Inman D, Layton M, Mitchell N, Patel J, Saurigny D, Williamson R, Tak PP (2020) Efficacy, patient-reported outcomes, and safety of the anti-granulocyte macrophage colony-stimulating factor antibody otilimab (GSK3196165) in patients with rheumatoid arthritis: a randomised, phase 2b, dose-ranging study. Lancet Rheumatol 2(11):e677–e688.

    Article  Google Scholar 

  7. Genovese MC, Berkowitz M, Conaghan PG, Peterfy C, Davy K, Fisheleva E, Gupta A, Inman D, Janiczek R, Layton M, Mitchell N, Patel J, Roberts A, Saurigny D, Smith JE, Williamson R, Tak PP (2020) MRI of the joint and evaluation of the granulocyte–macrophage colony-stimulating factor–CCL17 axis in patients with rheumatoid arthritis receiving otilimab: a phase 2a randomised mechanistic study. Lancet Rheumatol 2(11):e666–e676.

    Article  Google Scholar 

  8. Witt S, Krauss E, Barbero MAN, Müller V, Bonniaud P, Vancheri C, Wells AU, Vasakova M, Pesci A, Klepetko W, Seeger W, Crestani B, Leidl R, Holle R, Schwarzkopf L, Guenther A (2019) Psychometric properties and minimal important differences of SF-36 in idiopathic pulmonary fibrosis. Respir Res 20(1):47.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Gossec L, Steinberg G, Rouanet S, Combe B (2015) Fatigue in rheumatoid arthritis: quantitative findings on the efficacy of tocilizumab and on factors associated with fatigue. The French multicentre prospective PEPS study. Clin Exp Rheumatol 33(5):664–670

    PubMed  Google Scholar 

  10. Jolly M, Annapureddy N, Arnaud L, Devilliers H (2019) Changes in quality of life in relation to disease activity in systemic lupus erythematosus: post-hoc analysis of the BLISS-52 trial. Lupus 28(14):1628–1639.

    Article  CAS  PubMed  Google Scholar 

  11. Maruish ME (2011) User’s manual for the SF-36v2 Health Survey, 3rd edn. Quality Metric Incorporated: Lincoln, RI

    Google Scholar 

  12. Cella D, Yount S, Sorensen M, Chartash E, Sengupta N, Grober J (2005) Validation of the Functional Assessment of Chronic illness therapy fatigue scale relative to other instrumentation in patients with rheumatoid arthritis. J Rheumatol 32(5):811–819

    PubMed  Google Scholar 

  13. Ward MM, Guthrie LC, Alba MI (2015) Clinically important changes in individual and composite measures of rheumatoid arthritis activity: thresholds applicable in clinical trials. Ann Rheum Dis 74(9):1691–1696.

    Article  PubMed  Google Scholar 

  14. Coon C (2016) Empirical telling the interpretation story: the case for strong anchors and multiple methods. Qual Life Res 25(1):1–2.

    Article  Google Scholar 

  15. Revicki D, Hays RD, Cella D, Sloan J (2008) Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 61(2):102–109.

    Article  PubMed  Google Scholar 

  16. Jacobson NS, Truax P (1991) Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 59(1):12–19.

    Article  CAS  PubMed  Google Scholar 

  17. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334.

    Article  Google Scholar 

  18. McDonald RP (1999) Test theory: a unified treatment. Lawrence Erlbaum Associates, Mahwah, NJ

    Google Scholar 

  19. Sijtsma K (2009) On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika 74(1):107–120.

    Article  PubMed  Google Scholar 

  20. Norman GR, Sloan JA, Wyrwich KW (2004) The truly remarkable universality of half a standard deviation: confirmation through another look. Expert Rev Pharmacoecon Outcomes Res 4(5):581–585.

    Article  PubMed  Google Scholar 

  21. van Mulligen E, Weel A, Kuijper TM, Hazes JMW, van der Helm-van Mil AHM, de Jong PHP (2020) The impact of a disease flare during tapering of DMARDs on the lives of rheumatoid arthritis patients. Semin Arthritis Rheum 50(3):423–431.

    Article  PubMed  Google Scholar 

  22. Emery P, Kavanaugh A, Bao Y, Ganguli A, Mulani P (2015) Comprehensive disease control (CDC): what does achieving CDC mean for patients with rheumatoid arthritis? Ann Rheum Dis 74(12):2165–2174.

    Article  CAS  PubMed  Google Scholar 

  23. Bjorner JB, Terluin B, Trigg A, Hu J, Brady KJS, Griffiths P (2022) Establishing thresholds for meaningful within-individual change using longitudinal item response theory. Qual Life Res 32(5):1267–1276.

    Article  Google Scholar 

  24. Lubeck DP (2004) Patient-reported outcomes and their role iin the assessment of rheumatoid arthritis. PharmacoEconomics 22(2 Suppl 1):27–38.

    Article  PubMed  Google Scholar 

  25. Strand V, Boers M, Idzerda L, Kirwan JR, Kvien TK, Tugwell PS, Dougados M (2011) It’s good to feel better but it’s better to feel good and even better to feel good as soon as possible for as long as possible. Response criteria and the importance of change at OMERACT 10. J Rheumatol 38(8):1720–1727.

    Article  PubMed  Google Scholar 

Download references


Editorial support (in the form of writing assistance, including preparation of the draft manuscript under the direction and guidance of the authors, collating and incorporating authors’ comments for each draft, assembling tables and figures, grammatical editing, and referencing) was provided by Kathryn Wardle of Apollo, OPEN Health Communications, and funded by GSK, in accordance with Good Publication Practice (GPP) guidelines (


This study was funded by GSK (GSK study 212734). The sponsor was involved in study conception and design, data interpretation, and the decision to submit the article for publication. The sponsor was also given the opportunity to review the manuscript for medical and scientific accuracy as well as intellectual property considerations.

Author information

Authors and Affiliations



Study concept/design: RRB, XL, MK, WHC, MB, JBB. Data acquisition, analysis, or interpretation: RRB, XL, MK, WHC, MB, JBB. All authors reviewed and critically revised the manuscript for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

Corresponding author

Correspondence to Wen-Hung Chen.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was not required for this post-hoc secondary analysis of existing Phase 2 clinical trial data. However, patients provided consent for such secondary data analyses in the informed consent forms for the BAROQUE and RENAISSANCE clinical trials.

Consent for publication

Not applicable.

Competing interests

RR-B, MK, and JBB are employees of QualityMetric Inc, who received consulting fees for the conduct of this analysis but not for manuscript development. XL was an employee of QualityMetric Inc at the time of the study and is currently an employee of, and holds stocks/shares in, Biogen. MGB and W-HC are employees of, and holds stocks/shares in, GSK.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Additional File 1

Additional File 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rendas-Baum, R., Lin, X., Kosinski, M. et al. Meaningful score changes for SF-36v2, FACIT-fatigue, and RASIQ in rheumatoid arthritis. J Patient Rep Outcomes 8, 9 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: