Psychometric properties of FACIT-Fatigue in systemic lupus erythematosus: a pooled analysis of three phase 3 randomised, double-blind, parallel-group controlled studies (BLISS-SC, BLISS-52, BLISS-76)

Background Fatigue is a key symptom in patients with systemic lupus erythematosus (SLE), and regulatory bodies recommend its assessment in clinical trials of SLE therapies. Methods This post hoc pooled analysis of the three BeLimumab In Subjects with Systemic lupus erythematosus (BLISS) Phase 3 randomised, double-blind, parallel-group controlled trials evaluated the measurement properties of the Functional Assessment of Chronic Illness Therapy (FACIT)-Fatigue. Patients (N = 2520) completed the FACIT-Fatigue every 4 weeks from baseline until the end of each study period. Internal consistency, test–retest reliability, convergent validity, and ability to detect changes in SLE were evaluated for the FACIT-Fatigue. Results The FACIT-Fatigue showed good internal consistency reliability (Cronbach’s alpha > 0.90), very good test–retest reliability (0.76 ≤ intraclass correlation coefficient ≤ 0.92), and moderate-strong convergent validity (0.49 ≤ |r| ≤ 0.86) against scale and summary measure scores from the Short Form 36 Health Survey Version 2. Correlations between FACIT-Fatigue and British Isles Lupus Assessment Group (BILAG) General/Musculoskeletal scores (0.24 ≤ |r| ≤ 0.43) supported convergent validity. Correlations between FACIT-Fatigue and the Safety of Estrogens in Lupus Erythematosus National Assessment-Systemic Lupus Erythematosus Disease Activity Index (SELENA-SLEDAI) scores and SLE annualised flare rate were weak but in the expected direction (ranging from − 0.02 to − 0.25). Known-groups validity testing showed that the FACIT-Fatigue can significantly discriminate between patient groups with differing scores for SELENA-SLEDAI, BILAG (General and Musculoskeletal) ratings, and Physician’s Global Assessment (PGA). Patients showing improvement in PGA and meeting the BILAG responder criteria had significantly higher mean improvement in FACIT-Fatigue scores than those without improvements in either measure (Week 52 mean score difference [95% confidence interval]: − 4.0 [− 5.0, − 3.0] and −2.2 [−3.1, −1.2], respectively; both p < 0.0001). The range of important (i.e. meaningful) change in FACIT-Fatigue, based on multiple anchors, was 3–6 points. Conclusions The FACIT-Fatigue demonstrated adequate psychometric properties in patients with SLE. The body of evidence from the three BLISS trials (both pooled and individually) supports the FACIT-Fatigue as a reliable and valid measure of SLE-related fatigue in clinical trials. Clinical trial identifiers BLISS-SC (NCT01484496), BLISS-52 (NCT00424476), and BLISS-76 (NCT00410384). Supplementary Information The online version contains supplementary material available at 10.1186/s41687-021-00298-x.


Background
Systemic lupus erythematosus (SLE) is a chronic autoimmune disease characterised by diverse clinical manifestations and flares associated with organ damage [1]. Fatigue is the most common symptom experienced by patients with SLE, affecting approximately 50-86% of the SLE population [2][3][4]. Fatigue in SLE has a multifactorial aetiology [5], and is considered one of the most debilitating symptoms of the disease by many patients, associated with decreased quality of life and increased work disability [4,6]. Given the substantial burden of fatigue on patients with SLE, regular assessment is important to ensure it is managed optimally, with strategies tailored to the individual patient [7]. Furthermore, both the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) recognise that fatigue is an important symptom of SLE, and recommend that this should be assessed in clinical trials of SLE therapies [8,9], using a reliable patient-reported outcome (PRO) measure [8].
The Functional Assessment of Chronic Illness Therapy (FACIT)-Fatigue questionnaire is a PRO instrument that captures multiple aspects of physical and mental fatigue, and their effects on function and daily living [10]. There is only limited evidence currently available regarding whether the FACIT-Fatigue is an appropriate instrument to measure SLE-related fatigue in clinical trials. Based on this limited evidence, the FACIT-Fatigue appears to have good psychometric properties in patients with SLE [10,11], and findings from cognitive debriefing interviews demonstrate that items of the FACIT-Fatigue scale are relevant and understood by patients with SLE [10]. In an analysis of data from a 52-weekplacebo-controlled trial in patients with moderately to severely active SLE [12], the FACIT-Fatigue showed good internal consistency reliability (Cronbach's alpha > 0.95 at all visits) and detected improvement in clinical outcome measures of SLE disease activity (British Isles Lupus Activity Group [BILAG] General and BILAG Musculoskeletal) and worsening in patients' global assessment [11]. Results also showed significant and longitudinally consistent correlations between the FACIT-Fatigue and other measures including pain intensity and interference, the Short Form 36 Health Survey Version 2 (SF-36v2) Vitality domain, Physical component score and Mental component score, and patients' global assessment of disease activity, supporting the convergent validity of the FACIT-Fatigue in patients with SLE [11]. Furthermore, a systematic literature review of PRO measures targeting key SLE symptoms and impacts concluded that of the three most relevant fatigue instruments, the FACIT-Fatigue demonstrated the strongest evidence of reliability, validity, and ability to detect change in an SLE population, as well as qualitative evidence of the relevance of item content to patients with SLE [13].
To strengthen the currently available evidence in this area, the present study investigated the psychometric properties (internal consistency, test-retest reliability, convergent validity, known-groups validity, and ability to detect changes in SLE) of the FACIT-Fatigue in SLE using data from three randomised controlled trials (RCTs) of belimumab [14][15][16], a human immunoglobulin G1-λ monoclonal antibody approved for the treatment of SLE in adults (Europe and USA) and children (USA only) who are receiving standard therapy [17][18][19].

Study design
This was a post hoc analysis (GSK Study 209013) of data from three Phase 3 randomised, double-blind, parallel-group controlled BeLimumab In Subjects with Systemic lupus erythematosus (BLISS) studies that compared the safety and efficacy of belimumab in patients with SLE. Full details of these three trials have been published elsewhere: BLISS-SC (NCT01484496) [16], BLISS-52 (NCT00424476) [14], and BLISS-76 (NCT00410384) [15].
Briefly, the primary efficacy endpoint in all three trials was the SLE Responder Index (SRI) response rate at Week 52 (this endpoint was met in all trials, and showed a significantly higher SRI response rate with belimumab than placebo) [14][15][16]. Patients were randomised and treated as follows: 2:1 to subcutaneous belimumab 200 mg (n = 556) or placebo (n = 280) in BLISS-SC, 1:1:1 to intravenous placebo (n = 287) or belimumab 1 mg/kg (n = 288) or 10 mg/ kg (n = 290) (BLISS-52), and 1:1:1 to intravenous placebo (n = 275) or belimumab 1 mg/kg (n = 271) or 10 mg/kg (n = 273) in BLISS-76. All patients also received standard therapy. Patients were considered for inclusion if they were ≥ 18 years of age, had a clinical diagnosis of SLE according to American College of Rheumatology criteria, had active SLE, were autoantibody-positive, and were on a stable SLE treatment regimen (that may have included corticosteroids and/or immunosuppressants). Exclusion criteria included severe lupus nephritis, central nervous system lupus or prior treatment with a B-cell-targeted therapy (including rituximab), intravenous cyclophosphamide, or prednisone. Studies were performed in accordance with the Declaration of Helsinki 2008 and approval of institutional review boards; all patients read and signed an informed consent form in addition to providing verbal consent to participate and be audio recorded during interviews.

Outcome measures FACIT-Fatigue
The FACIT-Fatigue is a self-administered13-item questionnaire that assesses patient-reported fatigue and its impact upon daily activities and function over the prior 7 days [10]. The questionnaire assesses physical fatigue (e.g. "I feel tired"), functional fatigue (e.g. "trouble finishing things"), emotional fatigue (e.g. "I am frustrated by being too tired to do the things I want to do"), and social consequences of fatigue (e.g. "limits social activity") [10] (Supplementary Table S1). Patients are asked to answer each of the questions using a 5-pointLikert-type scale (0 = Not at all, 1 = A little bit, 2 = Somewhat, 3 = Quite a bit, and 4 = Very much). Each of the 13 items contributes equally to a single conceptual domain representing fatigue. FACIT-Fatigue total scores are the sum of responses and range from 0 to 52, with lower scores indicating greater fatigue and higher scores indicating less fatigue [10].
In all three studies, patients completed the FACIT-Fatigue every 4 weeks from baseline until the end of each trial period (with the exception of Weeks 56 and 64 for BLISS-76).
The current study used endpoints from the original trials as criterion measures in the evaluation of the FACIT-Fatigue measurement properties. These measures included Safety of Estrogens in Lupus Erythematosus National Assessment-Systemic Lupus Erythematosus Disease Activity Index (SELENA-SLEDAI), a measure of reduction in global disease activity; Physician's Global Assessment (PGA), which measures overall worsening of the patient's condition; BILAG, which assesses worsening in specific organ systems; and the SF-36v2, a widely used healthreported quality of life measure, consisting of 8 distinct domains that are subsequently aggregated into two summary scores, representing physical and mental health status. The schedule of assessments for each study is summarised in Supplementary Table S2.
With a few exceptions, the analyses were conducted at baseline (Week 0) and Weeks 24 and 52, as FACIT-Fatigue assessments were performed at these time points across all included studies. Analyses to evaluate ability to detect change used data from baseline through Week 24 and from baseline to Week 52 to calculate BILAG response at Week 24 and Week 52, respectively. Intraclass correlation coefficients (ICCs) were calculated using SELENA-SLEDAI and PGA score data between Weeks 8 and 12.

Data analysis
The current analysis was a psychometric validation of the FACIT-Fatigue. Data from the intent-to-treat populations of the three trials were used to assess reliability, construct validity and responsiveness (ability to detect change) of the FACI T-Fatigue scale. Post hoc analyses were conducted for each trial separately and for the pooled samples, in accordance with FDA guidance that validation of PRO measures be conducted in samples reflective of the patient populations of the trials in which these measures were used [20]. The similarities in study design of each of the trials, in particular the inclusion/exclusion criteria, and study length enabled pooling of patient-level data, allowing overall estimates to be obtained in a larger sample size than the individual trials. Data analyses were conducted using SAS version 9.4 software (SAS Institute Inc., Cary, NC, USA). Significance testing was two-sided and at a level of 0.05 for all analyses. With the exceptions of the use of BILAG response rate as a criterion measure to evaluate the FACIT-Fatigue's ability to detect change in SLE and the evaluation of test-retest reliability, statistical analyses were conducted with cross-sectional data from baseline, and Weeks 24 and 52.

Internal consistency and test-retest reliability
Cronbach's alpha and the ICC were calculated with the R/ MBESS package (https://www3.nd.edu/~kkelley/site/ MBESS.html) and SAS version 9.4 software 'PROC MIXED' and used to evaluate internal consistency and test-retest reliability, respectively [21,22]. Given that the items of the FACIT-Fatigue are answered on a 5-point scale, a polychoric correlation coefficients matrix was used to calculate Cronbach's alpha [23].
A sample of patients with stable disease activity (based on constant SELENA-SLEDAI and PGA scores between Weeks 8 and 12) was used to calculate the ICC as: with σ 2 s and σ 2 e , the subject error and measurement errors, respectively, from a random effects model [21].
The minimum standard for acceptable reliability for both internal consistency and test-retest reliability was ≥0.70 [24].

Construct validity
Spearman correlations between FACIT-Fatigue scores and the SELENA-SLEDAI scores, annualised flare rate, BILAG General and Musculoskeletal systems ratings, and the SF-36v2 were computed at baseline, and Weeks 24 and 52. Correlations ≥0.30 in absolute value were considered indicative of good convergent validity [25].

Confirmatory factor analysis
The measurement model of the FACIT-Fatigue was evaluated further using confirmatory factor analysis (CFA) methods appropriate for categorical data (for more details please see Supplementary Materials).

Ability to detect changes in SLE
The ability of the FACIT-Fatigue to detect change was evaluated using two different approaches: (1) by computing correlations between changes in FACIT-Fatigue scores and changes in SF-36v2 scales, SELENA-SLEDAI scores, rate of BILAG response, and PGA, with values interpreted as weak (r < 3.0), moderate (r ≥ 0.3 and < 0.5) or strong (r ≥ 0.5); and (2) by evaluating differences in mean changes in FACIT-Fatigue across change in PGA (improved vs same/worse) and BILAG response rate (≥50% vs < 50% of assessments).
The rate of BILAG responses was evaluated as the ratio of the total number of BILAG responses (i.e. no new BILAG A organ domain score or 2 new BILAG B organ domain scores compared with baseline at the time of assessment) divided by the total number of assessments within the period considered. Estimated mean FACIT-Fatigue change scores and tests of statistical significance for differences between BILAG responder groups or PGA improvement groups at Week 24 and Week 52 were evaluated using the following model: where ij represents the j th observation for the i th patient, GROUP (either BILAG response rate or PGA improvement) and Week (24 or 52) are fixed effects, and FACI T Baseline , i represents a continuous adjustment for the baseline score of i th patient. An unstructured covariance matrix was used to take into account repeated measurements for the same patient [26].

Patient population
Data from 836, 865, and 819 patients treated with belimumab or placebo in the BLISS-SC, BLISS-52, and BLISS-76 trials, respectively, were included in this pooled analysis. The demographics and baseline characteristics of patients (N = 2520) across the three trials are shown in Table 1. In total, 94% of patients were female, which is in line with the proportion of female participants in the FACIT-Fatigue validation study [11], as well as the general population of patients with SLE [27].

Construct validity
Across two of the three BLISS trials that used the SF-36, Week 24 FACIT-Fatigue score correlations were strong with SF-36v2 scores (0.60 ≤ |r| ≤ 0.83), of which the largest correlations were observed between the FACIT-Fatigue and the SF-36v2 Vitality scales (Table 2), which is as expected because both instruments measure the underlying construct of energy/fatigue. The correlations between FACIT-Fatigue and the SF-36v2 Physical component summary and Mental component summary measures were very similar in magnitude, suggesting that the FACIT-Fatigue captures both the physical and mental aspects underlying the fatigue experienced by patients with SLE (Table 2). At Week 24, correlations between FACIT-Fatigue and SELENA-SLEDAI total scores were weak (ranging from − 0.04 to − 0.17). Correlations between FACIT-Fatigue and BILAG General or BILAG Musculoskeletal scores were low to moderate, ranging from 0.27 to 0.43 and 0.24 to 0.41, respectively (Table 2).
Correlations between mean FACIT-Fatigue score across two time points (baseline and Week 52, and Weeks 24 and 52) and SLE annualised flare rate (|r| ≤ 0.25) were weak, ranging from − 0.02 to − 0.25 (Table 3). In the pooled sample, mean FACIT-Fatigue scores differed across criterion groups derived from SELENA-SLEDAI scores, PGA of SLE severity, and BILAG General and Musculoskeletal ratings, at all three time points (Table 4). Study participants with higher SELENA-SLE-DAI scores (indicating more active disease) had lower mean FACIT-Fatigue scores (i.e. felt more fatigued) ( Table 4). Mean FACIT-Fatigue scores were higher (indicating less fatigue) among patients who had SLE severity classified as none or mild (according to PGA); in contrast, those classified as having severe SLE by PGA had the lowest mean FACIT-Fatigue score (indicating more fatigue) ( Table 4). When assessing both BILAG General and Musculoskeletal ratings, mean FACIT-Fatigue scores were statistically significantly higher (indicating less fatigue) among the less severe group of patients (those with BILAG disease activity grades of C/ D/E) ( Table 4).
In the CFA, item-to-factor loadings were acceptable, with only 5 of a total of 78 loadings (13 items evaluated at two time points across the three trials) lower than 0.6, and fit statistics generally supporting a unidimensional model for the FACIT-Fatigue items (details of the CFA are provided in the Supplementary Tables S4 and S5).

Ability to detect changes in SLE
At Week 52, correlations between the change in FACIT-Fatigue scores and the change in SF-36v2 scale scores were moderate to strong and in the expected direction (ranging from 0.42 to 0.67), with the strongest correlation seen between the FACIT-Fatigue and Vitality subscale ( Table 5). The correlations observed between changes in FACIT-Fatigue scores and changes in the clinical assessments of SLE were considerably smaller than those observed with the SF-36v2 scales, ranging from − 0.08 to − 0.12 (SELENA-SLEDAI) and − 0.12 to − 0.31 (PGA) ( Table 5).  Findings were similar for Week 24 assessments (data not shown). Across the three trials, improvements in FACIT-Fatigue scores at Week 24 and Week 52 were significantly greater (p < 0.0001) among patients with, compared with those without, an improvement in PGA (Table 6). At both Weeks 24 and 52, the difference in mean change from baseline in FACIT-Fatigue scores between these PGA groups was at the threshold of meaningful change of 3-4 points established for the FACIT-Fatigue [11].
Study participants who had a BILAG response in at least 50% of assessments between baseline and Week 24 or baseline and Week 52 had significantly greater mean improvements in FACIT-Fatigue scores compared with those who had a BILAG response less frequently (Table 6). Mean [95% confidence interval] improvement in FACIT-Fatigue scores for BILAG responders (5.4 [4.9, 5.9] for baseline-Week 24 and 5.5 [4.9, 6.0] for baseline-Week 52) exceeded the meaningful threshold of change established for the FACIT-Fatigue [11].

Discussion
Fatigue remains an important concern for patients with SLE, even those who have mild disease, and is considered one of the most debilitating symptoms of the disease by many patients, associated with decreased quality of life and increased work disability [4,6]. Fatigue in people with SLE is a complex issue and has a multifactorial aetiology, being associated with physical activity, obesity, sleep quality, depression, anxiety, and cognitive dysfunction or can be related to the treatments for SLE itself [5]. Consequently, there is a need for a reliable PRO measure to clearly describe fatigue in SLE including physical and mental components, and measures of fatigue symptoms and effects in the presence of comorbid factors (e.g. depression and medication) [8]. The results of the present post hoc psychometric analysis from the three BLISS RCTs of belimumab build on the limited evidence from clinical trials and qualitative research studies of patients with SLE, showing that the FACIT-Fatigue scale is capable of capturing the multi-dimensional nature of SLE-related fatigue and includes assessment of the physical and mental symptoms of fatigue and their impact on the patient's daily life. It is true that the results of psychometric analyses alone do not indicate which parameter a particular instrument measures. This aspect can be informed by the evidence from previous qualitative analyses, which were overall supportive of the content validity of the FACIT-Fatigue, even if patients with SLE raised some questions about 4 of the 13 items ("feeling listless", "having energy", "too tired to eat", "needing help doing one's usual activities") [10].
In the present analysis, the FACIT-Fatigue demonstrated good internal consistency reliability and very good test-retest reliability in SLE. Convergent validity was supported by strong correlations between the FACI T-Fatigue and SF-36v2 scale and summary measure scores, as well as moderate correlations with BILAG General and Musculoskeletal system scores. Tests of known-groups validity showed that the FACIT-Fatigue can discriminate between groups of study participants who differed in categories of SELENA-SLEDAI scores, BILAG (General and Musculoskeletal) ratings, and PGA. These results strengthen the validity of the FACIT-Fatigue scale to reliably and reproducibly assess fatigue among patients with SLE and its ability to detect treatment-related changes in these patients. Results from the pooled BLISS analysis were similar to those observed in each individual BLISS clinical trial, which highlights the robustness of the findings.
Calculation of Cronbach's alpha based on the polychoric correlation matrix may overestimate reliability, because it does not take into account possible inaccuracies Table 3 Spearman correlations between mean FACIT-Fatigue scores, and number of SLE flares by severity and trial due to the ordinal nature of the items. It is based on the underlying variable that is assumed to be normally distributed, and not the observed variable, which is used to calculate the FACIT-Fatigue score [28], a concept that is widely used in various statistical models, as the ordinal model using the logit link [29]. An advantage of using Cronbach's alpha based on the Spearman correlation coefficient is that it can be seen as a lower bound of the reliability coefficient [30]. Our results, which included estimates of reliability that used both types of correlation coefficients, consistently demonstrate very good reliability coefficient estimates, supporting the use of the FACI T-Fatigue score in assisting group-baseddecisionmaking.
Given the substantial burden of fatigue in patients with SLE, it is important to determine whether therapies to treat the condition result in improvements in fatigue that are meaningful to patients. Consequently, it is important to ascertain the minimal clinically important difference (MCID) for the FACIT-Fatigue score, or the smallest change in this score that represents a change in symptoms that is clinically meaningful to the patient. The present analysis demonstrated FACIT-Fatigue score changes of 5.9 points at both Weeks 24 and 52, respectively, in patients with PGA improvements. For patients with BILAG response in ≥50% of assessments, mean change in FACIT-Fatigue was equal to 5.4 and 5.5 points at Weeks 24 and 52, respectively. These values are within the range of prior minimal important difference estimates of 3-6 points, specifically patients with SLE, derived by Lai et al. (2011) [11]. It should be noted that in that study, the authors determined that the MCID was 3-4 points, based on previously published minimal important difference estimates, derived in patients with cancer or RA. Any variations in anchor-based MCID estimates likely reflect the choice of anchors selected for estimation [11]; therefore, based on the available evidence from prior studies and the present one, we would consider the 3-4 points estimate to be the correct MCID in patients with SLE. Future studies using estimation of the MCID based on different anchors may be of interest to further clarify the strength of this finding. The present analysis has some limitations that should be considered. While modern psychometric methods, such as itemresponse theory, can provide additional insights and improved understanding of how each item of the FACIT-Fatigue contributes to the measurement of the underlying latent variable, our evaluation was limited to classical test theory methods. The generalisability of the findings to the general lupus population may be limited

Conclusion
This post hoc analysis of the BLISS trials demonstrated measurement properties of the FACIT-Fatigue that are