Study design
This was a post hoc analysis (GSK Study 209013) of data from three Phase 3 randomised, double-blind, parallel-group controlled BeLimumab In Subjects with Systemic lupus erythematosus (BLISS) studies that compared the safety and efficacy of belimumab in patients with SLE. Full details of these three trials have been published elsewhere: BLISS-SC (NCT01484496) [16], BLISS-52 (NCT00424476) [14], and BLISS-76 (NCT00410384) [15].
Briefly, the primary efficacy endpoint in all three trials was the SLE Responder Index (SRI) response rate at Week 52 (this endpoint was met in all trials, and showed a significantly higher SRI response rate with belimumab than placebo) [14,15,16]. Patients were randomised and treated as follows: 2:1 to subcutaneous belimumab 200 mg (n = 556) or placebo (n = 280) in BLISS-SC, 1:1:1 to intravenous placebo (n = 287) or belimumab 1 mg/kg (n = 288) or 10 mg/kg (n = 290) (BLISS-52), and 1:1:1 to intravenous placebo (n = 275) or belimumab 1 mg/kg (n = 271) or 10 mg/kg (n = 273) in BLISS-76. All patients also received standard therapy. Patients were considered for inclusion if they were ≥ 18 years of age, had a clinical diagnosis of SLE according to American College of Rheumatology criteria, had active SLE, were autoantibody-positive, and were on a stable SLE treatment regimen (that may have included corticosteroids and/or immunosuppressants). Exclusion criteria included severe lupus nephritis, central nervous system lupus or prior treatment with a B-cell-targeted therapy (including rituximab), intravenous cyclophosphamide, or prednisone. Studies were performed in accordance with the Declaration of Helsinki 2008 and approval of institutional review boards; all patients read and signed an informed consent form in addition to providing verbal consent to participate and be audio recorded during interviews.
Outcome measures
FACIT-Fatigue
The FACIT-Fatigue is a self-administered13-item questionnaire that assesses patient-reported fatigue and its impact upon daily activities and function over the prior 7 days [10]. The questionnaire assesses physical fatigue (e.g. “I feel tired”), functional fatigue (e.g. “trouble finishing things”), emotional fatigue (e.g. “I am frustrated by being too tired to do the things I want to do”), and social consequences of fatigue (e.g. “limits social activity”) [10] (Supplementary Table S1). Patients are asked to answer each of the questions using a 5-pointLikert-type scale (0 = Not at all, 1 = A little bit, 2 = Somewhat, 3 = Quite a bit, and 4 = Very much). Each of the 13 items contributes equally to a single conceptual domain representing fatigue. FACIT-Fatigue total scores are the sum of responses and range from 0 to 52, with lower scores indicating greater fatigue and higher scores indicating less fatigue [10].
In all three studies, patients completed the FACIT-Fatigue every 4 weeks from baseline until the end of each trial period (with the exception of Weeks 56 and 64 for BLISS-76).
The current study used endpoints from the original trials as criterion measures in the evaluation of the FACIT-Fatigue measurement properties. These measures included Safety of Estrogens in Lupus Erythematosus National Assessment-Systemic Lupus Erythematosus Disease Activity Index (SELENA-SLEDAI), a measure of reduction in global disease activity; Physician’s Global Assessment (PGA), which measures overall worsening of the patient’s condition; BILAG, which assesses worsening in specific organ systems; and the SF-36v2, a widely used health-reported quality of life measure, consisting of 8 distinct domains that are subsequently aggregated into two summary scores, representing physical and mental health status. The schedule of assessments for each study is summarised in Supplementary Table S2.
With a few exceptions, the analyses were conducted at baseline (Week 0) and Weeks 24 and 52, as FACIT-Fatigue assessments were performed at these time points across all included studies. Analyses to evaluate ability to detect change used data from baseline through Week 24 and from baseline to Week 52 to calculate BILAG response at Week 24 and Week 52, respectively. Intraclass correlation coefficients (ICCs) were calculated using SELENA-SLEDAI and PGA score data between Weeks 8 and 12.
Data analysis
The current analysis was a psychometric validation of the FACIT-Fatigue. Data from the intent-to-treat populations of the three trials were used to assess reliability, construct validity and responsiveness (ability to detect change) of the FACIT-Fatigue scale. Post hoc analyses were conducted for each trial separately and for the pooled samples, in accordance with FDA guidance that validation of PRO measures be conducted in samples reflective of the patient populations of the trials in which these measures were used [20]. The similarities in study design of each of the trials, in particular the inclusion/exclusion criteria, and study length enabled pooling of patient-level data, allowing overall estimates to be obtained in a larger sample size than the individual trials. Data analyses were conducted using SAS version 9.4 software (SAS Institute Inc., Cary, NC, USA). Significance testing was two-sided and at a level of 0.05 for all analyses. With the exceptions of the use of BILAG response rate as a criterion measure to evaluate the FACIT-Fatigue’s ability to detect change in SLE and the evaluation of test–retest reliability, statistical analyses were conducted with cross-sectional data from baseline, and Weeks 24 and 52.
Internal consistency and test–retest reliability
Cronbach’s alpha and the ICC were calculated with the R/MBESS package (https://www3.nd.edu/~kkelley/site/MBESS.html) and SAS version 9.4 software ‘PROC MIXED’ and used to evaluate internal consistency and test–retest reliability, respectively [21, 22]. Given that the items of the FACIT-Fatigue are answered on a 5-point scale, a polychoric correlation coefficients matrix was used to calculate Cronbach’s alpha [23].
A sample of patients with stable disease activity (based on constant SELENA-SLEDAI and PGA scores between Weeks 8 and 12) was used to calculate the ICC as: \( \frac{\upsigma_{\mathrm{s}}^2}{\upsigma_{\mathrm{s}}^2+{\upsigma}_{\mathrm{e}}^2} \) with \( {\upsigma}_{\mathrm{s}}^2 \) and \( {\upsigma}_{\mathrm{e}}^2 \), the subject error and measurement errors, respectively, from a random effects model [21].
The minimum standard for acceptable reliability for both internal consistency and test–retest reliability was ≥0.70 [24].
Construct validity
Spearman correlations between FACIT-Fatigue scores and the SELENA-SLEDAI scores, annualised flare rate, BILAG General and Musculoskeletal systems ratings, and the SF-36v2 were computed at baseline, and Weeks 24 and 52. Correlations ≥0.30 in absolute value were considered indicative of good convergent validity [25].
Known-groups validity was tested by evaluating differences in mean FACIT-Fatigue scores at baseline, Week 24, and Week 52, across mutually exclusive groups of patients who differed in: (1) SELENA-SLEDAI (< 6, 6–9, ≥10), PGA (none/mild, moderate, severe); (2) BILAG Musculoskeletal and General measure scores (A/B vs C/D/E); (3) normal/high baseline levels of C3 and C4. Analysis of variance was conducted to test for statistically significant (p < 0.05) differences in mean FACIT-Fatigue scores across these criterion groups.
Confirmatory factor analysis
The measurement model of the FACIT-Fatigue was evaluated further using confirmatory factor analysis (CFA) methods appropriate for categorical data (for more details please see Supplementary Materials).
Ability to detect changes in SLE
The ability of the FACIT-Fatigue to detect change was evaluated using two different approaches: (1) by computing correlations between changes in FACIT-Fatigue scores and changes in SF-36v2 scales, SELENA-SLEDAI scores, rate of BILAG response, and PGA, with values interpreted as weak (r < 3.0), moderate (r ≥ 0.3 and < 0.5) or strong (r ≥ 0.5); and (2) by evaluating differences in mean changes in FACIT-Fatigue across change in PGA (improved vs same/worse) and BILAG response rate (≥50% vs < 50% of assessments).
The rate of BILAG responses was evaluated as the ratio of the total number of BILAG responses (i.e. no new BILAG A organ domain score or 2 new BILAG B organ domain scores compared with baseline at the time of assessment) divided by the total number of assessments within the period considered. Estimated mean FACIT-Fatigue change scores and tests of statistical significance for differences between BILAG responder groups or PGA improvement groups at Week 24 and Week 52 were evaluated using the following model:
$$ \Delta \mathrm{FACIT}-{\mathrm{Fatigue}}_{\mathrm{i}\mathrm{j}}={\upbeta}_0+{\upbeta}_1{\mathrm{GROUP}}_{\mathrm{i}}+{\upbeta}_2{\mathrm{Week}}_{\mathrm{j}}+{\upbeta}_3{\mathrm{GROUP}}_{\mathrm{i}}\bullet {\mathrm{Week}}_{\mathrm{j}}+\mathrm{FACIT}-{\mathrm{Fatigue}}_{\mathrm{Baseline},\mathrm{i}} $$
where ij represents the jth observation for the ith patient, GROUP (either BILAG response rate or PGA improvement) and Week (24 or 52) are fixed effects, and FACITBaseline, i represents a continuous adjustment for the baseline score of ith patient. An unstructured covariance matrix was used to take into account repeated measurements for the same patient [26].