Content validity of the FACIT-Fatigue was evaluated in qualitative interviews with CLL in the first-line (1 L) setting or in the relapsed or refractory (R/R) setting. The interviews included cognitive debriefing of the instrument. Reliability and validity of the FACIT-Fatigue were assessed in patients with R/R CLL enrolled in a phase 3 trial assessing acalabrutinib in R/R CLL (ASCEND; NCT02970318) .
The FACIT-Fatigue includes a five-item symptom subscale and an eight-item impact subscale that together make up the 13-item total score. Item responses range from 0 (‘not at all’) to 4 (‘very much’). Scores for negatively worded items are reversed, such that higher scores are better (i.e. less fatigue). The FACIT-Fatigue total score ranges from 0 to 52 (the general population mean score is 43 [5, 13]). The recall period for each item is the past 7 days.
The European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30-questions (EORTC QLQ-C30) contains five multi-item function scales (physical, role, cognitive, emotional, social), three symptom scales (fatigue, pain, nausea/vomiting), five single-item symptoms (dyspnea, insomnia, appetite loss, constipation, diarrhea), a global health status scale and a single-item financial impact question. High function scale or global health status scale scores represent a high level of functioning and a high quality of life, respectively, whereas a high symptom score 'represent' a high level of symptomatology/problems.
EQ-5D-5L and EQ-VAS
The 5-level, 5-dimension EuroQol questionnaire (EQ-5D-5L) comprises five impairment-related dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression). Each dimension is defined from 1, indicating no problem, to 5, indicating extreme problems. Its global health visual analogue scale (EQ-VAS) is a 0–100 scale of a patient’s health status, where 0 represents the ‘worst health you can imagine’ and 100 the ‘best health you can imagine’.
Cognitive debriefing interviews
As part of a qualitative interview study , cognitive debriefing interviews were conducted with 40 patients with 1 L CLL or R/R CLL resident in the United States. Full methods and results of the concept elicitation part of the interview study have been published previously . Potential participants were identified via a patient advocacy organization (CLL Society; https://cllsociety.org) and two market research firms (Liberating Research, www.liberatingresearch.com; and Rare Patient Voice, https://rarepatientvoice.com), and were contacted by email and telephone about study details and participation. To be eligible, patients needed to be aged 18 years or older, be diagnosed with CLL, have a self-reported Eastern Cooperative Oncology Group (ECOG) Performance Status score ≤ 2, be proficient in English and have experienced at least one constitutional symptom of CLL (fatigue, weight loss, fever or night sweats) in the past week. Patients in the R/R CLL group had to have received two or more lines of treatment specifically to treat CLL.
The qualitative interviews were carried out by telephone and generally lasted 60–75 min in total for the concept elicitation and cognitive debriefing parts combined. Interviews were conducted by trained interviewers (O. Meyers, C. Krogh, S. Lee; IQVIA). Patients completed the FACIT-Fatigue as part of cognitive debriefing. During cognitive debriefing, participants were asked to review the FACIT-Fatigue. Patients’ observations of the FACIT-Fatigue were grouped by feedback on the instrument instructions (clarity, difficulty understanding), individual items, response options and the questionnaire as a whole (missing and redundant items).
De-identified transcripts of patient interviews were coded using ATLAS.ti software (version 8). Two coders, who had also moderated most of the patient interviews, coded the results of and feedback on the FACIT-Fatigue. Inter-coder agreement was assessed periodically throughout the coding process, and any disagreement was discussed and addressed.
Psychometric analysis in CLL
Data for the psychometric analysis of the FACIT-Fatigue were from baseline assessments in the phase 3 ASCEND trial (NCT02970318), a multicenter, open-label study that enrolled patients with R/R CLL . Eligible patients were aged 18 years or older, had previously been treated with at least one systemic therapy and had an ECOG Performance Status score ≤ 2. Patients were randomized 1:1 to acalabrutinib 100 mg twice daily or investigator’s choice of therapy (either idelalisib 150 mg twice daily plus rituximab [375 mg/m2 intravenously on day 1 of cycle 1, then 500 mg/m2 intravenously every 2 weeks for 4 doses and thereafter every 4 weeks for 3 doses] or bendamustine 70 mg/m2 intravenously on day 1 and 2 of each 28-day cycle plus rituximab [375 mg/m2 intravenously on day 1 of cycle 1, then 500 mg/m2 intravenously on day 1 of cycles 2–6]).
Patients in the ASCEND trial completed the FACIT-Fatigue, EORTC QLQ-C30, EQ-5D-5L and EQ-VAS at baseline and during the study. Mean total, subscale and item scores were calculated. The presence of floor effects (> 25% of patients scoring ‘worst possible health state’) and ceiling effects (> 25% of patients scoring ‘best possible health state’) was assessed.
Confirmatory factor analysis
Confirmatory factor analysis was employed to evaluate the latent structure (i.e. underlying subscales) of the FACIT-Fatigue instrument. First, a single factor model of the FACIT-Fatigue was examined to determine the unidimensionality of all 13 items of the instrument. If the model fits the data well and all item factor loadings are greater than 0.3, the FACIT-Fatigue can be considered unidimensional . Next, a bifactor model was examined [14,15,16]. The bifactor model comprised a general factor of all 13 items, and two sub-domain factors, which were defined by the five symptom items and eight impact items, respectively. If all general factor item loadings are greater than 0.3 and loadings are higher on the general factor than they are on the sub-domains, the general factor can then be considered measurable even in the presence of sub-domain factors .
Unidimensionality was evaluated by examining fit statistics of the confirmatory factor analysis (CFA) models and the investigation of factor loadings to assess the relative impact of the secondary dimensions. The following fit indices were evaluated: the root mean square error of approximation (RMSEA) ; standardized root mean square residual (SRMR) ; comparative fit index (CFI) . The RMSEA and SRMR measure the discrepancy between the observed sample and the hypothesized model. The CFI is an incremental fit index with the null hypothesis that all components in the model are uncorrelated. In addition, factor loadings from single factor modeling and loadings on the general factor of the bifactorial model were compared to assess the level of disturbance due to multidimensionality in the data . Standard cutoff values were used for RMSEA (< 0.06), SRMR (< 0.08) and (CFI > 0.95) [17,18,19,20].
Model identification was ensured by restricting the factor variance to 1, making sure that at least three indicator variables per latent factor were considered and verifying that the number of datapoints was larger than the number of parameters to be estimated. Mplus 8.0 was used to perform the factor analyses. We employed the mean and variance-adjusted weighted least-squares (WLSMV) estimator, suitable for the analysis of categorical data, polychoric correlations and theta parameterization – in which residual variances of observed categorical outcome variables are allowed to be parameters in the models . A pairwise present approach to missing data was used as it is the default in Mplus with WLSMV estimator .
Internal consistency reliability
Internal consistency reliability is a measure that summarizes the correlations across instrument items. Cronbach’s coefficient α was used to assess internal consistency reliability of the FACIT-Fatigue symptom subscale, the impact subscale and the total scale. A Cronbach’s coefficient α ≥ 0.70 indicates acceptable reliability . In addition, McDonald’s omega (ω) and omega hierarchical (ωH) coefficients were calculated as they provide better estimates of measurement precision (reliability) than the traditional Cronbach’s alpha . Omega coefficients estimate the proportion of variance in unit-weighted total score attributable to all sources of common variance and to the general factor within the bifactor framework [16, 25, 26]. A high ω value suggests a highly reliable multidimensional composite and a high ωH value (> 0.80), when a bifactor structure is employed, suggests that the general factor is the dominant source of systematic variance with sub-domain factors having less influence. The unidimensionality of the index was also evaluated by calculating the Explained Common Variance index (ECV) [27, 28]. Higher values of ECV indicate a strong general factor allowing us to fit a unidimensional model even to multidimensional data.
Construct validity examines the relationship among scales that measure similar concepts (convergent validity) and among scales that measure different concepts (divergent validity). Convergent validity and divergent validity were assessed to explore associations between the FACIT-Fatigue and the EORTC QLQ-C30 and EQ-VAS, using Spearman’s rank correlation coefficients. Spearman’s rank correlation coefficients ≥ 0.50 were considered to demonstrate convergent validity; Spearman’s rank correlation coefficients < 0.30 demonstrated divergent validity . Moderate to high correlations were expected between the FACIT-Fatigue scores and the fatigue scale from the EORTC QLQ-C30, supporting convergent validity. Although fatigue is likely to affect most aspects of quality of life, low correlations were expected between the FACIT-Fatigue scores and gastrointestinal-related scales (e.g. constipation) from the EORTC QLQ-C30, supporting divergent validity.
Known-groups validity is a form of construct validity that explores if scales differentiate between groups that are hypothesized a priori to differ. FACIT-Fatigue scores between groups known to be different were compared using analysis of variance (ANOVA) on baseline data. Known groups comparisons were explored based on ECOG Performance Status score (0 [fully active] vs 1 or 2 [restricted activity but still ambulatory and capable of all selfcare]), Hb level (≥ 110 g/L [no/mild anemia] vs < 110 g/L [moderate/severe anemia] ) and constitutional symptoms (night sweats, fever, unexplained weight loss, significant fatigue [none vs ≥ 1 symptom]). It was hypothesized that patients with an ECOG score ≥ 1, with moderate to severe anemia or with constitutional symptoms would have lower FACIT-Fatigue scores (more fatigue) than patients with an ECOG score of 0, no moderate to severe anemia or with no constitutional symptoms. The following baseline covariates were included: sex (male vs female) and age. Because so few patients had an ECOG Performance Status score of 2, the known groups employed here differ from the stratification of 0 or 1 vs 2 which was used as part of the stratified randomization in the ASCEND trial.
Defining the severity cut-off score
Severity cut-off scores were explored for differentiating between patients with low symptom levels and those with higher symptom levels, to define a severe fatigue population. Cluster analysis was performed to identify a FACIT-Fatigue severity cut-off score . Clusters were formed using the FACIT-Fatigue symptom subscale and EORTC QLQ-C30 Fatigue scale scores on one hand, and using individual FACIT-Fatigue and EORTC QLQ-C30 item scores on the other hand. Scores were first standardized on their ranges to equalize the influence of variables with different scale lengths on the cluster solution. A two-step cluster analysis using SPSS  was then used to determine the cluster membership. An analysis by Cella et al. suggested one standard deviation (SD) below the general population mean of 43 (SD: 9) to denote the threshold for fatigue impairment, resulting in a cut-off value of 34 . In addition, a FACIT-Fatigue threshold of 30 for fatigue impairment, suggested by Piper and Cella , was also considered in our analysis. Agreement between the clusters and thresholds was assessed using Cohen’s kappa coefficient . Cohen characterized values ≤ 0 as indicating no agreement, and 0.01–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial and 0.81–1.00 as almost perfect agreement.