Study design and population
Data for this analysis are from the RAY trial, a randomized, controlled, open-label, multicenter Phase 3 study of ibrutinib (n = 139) versus temsirolimus (n = 141) conducted in 21 countries [13]. To be included in the RAY trial patients had to meet the following inclusion criteria: have received at least one previous rituximab-containing chemotherapy regimen, have documented relapse or disease progression after the last anti-mantle-cell lymphoma treatment, have measurable disease by Revised Response Criteria for Malignant Lymphoma [15], and an Eastern Cooperative Oncology Group (ECOG) performance status of 0 or 1 [16]. Treatment was administered orally once a day on continuous cycles (ibrutinib) or intravenously on days 1, 8 and 15 of each 21-day cycle (temsirolimus). Disease progression was assessed by an independent review committee using the revised International Working Group Criteria for non-Hodgkin’s lymphoma [17] and clinical cutoff was defined as the time at which approximately 178 progression-free survival events had been observed.
Patient reported outcomes
Patient reported outcomes (PROs) in the RAY trial were assessed using the FACT-Lym and EQ-5D-5L, administered on day 1 of every treatment cycle during the first 6 months, then every 9 weeks up to 15 months after the first dose of the study drug. Beyond that point, PROs were collected every 24 weeks until disease progression, death, clinical cutoff (FACT-Lym) or study end (EQ-5D-5L), whichever came first. Instruments were administered at the beginning of clinic visits prior to any procedures or physician interventions. The main PRO methods and results are presented in Hess et al. [18].
FACT-8D
The FACT-8D was developed to contribute utility weights for cost-effectiveness analysis in cancer and was derived from secondary analysis of FACT-G results from 17 pooled data sets, which included 6912 patients encompassing 14 primary cancer sites [19]. Items were selected based on a series of psychometric analyses, including assessment of response distribution, confirmatory factor analysis, Rasch analysis, sensitivity to clinical features, and responsiveness. A patient survey was also performed to assess the relative importance of items within each domain.
The items in the FACT-8D cover 8 dimensions of health (pain, fatigue, nausea, problems sleeping, problems doing work, problems with support from family/friends, sadness, worries about health) with 5 levels of severity in each dimension (None, A little bit, Some, Quite a bit, Very much) over the past 7 days. The instrument generates (58 =) 390,625 health states.
To date, societal valuation of the FACT-8D has only been conducted in Australia [20]. Valuations were elicited using a Discrete Choice Experiment (DCE) from a panel of individuals (n = 1737) drawn from the general population. States were combined with duration. Utility decrements were derived for each level of the eight dimensions and coefficients corresponding to each level in each dimension. Index scores for FACT-8D health states range from − 0.5 to 1.0.
In the present study, FACT-8D scores were derived from responses on the FACT-G, which is incorporated into the FACT-Lym. The FACT-Lym consists of the four Functional Assessment of Chronic illness Therapy - General (FACT-G) subscales (physical, social, functional, and emotional well-being) and a 15-item lymphoma-specific additional concerns subscale (LymS). Two summary scores can be calculated: the FACT-Lym total score (FACT-G + LymS) and the FACT-Lym trial outcome index (TOI) score (physical well-being + functional well-being + LymS), with higher scores representing better outcomes.
EQ-5D-5L
The EQ-5D-5L [5] measures health status in five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) each with five levels of severity (no problems, slight problems, moderate problems, severe problems and either extreme problems or unable to perform activity). Respondents describe their health status ‘today’ by selecting one statement in each dimension.
Societal preference weights generated using trade-off techniques are available in the form of a single index value for each of the 3125 possible health states derived from the descriptive system (EQ-5D-5L Index). We used UK and English values to assign preference weights to EQ-5D-5L health states in the current study, the first via the use of a crosswalk algorithm developed to map EQ-5D-3L values to the 5L version [21], and the second by using values elicited directly for EQ-5D-5L health states using time trade-off (TTO) and DCE in a representative sample of the general population of England [22]. Values range from − 0.594 to 1 in the crosswalk set and from − 0.285 to 1 in the England value set (EVS). For the present study, analyses involving EQ-5D-5L were performed using both the crosswalk value set and the EVS to be able to compare results using the two scoring systems [23].
The EQ-5D-5L also includes a ‘0’ (worst imaginable health) to ‘100’ (best imaginable health) visual analogue scale (EQ VAS) on which respondents rate their overall health.
Analysis
Given that between-arm comparisons were not relevant for the study objectives, all analyses were performed on pooled trial data and the analysts blinded to treatment arm. Due to a substantial drop-out rate (complete EQ-5D and FACT-8D data was available from 250 patients at baseline, 130 patients [51%] at 31 weeks, and 87 [34%] at week 58), we restricted analysis to data collected up to week 31 to ensure sufficient numbers of patients for all analysis. Patients who died were considered missing rather than being given a utility value of 0. Any differences in sample size with earlier studies reporting results from the RAY trial [13, 18] are due to the fact that our analyses were performed only on patients with available PRO data.
All analyses were carried out in SAS version 9.4 and results were considered statistically significant at p < 0.05.
Descriptive analysis
Means and standard deviations (SD) or absolute numbers and proportions were used to describe responses by dimension and for EQ-5D-5L and FACT-8D Index scores at baseline and at 4, 7, 16, and 31 weeks.
Ceiling and floor effects were calculated for the descriptive systems of the two instruments as the n (%) of patients reporting best (ceiling effect) or worst health state (floor effect).
Convergent validity
Convergent validity was assessed by calculating the correlation between the two preference-based measures (FACT-8D and EQ-5D-5L) and the FACT-Lym total score, the TOI, LymS, EQ-5D VAS, and haemoglobin levels (HgB), which was used as an indirect indicator of fatigue. We hypothesised that FACT-8D Index scores would show strong correlation with the LymS (and obviously with the FACT-Lym total score and TOI given the shared items), moderate correlations with the EQ VAS, and weak or no correlation with HgB (continuous values). A similar pattern of correlations was expected for EQ-5D-5L, though with a potentially weaker correlation with the FACT-Lym scores, due to lack of shared content and a generic versus disease-specific perspective. It was expected that EQ-5D-5L would show a weaker correlation with HgB than the FACT-8D, given that the latter includes a fatigue domain and EQ-5D-5L does not. Pearson’s correlation coefficient was used to calculate correlations between continuous measures and Spearman’s rank correlation was used when at least one of the variables was categorical. Correlations were classed as: non-existent or weak (0–0.2), moderate (0.2–0.5), strong or very strong (> 0.5) [24].
Known groups’ validity
Mean (SD) FACT-8D and EQ-5D-5L Index scores were estimated and compared for groups classified according to the following variables at baseline: presence of lymphoma symptoms, ECOG performance status, simplified Mantle Cell Lymphoma International Prognostic Index (MIPI) score with patients classified as low, medium, and high risk [25], haemoglobin (HgB) levels (categorised dichotomously: < > 120 g/l for women and 130 g/l for men [26]), and number of previous lines of therapy. ECOG performance status is assigned by the attending clinician with patients being classified in one of five categories: 0 (fully active, no performance restriction); 1 (restricted in physically strenuous activity but ambulatory, able to carry out work of a light or sedentary nature); 2 (ambulatory and capable of all self-care but unable to carry out any work activities. Up and about more than 50% of waking hours); 3 (capable of only limited self-care, confined to bed or chair more than 50% of waking hours); 4 (Completely disabled, cannot carry out any self-care. Totally confined to bed or chair) or 5 (dead) [16].
Between-groups comparisons were carried out using ANCOVA models with adjustment for potential confounders. In all models, potential confounders included were age, gender, ECOG status, MIPI, and prior lines of therapy except when ECOG status or MIPI were the dependent variables, in which case they were not included as confounders as well.
Responsiveness
Responsiveness was assessed by analysing the extent to which the FACT-8D and EQ-5D-5L reflected change on the following variables: patients showing deterioration vs no change vs improvement on the FACT-Lym total score and the LymS, using previously defined minimal important difference (MID) thresholds of 6.5 points for the FACT-Lym and 5 points for the LymS [27, 28]; change in ECOG status; and change in HgB. Effect sizes, calculated using Cohen’s d, were used to show the magnitude of change and categorised as small (0.2), medium (0.5), or large (0.8 or over) [24]. For this analysis, data used were from the baseline (n = 250) and 31 week (n = 130) visits.
Responsiveness was also assessed by analysing the correlation between changes on the PBMs and changes on the FACT-Lym total score, TOI, LymS, EQ VAS, and HgB, using Pearson’s correlation coefficient.
EQ-5D-5L and fatigue
Given the importance of fatigue in this population, cross-sectional exploratory analysis was also conducted to assess the sensitivity of the EQ-5D-5L to fatigue. Cross-walk and EVS utilities were calculated by response level on the FACT-G/FACT-8D and LymS ‘lack of energy’ and ‘tiring easily’ items.
Analysis by ECOG status
Only patients with ECOG performance status 0 or 1 were included in the RAY trial and, of those, 47.9% were classed as ECOG0 (fully active, able to carry on all pre-disease performance without restriction). Real-world data suggests this may not represent relapsed refractory patients in UK clinical practice [29]. We therefore included an exploratory analysis to investigate differences in utility between patients who were ECOG0 at baseline and those who were classified as ECOG1. We analysed differences between the two groups on the FACT-8D and EQ-5D-5L dimensions at baseline and compared change over time using mean Index scores for all available patients at each visit.