Measuring patient-reported physical functioning and fatigue in myelodysplastic syndromes using a modular approach based on EORTC QLQ-C30

Physical functioning and fatigue are key patient concerns in myelodysplastic syndromes (MDS), chronic myelomonocytic leukemia (CMML), and acute myeloid leukemia (AML). The objective of this research was to generate supportive quantitative evidence for modular physical functioning and fatigue measures based on the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 items (QLQ-C30) and a customized selection of 10 supplemental items from the EORTC Item Library. The 40 items were completed online cross-sectionally by 51 patients (higher risk [HR] MDS: 53%; CMML: 26%; AML: 10%). Psychometric analyses based on Rasch measurement theory (RMT) were conducted on the QLQ-C30 physical functioning and fatigue domains as well as measures combining QLQ-C30 and supplemental items. A measure of anemia-related symptoms composed of QLQ-C30 and supplemental items covering fatigue, dyspnea, and dizziness was also investigated. The QLQ-C30 physical functioning and fatigue domains showed good targeting to the sample and adequate reliability, with few conceptual gaps identified. Combining the QLQ-C30 and supplemental physical functioning and fatigue items improved the conceptual coverage and the reliability of the measures. The patient-reported anemia-related symptom measure showed good measurement performance, underpinned by a clinically meaningful characterization of severity of these symptoms over a spectrum, starting with fatigue, then dyspnea, and finally dizziness (most severe). The modular measurement approach of combining EORTC QLQ-C30 and Item Library offers a promising pragmatic solution to the measurement of physical functioning and fatigue, as well as anemia-related symptoms in clinical trials conducted in HR MDS, CMML, and AML.


Introduction
Myelodysplastic syndromes (MDS), chronic myelomonocytic leukemia (CMML), and acute myeloid leukemia (AML) are rare blood cancers that affect the myeloid cells. They are associated with anemia, neutropenia, and thrombocytopenia, which lead to a variety of symptoms and functional impacts for affected patients [1]. Measuring patient-reported outcomes (PROs) is key to understanding the patient experience in the field of hematology in both research and practice [2,3]. Three core PRO concepts have been identified for oncology trials and should be measured to integrate the experience of patients and to demonstrate the benefit of new treatments as well as inform decision making: physical functioning, disease-related symptoms, and symptomatic adverse events [4]. The measurement of these core PRO concepts in oncology trials is increasingly performed using a modular approach. In a modular measurement approach, only the key concepts for a specific context of use are carefully selected and measured in the study, using items that are thoughtfully selected from existing static questionnaires and item banks or libraries [4,5]. This approach allows a bespoke and efficient measurement of the concepts that are meaningful to patients.
Assessing symptomatic adverse events of a new treatment for MDS, CMML, and AML will be dependent on the treatment under consideration, as different treatments may have different safety profiles. Patient reporting of symptomatic adverse events would typically be done using the PRO-Common Terminology Criteria for Adverse Events (PRO-CTCAE) [6,7], with a selection of PRO-CTCAE items defined on a case-by-case basis depending on the expected adverse event symptom profile of each treatment [8].
In contrast, a measurement strategy for physical functioning and disease-related symptoms in MDS, CMML, and AML can be achieved independently of the treatment under consideration. For this purpose, one must identify the specific, key concepts to measure and the most appropriate instrument to use to assess these symptoms and aspects of physical functioning in this context of use. Previous research identified the key concepts of importance for patients and organized them into a conceptual model for the measurement of PROs in higher risk (HR) MDS, CMML, and AML [9]. Fatigue was identified as a core symptom, as well as other anemia-related symptoms, such as dyspnea and dizziness.
A modular approach based on the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire-Core 30 items (QLQ-C30) and Item Library is most appropriate to measure physical functioning and core symptoms of HR MDS, CMMT and AML [9]. The EORTC QLQ-C30 alone already includes domain scores for physical functioning and core symptoms of HR MDS, CMML, and AML, such as fatigue, and has shown good measurement performance for this population [10]. A modular approach combining the QLQ-C30 and a customized selection of items from the EORTC Item Library allows bridging the conceptual gaps identified on the QLQ-C30 compared to the framework previously created for HR MDS, CMML, and AML. Ten items from the EORTC Item Library were identified as potential supplementary items to the QLQ-C30 to create a customized measure of key PROs to capture clinical benefit in HR MDS, CMML, and AML for inclusion in clinical trials. This modular approach also has the advantage of offering a pragmatic solution to the challenges associated with the measurement of PROs in rare diseases [11], such as MDS, AML, or CMML, as it builds on existing established material and does not require researchers to develop new, disease-specific PRO measures.
The objective of this research was to generate supportive quantitative evidence on the appropriateness of using the EORTC QLQ-C30 and 10 supplemental items from the EORTC Item Library to assess key PROs for people with HR MDS, CMML, and AML; specifically, physical functioning, fatigue, and other anemia-related symptoms.

Methods
The items of the EORTC QLQ-C30 and 10 supplemental items chosen from the EORTC Item Library were compiled into an online survey. The survey was administered to people living with HR MDS, CMML, and AML at one time point.

Patient sample
Patients were recruited using convenience sampling (non-probability sampling) through the MDS Foundation, social media, and market research agencies. Patients were included if they met all of the following criteria: were willing to consent; spoke, read, and understood English; were located in the US at the time of survey completion; were 18 years old or older; had a selfreported diagnosis of HR MDS, CMML, or AML; and had a patient-rated Eastern Cooperative Oncology Group (ECOG) status of 0, 1, or 2 [12]. Patients were excluded if they met any of the following criteria: were receiving daunorubicin or idarubicin; had participated in a clinical trial for treatments related to MDS, CMML, or AML within the previous 2 weeks; underwent major surgery within the previous 2 weeks; had a diagnosis or had been treated for another cancer in the previous 2 years; had a life-threatening illness unrelated to cancer; had a visual, cognitive, or linguistic impairment preventing them from understanding and answering the survey questions; or had a patient-rated ECOG status of 3 or 4.
Recruitment was stopped when a total of 50 participants completing the online survey was reached. Given the practical challenges of recruitment in a rare disease context, a sample of 50 patients was deemed sufficient to generate early evidence on the item sets under scrutiny using the anticipated methods, especially as the items included in the analyses were carefully selected to be appropriate to the context of use and were therefore expected to be well targeted to the sample.

Ethics
All study documents and procedures were approved by an independent institutional review board (Copernicus Group IRB).

EORTC QLQ-C30 and item library
The EORTC QLQ-C30 questionnaire is a standard PRO instrument in cancer patient populations [13]. It has five functional scales, eight symptom scales, a finance item, and a global health and Quality of Life (QoL) scale ( Table 1). All items have four response choices, except the global health QoL scale, which has seven. QLQ-C30 scale scores range from 0 to 100. Higher scores represent better functioning/health status for the functioning scales and the global health status/QoL, but more severe symptoms for the symptom scales.
The EORTC Item Library is an online platform that compiles more than 900 individual items from over 60 EORTC questionnaires [14]. Users can select ad-hoc sets of items that were developed with the same general principles as the EORTC QLQ-C30; items have the same response options and recall period as the QLQ-C30 and, for some items, translations are already available in a number of languages.
In our research, a set of 10 items from the EORTC Item Library were identified based on the conceptual model resulting from previous qualitative research and testing [9] (Table 1). These items were selected as they were measuring concepts strongly endorsed by patients or considered a core symptom/impact by clinicians, were not primarily considered a side-effect of treatment, and could potentially capture a treatment benefit. These 10 items were shown to be clear and relevant in cognitive debriefing interviews with patients with HR MDS, CMML, or AML. Of note, in addition to the items added to measure physical functioning (Difficulty climbing stairs, Feeling slowed down), fatigue (Weakness in arms or legs, Becoming easily tired, Lacking energy), and other anemia-related symptoms (Shortness of breath on exertion, Having to stop for breath when walking and Dizziness), two items assessing role functioning were also selected to be tested as complementary to the original QLQ-C30 role functioning (RF) domain.

Statistical analyses
Responses to the EORTC QLQ-C30 and 10 supplemental items from the EORTC Item Library were described. Psychometric analyses were performed in the Rasch measurement theory (RMT) framework. RMT uses a mathematical model, the Rasch model, to examine the legitimacy of creating scores from a set of items [15][16][17]. The following properties were explored in the RMT framework: Scale-to-sample targeting was assessed by visual inspection of the relative distribution of item locations and person measurements on their shared continuum. Adequacy of response scales was assessed by examining whether all 'item thresholds' (i.e., the point in the continuum where the most probable response between two adjacent response category changes) were properly ordered. Item fit was assessed by joint examination of statistical parameters (log residuals and chi-square values comparing between observed and expected responses to an item) and item characteristic curves (ICC) that graphically display the expected responses across the continuum of person scores and the observed values for each class interval of person scores. Fit residuals outside the recommended range of − 2.5 / 2.5 and statistically significant chi-square values are indicative of possible fit issues [17].  [18]. Local item dependency was assessed by examining the correlations between the standardized item residuals. Any positive correlation greater than 0.3 was considered as possibly indicative of item dependency [17].
Two sets of items for physical functioning and fatigue were tested separately using the Rasch model: first, items from the EORTC QLQ-C30 PF and FA domains and then the combination of the same QLQ-C30 items with the supplementary items from the EORTC Item Library. All items reflecting patient-reported anemia-related symptoms (i.e. items pertaining to fatigue, dyspnea, and dizziness) were also tested together in a separate Rasch model. The Rasch model was also applied to the items pertaining to role functioning but the results of this analyses are not presented here.

Patient demographics
A total of 51 patients participated in the online survey, with most patients diagnosed with HR MDS (HR MDS: 53%; CMML: 26%; AML: 10%). All patients confirmed a diagnosis of HR MDS, CMML, and AML to be included in the study. However, 6 patients did not report which was their exact diagnosis. The mean age was 68 years (SD: 12) and 49% of patients were female ( Table 2).

Description of responses to EORTC QLQ-C30 and item library items
A possible floor effect for several EORTC QLQ-C30 items was identified (i.e., a substantial percentage of responders reported the lowest value ['Not at all']), whereas the responses to the supplemental items were well distributed across the response categories, with the exception of "dizziness" for which a greater number of patients responded 'Not at all' (Fig. 1). This confirmed that the customized supplemental item selection was well targeted to the patient sample overall.

RMT analysis of physical functioning items
The EORTC QLQ-C30 PF items demonstrated acceptable measurement performance: good targeting of the items to the patient sample (Fig. 2), adequate reliability, fit of all items to the Rasch model, and intended function of response scales for all items but one (hygiene-   (Table 3 and Supplementary material 1 and 2). No pairs of items showed high correlations in standardized residuals (see supplementary material 1). Some gaps in the coverage of the physical functioning continuum were uncovered; for example, few items provided an opportunity to differentiate between scores of patients with higher levels of physical functioning ( Table 3).
The addition of the two physical functioning items from the EORTC Item Library to the QLQ-C30 PF items bridged some gaps in the coverage of the functioning continuum, especially thresholds between 0 and 1 logit and between 2 and 4 logits (where no threshold was originally present) and at about 5 logits (Fig. 2). The new items therefore added information for a part of the physical functioning continuum where most patients  where distributed. It also marginally improved reliability, without adding any issue in terms of fit, item dependency, or adequacy of the response options (Table 3 and Supplementary material 1).

RMT analysis of fatigue and other anemia-related symptoms
The EORTC QLQ-C30 FA items demonstrated acceptable measurement performance: good targeting of the items to the patient sample (Fig. 3), adequate reliability, fit of all items to the Rasch model, and intended functioning of all response scales (Table 4 and Supplementary material 1 and 2). No pairs of items showed high correlations in standardized residuals (see supplementary material 1). However, some gaps in the coverage of the fatigue continuum were uncovered (Fig. 3).
The addition of the three fatigue items from the EORTC Item Library to the QLQ-C30 FA items bridged some gaps in the coverage of the fatigue continuum (Table 4). It also improved reliability to a very good level, without adding any issue in terms of fit, or adequacy of the response options ( Table 4). The items "Becoming easily tired" and "Lacking energy" showed some possible dependency (Standardized residual correlation: 0.45 -Supplementary material 1).
Combining the items assessing fatigue, dyspnea, and dizziness in a single scale intended to measure the severity of symptoms related to anemia among patients with HR MDS, CMML, and AML led to a measure with adequate measurement performances: good targeting (data not shown), good reliability, fit of all items to the Rasch model, and no issues in terms of response scales. Some possible dependency was found between three pairs of items ("Becoming easily tired" and "Lacking energy" (0.69); "Shortness of breath" and "Shortness of breath on exertion" (0.65); "Become easily tired" and "Weakness in arms or legs" (0.35)-Supplementary material 1). Most importantly, the RMT analysis uncovered a meaningful hierarchy of the anemia-related symptom items, in which fatigue comes first, followed by dyspnea, and dizziness (Fig. 4).

Proposed new scores specific to HR MDS, CMML and AML using EORTC QLQ-C30 and supplemental items
The RMT analyses warrant the creation of scores combining the items from the QLQ-C30 and Item Library  for PF, FA, and a combination of all items assessing fatigue and other symptoms related to anemia. For consistency, we recommend that the calculation of these scores follow the same principles as the calculation of the QLQ-C30 scores, keeping a range from 0 to 100.

Discussion
In this research, we created customized PRO measures of physical functioning and fatigue for HR MDS, CMML, and AML using items from the EORTC QLQ-C30 and Item Library. We also explored a wider measure targeting the symptomatic manifestations of anemia, a core clinical feature of HR MDS, CMML, and AML, based on 10 PRO items covering fatigue plus dyspnea and dizziness. Quantitative evidence from the application of modern psychometrics in a sample of 51 patients supported the appropriateness of the customized measures in this context. Our modular measurement approach to the key PROs in HR MDS, CMML, and AML was conceptually driven. We first conducted sound qualitative research including a literature review, interviews with clinicians and patients, and the development of a conceptual model  summarizing the key outcomes for patients with MDS, CMML, and AML [9]. We then used the EORTC Item Library to complement the QLQ-C30 based on the conceptual model. The analyses described here generated supportive evidence of the relevance of the modular measurement approach previously described and available by using the EORTC Item Library as a resource to supplement legacy EORTC instrument; our research show how this method ensures that PROs are well suited for specific contexts of use like rare cancers. We showed that the EORTC QLQ-C30 scores had acceptable performance in patients with HR MDS, CMML, and AML and that the supplemental items selected from the EORTC Item Library improved them. The conceptual clarity made possible by the EORTC QLQ-C30 and supplemental items is illustrated by the clinically meaningful ordering of anemia-related symptoms uncovered by the RMT analysis of these data: it showed that the perception of symptoms of anemia by patients with HR MDS, CMML and AML appears to start by fatigue (first fatigability, then weakness) then dyspnea (first on exercise, then at rest), and finally dizziness. The positive results of our RMT analyses for the various item sets that we investigated and the informative findings from our analyses is certainly stemming from the early thoughtful conceptualization and qualitative work conducted with patients to carefully craft the item sets that are used to measure the concepts of interest in the specific context of the research. Thus, we would recommend that any modular measurement research endeavor put emphasis on these initial stages. Our research focused on the measurement of core outcomes for patients with HR MDS, CMML, or AML, namely physical functioning, fatigue, and anemia-related symptoms. Yet, other concepts are relevant to capture the full patient experience in this context. First, role functioning is recognized as an important PRO concept and was identified in our conceptual model. However, while we added two items from the EORTC Item Library based on our predefined conceptualization of patient experience with MDS, CMML, and AML to the two QLQ-C30 RF items, we were not able to demonstrate that the resulting item set had strong measurement properties. Including more role functioning items, for example from the 10-item bank used for the EORTC role functioning computer adaptive test [19], may be an option to create a measure of role functioning with better measurement properties based on the EORTC measurement system. Besides, PRO instruments such as the QOL-E [20] and the Quality of Life in Myelodysplasia Scale (QUALMS) [21] have been developed to measure more distal concepts, such as health-related quality of life, in MDS. The possibility that these instruments may be relevant to capture role functioning should be explored. Finally, patients with HR MDS, CMML, or AML also experience other symptoms related to cytopenia, for example related to thrombocytopenia or neutropenia. Further research on an acceptable measurement approach to these symptoms would allow a more comprehensive understanding of the experience of patients with HR MDS, CMML, or AML.
We also acknowledge some limitations in the data collected for our online study. Firstly, the sample consisted of only 51 subjects. The RMT parameters could be estimated in this relatively small sample, with a reasonable amount of uncertainty, but our results should be confirmed in larger samples. Typically, the specific examination of the measures in each diagnosis subgroup (HR MDS, CMML, and AML) needs to be further explored to confirm the absence of major differences in responses of patients from a different diagnosis subgroup. It also would be relevant to explore whether the measure of anemia-related symptoms allows characterizing the disease continuum from MDS to AML in a larger sample including enough patients within each subgroup. This may show, for example, that patients with AML are more likely to report more severe anemia-related symptoms, such as dizziness. Secondly, our sample was a convenience sample, and all the variables were reported by patients, including their diagnosis. This raises questions about how representative this sample is of the population of patient with HR MDS, CMML, and AML and the extent to which our findings in terms of coverage of the physical functioning and fatigue continuums by our item selection, as well as the good fit of the selected items to the Rasch model, are generalizable. Replicating our analyses in a larger, better clinically defined sample would reinforce our findings. For example, having a sample where the QLQ-C30 and supplemental items are collected together with good quality hemoglobin level data would allow for exploration of the association between hemoglobin level and severity of fatigue or anemia-related symptoms, hence consolidating their validity. Similarly, the construct validity of the scores based on the QLQ-C30 and supplemental items could be consolidated in future research by exploring their association with other measures of fatigue and physical functioning. Finally, the study was cross-sectional, preventing the examination of the longitudinal measurement properties of the PRO measures, such as test-retest reliability, ability to detect change over time, or exploration of what constitutes a meaningful change in score. Thus, more data will be needed to confirm our positive early results on the appropriateness of these modular measures of physical functioning and fatigue in this context.

Conclusion
The EORTC QLQ-C30 and the selection of items from the EORTC Item Library offer a promising pragmatic