Translation and validation of two disease-specific patient-reported outcome measures (Bladder Cancer Index and FACT-Bl-Cys) in Dutch bladder cancer patients

Background The Bladder Cancer Index (BCI) and Functional Assessment of Cancer Therapy-Bladder-Cystectomy (FACT-Bl-Cys) were developed to measure disease-specific health-related quality of life (HRQOL) in bladder cancer patients and patients treated with radical cystectomy, respectively. Both patient-reported outcome measures (PROMs) are frequently used in clinical practice, but are not yet validated according to the COSMIN criteria and not yet available in Dutch. Therefore, the aim of this study was to translate the BCI and FACT-Bl-Cys into Dutch and to evaluate their measurement properties according to the COSMIN criteria. Methods The BCI and FACT-Bl-Cys were translated into Dutch using a forward-backward method, and subsequently administered at baseline (pre-operatively) and 3 months post-operatively in bladder cancer patients who received a radical cystectomy. Validity (content and construct), reliability (internal consistency, test-retest reliability, and measurement error), floor and ceiling effects, and responsiveness were assessed according to the COSMIN criteria. Results Forward-backward translation encountered no particular linguistic problems. In total 260 patients completed the baseline measurement, while 182 patients completed the three-month measurement. Only a ceiling effect was identified for the BCI. Hypotheses testing for construct validity was satisfying, as 67% and 92% of the hypothesized correlations were confirmed. Structural validity was moderate for both measures, as confirmatory factor analyses showed limited fit. Reliability of both PROMs was good. The intraclass correlation coefficient (ICC) of the BCI domains ranged from 0.47 to 0.93, minimal value of Cronbach’s α was 0.70, smallest detectable change on group level (SDC group) ranged from 1.9 to 8.6. The ICC of the FACT-Bl-Cys domains ranged from 0.43 to 0.83, minimal value of Cronbach’s α was 0.77, SDC group was around 1. Only the FACT-Bl-Cys total score was found to be responsive to changes in generic quality of life. Conclusions The Dutch versions of the BCI and FACT-Bl-Cys were shown to be reliable and have good content validity. Structural validity was limited for both measures. Only the FACT-Bl-Cys total score was responsive to changes in generic HRQOL. Despite some limitations, both PROMs seem suitable for use in clinical practice and research. Electronic supplementary material The online version of this article (10.1186/s41687-019-0149-7) contains supplementary material, which is available to authorized users.


Introduction
Bladder cancer (BCa) ranks ninth in worldwide cancer incidence and is one of the most expensive malignancies to manage [1,2]. The spectrum of BCa includes nonmuscle-invasive, muscle-invasive, and metastatic disease, each with its own clinical behavior, prognosis, and treatment.
Due to the heterogeneous BCa population, the various treatments (e.g. transurethral surgery, chemotherapy, radical cystectomy), and the lack of long-term follow-up data, not much is known regarding the patient burden imposed by BCa [3][4][5]. Since BCa patients will usually undergo several treatments, measuring health-related quality of life (HRQOL) with valid instruments is important for clinicians and patients when making informed decisions about treatments, based on patients' experiences [6,7]. Increasingly more clinical trials [8] and comparative effectiveness studies [9] include patient-reported outcomes (PROs), in addition to objective outcomes.
In 2016, a systematic review was published in which Danna et al. [10] advised to use the Functional Assessment of Cancer Therapy-Bladder-Cystectomy (FACT-Bl-Cys, formerly known as the FACT-VCI) [11,12] when studying HRQOL in patients undergoing radical cystectomy and to use the Bladder Cancer Index (BCI) [13,14] when studying HRQOL in patients with nonmuscle-invasive or muscle-invasive BCa disease of any stage. Authors report that the FACT-Bl-Cys was the only available measure for muscle-invasive BCa patients and the BCI has unique "bother" scales, which quantifies the symptoms' impact [10]. Both patient-reported outcomes measures (PROMs) are frequently used and have been translated into various languages (i.e. BCI is available in Spanish, French, Hungarian [15][16][17], and FACT-Bl-Cys is available in Korean and Swedish [18,19]), but were not yet available in Dutch. In 2018, Mason et al. revealed in a systematic review that most PROMs related to BCa had limited information reported on their measurement properties [4]. The FACT-Bl-Cys and BCI are suggested to be most promising, but some measurement properties (e.g. content validity, measurement error, and responsiveness) are still unknown [4].
To assess whether both PROMs provide valuable information in clinical practice and research, it is warranted to study their measurement properties. The COnsensusbased Standards for the selection of health Measurement INstruments (COSMIN) checklist was developed to assess the quality of validation studies and can be used to design and report on these studies [20][21][22]. The aim of this study was to translate the BCI and FACT-Bl-Cys into Dutch and to evaluate the validity, reliability, floor and ceiling effects, and responsiveness according to the COSMIN criteria.

Translation procedure
The BCI and FACT-Bl-Cys were translated into Dutch, using a forward-backward method according to published guidelines, involving six steps [23,24]. A schematic overview of the translation procedure is presented in Fig. 1. First, two bilingual native Dutch translators independently translated the United States English versions of the BCI and FACT-Bl-Cys into Dutch (forward translation). Second, two independent native Dutch speakers (JG, CM) reconciled the two forward translations into one forward translation. Any uncertainties were discussed among JG, CM and a urologist (CW), and resolved by consensus. Third, one independent native English translator, who is fluent in Dutch, translated the forward translation back to English. The translator was uninformed of the concepts explored and blinded to the original instruments. Fourth, three independent bilingual experts (two native Dutch and one native Flemish speaker, all fluent in English) reviewed and commented on all documents (original version, forward and backward translations). The synthesis process was carefully documented and outcomes were evaluated by three Dutch speakers (a language coordinator, JG, CM). Conceptual rather than literal translation was leading, aiming to preserve the original meaning of each item. This resulted in pre-final versions of the Dutch BCI and FACT-Bl-Cys. Fifth, both pre-final versions were tested during a pilot study, in which cognitive interviews were carried out among 20 BCa patients who underwent a radical cystectomy. Patients completed both pre-final versions and were interviewed about the comprehension of items and the chosen response, according to Functional Assessment of Chronic Illness Therapy (FACIT) guideline and instructions [24]. Patients were randomly recruited from Radboud university medical center (Nijmegen, the Netherlands) and Rijnstate Hospital (Arnhem, the Netherlands). Finally, experts (JG, CM, CW, and FACIT representatives) discussed the outcomes and final versions were realized to enable the evaluation of their psychometrics.

Study design and sample
As part of the RACE study (trial identifying number NTR5362, Dutch Trial Registry, www.trialregister.nl) patients were asked to complete four measures (i.e. BCI, FACT-Bl-Cys, EQ-5D-5 L, and EQ-VAS). Patients that completed the three-month follow-up period were included in this substudy. RACE is a comparative effectiveness study aiming to determine the (cost-)effectiveness of robot-assisted and open radical cystectomy [25]. Radical cystectomy is the standard treatment of non-metastatic, invasive BCa and is curative in the majority of patients with localized disease [1]. This surgery is associated with a high complication rate, varying between 49% and 68% [26]. Studies suggest that radical cystectomy affects urinary, sexual and bowel function, and body image which can lead to anxiety and depression [27][28][29][30].
All participants in this validation study fulfilled the inclusion criteria of the RACE study: they were 18 years or older, had an oncological indication for radical cystectomy, a histologically proven primary muscle invasive urothelial carcinoma or therapy resistant high-risk non muscle-invasive BCa (CIS, refractair pTa-1), a non-metastatic tumor (cT1a-cT4a, cN0M0), they were able to complete Dutch questionnaires, and provided written informed consent. Patients were excluded from RACE when they met one of the following criteria: previous major abdominal surgery (i.e. existing stomata, low anterior resection of the rectum or rectal amputation, status after open aortabifemoral graft, status after right hemicolectomy), pregnancy, morbid obesity (BMI ≥ 40 kg/m 2 ), radical cystectomy performed in combination with a nephrectomy or a partial colon resection.
Measures used in this validation study were the translated Dutch versions of the BCI and FACT-Bl-Cys, and the validated Dutch versions of the EuroQol 5-Domain 5-Level (EQ-5D-5 L) [31] and the EuroQol Visual Analogue Scale (EQ-VAS). Participants could choose to complete the PROMs on paper or electronically. Participants completed the PROMs at baseline (T0, pre-operatively) and 3 months post-operatively (T3). Patients who completed the three-month measurement (T3) were asked to complete the PROMs again within 2 weeks (T3 retest).

Bladder Cancer index (BCI)
The BCI is developed to measure disease-specific HRQOL in patients with nonmuscle-invasive and muscle-invasive BCa disease of any stage. It consists of 36 items covering three domains: urinary (14 items), bowel (10 items), and sexual (12 items) [13,14]. Each domain consists of two subdomains (function and bother). Item responses are based on 4-or 5-point Likert scales; domain and subdomain scores are standardized to a 0-100 point scale where higher scores indicate better HRQOL. No total BCI score is calculated. Function items focus on the frequency of the disease symptoms and bother items reflect the individual perception of these symptoms. According to BCI scoring instructions, the number of non-missing items needed to compute domain scores varied from 10 (urinary domain and sexual domain) to 4 ("urinary function", "bowel function", "sexual bother"), otherwise scores were set to missing.
Functional assessment of Cancer therapy-bladdercystectomy (FACT-Bl-Cys) The FACT-Bl-Cys is developed to measure condition-specific HRQOL in patients treated with radical cystectomy. It consists of the four general FACT-General (FACT-G) [32] domains (physical, social/family, emotional, and functional well-being) plus one additional Bl-Cys domain (17 items) covering urinary, sexual and bowel function, and body image [11,12]. Thus, the FACT-Bl-Cys measure consists of five domains (physical, social/family, emotional, functional well-being, and Bl-Cys). The measure consists of 44 items and responses are based on 5-point Likert scales, with higher scores indicating better HRQOL. The total FACT-Bl-Cys score can range from 0 to 168. According to FACT-Bl-Cys scoring instructions, more than 50% of the items per domain need to be answered in order to calculate a domain score.

Statistical analyses
COSMIN recommendations were used as a guide for evaluating the measurement properties of the Dutch BCI and FACT-Bl-Cys [22]. Validity, reliability, floor and ceiling effects, and responsiveness were assessed as described below. Analyses were performed using IBM SPSS Statistics software (version 22.0, IBM Corp., Armonk, NY, USA). Additionally, R version 3.5.2 (R Foundation for Statistical Computing, Vienna, Austria), including the packages mice (version 3.5.0) [33] and SemTools (version 0.5-1.928) [34] was used for multiple imputation and confirmatory factor analyses (CFA).
Because the BCI is developed for generic BCa patients, we determined its measurement properties at baseline (T0, pre-operatively), with the exception of test-retest and measurement error which were based on T3 and T3 retest. The test-retest and measurement error could not be assessed at T0, because there was no possibility to distribute and complete the PROMs again within two weeks since patients received a cystectomy on short term after T0. Because the FACT-Bl-Cys is developed for patients treated with radical cystectomy, we determined its measurement properties after the cystectomy had been performed, i.e. at T3 and T3 retest.

Floor and ceiling effects
Floor and ceiling effects are considered to be present if more than 15% of respondents achieved the lowest or highest possible score, respectively [21,35]. We evaluated the presence of floor and ceiling effects of single scores in this study to assess interpretability. Interpretability [20] is the degree to which one can assign a qualitative meaning on a quantitative score or change in score. Additionally, floor and ceiling effects can also impact the responsiveness (if scores cannot further improve or deteriorate) and validity of an instrument (as patients with the lowest or highest possible score cannot be distinguished from each other).

Validity
Validity in this study was assessed by content validity and construct validity [20]. Content validity was evaluated from the perspective of researchers, patients, and urologists. First, the component of content validity known as face validity was determined by discussing with the research team whether the two measures gave the impression of adequately reflecting the construct to be measured. Second, content validity was evaluated during the above described cognitive interviews (n = 20).
An interview guide was used for the interviews and respondents were invited to appraise the two measurement instruments on relevance, comprehensiveness, and comprehensibility [36]. Input from the interviews was summarized and notable similarities and differences in opinion among respondents were registered and closely evaluated in our research team.
Construct validity was evaluated by structural validity and hypotheses testing. First, structural validity was assessed by confirmatory factor analyses (CFA) to confirm the previously suggested dimensionality of both PROMs [37]. Regarding BCI, the original validation study [13] identified three primary domains (urinary, bowel and sexual) and one translation study [16] reported that data fitted to six subdomains (function and bother). Therefore, in this study BCI items were hypothesized to load on three or six factors. Regarding FACT-Bl-Cys, the original validation study [11] identified three factors in the Bl-Cys domain and one translation study [18] reported that the Bl-Cys domain fitted on three other factors. Therefore, the FACT-G (four domains: physical, social/family, emotional, functional well-being) was hypothesized to load on four factors and the individual Bl-Cys domain on two different three-factor structures. Which items are structured within which factor is presented in Table 4.
For a CFA, COSMIN advises to use a sample size of 7 patients per item, with a minimum number of 100 patients [38]. Since our sample size was too small according to the COSMIN criteria as cases with missing items are not analyzed in a CFA (i.e. listwise deletion), we performed multiple imputation by chained equations generating 25 independent imputation datasets. Data were imputed at item level using a predictive mean matching approach. CFA was performed on the imputed datasets. Model fit was evaluated based on the Comparative Fit Index (CFI), Tucker-Lewis index (TLI), Root Means Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). The criteria for unidimensionality include CFI ≥ 0.95, TLI ≥ 0.95, RMSEA≤0.06, and SRMR≤0.08 [39].
Second, we tested various hypotheses by analyzing the association between the BCI domains and FACT-Bl-Cys domains, using Pearson's correlation coefficients. Coefficients were considered low (< 0.30), moderate (0.30-0.69) or high (≥0.70). If ≥75% of the set number of hypotheses were confirmed, the construct validity was considered good [21]. Pre-specified hypotheses regarding BCI at T0 were that: a) correlations among different 3 domains (urinary, bowel, sexual) are low as they measure different constructs; b) correlations between the BCI domains and FACT-G, EQ-5D-5 L, EQ-VAS are moderate, due to differences between generic and disease-specific instruments. Pre-specified hypotheses regarding FACT-Bl-Cys at T3 were that: a) correlations among 5 different domains (physical, social/family, emotional, functional, Bl-Cys) are moderate based on original validation studies [11,40]; b) correlation between the Bl-Cys domain and EQ-5D-5 L is moderate; c) correlation between FACT-Bl-Cys total score and the EQ-5D-5 L and EQ-VAS is moderate, due to differences between generic and disease-specific instruments. For the BCI and FACT-Bl-Cys there were 12 and 13 pre-specified hypotheses formulated, respectively.

Reliability
Reliability was assessed by analyzing internal consistency, test-retest reliability, and measurement error [20].
Internal consistency [20] of subscale items was measured using Cronbach's α, to assess the degree of interrelatedness among the items of (sub) scales of the instrument. Values between 0.70 and 0.95 are considered adequate [21,41].
Test-retest reliability [20] was assessed by calculating the intraclass correlation coefficient (ICC), using a twoway random effects model to compute absolute agreement (single measures) [21,42]. Values ≥0.70 are considered to be reliable [21]. Patients who completed the three-month measure were assumed to be in a stable physical state and were asked to complete the PROMs again after 2 weeks.
Measurement error [20] was assessed by calculating the standard error of measurement (SEM) using the formula: [22,43]. Additionally, the smallest detectable change on group level (SDC group) was calculated from the SDC at individual level (SDC individual = 1.96*√2*SEM), using the formula: SDC individual /√n [44].

Responsiveness
Responsiveness should be determined in a population that shows actual change [20]. First, we assessed the change scores of both measures between T0 and T3. Second, we evaluated the occurrence of complications (defined as grade 1 to 5 according to the Clavien-Dindo classification) in relation to the change scores of both measures between T0 and T3. Third, we assessed the correlation and effect size of change score of both measures with the change score of the EQ-VAS between T0 and T3. We used the EQ-VAS score as this might cover a broader underlying concept of overall health than the EQ-5D-5 L, which seems applicable for this population considering the heterogeneous and multidimensional character of BCa disease. If there is an actual change in HRQOL, we assumed that the EQ-VAS score shows this change. We constructed hypotheses about the change scores between T0 and T3 of the BCI and FACT-Bl-Cys, in comparison to the change scores of the EQ-VAS. We expected that patients with an improved or decreased EQ-VAS score (i.e. improvement or deterioration of ≥10 points), the BCI and FACT-Bl-Cys values also increased or declined, respectively. Previous studies in other patient populations reported a minimal important change (MIC) varying from 6.9 to 8.9 [45][46][47][48]. Based on these MIC values we assumed that an EQ-VAS change of 10 points might indicate actual change. Pre-specified hypotheses regarding the correlations and effect sizes between changes were that in patients with an improved or declined EQ-VAS score (i.e. plus or minus ≥10 points), we expected a) to find moderate correlations of ≥0.30 between change in EQ-VAS and the change in BCI and FACT-Bl-Cys values; b) to find a larger effect size, in comparison to patients that did not show a change in EQ-VAS (i.e. plus or minus ≤5 points). We analyzed the effect size between the changes by calculating Cohens d, using the formula: (T3 mean -T0 mean )/T0 SD .

Translation procedure
Conceptual rather than literal translation was leading to preserve the original meaning of each item. There was agreement between the forward and backward translations, and no cultural adaptions regarding translation were made. Based on the cognitive interviews, two items regarding the interpretation and meaning were discussed with the authors (JG, CM, CW) and FACIT representatives. First, we added one additional phrase "for men only" (in Dutch: "alleen voor mannen") to the beginning of FACT-Bl-Cys item Bl5 "I am able to maintain an erection", as this item is specifically aimed at men. According to FACIT scoring instructions Bl5 was not included in the scoring algorithm of the scale. Therefore, adapting this item could not influence the total score. Second, the BCI items where patients are asked about their "urine loss" could be interpreted according to their own specific situation. In case of a neobladder, stoma, or own bladder questions about "urine loss" could be answered as problems with catheterizing incontinence, emptying the stoma bag or leakage from the stoma bag, or urination, respectively.

Patient population
In total 260 patients completed the baseline measure (T0), followed by 182 patients who completed the three-month measure post-operatively (T3). The T3 evaluation was completed at a mean of 106 days (SD:14 days). Sixty-two patients participated in the test-retest measurement (T3 retest). The T3 retest was completed at a mean of 14 days (SD:6 days) after T3. In Table 1 the characteristics of the study population are shown. Of the 260 patients who completed T0, the mean age was 67 years (SD:10 years), 207 (80%) were male, 177 (68%) were married, and 149 (57%) were retired. These characteristics were not notably different at T3. We did not find any major change scores between T0 and T3 for both measures, as presented in Table 2.

Floor and ceiling effects
Seven out of nine BCI domains showed a ceiling effect (meaning few problems), while no floor effects were observed (see Table 3). Most missing values were reported in the BCI "sexual function" domain at T0; for 112 of 260 patients (43%) we could not calculate a domain score.
Regarding the FACT-Bl-Cys, no floor or ceiling effects were detected (see Table 3). The FACT-EWB domain at T3 was missing for 9 out of 182 patients (5%). For 13 out of 182 patients (7%) the FACT-Bl-Cys total score could not be calculated since they did not complete all items.

Validity
Face validity was considered to be good for both measures, as they appeared to address all relevant aspects of HRQOL in Dutch BCa adults treated with radical cystectomy.
Regarding content validity, interviewed patients expressed an overall positive opinion regarding relevance and comprehensiveness of the measures and their evaluative purposes. Nonetheless, one item of FACT-Bl-Cys (Bl1 "I have trouble controlling my urine") raised concerns by some respondents, as patients with different bladder diversions might interpret this differently (e.g. stoma, neobladder of   bladder replacement). We discussed these findings with FACIT representatives, and decided that patients could respond to this item in any way they feel appropriate. The CFA was performed on the imputed datasets and the model fit was evaluated based on Comparative Fit Index (CFI), Tucker-Lewis index (TLI), Root Means Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). Considering structural validity for BCI, the CFA resulted in a moderate fit (see Table 4). The BCI in six factors showed a better fit (CFI:0.88, TLI:0.87, RMSEA:0.06, SRMR:0.07) in comparison to three factors (CFI:0.69, TLI:0.67, RMSEA:0.09, SRMR:0.09). Considering structural validity for FACT-Bl-Cys, the CFA also resulted in a moderate fit (see Table 4). For FACT-G in four factors, the CFI was 0.88, the TLI was 0.86, the RMSEA was 0.05, and the SRMR was 0.09. The Bl-Cys domain in three factors as proposed by Kim et al. [18] showed better fit (CFI: 0.85, TLI:0.81, RMSEA:0.06, SRMR:0.09). None of the hypothesized factor structures for both PROMs fulfilled the CFI or TLI thresholds of ≥0.95.
Considering hypotheses testing for BCI, 67% of prior hypotheses (8 of 12) were confirmed (see Table 5). Correlations among the three BCI domains were expected to be low, however we found a moderate correlation between the urinary domain and the bowel domain (r = 0.35). Although we expected moderate correlations when comparing the urinary domain with FACT-G, the sexual domain with EQ-5D-5 L, and the sexual domain with EQ-VAS, we instead found low correlations of r = 0.29, r = 0.24, r = 0.28, respectively. For FACT-Bl-Cys, 92% of all a priori hypotheses (12 of 13) were confirmed (see Table 5). As hypothesized for FACT-Bl-Cys, we found moderate correlations when comparing the Bl-Cys domain with EQ-5D-5 L (r = 0.60), FACT-Bl-Cys with EQ-5D-5 L (r = 0.70), and FACT-Bl-Cys with EQ-VAS (r = 0.58). All correlations between the five domains were moderate as expected, with the exception of one low correlation was observed (EWB-SWB r = 0.29).

Reliability
Internal consistency (Cronbach's α), test-retest reliability (ICC), and measurement error (SEM, SDC group, SDC individual) are presented in Tables 3 and 6. For BCI, Cronbach's α of the domains ranged from 0.70 ("bowel function") to 0.91 (sexual domain), which indicates a good internal consistency. All ICCs were above 0.70, except for "sexual bother" which showed an ICC of 0.47. The SEM ranged from 4.17 ("sexual function") to 20.07 ("sexual bother"), which resulted in a SDC individual score ranging from 11.56 to 55.63. The SDC group ranged from 1.90 (urinary domain) to 8.58 ("sexual bother"). Thus for the domain "sexual bother", an average group difference in score ≥ 8.58 cannot be attributed to only measurement error, but actual change will have taken place.
For FACT-Bl-Cys, Cronbach's α of the domains ranged from 0.77 (Bl-Cys) to 0.85 (FWB), which    indicates a good internal consistency. All ICCs were above 0.70, except for SWB which showed an ICC of 0.43. The SEM ranged from 1.57 (EWB) to 3.14 (Bl-Cys domain), which resulted in a SDC individual score ranging from 4.36 to 8.69. Our SDC individual values show that, to determine a treatment effect a difference of at least 4.36 to 8.69 points between two scores from an individual patient is required to ensure that the difference is not attributed to measurement error. The SDC group ranged from 0.59 (EWB) to 1.18 (Bl-Cys). See Additional file 1 for separate variance components.

Responsiveness
Our total study population did not show major changes between T0 and T3 for the BCI and FACT-Bl-Cys (see Table 2). When evaluating the occurrence of complications, it appeared that our dataset at the time contained too few registered complications to assess responsiveness, see Additional file 2. Only 42 patients had at least one registered complication (Clavien-Dindo classification grade 1 or higher). The number of patients that completed enough items to calculate a total score ranged from 0 to 27, depending on the domain. Additionally, the change score between T0 and T3 of BCI and FACT-Bl-Cys in comparison to the change score of the EQ-VAS was studied. As presented in Additional file 3, mean change scores between T0 and T3 were calculated for patients who showed an improvement in the EQ-VAS score of ≥10 points (n = 55, mean change:25, SD:16) or a deterioration in this score of ≥10 points (n = 89, mean change:-50, SD:30).
For BCI, the number of patients with an improved EQ-VAS ranged from 17 to 47, depending on the domain. We hypothesized to find a larger effect size in the improved group compared to the no-change group, but only 1 of 9 hypotheses was confirmed (see Table 7). We only found a larger effect size for the urinary domain in the improved group, in comparison to the no-change group (Cohens d: 0.36 > 0.06). Regarding hypotheses on correlations, only 22% of all a priori hypotheses (2 of 9) were confirmed. We only found the expected moderate correlations for the change score of the bowel domain (r = 0.42) and the "bowel bother" domain (r = 0.39).
For BCI, the number of patients with a deteriorated EQ-VAS ranged from 10 to 40, depending on the domain. Regarding hypotheses on effect sizes for the deterioration group, only 3 out of 9 hypotheses were confirmed. We found larger effect sizes in the urinary domain (Cohens d: 0.44 > 0.06), "urinary function" (Cohens d: − 0.56 > − 0.45), and the sexual domain (Cohens d: − 1.10 > − 0.85). Regarding hypotheses on correlations, only 11% of all a priori hypotheses (1 of 9) were confirmed. We only found a moderate correlation for the change score of the "sexual function" domain.
For FACT-Bl-Cys, the number of patients with an improved EQ-VAS ranged from 45 to 52, depending on the domain. For both effect sizes and correlations, 83% of all a priori hypotheses (5 of 6) were confirmed. Only for FACT-SWB we found a smaller effect size (Cohens d: − 0.17 < − 0.50) and a low correlation (r = 0.20).
For FACT-Bl-Cys, the number of patients with a deteriorated EQ-VAS ranged from 40 to 42, depending on the domain. Regarding hypotheses on effect sizes for the deterioration group, 67% of all a priori hypotheses (4 of 6) were confirmed (see Table 7). We found lower effect sizes in the FACT-SWB (Cohens d: 0.00 < -0.50) and FACT-EWB (Cohens d: 0.25 < 0.60). Regarding hypotheses on correlations, 83% of all a priori hypotheses (5 of 6) were confirmed. We only found a low correlation for the change in FACT-SWB (r = 0.24), which was contrary to our expectations. Overall, these results indicate that the FACT-Bl-Cys total score seems responsive to change in generic HRQOL.

Discussion
The aim of this study was to translate two disease-specific measures (BCI and FACT-Bl-Cys) into Dutch and to evaluate their measurement properties according to the COSMIN criteria. Both measures were found to be reliable and have good content validity, but limited structural validity. This study showed that the Dutch version of the BCI was not very responsive to changes in generic HRQOL, and showed a ceiling effect. In the Dutch version of the FACT-Bl-Cys no floor or ceiling effects were observed and the FACT-Bl-Cys total score was responsive to changes in generic HRQOL.
The structural validity of the BCI was assessed as moderate, as the hypothesized domains based on the original validation [13] and translational study [16] could partially be confirmed. For BCI, we found a better fit for a six-factor than a three-factor structure, which could indicate that BCI distinguishes function and bother. The structural validity of the FACT-Bl-Cys was assessed as moderate as the majority of hypothesized domains could not be confirmed. This limited fit could be caused by the heterogeneous characteristics of BCa disease, as many different topics are covered in the domains. The FACT-Bl-Cys domains contain a wide range of complaints that do not necessarily relate to each other. For instance, the Bl-Cys domain contains items related to body image, physical functioning, satisfaction, and socials aspects. Future exploration of whether there is a more adequate component solution is recommended for further research.
Although experts debate on the minimum number of patients needed for a factor analysis (ranging from 100 to 300 patients) [49][50][51][52][53][54], COSMIN advises to use a sample size of 7 patients per item, with a minimum number NA.  of 100 patients [38]. Though we handled the missing data with multiple imputation in order to perform the CFA with an adequate sample size, when interpreting the results it should be considered that due to multiple imputation, in which missing values are imputed with the assumption that the distribution of the missing values is similar to the observed data (i.e. predictive mean matching approach), indices might be overfitted. Regarding BCI at T0, high ceiling effects were observed for both urinary and bowel domains. The maximum scores indicate good function and no bother, which might be when patients are recently diagnosed and have minor symptoms (e.g. painless hematuria). Similar findings are reported in previous studies [13,16,17].

BCI
Regarding FACT-Bl-Cys, there were uncertainties concerning the calculation of total scores, as we excluded the items BL4 'I am interested in sexual activity' and BL5 'I am able to have and maintain an erection', according to FACIT instructions. These items were also not included in the CFA. Previously published studies each handled these items differently [11,12,18,19].
In comparison to previous BCI [13,16,17] and FACT-Bl-Cys [11,18,19] studies, we found similar internal consistency values for all domains. Reliability of both measures in terms of test-retest scores was good. To our knowledge, the SEM and SDC have not yet been assessed before for both measures. It should be noticed that our design methodologically differs from previous conducted studies. Previous FACT-Bl-Cys studies [11,12,18,19] assessed test-retest reliability using an interval of 4 weeks, while COSMIN advises to use an interval of 2 weeks [38]. Additionally, FACT-Bl-Cys was completed at different moments (e.g. 7 months to 3 years post-operatively) [11,12,18,19]. Anderson et al. did not show an ICC, but reported a test-retest Spearman correlation of 0.89 for the Bl-Cys domain [11]. Three other studies reported an ICC of 0.79 [12], 0.75 [19], and 0.82 [18] for the Bl-Cys domain, which is similar to our ICC of 0.82.
Regarding construct validity of the BCI, our results confirm the suggestion that the BCI captures additional information which is not covered by generic instruments. Schmidt et al. found similar results as they reported low to moderate correlations (range: 0. 16-0.48) between BCI and the SF-36 [17]. Regarding construct validity of the FACT-Bl-Cys, only Kim et al. correlated FACT-Bl-Cys with a generic measure [18]. They found moderate correlations (range: 0.29-0.69) between FACT-Bl-Cys and the SF-36 [18]. These results are comparable to our moderate correlations between FACT-Bl-Cys and the generic measures. In contrast to our results, three previous studies showed a low correlation (range: 0.16-0.24) between SWB and Bl-Cys domain [12,18,19]. Cookson et al. suggested that this low correlation reflects the predominant focus on physical/functional aspects of HRQOL in the FACT-Bl-Cys [12]. We found moderate correlations between all four domains (PWB, SWB, EWB, FWB) and the Bl-Cys domain, which indicates an equal share in HRQOL.
Only one BCI study mentioned responsiveness [17]. In 2014, Schmidt et al. reported an improvement 12 months after treatment for the urinary domain (Cohens d: 0.38) and "urinary bother" (Cohens d: 0.53). Based on these findings, the authors concluded that the BCI is responsive [17]. In 2018, a systematic review performed by Mason et al., however, rated the BCI responsiveness as unknown because Schmidt et al. only calculated effect sizes and did not construct hypotheses a priori which is not conform COSMIN recommendations [4].
We found one study that reported on longitudinal changes of FACT-Bl-Cys [11]. Anderson et al. evaluated change in 190 patients who underwent a radical cystectomy (pre-operative and 12 months post-operative) [11]. Authors did not calculate an effect size or correlations regarding responsiveness according to COSMIN, but they reported that patients with an ileal conduit diversion had an average 5 points higher score at 12 months, than those with an orthotopic neobladder diversion.

Strengths and limitations
As far as we are aware, this is the first study determining the measurement properties of BCI and FACT-Bl-Cys, according to COSMIN guidelines, in a large study population. COSMIN enhances clarity and stimulates uniform usage of terminology, enables researchers to evaluate each measurement property individually, and improves comparability with other studies.
This study also has some limitations. First, our study population comprised only patients treated with radical cystectomy. Therefore, we were unable to determine whether both measures will be valid and reliable in a more generic BCa population. Second, due to our study design test-retest and measurement error had to be determined at 3 months (T3, post-operatively) rather than at baseline (T0, pre-operatively). For FACT-Bl-Cys this is an adequate moment, but for BCI this might result in a bias since this measure is originally intended for BCa patients in general. Third, to explore responsiveness we studied the correlation and effect size of change score of both measures with the change score of the EQ-VAS between T0 and T3. Although we expected that radical cystectomy impacts HRQOL, the BCI and FACT-Bl-Cys did not detect large changes between T0 and T3. Two previous studies did also not find major changes between T0 and T3 for FACT-Bl-Cys [55,56]. This small change could indicate that radical cystectomy does not impact HRQOL largely, but it is also possible that the BCI and FACT-Bl-Cys do not adequately detect the actual impact and change over time. Particularly since we did find a large change in EQ-VAS scores: the population with an improved EQ-VAS score (n = 55) showed a mean increase of 25 points, and the population with a deteriorated EQ-VAS score (n = 89) showed a mean decrease of 50 points. Previous studies in other patient populations reported MIC values varying from 6.9 to 8.9 [45][46][47][48], but comparability between these different populations should be considered. The FACT-Bl-Cys appeared to be responsive to changes in generic HRQOL, though limited structural validity must be considered as this could negatively impact the responsiveness of specific domains. The possibility remains that in some cases responsiveness was not detected, because there was no meaningful change. In our study population we found too few registered complications to assess responsiveness based on the occurrence of complications. This study was a first exploration of responsiveness of both measures, further research is warranted to evaluate whether the BCI and FACT-Bl-Cys are responsive to objective measures of change (e.g. occurrence of complications).

Clinical implications
Based on this study, both measures seem suitable for cross-sectional use in clinical practice and for research purposes. Consulted urologists mentioned that all topics of both measures are usually discussed with patients in clinical practice, and usage of measures could be valuable for clinicians in order to provide more personalized care. Additionally, urologists noticed that the measures seem quite long for daily practice which limits the suitability and feasibility, but removing or combining items influences validity.
When using the BCI, it should be considered that the largest proportion of missing values was found in the sexual domain, which is also noticed by two previous studies [16,17]. During cognitive interviews, some patients mentioned that the sexual domain items were clear but 'irrelevant' as they had no sexual life. Thus a 'never' response or a missing value could indicate no sexual bother or function, or the lack of activity. Therefore, it is questionable whether BCI is suitable to measure "sexual function" and "sexual bother". Patients also indicated that BCI items about "urine loss" and FACT-Bl-Cys items about "controlling my urine" could be interpreted in different ways, as patients might have different bladder diversions (e.g. stoma, neobladder of bladder replacement). We recommend that future users of the measures provide an instruction which states that patients should answer these items according to their own specific situation.