Skip to main content

Translation and cultural validation of the University of Washington Caregiver Stress and Benefit Scales



English versions of the University of Washington Caregiver Stress (UW-CSS) and Benefit (UW-CBS) Scales were developed in the United States (US) to measure impact on caregivers of caring for a child/children. Caregiving stress and benefit are important constructs to study worldwide. The purpose of this study was to translate and validate the UW-CSS and UW-CBS into French, German, Italian, and Spanish languages.


UW-CSS and UW-CBS were translated using forward and backward translation with reconciliation. Cognitive interviews (CIs) were completed with caregivers of children < 18 years with severe epilepsy. Translated versions were also administered to at least 100 caregivers in each of the four countries: France, Germany, Italy, and Spain. Differential item functioning (DIF) analyses were used to assess linguistic and cultural bias by country. The US development sample of 722 caregivers was used as a comparison sample for DIF analyses. DIF adjusted scores were calculated to determine impact of DIF on the item response theory (IRT)-based T-score. Benefit and stress scores were also calculated and compared across countries and health condition subgroups. Finally, short forms were modified to minimize the impact of DIF on the UW-CSS and UW-CBS T-scores and to reflect feedback from CIs.


Interviews were completed with 47 caregivers (German n = 14; Spanish n = 10; French n = 13; Italian n = 10). UW-CSS and UW-CBS were administered to 456 (German n = 117, Spanish n = 114, French n = 115, Italian n = 110) caregivers of children with and without health conditions. All stress items functioned well in CIs, though results indicated statistically significant DIF for three items in multiple countries and in the overall sample. Four of the 13 benefit items were problematic based on CI feedback, and six items showed DIF in one or more countries or in the combined sample. However, average differences between DIF adjusted and non-adjusted scores were minimal for both scales and all comparisons, indicating the impact of DIF on the total score was negligible.


Modified short forms functioned well in all four of the translated versions. All language versions are freely publicly available.


Taking care of a child or children with chronic illness, such as severe epilepsy, can be stressful and exhausting for caregivers, but can also bring significant benefits and rewards. In order to better understand the impact of caregiving on caregivers, Amtmann et al. (2020) developed the University of Washington (UW) Caregiver Stress (UW-CSS) and Caregiver Benefit (UW-CBS) scales using item response theory (IRT) [1]. The UW-CSS and UW-CBS were developed in English for caregivers of healthy children or children with chronic conditions, and both scales demonstrated strong reliability and validity [1]. The measures were also designed to be relevant to caregivers of children with severe epilepsy, as caring for a child with severe epilepsy can have unique challenges. For both instruments, an IRT-based T-score of 50 represents the mean caregiver stress or benefit in United States (US) community sample of caregivers [1].

Due to differences in how societies support families and due to cultural differences, the stresses and benefits experienced by caregivers may differ across countries. In order to understand these differences, researchers need culturally and linguistically appropriate measures. The Functional Assessment of Chronic Illness Therapy (FACIT) is a rigorous translation methodology that includes forward and backward translation and pretest item review, and is recommended for translation of health outcome measures [2]. The objective of this study was to translate and validate the translations of the UW-CSS and UW-CBS into French, German, Italian, and Spanish languages utilizing FACIT translation methodology.


Participants and procedures

The UW-CSS and UW-CBS were translated into four languages using forward and backward translation with reconciliation. Semi-structured cognitive interviews (CIs) with caregivers were used to evaluate translated versions. Data collected in a large-scale administration were used to evaluate differential item functioning (DIF) and compare stress and benefit scores between European Union (EU) countries and the USA.


The UW-CSS and UW-CBS translations were conducted by The Academy of Languages Translation and Interpretation Services (AOLTI, who specialize in medical translations. All language translations were back-translated to English by AOLTI. Trained native speakers of each language worked with the AOLTI to arrive at the final translations to be tested in CIs.

Cognitive interviews

Trained native speaker interviewers completed semi-structured CIs [3] over the phone or via web teleconference software (e.g., Zoom) with caregivers of children (< 18 years) with epileptic encephalopathies (EE). Interviews were recorded and the interviewers’ notes were used to compile summaries of feedback on each item. Caregivers were defined as a parent or legal guardian who coordinates and provides most of the unpaid day-to-day care for a child. Eligibility criteria included residing in France, Germany, Italy, or Spain, and ability to read, speak and understand French, German, Italian, or Spanish, respectively. Participants were recruited with help from clinicians who see patients with EE and from participants in previous studies [4]. A minimum of five caregivers reviewed each item in each language, with at least one male and two caregivers of younger children with EE (< 9 years). The interviews assessed the comprehension, clarity, and cultural applicability of the items. Items that required significant modifications after CI testing were tested in a second round of interviews with at least three participants. Two additional German caregivers of healthy children were recruited due to difficulties translating the term “caregiving” into German. Caregivers also completed a short online survey with demographic and clinical information. Surveys were administered through the REDCap (Research Electronic Data Capture) web-based software platform [5, 6]. Participants provided informed consent and were sent a €43 electronic gift card.

Large scale administration

The final translated and revised UW-CSS and UW-CBS items were administered to a larger sample (N = 400 target sample size) along with demographic and child health questions via an online survey also using REDCap [5, 6]. Adult caregivers (> 18 years) residing in France, Italy, Germany, or Spain and fluent in the native language of the country, and caring for at least one child under age 18 years were eligible. At least 100 caregivers per country was targeted, with additional subsample targets per country: 50 caregivers of a child with EE, 25 caregivers of a child with a chronic health condition, and 25 caregivers of children with no health conditions. Caregivers were recruited from the CI study and by Op4G (, a market research organization. Participants recruited by Op4g were not paid but participants recruited from the CI study were sent an €23 electronic gift card after completing the survey.


Cognitive interviews

Any problematic or confusing items were flagged and addressed. Minor changes were made to the English version to keep content and constructs as consistent as possible across all versions.

Differential item function

DIF analyses were conducted using data from the large-scale administration to examine the linguistic and cultural equivalence of the translations. The original US development sample, described in detail in Amtmann et al. [1] was used as a comparison for the analyses. The US development sample included a mix of caregivers of children with EE, Down syndrome, muscular dystrophy, or children with no specific health care needs. Prior to running DIF analyses unidimensionality of the scales was examined using 1-factor confirmatory factor analysis (CFA) using Mplus software 8.2. [7]. A comparative fit index (CFI) of 0.90 or higher was considered sufficient support for unidimensionality [8]. DIF was assessed by each country individually (e.g., US vs Spain) as well as by the combined sample (i.e., US vs EU) using the program lordif [9] in R [10] with an R2 criterion of 0.02, as is recommended for translation validity analyses [11]. If statistically significant DIF was observed DIF adjusted scores were calculated and compared to non-adjusted scores to determine the scale-level impact of DIF [12].

US and EU comparisons

Sample demographics were compared using Student’s t-tests or chi-squared tests. UW-CSS and UW-CBS 6-item short form scores were generated and summarized across countries and subgroups. Using the Student’s t-test, stress and benefit scores in EU countries were also compared to scores in the US sample utilized for the DIF analyses.

Short Forms

Fixed length short forms developed by Amtmann et al. [1] were revised based on the results of the CIs and DIF analyses. Items that were identified as problematic were removed and\or replaced with better functioning items and items without DIF. Internal consistency of the new short forms was examined using Chronbach’s alpha [13] and item convergent validity by calculating corrected item-total score correlations. Alpha values between 0.7 and 0.9 and correlations > 0.40 were considered acceptable [14].


Cognitive interview study

Interviews were completed with 47 parent caregivers (German n = 14; Spanish n = 10; French n = 13; Italian n = 10) (see Table 1 for sample demographics). All but two participants cared for a child with EE.

Table 1 Demographics of study samples in the European Union and the United States

Based on CI feedback the instructions for both scales were modified in all languages (including English) to clarify that “caregiving” refers to “all aspects” of taking care of a child or children and to take into account how “having a child or children you take care of affects all areas of your life.” Because the instructions define caregiving as “typically unpaid,” in the German translation we added a statement that the government stipend paid to German parents and guardians to help with caregiving (“Pflegegeld”) did not count as paid caregiving when responding to the questions.

The UW-CSS translated items functioned well, although some translations required minor changes to improve comprehension and to clarify meaning. CIs identified issues with three UW-CBS items that did not work well in all languages (being a better advocate, putting life in perspective and feeling closer to other adults) and a fourth item (being more accepting) was problematic in German (Table 2). Concepts in these items were both difficult to translate into other languages, and the translated items were difficult for caregivers to understand. Short forms were modified to exclude these problematic items. Several caregivers also felt that the benefit items were repetitive (for example, the benefits of “finding new strengths” and “becoming a stronger person”).

Table 2 Summary of cognitive interview and differential item functioning results for the final translated UW-CSS and UW-CBS items

An additional issue relating to translation of “caregiving” in German was also identified, as there are two terms for caregiving. Erziehung and Betreuung are more commonly used and describe the process to raise and educate (Erziehung) and care and support (Betreuung) a child, respectively. Fürsorge is used to describe caring for a child with a chronic health condition. In addition to the 12 German caregivers of children with EE, two German participants who cared for healthy children with no chronic health conditions were interviewed to get their thoughts on the best word or words to describe “caregiving” in German. Feedback from the 12 German caregivers of children with EE and two who cared for healthy children indicated that combining the two terms into one (i.e., “Erziehung/Betreuung”) would be acceptable.

Large scale administration study

A total of 456 caregivers from France (n = 115), Germany (n = 117), Italy (n = 110), and Spain (n = 114) completed the study (see Table 1 for sample demographics).

Differential item function

Unidimensionality of both scales was supported by 1-factor CFA, with CFI values of 0.90 for stress and 0.98 for benefit. DIF analyses identified three stress items with statistically significant DIF by multiple countries and in the overall sample (see Table 2). Similarly, six benefit items displayed DIF in one or more countries or in the combined sample. However, average differences between DIF adjusted and non-adjusted scores were less than 1 point on the T-score metric, for both scales and for all comparisons.

US and EU score comparisons

Sample demographic differences are shown in Table 1. The epilepsy and community subsamples in the EU reported less stress than corresponding US subsamples (both p’s < 0.001) (see Table 3). The overall EU sample and community subsamples also reported less benefit than parents in the US (both p’s < 0.001). Additional comparisons between samples and subgroups are shown in Table 3.

Table 3 Caregiver stress and benefit scores and comparisons across subgroups and countries in the European Union and the United States

Short forms

Previously published short forms were modified by excluding problematic items to minimize the impact of DIF on the UW-CSS and UW-CBS T-scores and to reflect feedback from CIs. New 6-item and 3-item short forms are recommended as indicated in Table 2. Correlations between the 6-item and 3-item UW-CSS short forms scores and the full bank were 0.96 and 0.94, respectively. Similarly, correlations for UW-CBS were 0.97 and 0.92. Cronbach’s alpha values for the 6-item short forms were 0.83 for stress and 0.85 for benefit. Corrected item-total correlations ranged from 0.52 to 0.66 for stress and 0.56 to 0.74 for benefit.


The UW-CSS and UW-CBS were translated into French, Italian, German, and Spanish using rigorous methods. CI feedback resulted in minor changes to the English version to improve functioning of items and to harmonize items across all languages. CIs also identified benefit items that were difficult to translate. The items, “put life into perspective” and “be a better advocate,” were both difficult to translate into each of the four languages and hard to understand conceptually; caregivers often said they did not see how these concepts were related to caregiving. For the item “feel closer to other adults who are important to you,” caregivers did not know who this meant—their partner, parents, and whether this meant spending time with these people or feeling connected to them. This feedback indicated important cultural differences between US and EU countries, and perhaps explains why EU samples generally report less caregiver benefit. To address this difference, new short forms were made that prioritized items that did not show bias by country, to minimize any potential DIF impact. All language versions of short forms for the UW-CSS and UW-CBS are free and publicly available including user guides with scoring [15]. The 6-item short forms are recommended in most situations. The 3-item short form minimizes respondent burden and is appropriate for group level comparisons.

The primary limitation of this study is related to the power to detect DIF by country and relatively small sample size of the caregivers in the specific categories in each country. The EU sample sizes for DIF by country were below that recommended to detect DIF [16], and additional DIF may exist that we may not have detected due to sample size limitations. We also note that the DIF detected may be partially related to differences in education and child health status between the samples and not just reflective of linguistic differences, though no DIF was detected by education or child diagnosis in the original validation study [1]. An additional limitation of this study is that it only included a limited number of health conditions and cognitive interviews were primarily conducted with caregivers of children with Epilepsy. Future studies should be conducted to further validate the measures in other health conditions not included in this study and could be done to further explore the responsiveness of the scales and the nature of the differences between caregiver benefits between the US and EU caregivers.

This study developed translations of UW-CSS and UW-CBS in four EU languages. New English, Spanish, Italian, French and German short forms were also created. The scales can be used in research and clinical practice to examine caregiver stress and benefit experienced by caregivers of healthy children, as well as caregivers of children with health conditions, including severe epilepsy.

Availability of data and materials

The datasets analyzed during the current study are available from on reasonable request.



University of Washington Caregiver Stress Scale


University of Washington Caregiver Benefit Scale


United States


Cognitive Interview


Differential Item Functioning


Item Response Theory


Functional Assessment of Chronic Illness Therapy


European Union


Academy of Languages Translation and Interpretation Services


Epileptic encephalopathies


Standard deviation


  1. Amtmann D, Liljenquist KS, Bamer A, Gammaitoni AR, Aron CR, Galer BS et al (2020) Development and validation of the University of Washington caregiver stress and benefit scales for caregivers of children with or without serious health conditions. Qual Life Res 29(5):1361–1371

    Article  Google Scholar 

  2. Eremenco SL, Cella D, Arnold BJ (2005) A comprehensive method for the translation and cross-cultural validation of health status questionnaires. Eval Health Prof 28(2):212–232

    Article  Google Scholar 

  3. Willis G (2005) Cognitive interviewing: a tool for improving questionnaire design. Sage Publications, Thousand Oaks

    Book  Google Scholar 

  4. Jensen MP, Liljenquist KS, Bocell F, Gammaitoni AR, Aron CR, Galer BS et al (2017) Life impact of caregiving for severe childhood epilepsy: results of expert panels and caregiver focus groups. Epilepsy Behav 74:135–143

    Article  Google Scholar 

  5. Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L et al (2019) The REDCap consortium: building an international community of software platform partners. J Biomed Inform 95:103208

    Article  Google Scholar 

  6. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG (2009) Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 42(2):377–381

    Article  Google Scholar 

  7. Muthén LK, Muthén BO (1998–2012) Mplus user’s guide, 7th edn. Muthén & Muthén, Los Angeles

  8. Hu LT, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model 6:1–55

    Article  Google Scholar 

  9. Choi SW, Gibbons LE, Crane PK (2011) lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw 39(8):1–30

    Article  Google Scholar 

  10. R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  11. PROMIS® (2014) Minimum requirements for the release of PROMIS instruments after translation and recommendations for further psychometric evaluation. Retrieved April 14, 2021, from

  12. Crane PK, Gibbons LE, Narasimhalu K, Lai J, Cella D (2007) Rapid detection of differential item functioning in assessments of health-related quality of life: the functional assessment of cancer therapy. QOLR 16(1):101–114

    Article  Google Scholar 

  13. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334

    Article  Google Scholar 

  14. Streiner DL, Norman GR (2002) Health Measurement Scales: a practical guide to their development and use, 3rd edn. Oxford Medical Publications, Oxford

    Google Scholar 

  15. University of Washington Center on Outcomes Research in Rehabilitation (2021) Measures. Retrieved June 15, 2021, from

  16. Scott NW, Fayers PM, Aaronson NK, Bottomley A, de Graeff A, Groenvold M et al (2009) A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. J Clin Epidemiol 62(3):288–295

    Article  Google Scholar 

Download references


Not applicable.


The contents of this manuscript were developed under a grant awarded to the University of Washington from Zogenix, Inc. (Contract Number #ZXIIS2015-005).

Author information

Authors and Affiliations



All authors contributed to the study conception and design. Data analyses were performed by DA, AB, RS, and MJ. The first draft of the manuscript was written by AB and RS and all authors provided feedback on or edits to the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Dagmar Amtmann.

Ethics declarations

Ethics approval and consent to participate

All procedures performed were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was approved by the Institutional Review Board at the University of Washington. Freely-given, written informed consent was obtained from all participants.

Consent for publication

Not applicable.

Competing interests

Authors DA, RS, AB, and MJ receive salary support from grant funding from Zogenix, Inc. Authors AG and BG are employed by Zogenix, Inc., Emeryville, CA.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amtmann, D., Bamer, A.M., Salem, R. et al. Translation and cultural validation of the University of Washington Caregiver Stress and Benefit Scales. J Patient Rep Outcomes 5, 113 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: