Psychometrics of three Swedish physical pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)®: pain interference, fatigue, and physical activity
Journal of Patient-Reported Outcomes volume 5, Article number: 105 (2021)
The Patient-Reported Outcomes Measurement Information System (PROMIS®) aims to provide self-reported item banks for several dimensions of physical, mental and social health. Here we investigate the psychometric properties of the Swedish pediatric versions of the Physical Health item banks for pain interference, fatigue and physical activity which can be used in school health care and other clinical pediatric settings. Physical health has been shown to be more important for teenagers’ well-being than ever because of the link to several somatic and mental conditions. The item banks are not yet available in Sweden.
12- to 19-year-old participants (n = 681) were recruited in public school settings, and at a child- and psychiatric outpatient clinic. Three one-factor models using CFA were performed to evaluate scale dimensionality. We analyzed monotonicity and local independence. The items were calibrated by fitting the graded response model. Differential Item analyses (DIF) for age, gender and language were calculated.
As part of the three one-factor models, we found support that each item bank measures a unidimensional construct. No monotonicity or local dependence were found. We found that 11 items had significant lack of fit in the item response theory (IRT) analyses. The result also showed DIF for age (seven items) and language (nine items). However, the differences on item fits and effect sizes of McFadden were negligible. After considering the analytic results, graphical illustration, item content and clinical relevance we decided to keep all items in the item banks.
We translated and validated the U.S. PROMIS item banks pain interference, fatigue and physical activity into Swedish by applying CFA, IRT and DIF analyses. The results suggest adequacy of the translations in terms of their psychometrics. The questionnaires can be used in school health and other pediatric care. Future studies can be to use Computerized Adaptive Testing (CAT), which provide fewer but reliable items to the test person compared to classical testing.
Physical inactivity has implication both for somatic medical conditions and for mental health in teenagers. Sedentary lifestyle is linked to the development of several medical conditions, such as heart disease and type 2 diabetes [1, 2], that increases the risk of mental health problems and shortens lifespan by 3–5 years . Physical inactivity is also directly linked to psychiatric symptoms and disorders, such as major depressive disorder, independent of somatic medical conditions [4, 5]. Since, adolescence is a critical developmental phase for the establishment of behavioral habits , the level of physical activity during this time period may have long term implications for future levels of physical activity [7,8,9].
Chronic pain, as defined by persisting or recurring pain over 3 months or more , occurs in adolescents with prevalence rates up to 30% . Chronic pain is debilitating, and it impacts function of daily life, here described by the concept of pain interference. Many teenagers with chronic pain, complain about fatigue . Fatigue, in a clinical sense, is defined as an overwhelming, incapacitating, and sustained sense of exhaustion that diminishes one’s ability to perform daily activities . It is a subjective feeling of tiredness which can be either acute or chronic. Prevalence rates vary from 2 to 21% in this age-group [8, 9]. Fatigue can be described conceptually as the experience of fatigue or as the impact of fatigue on physical capacity, cognitive function, and social activities [11, 12]. In this article we use the latter concept of fatigue.
Chronic pain and fatigue often emerge at the onset of puberty and are often linked to a decrease of physical activity, creating a multi-directional causal relationship . We conclude that it is important to monitor physical activity, chronic pain and fatigue in schools [10, 14] and in pediatric clinical settings [5, 15], and to provide validated measures of all three constructs for safer diagnostics and treatments.
The National Institute of Health (NIH) has identified a need for patient-reported outcomes measures that are better validated, more dynamic, and developed with modern test-methodology (www.healthmeasures.net). The pediatric Patient-Reported Outcomes Measurement Information System (PROMIS®) item banks were initially developed through an extensive review of research, expert review of items, qualitative methods with focus groups reviewing items  and cognitive interviewing of children . The PROMIS item banks of pain interference [18,19,20], fatigue [21, 22], and physical activity [23,24,25] have recently been implemented internationally [23, 26, 27], but are not yet available in Sweden.
Several pediatric scales have been developed by using classical test theory to measure pain (i.e. The Faces Pain Scale-Revised ), fatigue (i.e. Functional Assessment of Chronic illness Therapy-Fatigue—pedsFACIT-F ) and physical activity (i.e. Physical Activity questionnaire for Older Children PAC-C ). Modern test-methodology, such as item response theory (IRT), has recently been introduced [21, 23, 29], including the calibration of items and patients onto the same metric, regardless of which latent trait is being measured. Contrary to when classical methods are used, precision measurement may only require a few items to measure a construct because the calibration or weighting of the question is built into the results. In a computer adapted system (CAT) an answer to one question is used to identify the next question to be asked that will reduce the error rate of the predicted total score. By using CAT respondents do not need to report on the same items as each other in order to produce comparable scores. Different questions within the same item bank can be used to arrive at a total score for that domain. Thus IRT techniques minimize the number of items presented to each respondent and further prevent test-tiredness by the possibility of answering different questions at each test occasion.
This study is part of a Swedish PROMIS cooperative research group  aiming to translate and standardize PROMIS measures across global initiatives and settings. We work to create a shared unified terminology and metric to report common symptoms and functional life domains. PROMIS item banks offer great potential for improving Swedish and global assessment in clinical trials and evaluation of treatment and health care in clinical settings.
In this study, we validated the Swedish translations of three PROMIS Pediatric item banks. The PROMIS pediatric scale of pain interference has been used in studies among child and adolescent populations such as juvenile fibromyalgia and sickle cell disease [17,18,19,20, 32] and shown good psychometric properties. The PROMIS pediatric Fatigue has previously been applied in several studies of child- and adolescent populations [20,21,22]. One article using IRT, Lai et al. , showed that the scale Fatigue demonstrated satisfactory psychometric properties after removing two items. The PROMIS pediatric Physical activity [23,24,25], has also previously shown to be a precise and valid measurement of children’s lived experiences of physical activity .
The Swedish versions of the item banks need to be validated to ensure that quality and consistency are maintained from the PROMIS original English versions. The aim of this study was to validate three item banks in a Swedish population: The PROMIS pediatric item banks of Pain Interference v.2.0, Pediatric Fatigue v.2.0 and Pediatric Physical Activity v.1.0. These item banks were recently translated to Swedish .
The study was conducted in the northern part of Sweden and was approved by the Regional Swedish Ethical Review Board in Umeå (number 2018/59-31). The authors have been working with PROMIS Health Organization since 2016. Authorization to translate the item banks was granted in the fall of 2016.
Adolescents (n = 681) were recruited between September 2018 and May 2019 from four community high schools (n = 638) and one child- and adolescent psychiatric (CAP) clinic (n = 43). To be eligible for the study, participants had to be fluent in spoken and written Swedish. Oral and written informed consent was gathered from participants and their parents (for children under 15 years).
All participants completed the survey on-line during approximately 30–45 min, and they received a gift card for their participation.
High-school students (n = 897) and CAP patients (n = 160) were asked to participate and 71% of the high-school students (n = 638) and 27% (n = 43) of the CAP clinic patients agreed to participate, which rendered a total sample of 419 girls and 262 boys between 12 and 19 years of age (M = 15.75, SD = 1.77). Most participants were of Swedish origin (91%). The socioeconomic status of the households was distributed as follows: 17% manual workers, 28% clerical or office workers, 32% higher civil servants, and executives, 7% self-employed of different kinds, 1% students, and 15% unknown. A subset of the adolescents (n = 238 girls and n = 110 boys, mean age 15.39, SD = 1.68) was invited for retesting approximately 3 weeks after the first assessment.
US sample for DIF analyses
For comparative analyses of language, a US sample  was used in the DIF analyses. From which only the variables that we analyzed in the present article was extracted. US data was only available for the pain and fatigue PROMIS item banks. The sample consisted of N = 356 adolescent (173 girls) between 12 and 17 years of age, (M = 14.70, SD = 1.72). All participants suffered from different medical conditions (19% cancer, 40% kidney problems, 15% rheumatic conditions, and 26% sickle cell anemia). The sample has been described in further detail elsewhere .
Translation and adaption of the item banks
Functional Assessment of Chronic Illness Therapy (FACIT) Multilingual Translation Methodology [34, 35], with some modifications, was used for translation. Forward translation, reconciliation, expert reviews, back-translation, cognitive debriefing, and pilot testing were performed. For more details, see Blomqvist et al. [29, 31]. See Fig. 1, for an overview of the Swedish translation and adaption processes. The current translated item banks are found in the step “Reports of validation” in Fig. 1.
Patient Reported Outcome Measurements Information System consists of item banks measuring generic health . In the present study, the item banks for pain interference, fatigue, and physical activity were used.
PROMIS Pediatric Pain Interference v.2.0. 
The pain interference questionnaire measures the perceived extent to which pain has disrupted daily living over the last 7 days. It consists of 20 questions on a 5-point summated-rated scale ranging from 1 (never) to 5 (almost always).
PROMIS Pediatric Fatigue v.2.0. 
The fatigue questionnaire measures how tired the child has felt during the last 7 days. The 25 questions are rated on a 5-point scale ranging from 1 (never) to 5 (almost always).
The physical activity questionnaire measured how much physical activity the child has had during the last 7 days. The 10 questions are rated on a 5-point scale of 1 (no days), 2 (1 day), 3 (2–3 days), 4 (4–5 days), and 5 (6–7 days) except for one item (On a usual day, how physically active were you?) that was answered with 1 (Not at all), 2 (A little bit), 3 (Somewhat), 4 (Quite a bit), or 5 (Very much).
Statistical and psychometric methods
The analyses were performed in IBM SPSS, Version 26.0 and in R . Psychometric calculations followed the method described in Reeve et al., . First, descriptive statistics was calculated. Thereafter, corrected item-total correlations (rit c) was estimated. A correlation less than 0.3 indicates that the corresponding item does not correlate well with the overall scale and should be removed . The reliability of the scales were calculated using Cronbach’s α (good internal consistency is proposed to be between 0.70 and 0.90 . Further IRT Test Information Function (TIF), Item Information Curves (IIC) and Standard Errors (SE) were calculated. TIF is inversely related to SE. A SE of 0.32 corresponds to a reliability of 0.90 according to the formula: r = 1–SE2, e.g. 1–0.32 = 1–0.09 = 0.91 , the smaller SE the better reliability.
We performed a test–retest analysis, with 3 weeks between the tests, and correlations were measured through intraclass correlation coefficients (ICCs), with a two-way fixed effects model . Values below 0.40 were considered poor, from 0.40 to 0.75 were fair to good, and values greater than 0.75 were excellent according to the criteria of Fleiss .
Before using IRT, we checked for unidimensionality (all items must load on a single factor) in the item banks with three single factor Comparative Factor Analyses (CFA) of the inter-item polychoric correlation matrices (as recommended by Reeve . Due to the non-normal distribution found in the data and the use of ordinal data, we used the diagonally weighted least squares estimator with robust standard error  in the R package Lavaan for structural equation modeling version 0.6-3 . Goodness of fit indices used in the study were Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Root Means Square Error of Approximation (RMSEA) and Standardized Root Mean Residual (SRMR). We followed the recommendations form Hu and Bentler  and PROMIS analysis plan  for unidimensionality CFI > 0.95, TLI > 0.95, RMSEA < 0.06 and SRMR < 0.08.
Monotonicity and local independence
We assessed monotonicity and local independence using a non-parametric IRT model with Mokken scale analyses using R-package Mokken (version 3.0.3) . Coefficients of homogeneity (H) were examined and monotonicity was indicated with item values at 0.3 or above and for total scale values at least 0.50 . Local independence was checked by conditional association and reported with true/false values, if all values are true the items show local independence.
Graded response models
In addition, the items were fitted with the graded response model  with the R package ltm . The discrimination (slope) and difficulty (thresholds) were calculated for each item. The four threshold parameters (beta coefficients for five alternative answers) were used to indicate the level of pain interference, fatigue, and physical activity at which a response in a particular category becomes likely. The goodness of fit of the IRT model (item-fit) was examined using S-χ2 statistic for polytomous response data . A non-significant value indicated adequate fit of the model to the data (p > 0.001 ).
Differential item function (DIF)
DIF for gender, age (median split), language (Swedish translated vs US original pediatric PROMIS item banks of pain and fatigue) , were calculated for each item on each scale using the IRT Likelihood Ratio DIF approach , using LR χ2 item fit statistics, as implemented in the software R package mirt . The Benjamini–Hochberg procedure  was used to control for multiplicity of comparisons in DIF (see Table 2). McFadden’s R2 was used to evaluate when DIF was detected (> 2%) . McFadden’s R2 could be interpreted as < 0.035 = negligible DIF, 0.035–0.07 = moderate DIF, and > 0.07 = large DIF .The level of the effect size was evaluated tabular and graphically using methods outlined by Steinberg and Thissen  for items with significant DIF.
We transformed the theta scores into T-scores as recommended by PROMIS using the formula ((θ*10) + 50. The average T-score of the study population is 50 (SD = 10).
Descriptive statistics and confirmatory factor analysis
The data showed good range and response distribution within the items. Descriptive statistics are shown in Table 1. Missing data analysis was performed and showed 0.3% missing data in all three item banks respectively. Missing data were replaced with imputed values using linear regression. Data was assumed to be missing at random.
Corrected item-total correlations (ritc) were greater than 0.3 in the total sample (ranging from 0.52 to 0.85) and in the male and female subsamples (0.62 to 0.88 vs. 0.46 to 0.86, respectively). The corresponding items correlated well with the overall scales.
The internal consistency in terms of Cronbach alpha for the three item banks were very high: pain interference (α = 0.97, 95% CI [0.97, 0.97]), fatigue (α = 0.97, 95% CI [0.97, 0.97]) and physical activity (α = 0.94, 95% CI [0.93, 0.94]).
Test consistency over time was calculated using a subsample of n = 348 adolescents (55% of the original sample of N = 638 answered the questionnaire again 3 weeks later). The test–retest ICCs were 0.84 for the total score of the pain interference (95% CI 0.80, 0.87; F = 6.07; p ≤ 0.001), 0.89 for the fatigue (95% CI 0.86, 0.91; F = 9.04; p ≤ 0.001), and 0.86 for the physical activity item bank (95% CI 0.82, 0.88; F = 6.94; p ≤ 0.001). Based on the criteria of Fleiss , the ICCs were considered very good.
Unidimensionality within the scales was concluded from the three performed single CFAs. The results were as follows: χ2 (1375) = 2768.09, CFI = 0.98, TLI = 0.98, RMSEA = 0.08, 90% CI [0.07, 0.09], SRMR = 0.05 for pain interference; χ2 (275) = 2100.35, CFI = 0.96, TLI = 0.96, RMSEA = 0.10, 90% CI [0.10, 0.10], SRMR = 0.06 for fatigue; and χ2 (35) = 626.45, CFI = 0.98, TLI = 0.97, RMSEA = 0.16, 90% CI [0.15, 0.17]), SRMR = 0.04 for physical activity. Goodness of fit indices showed a good fit of the models to the data, except for RMSEA that showed a moderate fit, and a relatively low fit (0.16) for physical activity. The subscales showed standardized factor loadings greater than 0.40 for all items (for pain interference ranging from 0.81 to 0.94; for fatigue ranging from 0.71 to 0.90, and for physical activity ranging from 0.59 to 0.91) (factor loadings are available on request). Moreover, the items were conditionally independent in the model showing no pairs of items with significant residual correlations.
Monotonicity and local independence
The basic IRT assumptions were evaluated and showed monotonicity (H for pain interference items ranged 0.59 to 0.70 [total scale H = 0.68], fatigue items ranged 0.53–0.69 [total scale H = 0.63] and physical activity items ranged 0.48–0.72 [total scale H = 0.65]), and local independence was found among the items.
Graded response models
The item parameter estimates and the χ2 mean square item fit statistics are shown in Table 2. In this table the items are sorted in order of decreasing discrimination (a), so the generally best indicators of pain interference, fatigue, and physical activity are near the top of the tables. The best and the worst discriminating items are shown in category characteristic curves, see Fig. 2.
For the pain interference items, five of the items exhibited significant lack of fit as indicated by the SS χ2 item fit (p < 0.001, χ2 ranged from 503.88 to 754.07, df = 391) (Table 2), after Benjamini–Hochberg correction for multiplicity. For the fatigue items, three of the items showed significant lack of fit (p < 0.05, χ2 ranged from 887.04 to 1232.74, df = 636), and for physical activity items, three items showed significant lack of fit (p < 0.05, χ2 ranged from 856.52 to 1007.04, df = 662).
The TIF, IIC, and SE, were satisfactory (see Fig. 3). SE for pain interference items ranged from 0.07 to 0.62 (M = 0.35, SD = 0.68), SE for fatigue items ranged from 0.11 to 0.49 (M = 0.19, SD = 0.70), and SE for physical activity items ranged from 0.16 to 0.52 (M = 0.22, SD = 0.70).
Differential item function
DIF was used to detect whether gender, age-group and language biased an item. No DIF by gender was found in any of the subscales. For age groups (12–15 years and 16–19 years), there were, after Benjamin Hochberg correction, seven items with significant DIF. One of them had moderate DIF: “I have trouble starting things because I was too tired” (from fatigue item bank). For language (only measured for pain interference and fatigue) there were 9 items with significant DIF after Benjamin Hochberg correction. Most of them had negligible McFadden effect sizes, and only three of the items had moderate DIF (“Being tired kept me from having fun”, “I had trouble starting things because I was too tired”, and “I was too tired to go up and down a lot of stairs” [all three from fatigue item bank]). See Table 2 for the DIF results and the McFadden effect size.
For the items where DIF was found by age and language, we further investigated whether the results were due to the item’s discrimination (slope) or difficulty (thresholds) by using a model where the equal slope assumption was imposed and the difficulty was freely estimated for both of the two groups. There was no significant result for seven items of age, and four items of language. For five items in the item bank fatigue (marked as significant with a star in Table 2 for DIF of language), non-uniformity was found, meaning that the items had different slopes. After considering the analytic results, graphical illustration, item content and clinical relevance we decided to keep all items in the item pools.
The T-score calculations were based on the full original English item bank (general and clinical population), obtained from www.assessmentcenter.net/ac_scoringservice. The mean T-scores of the study sample were as follows: for pain (M = 46.60, SD = 6.11, range of 42.60–64.20), for fatigue (M = 48.57, SD = 7.77, range of 40.00–63.70) and for physical activity (M = 48.46, SD = 8.44, range of 23.50–72.20). Our T-scores can be provided on request.
One major challenge prior to the use of IRT models is to resolve issues of dimensionality. For all three item banks pain interference, fatigue and physical activity, we found good values on the fit indices CFI, TLI and SRMR. However, for all three item banks, RMSEA values indicated a moderate fit, and for physical activity a relatively low fit (0.16). Values over 0.06 have been reported for many other PROMIS item banks e.g. [41, 58]. Traditional goodness of fit indices has been criticized for not being suitable to establish unidimensionality of health item banks  and that RMSEA is sensitive to model complexity and skewed data distributions , the latter being the case in our distributions. SRMR has shown to generate more robust results through different populations and estimation methods .
Internal consistency or the scale reliability was high in all three item banks (Cronbach’s α ranged from 0.93 to 0.97). The high value of Cronbach’s α is probably partly due to the large number of items included in the scales (and some of the items were quite similar). However, when inspecting the TIF, IIC, and SE curves (IRT) this picture was confirmed but nuanced. At a total mean level, all item banks had satisfied reliability, while at an individual level, the items varied more in reliability. We conclude that the items with low reliability could be set aside in future studies.
Test–retest reliability of the scales and the ICC  showed excellent reliability over a period of three weeks (from 0.84 to 0.89 for all subscales). This can be interpreted as very good internal validity and ensures that the scales are both representative and stable over time.
Systematic measurement variability by groups can lead to a number of problems, including errors in hypothesis testing (e.g. it may be assumed that the test covers all genders, all ages or all cultures, but it does not), and misguided research . Ensuring equivalent testing is thus important prior to making comparisons among individuals or groups . We investigated DIF for gender, age-group and language in the three item pools. For all items, no DIF regarding gender was found (not in line with Lai et al. 2013 , which found three items due to gender-based DIF), and the subscales measured symptoms equally well for girls and boys. However, some items had DIF regarding age and language, although the effect sizes were mostly negligible (three were moderate for language) and we cannot draw any firm conclusions. DIF by age and language suggests that for these items, depending on age groups (12–15 years and 16–19 years) or language groups (Swedish sample of children speaking Swedish compared with a US sample speaking English), symptoms were not measured very well. For fatigue and age, this was in line with one previous study (Lai et al., 2013 , which found that 16 out of 25 fatigue items had DIF for age), while for the other two subscales (pain interference and physical activity) this was a new finding with regard to age. There can be several explanations for this, including that the concept of “fatigue” may not be the same across the age groups. Another potential item bias not measured (because our clinical sample was too small), was DIF regarding psychiatric and physical symptoms; our sample was more normative than the more clinical representation in the US sample.
When comparing the result with our previous review of the translated items (see ) we found similarity for only one of the items: “how many days did you run for 10 min or more?”. It was problematic in the translation process because this item is an equivocal item without precise definition in the PROMIS definition list [31, 62]. During cognitive interviews with Swedish children [31, 63], some of them wondered if the item meant that they had done 10 min of continuous running or if the 10 min of running could be accumulated over a day. Even though we translated this item word by word, some children may therefore have interpreted the item differently. DIF by age for this item was not found in the original English version . Several items contained the wording “how many days did you … for 10 min or more” and all of them were in the lower range of all psychometric measurement in our current study as well as in the study by Tucker et al., . Measures of distance and time often need context and a qualitative description to be understandable .
A common strategy to deal with DIF items is to set items aside . However, in brief questionnaires this strategy is not recommendable, because it might result in decreased reliability and validity. Apart from that, the shortened scale can lead to a modification of the construct it is intended to measure , and removing DIF items in well-established questionnaires decreases comparability between different research studies.
An interesting finding in this study was that the average T-sores of all three item banks was lower than the expected 50.0 (general and clinical US population). This may indicate that Swedish adolescents are, on average, less interfered by pain, less tired, and do less physical activity, compared to US adolescents. However, the samples differ, as our relatively healthy sample overall has less symptoms than the US sample. Further analyses are needed to explore possible alternative explanations.
Limitations and strengths
The present study had sufficient statistical power and all participants answered all questions, but some limitations should be noted. Participants were not geographically stratified and did not fully match the Swedish general pediatric population, for example, the unbalanced gender ratio limited generalizability. Instead, the participants came from four different schools along with a smaller sample from a child- and adolescent psychiatric clinic. When using IRT statistics, theoretically, a mixed sample is preferable because IRT offers the property of item invariance, in which item parameters are constant even if estimated in different samples . However, our clinical sample was too small to test for DIF and future studies need to investigate if this is also true empirically. For the DIF of language, a sample more similar to ours would have been preferable, as the US sample contained a greater variety of medical diagnoses, which potentially biased the results.
The three PROMIS pediatric item banks were translated and adapted to Swedish to meet the need of short, effective and valid tests based on modern test theory such as IRT and DIF for the use in Swedish healthcare [4, 31]. A major advantage in using IRT in health-related outcomes is that it enables adaptive testing, either by multiple short-forms or via computerized adaptive testing , which is less of a burden for the patients but not always available in research or clinical settings. Thus, short-forms can be valuable alternatives.
The PROMIS pediatric item banks of pain, physical activity, and fatigue showed sufficient psychometric properties in a Swedish population. Future studies can be to use Computerized Adaptive Testing (CAT), which provide fewer but reliable items to the test person compared to classical testing (e.g. ). This approach prevents test-tiredness.
We hope that the item banks will be implemented both in Swedish school-based health care and in pediatric clinics.
Availability of data and materials
The data from the current study are available from the corresponding author on reasonable request.
Child- and Adolescent Psychiatry
Comparative Factor Analysis
Comparative Fit Index
Differential item functioning
Functional Assessment of Chronic Illness Therapy
Intraclass correlation coefficient
Item response theory
The National Institute of Health
Physical Activity questionnaire for Older Children
Patient-Reported Outcomes Measurement Information System
Root Means Square Error of Approximation
Standardized Root Mean Residual
Tucker Lewis Index
Swedish National Institute of Public Health (SNIPH) (2010) Physical activity in the prevention and treatment of disease. http://www.fyss.se/wp-content/uploads/2018/01/fyss_2010_english.pdf
World Health Organization (2017) Physical activity. https://www.who.int/health-topics/physical-activity
World Health Organization (2017) Depression and other common mental disorders: global health estimates. http://apps.who.int/iris/bitstream/handle/10665/254610/WHO-MSD-MER-2017.2-eng.pdf;jsessionid=5C6C365CE6CA46022D8AE57751200AE9?sequence=1
Malm C, Jakobsson J, Isaksson A (2019) Physical activity and sports-real health benefits: a review with insight into the public health of Sweden. Sports (Basel) 7(5):1–28. https://doi.org/10.3390/sports7050127
Firth J, Siddiqi N, Koyanagi A, Siskind D, Rosenbaum S, Galletly C et al (2019) The Lancet Psychiatry Commission: a blueprint for protecting physical health in people with mental illness. Lancet Psychiatry 6(8):675–712. https://doi.org/10.1016/s2215-0366(19)30132-4
Sawyer SM, Afifi RA, Bearinger LH, Blakemore S-J, Dick B, Ezeh AC et al (2012) Adolescence: a foundation for future health. The Lancet 379(9826):1630. https://doi.org/10.1016/S0140-6736(12)60149-4
King S, Chambers CT, Huguet A, Macnevin RC, McGrath PJ, Parker L et al (2011) The epidemiology of chronic pain in children and adolescents revisited: a systematic review. Pain 152(12):2729–2738. https://doi.org/10.1016/j.pain.2011.07.016
ter Wolbeek M, van Doornen LJ, Kavelaars A, Heijnen CJ (2006) Severe fatigue in adolescents: a common phenomenon? Pediatrics 117(6):e1078-1086. https://doi.org/10.1542/peds.2005-2575
Farmer A, Fowler T, Scourfield J, Thapar A (2004) Prevalence of chronic disabling fatigue in children and adolescents. Br J Psychiatry 184:477–481. https://doi.org/10.1192/bjp.184.6.477
Borde R, Smith JJ, Sutherland R, Nathan N, Lubans DR (2017) Methodological considerations and impact of school‐based interventions on objectively measured physical activity in adolescents: a systematic review and meta‐analysis, vol 18, pp 476–490
Glaus A (1998) Fatigue in patients with cancer. Analysis and assessment. Recent Results Cancer Res 145:1–172
Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S et al (2010) The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol 63(11):1179–1194. https://doi.org/10.1016/j.jclinepi.2010.04.011
Leeuw M, Goossens M, Linton S, Crombez G, Boersma K, Vlaeyen J (2007) The fear-avoidance model of musculoskeletal pain: current state of scientific evidence. J Behav Med 30(1):77–94. https://doi.org/10.1007/s10865-006-9085-0
Dobbins M, Husson H, DeCorby K, LaRocca RL (2013) School-based physical activity programs for promoting physical activity and fitness in children and adolescents aged 6 to 18. Cochrane Database Syst Rev 2013(2):Cd007651. https://doi.org/10.1002/14651858.CD007651.pub2
Lazaridou A, Martel MO, Cornelius M, Franceschelli O, Campbell C, Smith M et al (2019) The association between daily physical activity and pain among patients with knee osteoarthritis: the moderating role of pain catastrophizing. Pain Med 20(5):916–924. https://doi.org/10.1093/pm/pny129
Walsh T, Irwin D, Meier A, Varni J, DeWalt D (2008) The use of focus groups in the development of the PROMIS pediatrics item bank. Qual Life Res 17(5):725–735. https://doi.org/10.1007/s11136-008-9338-1
Fussner L, Black W, Lynch-Jordan A, Morgan E, Ting TV, Kashikar-Zuck S (2019) Utility of the PROMIS Pediatric Pain Interference Scale in juvenile fibromyalgia. J Pediatr Psychol 44(4):436–441. https://doi.org/10.1093/jpepsy/jsy110
Kashikar-Zuck S, Carle A, Barnett K, Goldschneider K, Sherry DD, Mara C et al (2016) Longitudinal evaluation of patient-reported outcomes measurement information systems measures in pediatric chronic pain. Pain 157(2):339–347. https://doi.org/10.1097/j.pain.0000000000000378
Singh A, Dasgupta M, Simpson PM, Panepinto JA (2019) Use of the new pediatric PROMIS measures of pain and physical experiences for children with sickle cell disease. Pediatr Blood Cancer 66(5):e27633. https://doi.org/10.1002/pbc.27633
Cox ED, Connolly JR, Palta M, Rajamanickam VP, Flynn KE (2020) Reliability and validity of PROMIS® pediatric family relationships short form in children 8–17 years of age with chronic disease. Qual Life Res 29(1):191–199. https://doi.org/10.1007/s11136-019-02266-x
Lai J-S, Stucky BD, Thissen D, Varni JW, Dewitt EM, Irwin DE et al (2013) Development and psychometric properties of the PROMIS(®) pediatric fatigue item banks. Qual Life Res 22(9):2417. https://doi.org/10.1007/s11136-013-0357-1
Karimi M, Cox AD, White SV, Karlson CW (2019) Fatigue, physical and functional mobility, and obesity in pediatric cancer survivors. Cancer Nurs. https://doi.org/10.1097/NCC.0000000000000712
Tucker CA, Bevans KB, Becker BD, Teneralli R, Forrest CB (2020) Development of the PROMIS Pediatric Physical Activity Item Banks. Phys Ther 100(8):1393–1410. https://doi.org/10.1093/ptj/pzaa074
Withycombe JS, Baek MJ, Jordan DH, Thomas NJ, Hale S (2018) Pilot study evaluating physical activity and fatigue in adolescent oncology patients and survivors during summer camp. J Adolesc Young Adult Oncol 7(2):254–257. https://doi.org/10.1089/jayao.2017.0074
Tucker CA, Bevans KB, Teneralli RE, Smith AW, Bowles HR, Forrest CB (2014) Self-reported pediatric measures of physical activity, sedentary behavior, and strength impact for PROMIS: item development. Pediatr Phys Ther 26(4):385–392. https://doi.org/10.1097/PEP.0000000000000074
Terwee CB, Roorda LD, de Vet HC, Dekker J, Westhovens R, van Leeuwen J et al (2014) Dutch–Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS). Qual Life Res 23(6):1733–1741. https://doi.org/10.1007/s11136-013-0611-6
Liu SY, Hinds JP, Wang JJ, Correia JH, Du JS, Ding JJ et al (2013) Translation and linguistic validation of the pediatric patient-reported outcomes measurement information system measures into simplified chinese using cognitive interviewing methodology. Cancer Nurs 36(5):368–376. https://doi.org/10.1097/NCC.0b013e3182962701
Hicks CL, von Baeyer CL, Spafford PA, van Korlaar I, Goodenough B (2001) The Faces Pain Scale-Revised: toward a common metric in pediatric pain measurement. Pain 93(2):173–183. https://doi.org/10.1016/s0304-3959(01)00314-1
Lai J-S, Cella D, Kupst MJ, Holm S, Kelly ME, Bode RK et al (2007) Measuring fatigue for children with cancer: development and validation of the pediatric Functional Assessment of Chronic Illness Therapy-Fatigue (pedsFACIT-F). J Pediatr Hematol Oncol 29(7):471–479. https://doi.org/10.1097/MPH.0b013e318095057a
Crocker PRE, Bailey DA, Faulkner RA, Kowalski KC, McGrath R (1997) Measuring general levels of physical activity: preliminary evidence for the Physical Activity Questionnaire for Older Children. Med Sci Sports Exerc 29(10):1344–1349. https://doi.org/10.1097/00005768-199710000-00011
Blomqvist I, Chaplin JE, Nilsson E, Henje E, Dennhag I (2021) Swedish translation and cross-cultural adaptation of eight pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)((R)). J Patient Rep Outcomes 5(1):80. https://doi.org/10.1186/s41687-021-00353-7
Chan SF, Connelly M, Wallace DP (2017) The relationship between pain characteristics, peer difficulties, and emotional functioning among adolescents seeking treatment for chronic pain: a test of mediational models. J Pediatr Psychol 42(9):941–951. https://doi.org/10.1093/jpepsy/jsx074
DeWalt D (2016) PROMIS 1 Pediatric Supplement. (V1 ed.), Harvard Dataverse
Cella D (1997) Manual of the Functional Assessment of Chronic Illness Therapy (FACIT) measurement system (Version 4). Evanston
Eremenco SL, Cella D, Arnold BJ (2005) A comprehensive method for the translation and cross-cultural validation of health status questionnaires. Eval Health Prof 28(2):212–232. https://doi.org/10.1177/0163278705275342
Varni JW, Magnus B, Stucky BD, Liu Y, Quinn H, Thissen D et al (2014) Psychometric properties of the PROMIS (R) pediatric scales: precision, stability, and comparison of different scoring and administration options. Qual Life Res 23(4):1233–1243. https://doi.org/10.1007/s11136-013-0544-0
R Core Development Team (2020) R: a language and environment for statistical computing. https://www.r-project.org/
Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA et al (2007) Psychometric evaluation and calibration of health-related quality of life item banks. Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care 45(1):22–31. https://doi.org/10.1097/01.mlr.0000250483.85507.04
Field AP (2018) Discovering statistics using IBM SPSS statistics, 5th edn. Sage, London
Nunnally JC, Bernstein IH (1994) Psychometric theory. McGraw-Hill, New York
Terwee CB, Crins MHP, Boers M, de Vet HCW, Roorda LD (2019) Validation of two PROMIS item banks for measuring social participation in the Dutch general population. Qual Life Res 28(1):211–220. https://doi.org/10.1007/s11136-018-1995-0
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15(2):155–163. https://doi.org/10.1016/j.jcm.2016.02.012
Fleiss JL (1986) The design and analysis of clinical experiments. Wiley, New York
Li CH (2016) The performance of ML, DWLS, and ULS estimation with robust corrections in structural equation models with ordinal variables. Psychol Methods 21(3):369–387. https://doi.org/10.1037/met0000093
Rossell Y (2018) Latent variable analysis, version 0.6-3
Hu LT, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model 6(1):1–55. https://doi.org/10.1080/10705519909540118
Andries L, Koopman L, Straat JH, Vand den Bergh D (2020) Package 'mokken'. cran.r-project.org. https://cran.r-project.org/web/packages/mokken/mokken.pdf
Mokken RJ (1971) A theory and procedure of scale analysis: With applications in political research. De Gruyer, Berlin
Samejima F (1969) Estimation of latent ability using a response pattern of graded scores, vol 17. Frederiction, Baltimore
Rizopoulos D (2006) Ltm: an R package for latent variable modelling and item response theory analyses. J Stat Softw 17(5):1–25
Kang T, Chen TT (2011) Performance of the generalized S-X2 item fit index for the graded response model. Asia Pac Educ Rev 12(1):89–96. https://doi.org/10.1007/s12564-010-9082-4
McKinley RL, Mills CN (2016) A comparison of several goodness-of-fit statistics. Appl Psychol Meas 9(1):49–57. https://doi.org/10.1177/014662168500900105
Paek I, Cole K (2020) Using R for item response theory model applications. Routledge, London and New York
Chalmers RP (2012) MIRT: a multidimensional item response theory package for the R environment. J Stat Softw 1(6):66
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Ser BStat Methodol 57(1):289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Jodoin MG, Gierl MJ (2001) Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Appl Measur Educ 14(4):329–349. https://doi.org/10.1207/S15324818AME1404_2
Steinberg L, Thissen D (2006) Using effect sizes for research reporting: examples using item response theory to analyze differential item functioning. Psychol Methods 11(4):402–415. https://doi.org/10.1037/1082-989X.11.4.402
Flens G, Smits N, Terwee CB, Dekker J, Huijbrechts I, Spinhoven P et al (2017) Development of a computerized adaptive test for anxiety based on the Dutch–Flemish version of the PROMIS item bank. Assessment. https://doi.org/10.1177/1073191117746742
Cook KF, Kallen MA, Amtmann D (2009) Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption. Qual Life Res 18(4):447–460. https://doi.org/10.1007/s11136-009-9464-4
Shi D, Maydeu-Olivares A (2019) The effect of estimation methods on SEM fit indices. Educ Psychol Measur 80(3):421–445. https://doi.org/10.1177/0013164419885164
Choi SW, Gibbons LE, Crane PK (2011) lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw 39(8):1–30
PROMIS organization (2018) PROMIS pediatric banks item definitions. www.promishealth.org
Blomqvist I, Ekback E, Dennhag I, Henje E (2021) Validation of the Swedish version of the Reynolds Adolescent Depression Scale second edition (RADS-2) in a normative sample. Nord J Psychiatry 75(4):292–300. https://doi.org/10.1080/08039488.2020.1850858
Devine J, Klasen F, Moon J, Herdman M, Hurtado MP, Castillo G et al (2018) Translation and cross-cultural adaptation of eight pediatric PROMIS® item banks into Spanish and German. Qual Life Res 27(9):2415–2430. https://doi.org/10.1007/s11136-018-1874-8
Teresi JA, Ramirez M, Lai J-S, Silver S (2008) Occurrences and sources of differential item functioning (DIF) in patient-reported outcome measures: description of DIF methods, and review of measures of depression, quality of life and general health. Psychol Sci Q 50(4):538–538
Nguyen TH, Han H-R, Kim MT, Chan KS (2014) An introduction to item response theory for patient-reported outcome measurement. The Patient 7(1):23–35. https://doi.org/10.1007/s40271-013-0041-0
Cella D, Gershon R, Lai JS, Choi S (2007) The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Qual Life Res 16(1):133–141. https://doi.org/10.1007/s11136-007-9204-6
The authors want to thank the children that participated in the study.
Open access funding provided by Umea University. This study was supported by clinical research funding from Visare Norr, Oskar’s Foundation and from Västerbotten county council (ALF) and Region Västerbotten (SE) (Grant No. RV-931721).
Ethics approval and consent to participate
The study was approved by the Swedish Regional Ethical Review Board in Sweden (number 2018/59-31). Consent for publication was given from the children and their parents.
Consent for publication
The authors have no conflicts of interest to report.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Carlberg Rindestig, F., Wiberg, M., Chaplin, J.E. et al. Psychometrics of three Swedish physical pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)®: pain interference, fatigue, and physical activity. J Patient Rep Outcomes 5, 105 (2021). https://doi.org/10.1186/s41687-021-00382-2