Skip to main content

Test–retest reliability of the FitMáx©-questionnaire in a clinical and healthy population

Abstract

Purpose

The FitMáx© was developed as a questionnaire-based instrument to estimate Cardiorespiratory Fitness (CRF) expressed as oxygen uptake at peak exercise (VO2peak). Test–retest reliability is a clinometric measurement property, which defines stability over time if multiple measurements are performed (i.e. reliability). The present study aimed to assess the test–retest reliability of the FitMáx©-questionnaire in different patient groups.

Patients and methods

A total of 127 cardiac, pulmonary and oncology patients and healthy subjects aged 19–84 years who completed the questionnaire twice within an average of 18 days were included for analysis. Participants were in a stable clinical situation (no acute disease or participating in a training program). To determine the test–retest reliability, the Intraclass Correlation Coefficient (ICC) and Standard Error of the Measurement (SEM) was calculated between the first (T0) and second (T1) administration of the questionnaires.

Results

An excellent agreement was found between the FitMáx©-questionnaire scores at T0 and T1, with an ICC of 0.97 (SEM 1.91) in the total study population and an ICC ranging from 0.93 to 0.98 (SEM 1.52–2.27) in the individual patient groups.

Conclusion

The FitMáx©-questionnaire proves to be reliable and stable over time to estimate CRF of patients and healthy subjects.

Trial registration NTR (Netherlands Trial Register), NL8846. Registered 25 August 2020, https://trialsearch.who.int/Trial2.aspx?TrialID=NL8846

Introduction

Cardiorespiratory fitness (CRF) is an important variable that influences several health outcomes including quality of life [1, 2]. Cardiopulmonary exercise testing (CPET) is the gold standard to objectively measure CRF expressed as the oxygen uptake at peak exercise (VO2peak) and is clinically used to determine the underlying cause of limitations in exercise capacity [3,4,5]. However, CPET is costly and labour-intensive, whereas Patient-Reported Outcome Measures (PROMs) are a simple, safe and cost-effective alternative, especially in repeated testing such as rehabilitation programs [6, 7].

Máxima Medical Centre (MMC) developed the FitMáx©-questionnaire (FitMáx), which consists of only three single-answer, multiple-choice questions [8]. The FitMáx was developed to estimate cardiorespiratory fitness expressed in VO2peak based on the self-reported maximum capacity of walking, stair climbing and cycling. The FitMáx scores are combined with subject’s age, sex and Body Mass Index (BMI) to estimate VO2peak. A previous validation study showed a strong correlation between VO2peak estimated by the FitMáx (FitMáx-VO2peak) and VO2peak measured with CPET (CPET-VO2peak), r = 0.94 (0.92‒0.95), ICC = 0.93 (0.91–0.95), and Standard Error of the Estimate (SEE) of 4.14 ml/kg/min. Moreover, FitMáx performed superiorly over commonly used questionnaires such as the Veterans Specific Activity Questionnaire (VSAQ) and Duke Activity Status Index (DASI) [8,9,10].

The clinical usefulness and applicability of PROMs depend on several clinometric properties including validity, responsiveness and reliability [11, 12]. Reliability is defined as the extent to which test results of subjects (whose condition has not changed) are the same over time. To assess such test–retest reliability of an instrument, repeated measures are performed under the same conditions [11, 13]. In this way it is possible to quantify the proportion of total variance in repeated measurements that is due to true differences in PROMs. The measurement error describes the systematic and random error of subjects’ results that are not caused by true changes in the construct to be measured [11].

The present short report aimed to assess the test–retest reliability of the FitMáx in four different groups (healthy subjects, pulmonary, oncology, and cardiac patients) and in the total study population.

Material and methods

Setting

Pulmonary, oncology, and cardiac patients were recruited prospectively in MMC, Veldhoven and Eindhoven, the Netherlands. Healthy subjects were included at Ancora Health in Eindhoven, the Netherlands. The authorized Medical Research Ethics Committee of the MMC has reviewed the study protocol and concluded that the rules laid down in the Medical Research Involving Human Subjects Act (also known by its Dutch abbreviation WMO), do not apply to this study (reference number N20.086). The study was registered as NL8846 in the Netherlands Trial Register.

Study population

Subjects were eligible for inclusion if they were aged ≥ 18 years, had a good command of the Dutch language, and if no change in CRF was expected within 31 days from enrollment date. During their visit to MMC or Ancora Health, cardiac and pulmonary patients and healthy subjects who were scheduled to perform CPET, either for medical reasons or as part of a health check, were asked to participate in a study about CRF questionnaires. The CPET protocol is extensively described in our validation study [8]. Since oncology patients do not perform CPET as part of standard care, they were included from the outpatient clinic of the sports department without performing a CPET. Oncology patients were not eligible for inclusion when they were undergoing active disease-specific treatments, potentially affecting their CRF, within the study period. Similar to our validation study, subjects were asked to complete the FitMáx, VSAQ and DASI questionnaires. The questionnaires were administered in a paper format twice to the same subject. Subjects were excluded from analysis if the FitMáx was incomplete, or if the period between T0 and T1 was > 31 days. To minimize a possible ‘subject expectancy effect’, it was explicitly not explained that this was a study to determine the test–retest reliability of these questionnaires. All participants received a second information letter and questionnaire (T1) two weeks after T0. We did not explicitly question participants about experienced change in CRF. All participants gave written informed consent to the use of their anonymized CPET and questionnaire data.

Statistical analysis

We performed a sample size calculation with an expected ICC of 0.85, a minimum acceptable ICC of 0.60 and two measurements per individual, requiring a sample size of n = 26 per subject group to achieve a power of 80%.

Statistical analyses were performed using R, version 4.2.1 (R Foundation for Statistical Computing, Vienna, Austria) [14]. Normality of data was tested using the Shapiro–Wilk test, and checked qualitatively by means of histograms and Q–Q plots. Descriptive statistics were provided for demographic characteristics and reported as mean ± standard deviation (SD) in case of normal distribution, and as median and interquartile range (IQR) otherwise. For categorical variables, we reported frequencies and corresponding percentages.

Pearson correlation coefficient (r) was used to evaluate the linear relationship between CPET-VO2peak and Questionnaire-VO2peak at T0 [15].

To evaluate the test–retest reliability of the questionnaires, the Intraclass Correlation Coefficient (ICC) with 95% confidence interval (95%-CI) was determined (Two Way Mixed, Absolute Agreement, single measurement) [16]. The Standard Error of the Measurement (SEM, see Additional file 1: equations) [17] is a measure related to ICC, but clinically easier to interpret (expressed in the same unit as of the measurement of interest (VO2peak)). The ICC and SEM were calculated between T0 and T1 for all questionnaires in all patient groups together, and for each patient group separately. An ICC < 0.50 indicates poor test–retest reliability, 0.50–0.75 indicates moderate test–retest reliability, 0.75–0.90 indicates good test–retest reliability, and > 0.90 indicates excellent test–retest reliability [16]. The higher the ICC, the lower the SEM and vice versa, but there is no standard measure for the SEM as it depends on the standard deviation of the data.

In addition, Bland–Altman plots were used to present systematic errors with 95% limits of agreement (95%-LoA), by plotting the difference between Questionnaire-VO2peak at T0 and T1 against the mean Questionnaire-VO2peak from T0 and T1 [18].

Results

In this study, 213 subjects participated. A total of 73 subjects did not return the T1-questionnaire, resulting in a response rate of 66%. 11 subjects returned it after > 31 days from T0 and, although we did not explicitly question, two subjects reported on paper to have changed CRF due to a COVID-19 infection and were excluded as well. As such, a total of 127 participants (84 men and 43 women) were included for analysis. The time between completing the questionnaires and CPET ranged from 11 to 31 days.

Since the data collection of some patient groups was completed sooner, we continued the data collection until a group of at least n = 26 was reached for every included patient group (pulmonary, oncology, cardiac and healthy subjects). The total study population’s age ranged from 19 to 84 years. Ancora Health included healthy subjects during the COVID-19 period, using viral filters (MicroGard II, Vyaire Medical GmbH) resulting in inaccurate data, as such we omitted VO2peak data of this group [19]. As mentioned before, oncology patients were included from the outpatient clinic and did not perform CPET as part of standard care. Therefore, we present the CPET data from the total group without the healthy subjects and oncology patients. In the so-obtained population, the median VO2peak was 21.94 (16.89–31.29; IQR) ml/kg/min, which is 94.1 (85.7–134.5)% of the predicted reference value for healthy Dutch persons of the same age and sex [20]. Anthropometrical data, CPET data and questionnaire data are presented in Tables 1 and 2. Data of VSAQ and DASI questionnaires can be found in Additional file 2: Table S1.

Table 1 Participant characteristics
Table 2 Intraclass correlations of the questionnaires between T0 and T1

The FitMáx-VO2peak strongly correlated (r = 0.94 (0.91–0.97); 3.70 SEE ml/kg/min) with CPET-VO2peak. The correlation of the VSAQ and DASI with CPET-VO2peak was lower (r = 0.85 (0.76–0.91); 5.89 SEE ml/min/kg and r = 0.76 (0.63–0.85); 6.99 SEE ml/min/kg respectively), as was expected from the results of the validation study [8].

Test–retest reliability

The ICC’s and corresponding 95%-CI for each patient group are displayed in Table 2. The ICC of the FitMáx-VO2peak between T0 and T1 in the total population, was 0.97 (0.96–0.98). As a sensitivity analysis, we performed our ICC analysis in a two-way model examining potential systematic difference and found similar results, as expected. We found similar high ICC values in the VSAQ [0.94 (0.92–0.96)] and DASI [0.90 (0.85–0.93)] (more information in Additional file 2: Table S1). A Bland–Altman plot is provided in Fig. 1 (Additional file 3: Figure S1 for all questionnaires) showing the difference between the two values of FitMáx-VO2peak at T0 and T1 against their mean. The mean difference was − 0.39 (95%-LoA − 5.68 to 4.84 ml/kg/min), 0.31 (95%-LoA − 8.75 to 9.37) and 0.20 (95%-LoA − 5.56 to 5.96) for FitMáx, VSAQ and DASI respectively.

Fig. 1
figure 1

Bland–Altman plot for the FitMáx questionnaire. Notes The colors indicate the reason of the CPET visit. The dashed line represent the limits of agreement (− 1.96 to 1.96 SD). The solid line represents bias and the dotted line is the zero bias line

Discussion

The use of PROMs to assess CRF seems a simple, safe and cost-effective alternative for objective measurement using CPET in clinical settings [7]. The applicability of such PROMs collected via self-reported questionnaires depends upon several clinometric properties. An important aspect in the validation of a new questionnaire is the test–retest reliability. The FitMáx showed an excellent test–retest reliability between the VO2peak estimated at T0 and T1, with an ICC of 0.97 (0.96–0.98; IQR) in the total population. In the different patient groups the ICC ranged from 0.93 to 0.98 for FitMáx, 0.83–0.95 for VSAQ and 0.84–0.95 for DASI. The ICC (and thus SEM) support the precision and reliability of the FitMáx and VSAQ and DASI.

A study by Ravani et al. [21] assessed the test–retest reliability of the DASI. The study was performed in pre‐dialysis patients and patients who received a kidney transplant, and obtained an ICC of 0.71 and 0.81, respectively. These ICC values were lower than the ICC value(s) we found in the current study. This difference may be caused by the 6‐month window they used in their study, which could have resulted in true CRF changes and therefore lower reliability [21].

Strengths

The strength of the current study lies in the diverse study population. We initially included healthy subjects, oncology, pulmonary, and cardiac patients. Although oncology patients and healthy subjects did not perform (valid) CPET, a wide range of VO2peak values was observed in the current study population. The VO2peak ranged from (extremely) low to above average [21.94 (9.8–53.3)]. The FitMáx proves to be widely applicable in a clinical population, with both low and high VO2peak. Moreover, the ICC values of the FitMáx show little variance in the several subject groups. Therefore we can conclude that the ICC is independent of the CPET-VO2peak and the different patient groups to estimate CRF. At last, we ensured minimized ‘subject expectancy effect’ as the participants were not told that this study aimed to determine the test–retest reliability of the FitMáx, but that they could possibly be approached a second time for the purpose of this study.

Clinical applicability

The FitMáx is an inexpensive tool with low burden for subjects to assess CRF. Moreover, the questionnaire proves to be effective in various populations and provides information on daily life activities in several dimensions (intensity, frequency and duration). The current study shows that the FitMáx is reliable to assess CRF over time when no change in CRF has occurred. This makes FitMáx a useful tool to assess self-reported CRF among patients and healthy subjects in clinical settings.

Limitations

The study reached a response rate of only 66%. This might be explained by the assumption of patients that they already completed the exact same questionnaires before. The test–retest period used in the current study was on average 18 days, which could have been too short to prevent subjects from remembering the response of the FitMáx from memory. However, following recommendations, we have deliberately chosen for this short recall period in order to reduce reporting error in estimates of CRF due to fluctuating experienced physical fitness, especially in patients [2, 22]. The small sample size prohibited statistical testing to compare the ICC between questionnaires. Although inspection of the ICC in supplementary material might suggest a higher reproducibility for FitMáx in most patient groups, all three questionnaire revealed high ICC values. This possible difference may not be statistically or clinically relevant.

Conclusion

The FitMáx proves to be highly reliable in repeated measures to assess CRF of patients with different conditions and healthy subjects, when no change in CRF was expected. This increases the applicability and clinical usefulness of the FitMáx.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

BMI:

Body mass index

cm:

Centimetres

COPD:

Chronic obstructive pulmonary disease

CPET:

Cardiopulmonary exercise testing

CRF:

Cardiorespiratory fitness

DASI:

Duke activity status index

FEV1:

Forced expiratory volume in 1s

FVC:

Forced vital capacity

FitMáx:

FitMáx©-questionnaire

GOLD:

Global initiative for chronic obstructive lung disease

HR:

Heartrate

ICC:

Intraclass correlation coefficient

IQR:

Interquartile range

kg:

Kilograms

kg/m2 :

Kilograms per square meter

L:

Liters

m:

Meters

min:

Minutes

ml:

Milliliters

MMC:

Máxima medical centre

N:

Number of subjects

PROMs:

Patient-reported outcome measures

r:

Pearson’s correlation

RER:

Respiratory exchange ratio

SD:

Standard deviation

SEE:

Standard error of the estimate

SEM:

Standard error of the measurement

T0 :

Baseline measurement

T1 :

Second measurement

VO2peak :

Oxygen uptake at peak exercise

VSAQ:

Veterans Specific Activity Questionnaire

W:

Watts

95%-CI:

95% Confidence interval

95%-LoA:

95% Limits of Agreement

References

  1. Campbell KL, Winters-Stone KM, Wiskemann J et al (2019) Exercise guidelines for cancer survivors: consensus statement from international multidisciplinary roundtable. Med Sci Sports Exerc 51(11):2375–2390

    Article  PubMed  PubMed Central  Google Scholar 

  2. Ross R, Blair SN, Arena R et al (2016) Importance of assessing cardiorespiratory fitness in clinical practice: a case for fitness as a clinical vital sign: a scientific statement from the American Heart Association. Circulation 134(24):e653–e699

    Article  PubMed  Google Scholar 

  3. ATS/ACCP Statement on cardiopulmonary exercise testing (2003) Am J Respir Crit Care Med 167(2):211–277

    Article  Google Scholar 

  4. American College of Sports M, Riebe D, Ehrman JK, Liguori G, Magal M. ACSM's guidelines for exercise testing and prescription. 2018.

  5. Albouaini K, Egred M, Alahmar A, Wright DJ (2007) Cardiopulmonary exercise testing and its application. Postgrad Med J 83(985):675–682

    Article  PubMed  PubMed Central  Google Scholar 

  6. Chevalier L, Kervio G, Doutreleau S et al (2017) The medical value and cost-effectiveness of an exercise test for sport preparticipation evaluation in asymptomatic middle-aged white male and female athletes. Arch Cardiovasc Dis 110(3):149–156

    Article  PubMed  Google Scholar 

  7. Deshpande PR, Rajan S, Sudeepthi BL, Abdul Nazir CP (2011) Patient-reported outcomes: a new era in clinical research. Perspect Clin Res 2(4):137–144

    Article  PubMed  PubMed Central  Google Scholar 

  8. Meijer R, van Hooff M, Papen-Botterhuis NE et al (2022) Estimating VO2peak in 18–90 year-old adults: development and validation of the FitMax(c)-questionnaire. Int J Gen Med 15:3727–3737

    Article  PubMed  PubMed Central  Google Scholar 

  9. Hlatky MA, Boineau RE, Higginbotham MB et al (1989) A brief self-administered questionnaire to determine functional capacity (the Duke Activity Status Index). Am J Cardiol 64(10):651–654

    Article  CAS  PubMed  Google Scholar 

  10. Myers J, Do D, Herbert W, Ribisl P, Froelicher VF (1994) A nomogram to predict exercise capacity from a specific activity questionnaire and clinical data. Am J Cardiol 73(8):591–596

    Article  CAS  PubMed  Google Scholar 

  11. Mokkink LB, Terwee CB, Patrick DL et al (2010) The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 63(7):737–745

    Article  PubMed  Google Scholar 

  12. Mokkink LB, Terwee CB, Patrick DL et al (2010) The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 19(4):539–549

    Article  PubMed  PubMed Central  Google Scholar 

  13. Qin S, Nelson L, McLeod L, Eremenco S, Coons SJ (2019) Assessing test-retest reliability of patient-reported outcome measures using intraclass correlation coefficients: recommendations for selecting and documenting the analytical formula. Qual Life Res 28(4):1029–1033

    Article  PubMed  Google Scholar 

  14. R: A Language and Environment for Statistical Computing [computer program]. Version 4.2.1.: R Foundation for Statistical Computing; 2022.

  15. Lane DM. Standard Error of the Estimate. onlinestatbook. https://onlinestatbook.com/2/regression/accuracy.html. Accessed 23-03-2023.

  16. Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15(2):155–163

    Article  PubMed  PubMed Central  Google Scholar 

  17. Weir JP (2005) Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res 19(1):231–240

    PubMed  Google Scholar 

  18. Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1(8476):307–310

    Article  CAS  PubMed  Google Scholar 

  19. Marques MS, Fonseca A, Lima R, Ladeira I, Gomes J, Guimarães M (2022) Effect of a viral filter on cardiopulmonary exercise testing. Pulmonology 28(2):140–141

    Article  PubMed  Google Scholar 

  20. van der Steeg GE, Takken T (2021) Reference values for maximum oxygen uptake relative to body mass in Dutch/Flemish subjects aged 6–65 years: the LowLands Fitness Registry. Eur J Appl Physiol 121(4):1189–1196

    Article  PubMed  PubMed Central  Google Scholar 

  21. Ravani P, Kilb B, Bedi H, Groeneveld S, Yilmaz S, Mustata S (2012) The Duke Activity Status Index in patients with chronic kidney disease: a reliability study. Clin J Am Soc Nephrol 7(4):573–580

    Article  PubMed  Google Scholar 

  22. Matthews CE, Moore SC, George SM, Sampson J, Bowles HR (2012) Improving self-reports of active and sedentary behaviors in large epidemiologic studies. Exerc Sport Sci Rev 40(3):118–126

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank B. Dörssers and J. Verheij for their contribution to this research. We would also like to thank Richard Post for answering our statistical questions.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

RM was involved in study design, data collection, composing the database, statistical analysis and was a major contributor in writing the manuscript. GS was involved in study design and writing the manuscript. MR was involved in study design, statistical analysis and writing the manuscript. NEPB was involved in study design and writing the manuscript. HHCMS was involved in study design and writing the manuscript. MH was involved in study design, data collection, composing the database, statistical analysis and was a major contributor in writing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Renske Meijer.

Ethics declarations

Ethics approval and consent to participate

The authorized Medical Research Ethics Committee of the MMC has reviewed the study protocol and concluded that the rules laid down in the Medical Research Involving Human Subjects Act (also known by its Dutch abbreviation WMO), do not apply to this study (reference number N20.086). All participants gave written informed consent to the use of their anonymized CPET and questionnaire data.

Consent for publication

Not applicable.

Competing interests

The authors report no conflicts of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: 

Equations used to calculate the Standard Error of the Estimate (SEE) and  the Standard Error of the Measurement (SEM).

Additional file 2: Table S1. 

Questionnaire data between T0 and T1.

Additional file 3: Fig. S1. A–C

 Bland-Altman plot for the FitMáx, VSAQ and DASI questionnaire. Notes: The colors indicate the reason of the CPET visit. The dashed line represent the limits of agreement (− 1.96 to 1.96 SD). The solid line represents bias and the dotted line is the zero bias line.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meijer, R., Schep, G., Regis, M. et al. Test–retest reliability of the FitMáx©-questionnaire in a clinical and healthy population. J Patient Rep Outcomes 8, 3 (2024). https://doi.org/10.1186/s41687-023-00682-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s41687-023-00682-9

Keywords