A systematic review of the measurement properties of patient reported outcome measures used for adults with an ankle fracture

Background Ankle fractures are painful and debilitating injuries that pose a significant burden to society and healthcare systems. Patient reported outcome measures (PROMs) are commonly used outcome measures in clinical trials of interventions for ankle fracture but there is little evidence on their validity and reliability. This systematic review aims to identify and appraise evidence for the measurement properties of ankle specific PROMs used in adults with an ankle fracture using Consensus Based Standards for the Selection of Health Measurement Instrument (COSMIN) methodology. Methods We searched MEDLINE, Embase and CINAHL online databases for evidence of measurement properties of ankle specific PROMs. Articles were included if they assessed or described the development of the PROM in adults with ankle fracture. Articles were ineligible if they used the PROM to assess the measurement properties of another instrument. Abstracts without full articles and conference proceedings were ineligible, as were articles that adapted the PROM under evaluation without any formal justification of the changes as part of a cross-cultural validation or translation process. Two reviewers completed the screening. To assess methodological quality we used COSMIN risk of bias checklist and summarised evidence using COSMIN quality criteria and a modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. Two reviewers assessed the methodological quality and extracted the data for a sample of articles. Results The searches returned a total of 377 articles. From these, six articles were included after application of eligibility criteria. These articles evaluated three PROMs: A-FORM, OMAS and AAOS. The A-FORM had evidence of a robust development process within the patient population, however lacks post-formulation testing. The OMAS showed sufficient levels of reliability, internal consistency and construct validity. The AAOS showed low quality evidence of sufficient construct validity. Conclusions There is insufficient evidence to support the recommendation of a particular PROM for use in adult ankle fracture research based on COSMIN methodology. Further validation of these outcome measures is required in order to ensure PROMs used in this area are sufficiently valid and reliable to assess treatment effects. This would enable high quality, evidenced-based management of adults with ankle fracture.


Background
Ankle fractures cause significant pain, reduced mobility and subsequent limitation of usual activities [1]. The injury overall demonstrates a bimodal distribution, most commonly affecting young active males and older females. However some fracture patterns, such as more severe bi-malleolar and tri-malleolar ankle fractures demonstrate a unimodal distribution, most commonly affecting an older female population, indicative of being an osteoporotic injury [2,3]. Epidemiological studies have shown that the incidence of ankle fractures is rising, likely due to the ageing population, many of whom continue to remain physically active into later life [4,5]. Ankle fractures contribute to the increasing health and social care costs that accompanies an ageing population, specifically the cost of managing fragility fractures [6]. This cost was approximately €37.5billion across six European countries in 2017; a figure that is forecasted to rise to €47.4 billion by the year 2030 [7]. Fractures of the lower limb have a significant impact on the lives of individuals affected, not only on mobility and usual activities but they have also been linked to the development of anxiety and depression [8]. Evidence based treatment of burdensome and prevalent injuries such as ankle fractures is important, yet there is a lack of consensus surrounding the optimal management strategies for this injury [9]. It is therefore of paramount importance that funding bodies continue to allocate resources for the conduct high quality clinical trials in order to establish the most cost-effective management strategies for ankle fractures [9,10].
Clinical trials of interventions for fractures of the lower limb often utilise Patient Reported Outcome Measures (PROMs) as primary outcomes [11][12][13]. It is important that the instruments used to measure treatment effects in clinical trials demonstrate adequate measurement properties, such as validity, reliability and responsiveness, for the population they intend to assess. However, there is evidence that some widely used PROMs in trauma and orthopaedic research lack evidence for their measurement properties [14].
Conducting a randomised controlled trial is expensive, time consuming and relies on the good will of participants to be randomised to an intervention and complete questionnaires. If the PROM used in a clinical trial does not measure the treatment effects of the interventions in a valid and reliable way, this places the unnecessary burden of randomisation and trial processes onto participants. Using PROMs with insufficient measurement properties in randomised controlled trial is therefore a waste of resource and unethical [15]. A systematic review assessing the psychometric properties of PROMs for ankle fracture has been completed previously [16], which concluded that the Ankle Fracture Outcome of Rehabilitation Measure (A-FORM) was the most appropriate measure to use. However, considering the small number of articles included in this review, the growing incidence of ankle fractures and subsequent need for research in this area, an update is deemed timely, with a particular focus on PROMs currently and previously used in randomised controlled trials of interventions for ankle fractures.
The aim of this review is to identify and critically appraise the available evidence for the measurement properties of foot and ankle specific PROMs for use in adults with an ankle fracture. The results of this review will aim to determine the most appropriate instrument for use in evaluating change resulting from interventions in the context of randomised controlled trials in this research area.

Methods
We prospectively registered this review with PROSPERO International Prospective Register of Systematic Reviews (Reference CRD42018103112). Consensus Based Standards for the Selection of Health Measurement Instrument (COS-MIN) Methodology for Systematic Reviews of Measurement properties of PROMs was adhered to [15] and this review utilises definitions according to published COSMIN consensus based terminology [17]. This systematic review is reported using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (Additional file 2) [18].
This review was completed following a previous systematic review looking to assess all outcome measures collected in clinical trials of interventions for ankle fracture [19]. The outcome measures included all both primary and secondary outcome measures and we formulated a comprehensive list of all ankle specific PROMs collected. These PROMs formed the prespecified list we used to identify evidence for and evaluate during this current review. The PROMs on the pre-specified list being evaluated in this review are: the AAOS Foot and Ankle Outcome Questionnaire (AAOS) [20], the Ankle Fracture Outcome of Rehabilitation Measure (A-FORM) [21], the Foot and Ankle Ability Measure (FAAM) [22], the Karlsson Score (KS) [23], the KOOS Foot and Ankle Outcome Survey (FAOS) [24] the Manchester-Oxford Foot and Ankle Questionnaire (MOXFQ) [25] and the Olerud Molander Ankle Score (OMAS) [26].

Eligibility criteria
Included articles assessed the measurement properties, development or interpretability of one or more of the PROMs included in the pre-specified list in a majority patient population of adults with ankle fracture. Here, majority is defined as equal to or greater than 50% of the sample. In articles which did not reach the criteria of 50% but performed a separate analysis on the ankle fracture sub-sample of patients, these articles were included and only the analyses performed on the single subsample of individuals with ankle fracture were included; any analyses on the sample as a whole or comparing the two clinical groups were not included.
Articles were ineligible for inclusion if they use the PROM/s only for outcome measurement in an experimental study, where no formal evaluation of a measurement property is completed. Articles which use the PROM in question to validate another PROM (not on the pre-specified list here) were also ineligible for inclusion. Studies were excluded if the authors adapted the PROM in any way without formal justification of the changes as part of a translation or cross-cultural validation process. Abstracts without full articles and conference proceedings were not eligible for inclusion.

Search strategy and study selection
A systematic search of the literature was completed using the MEDLINE, EMBASE and CINAHL databases on 16/04/2019 up to the present date with no date limits applied using search strategies developed by the COSMIN group specifically for this type of review [27]. Additional file 1 details the search strategies. We also reviewed the reference lists of all included studies for any other potentially eligible papers for inclusion.
The lead author and a second reviewer (AR) independently screened the articles by title and abstract for possible inclusion. The reviewers selected any articles which were potentially eligible from title and abstract review and retrieved the full text. If it was unclear at the initial title and abstract review, the full text was retrieved and reviewed for purposes of completeness. If at least one of the reviewers felt that a study might be eligible based upon the initial title and abstract screening, then both researchers independently reviewed the full text to assess eligibility for inclusion. The reviewers then discussed findings and reached consensus on inclusion of articles. In instances of disagreement, a third reviewer (RSK) was consulted for a final decision.

Assessment of methodological quality and assessment of measurement properties
The methodological quality of the articles included in this review was assessed using the COSMIN risk of bias checklist [28]. Evidence for the measurement properties in the included articles was extracted and assessed against the COSMIN criteria of good measurement properties. The overall evidence from all articles was pooled and summarised using the modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) quality of evidence method [15]. The assessment of methodological quality and the data extraction was completed for all articles by the lead author initially. A second reviewer (EK) independently reviewed the methodological quality and performed data extraction in a sample of the articles (> 50%) to ensure a reduction of bias in the methodological quality assessment and data extraction process. Following independent review, authors discussed their results and reached consensus. When unable to reach a consensus, a third reviewer (RSK) was consulted for a final decision.
A decision was made that the criteria and box for criterion validity was not to be completed as there is no accepted gold-standard measure for assessing outcome in adults with ankle fracture, therefore this measurement property does not apply in this particular case. If reported, data on the interpretability and feasibility of the PROMs were also extracted and reviewed. We contacted developers of the PROMs where possible to obtain a copy of the user manual (if available) and to ensure that, to their knowledge, there were no further validation studies on the scores which may not have been identified in the database searches.

Hypotheses for construct validity
Hypotheses for assessing the construct validity evidence in the instances that this was assessed in the included articles was pre-defined [29]. The following thresholds of correlation were used for the hypothesis setting:

1.
A weak correlation is defined as < 0.30 2. A weak to moderate correlation is defined as > 0.20 -< 0.40 3. A moderate correlation is defined as > 0.30 -< 0.70 4. A moderate to high correlation is defined as > 0.60 to < 0.80 5. A high correlation is defined as > 0.70 The hypotheses tested during this review for construct validity are outlined in Table 1:

Search results
The searches produced a total of 377 returns. Following initial screening of the titles and abstracts, 353 records were excluded, leaving 24 articles for full text review. Following full-text review of the 24 articles, six articles were included in this review [30][31][32][33][34] and details of the application of the eligibility criteria can be found in the PRISMA Diagram in Fig. 1. The included six articles assessed three of the eight pre-specified PROMs; the AAOS, A-FORM and OMAS. There was no evidence for the measurement properties of the remaining PROMs in the pre-specified list (FAAM, FAOS, KS and MOXFQ) in the population of adults with ankle fracture. Table 2 shows the characteristics of the PROMs included in this review.

Characteristics of included PROMs
All of the PROMs included in this review are paper based questionnaires self-administered by the patient either in a clinical or research context. The AAOS consists of 25 questions including stiffness (one item), swelling (one item), pain (nine items), giving way (three items), function (six items) and footwear (five items). The score consists of a core score (AAOS-CS) comprising of 20 items and a shoe comfort scale (AAOS-SCS) comprising of five items. The scores are calculated to a normative score for each of these two scales, which is then converted to a summative mean for both the AAOS-CS and AAOS-SCS. The summative score for each subscale ranges between 0 and 100 with higher scores indicating a more favourable outcome.
The A-FORM consists of 15 items including pain, swelling, stiffness, anxiety regarding footwear, sleeping, jumping, waking, social aspects, anxiety related to future ankle function, depression and fatigue. The raw score is converted to a summary score which ranges between 0 and 100, with lower scores indicating more favourable outcomes. The footwear item is not included in the summary score conversion, so users are asked to omit this item from the summary score conversion process. The summary score conversion table is found in the user manual which can be requested from the developers at no cost to users. The summary score conversion was based on the Rasch analysis presented in the development article included in this review [32].
The OMAS is a nine-item questionnaire including pain, stiffness, swelling, stairs, squatting, supports, jumping, running and usual activities. Final scores range between 0 and 100 with higher scores indicating more favourable outcomes. The score is totalled using the scoring system provided in the development paper included in this review [26]. Different items of the score provide varying numbers of points which contribute to the overall score. For example, the item for pain is awarded between 0 and 25 points depending on the answer selected, work and activities of daily living between 0 and 20 points and squatting between 0 and 5 points. Table 3 shows the characteristics of the six studies included in this review. As Table 4 demonstrates, none of the articles included here scored higher than adequate Correlation with scores of instruments measuring a similar construct or another PROM included in the pre-specified list will be highly or moderately to highly correlated.

Study characteristics and methodological quality assessment
2 Correlation with scores of instruments measuring related but not the same constructs, for example generic disability scores or health related quality of life measures will be either moderately to highly or moderately correlated.

3
A weak to moderate correlation will be observed between PROM/s scores of instruments included here and two different subgroups of patients. These subgroups will be individuals who have had their fracture managed operatively and those who have had their fracture managed non-operatively. Here, fracture management is used as a surrogate for severity of fracture (i.e. more severe fractures usually managed operatively). Therefore, we would expect to see a weak to moderate correlation between the PROM score and the severity of the fracture. on the methodological quality assessment checklist. Whilst several articles [30][31][32][33][34] translated the PROM and then performed analyses of measurement properties on the translated PROM, these studies did not crossculturally validate the translated PROMs using an analysis of measurement invariance. Therefore, it was not possible to determine any differences in scores secondary to cultural contextual factors and the box for cross-cultural validity was not deemed to be relevant in these instances. The developers of the A-FORM instrument [21] did perform an assessment of internal consistency using Cronbach's alpha and structural validity using a Rasch Analysis, however these analyses were not completed on the final set of questions but on a larger set of the initial items for purposes of determining inclusion in the questionnaire. Therefore, this article was not scored for internal consistency and structural validity in this case as these analyses were completed for purposes of item reduction.
Following the COSMIN guidance for PROM development, an article encountered in the reference list of the  A-FORM development articles [32] was taken into consideration as it involved the development of the A-FORM [1]. Whilst this article did not meet the inclusion criteria of the review, the review team felt this article provided important developmental work for the PROM, therefore the information presented in this article was included when completing the box for PROM development of the A-FORM. Table 5 shows the results presented for each of the measurement properties in the included articles in this   review. Table 6 shows the summary of findings table, demonstrating the overall evidence for measurement properties against the COSMIN GRADE Assessment. The AAOS demonstrated low levels of evidence for sufficient construct validity. Zelle et al. [34] correlated the scores of the AAOS-CS and AAOS-SCS with the scores of the SF-36 subscales: the Physical Component Score (PCS) and Mental Component Score (MCS). The results of these four correlations performed met hypothesis 2 of the pre-defined hypotheses detailed in Table 1. The authors also assessed the test-retest reliability of the translated questionnaire, however, this result was indeterminate for this measurement property as the ICC or weighted Kappa were not reported in the results.

Measurement properties
McPhail et al. [21] detailed the development of the A-FORM through completion of item reduction exercises including a Delphi study and Rasch analysis. The development of the article was thorough and included both patients and clinicians in the concept elicitation interviews and the item-reduction Delphi exercise. However there was a gap in the evidence here with regards to content validity as there was no cognitive interview testing done on the final version of the questionnaire to assess relevance and comprehensiveness of the instrument, therefore the content validity box was not completed [35].
Authors of the included studies assessed the translated versions of the OMAS for structural validity in Norwegian and internal consistency, reliability and construct validity in both Norwegian and Turkish languages. The OMAS Norwegian version achieved high level evidence for sufficient construct validity; Garratt et al. [33] correlated the OMAS scores with the scores of the Self-Reported Foot and Ankle Score (SEFAS) which met hypothesis 1 of the pre-defined hypotheses in Table 1. They also correlated the OMAS scores with the EQ-5D and the SF-36 scores respectively, both of which met hypothesis 2 of those pre-defined in Table 1. The Norwegian OMAS achieved high level evidence for sufficient structural validity. The OMAS in both Buker et al. [30] and Turhan et al. [31] correlated the scores of the Turkish version of the OMAS with various patient reported outcome measures, all of which met hypotheses 1 or 2 in the predefined hypotheses in Table 1. Turkish and Norwegian versions achieved low-level evidence for sufficient reliability where reported. Both The OMAS was assessed for the measurement error through assessment of the minimal detectable change however as no data is available on the minimal important change for this PROM, results for this measurement property were indeterminate against COSMIN criteria.
Interpretability and feasibility Table 7 shows the information reported in the articles on the interpretability and feasibility of the PROMs included in this review.
There was no information reported in any of the included studies on response shift or minimal importance difference of the measures therefore these facets of interpretability have not been included in Table 7. Some articles did not report any data on the interpretability of the scores evaluated. Whilst the majority of articles included here do not report aspects of feasibility in there research, throughout the process of the review, we could conclude that they were all available free of charge without the need to purchase a licence. The instruments are easy and relatively quick to complete in a clinic setting or remotely and returned in the post, placing minimal burden on participants completing them. We found no information or guidance available on any of the included PROMs regarding completion electronically or via telephone. Like most questionnaires, the PROMs included here require the ability to read, comprehend and respond to the questions, with no evidence found during this review of these instruments being suitable for measurements by proxy. COSMIN methodology advises that in order to recommend a PROM, it should demonstrate any level of content validity and a minimum of low level evidence for internal consistency [15]. None of the instruments included in the review have met this criteria, therefore we are unable to recommend any of these PROMs for use in this patient population. However, there is no evidence of insufficient measurement properties in these PROMs, therefore further validation studies are required before they can be recommended for use in this patient population [15].

Discussion
This review demonstrates that at the time this review was undertaken, none of the PROMs used in clinical trials of interventions for ankle fracture had adequate evidence of measurement properties and we are therefore unable to recommend a particular PROM for use in this context and patient population. Furthermore, there were four additional PROMs (FAAM, FAOS, KS, and MOXFQ) which have been or are currently being used in clinical trials of interventions for ankle fracture for which the current review did not find any evidence of their measurement properties within the patient population. Whilst the OMAS demonstrates sufficient internal consistency, structural validity and construct validity, the PROM development scored poorly against COSMIN criteria used in this review. In contrast, the A-FORM demonstrates some evidence for PROM development within the patient population, but there is limited postformulation testing of this PROM.
This review updates the one completed in 2016 by Ng et al. [16] which assessed the psychometric properties of PROMs for ankle fractures. The current review includes four additional recently published articles and focussed on only ankle specific PROMs, whereas the previous review also included articles assessing both ankle and generic health-related quality of life PROMs. This review differs in that we used a pre-specified list of ankle specific PROMs which have been and are currently used in clinical trials for ankle fracture interventions. Ng et al. [16] recommended the use of the A-FORM suggesting it has a robust development process within the patient population. Whilst we agree that the A-FORM has more a more adequate development process when compared to other PROMs presented here, we do not think it is appropriate for recommendation due to the lack of evidence of sufficient internal consistency of the final version of the instrument. This is based on the updated COSMIN guidance on systematic reviews of this nature. Other studies have completed similar reviews on outcome measures used in generic foot and ankle research with similar results presented. A review assessing all foot and ankle PROMs for use in any foot and ankle disorder concluded that there was no region specific outcome measure with appropriate levels of evidence for their measurement properties for use in individuals with foot and ankle disorders [36].
Strengths of this review include the use of a welldeveloped, thorough and consensus based methodology and search filters for finding and reviewing the evidence for development and measurement properties of PROMs. Limitations of the review include the inherent difficulty in defining the construct under analysis; there is little research into the experiences of individuals recovering from an ankle fracture and further research into the construct of interest would be beneficial. The construct of outcome in ankle fracture recovery may vary depending on several individual factors, such as age, gender and whether the fracture is treated operatively or non-operatively. When considering the varied distributions of the different ankle fracture patterns which has been demonstrated in the epidemiological literature [3], one could argue that osteoporotic fractures in older adults are a different injury to those sustained by younger adults. Subsequently, the construct in question between these two different patient groups might vary considerably and may require different PROMs or versions of PROMs. Furthermore, the articles included here assessed differing populations with regard to fracture management; some assessed only operatively managed ankle fractures [26,30,33] and others included a mixture of operatively and non-operatively managed fractures [21,31]. One article also included non-ankle fractures patients, which may have further confounded the results for the measurement properties assessed here [34]. Four of the included articles here were concerned with the OMAS [26,[30][31][32][33], only one article did so for the AAOS [34] and another one for the A-FORM [21], making it difficult to compare evidence between the three PROMs. We encountered difficulty in applying the COSMIN methodology and assessment criteria to older articles such as the development of the OMAS instrument [26]. We acknowledge that the age of an instrument does not excuse it from critical review and analysis and further research into the acceptability of these instruments to patients is warranted to inform the ongoing use of older PROMs.

Conclusions and implications
This review shows that currently there is no PROM that can be recommended for use for the purpose of assessing outcome in clinical trials of interventions for ankle fracture. Further validation work should focus on ascertaining the acceptability, relevance and comprehensiveness of commonly used questionnaires such as OMAS in a population of adults with ankle fracture. Future research studies in this area should make use of COSMIN based standards for designing and reporting validation research to ensure that the appropriate evidence base is acquired for a PROM to be recommended. As this review demonstrates, there is no evidence that this PROM was formulated with the input of individuals who have ankle fractures and understanding the content validity of this widely used instrument would enable an understanding of whether it is fit for purpose in the patient population or whether the use of this outcome measure should be discontinued. Furthermore, the OMAS demonstrated ceiling effects in excess of the widely recognised acceptable level of 15% [37,38], which warrants further investigation.
Future exploratory research should aim to understand the patient experience of ankle fracture and the factors of most importance to individuals with this injury, with an understanding that this may differ between age group of the individuals and possibly fracture management. It might well be that the construct between these groups differs so much that it is not appropriate for the same PROM to be used between these populations. Exploring the relevance and comprehensiveness of PROMs such as the OMAS which were not developed with input from the patient population would be beneficial to ascertain the appropriateness of the ongoing use of these outcome measure. None of the articles here assessed the responsiveness of the PROMs and future research should seek to ensure that the instruments are suitably responsive to detect treatment effects in resource-intensive clinical trials. Furthermore, validation of the A-FORM questionnaire to ascertain the measurement properties of this PROM in its final format would be advantageous. Further validation research of the PROMs used in ankle fracture is warranted here to ensure that randomised controlled trials in this clinical area answer the questions needed to manage these individuals most effectively. Furthermore, the preparation of an agreed core outcome set for use in this patient population would be advantageous, enabling the conduct of high quality trials using an appropriate and standardised set of outcome measures for this important injury.