This multi-centre prospective study was conducted at two tertiary teaching hospitals in Australia. Orthopaedic surgeons operate routinely at both hospitals, performing approximately 500 hip arthroplasty procedures per year. Due to SARS Covid-19 related restrictions on elective operations, in 2020, this number was reduced to approximately 300 patients. The local Human Research Ethics Committee granted multi-centre approval (SALHN/329.17).
All consecutive adult patients undergoing elective total hip arthroplasty surgery were prospectively enrolled over an almost three-year period from 8th January 2018 to 1st of October 2020, with a six-month follow-up until 2nd April 2021. Informed consent was obtained from all participants. Baseline demographics were recorded for all patients, including age, gender, body mass index (BMI) and Charlson comorbidity index (CCI) [15].
Data were recorded by a dedicated research assistant, using scripted questionnaires either via telephone or via a written survey sent by postal mail. The same English language script was used at three different time points: preoperatively and six weeks and six months postoperatively. At all three time points, two validated PROMs were used: the Oxford Hip Score (OHS) [16] and the EQ-5D-5L [3] including the EQ-VAS stand-alone component. Data were entered into a password secured database and stored on the hospital computer network.
Patients were included for analysis if they had complete quality of life data. This was defined as completing the EQ-5D-5L and OHS preoperatively and at six weeks postoperatively. The validation of the EQ-5D-5L utility values was established using a discrete choice experiment approach [17].
Oxford Hip Score
The OHS is a joint-specific PROM [18] that has been used extensively over the last 20 years [19,20,21]. It assesses six fields, each with 2 questions (12 questions total). These fields are pain, walking, physical activity, function, quality of life and psychological wellbeing. Each question is scored on a 5-point discrete visual analogue scale, with higher numbers correlating with better function. (Appendix 1). The final score is a total of the individual question scores. In this study, it effectively functioned as a comparative control.
EQ-5D-5L index and EQ-VAS
The EQ-5D-5L is a standardized health-related quality of life (HRQoL) PROM that the EuroQol Group designed to quantify generic health in the adult population in the fields of mobility, self-care, usual activities, anxiety/depression and pain/discomfort. Response levels are on a 5-point scale of none, slight, moderate, severe and extreme/unable to perform. Based on Australian general population preference weights determined through a discrete choice experiment approach [17], a utility index ranging from − 0.676 to 1 can be attached to each of the EQ-5D-5L health states. Higher utilities represent better HRQoL.
The EQ-VAS is a vertical visual analogue scale that forms part of the EQ-5D-5L. It asks patients to rate their general health from 0 to 100. Higher numeric scores represent better patient function.
Statistical analysis
All statistical analyses were performed using STATA version 17 (StataCorp, Texas, USA). Continuous variables (age, BMI, CCI) were expressed as mean and standard deviation, whereas the categorical variable (gender) was expressed as percentages (counts). A p-value of < 0.05 was considered statistically significant.
Concurrent validity, predictive validity and agreement
For analysis of concurrent validity, the Spearman’s correlation coefficient (rho, ρ) was utilised to compare the EQ-5D-5L index score, dimension scores of the EQ-5D-5L and EQ-VAS against the OHS. The strength of the relationship was considered low/weak (ρ < 0.25), fair (ρ = 0.25–0.50), good (ρ = 0.50–0.75), and excellent (ρ > 0.75). This magnitude of rank order correlations was sourced from previous publications on the same area [22, 23]. Predictive validity was ascertained using a regression framework whilst controlling for confounders. We utilised generalized linear models with the 6-week and 6-month postoperative PROMs as the dependant variables and preoperative values and baseline characteristics as independent variables. The average marginal effect regarding preoperative score was used to compare models if different distribution families were utilised. Agreement between the EQ-5D-5L index score and the OHS was measured using Krippendorff’s alpha, which is a reliability coefficient designed to measure the agreement among observers, coders, judges, raters, or measuring instruments [7, 24]. The following interpretations of agreement were applied: below 0.0—poor, 0.00 to 0.20—slight, 0.21 to 0.40—fair, 0.41 to 0.60—moderate, 0.61 to 0.80—substantial and 0.81 to 1.00—almost perfect [25]. Two measures of absolute agreement were considered as alternatives to Krippendorff’s alpha: Lin’s Concordance Correlation Coefficient (CCC), which is robust to departures from normality [26] and Intraclass Correlation Coefficient (ICC), with PROM data transformed using power analysis to conform to assumptions of normality and stable variance required for ICC [27,28,29]. The ICC was based on a two-way mixed-effect model where the individual effect was random and the effect of the instrument was fixed. Data were analyzed using Intercooled Stata software version 17.1 for Windows (Stata Corp. College Station, TX, USA). Values of the ICC and CCC higher than 0.9 were considered to indicate excellent reliability, good between 0.9 and 0.75, moderate between 0.75 and 0.5, and poor below 0.5 [27].
Responsiveness
Responsiveness is a measure of the sensitivity of PROMs to reflect the change in health status over time. For this study, we compared measures at baseline and at 6 weeks and 6 months follow-up using paired t-tests. Further assessment of responsiveness was quantified using effect size (ES) and standardized response mean (SRM).
Effect size was calculated using the formula:
$$ES = \frac{Mean\,Difference\,from\,Baseline}{{Standard\,Deviation\,at\,Baseline}}$$
Standard response mean was calculated using the formula:
$$SRM = \frac{Mean\,Difference\,from\,Baseline}{{Standard\,Deviation\,of\,Difference}}$$
ES and SRM were classified according to Cohen’s rule of thumb, as large (≥ 0.8), moderate (0.5–0.79) or small (< 0.5). Both ES and SRM are standardized measures of change over time in health, independent of sample size.
Influence of baseline characteristics on PROMs
Regression analysis using generalised linear models was performed with respect to baseline characteristics (age, gender, BMI and CCI), using the preoperative EQ-5D-5L index score, EQ-VAS and OHS as independent variables. The postoperative PROMs were used as the dependent variables. Depending on the distribution of the dependant variable, an appropriate distribution family and canonical link function were chosen. Multiple families were trialled when there was difficulty ascertaining the appropriate family of distribution, and the best fitting model was selected based on low Akaike's Information Criteria and Bayesian Information Criteria score. The coefficient, standard error and p-values were recorded.
Since the EQ-5D-5L index scores had negative values, it was determined that the Gaussian family of distribution with a canonical identity link was most appropriate. Both OHS and EQ-VAS had a non-negative distribution. Multiple families and their canonical links were fitted, including Gaussian, inverse Gaussian, Poisson, and Gamma distributions were tested for best fit. In both OHS and EQ-VAS, it was determined that the Gamma distribution provided the best fit and was hence used for the final model.