Skip to main content

Psychometric properties of a custom Patient-Reported Outcomes Measurement Information System (PROMIS) physical function short form and worst stiffness numeric rating scale in tenosynovial giant cell tumors

Abstract

Purpose

The purpose of this study was to evaluate the psychometric properties of the PROMIS-Physical Function (PF) and Worst Stiffness Numeric Rating Scale (NRS) among patients with tenosynovial giant cell tumors (TGCT).

Methods

Measurement properties of the customized lower extremity (LE) and upper extremity (UE) PROMIS-PF scales and Worst Stiffness NRS were assessed using data from the Phase 3 ENLIVEN trial (n = 120). Anchor- and distribution-based analyses were utilized to derive a responder threshold for meaningful change over time. The Patient Global Rating of Concept (PGRC)-Physical Functioning and Patient Global Impression of Change (PGIC)-Stiffness served as anchors. Responsiveness and responder threshold analyses were from baseline to week 25.

Results

Cronbach’s alpha values for internal consistency reliability were 0.93 and 0.91 for the PROMIS-PF LE and UE, respectively. Test-retest reliability intra-class correlation coefficients were > 0.75 for both instruments. Convergent validity for both instruments was supported by moderate to strong correlations (≥0.30) with the Brief Pain Inventory and EQ-5D. Known-groups validity was established between subgroups stratified by pain level (p < 0.05). Responsiveness was supported by evaluating change scores among different levels of change in PGRC-Physical Functioning and PGIC-Stiffness (overall F values < 0.001). Triangulation of responder definition analyses resulted in a threshold of ≥3 for the PROMIS-PF and ≥ 1 for the Worst Stiffness NRS.

Conclusion

This study is the first to establish the psychometric properties of patient-reported outcome measures in TGCT. The evidence demonstrates that the PROMIS-PF and Worst Stiffness NRS have good reliability, validity, and responsiveness, and provides guidance for the interpretation of meaningful change.

Background

Tenosynovial giant cell tumors (TGCT) are rare non-malignant neoplasms that involve the synovium or tendon sheath. They typically present in young and middle-aged adults of both sexes, and result in functional limitations, morbidity, and decreased quality of life (QOL) [21]. Symptoms often include pain, stiffness, swelling, and reduced range of motion (ROM) of the affected joint. TGCT can be subdivided into 2 main subtypes: localized and diffuse, with localized presenting as a single nodule and diffuse presenting as an infiltrative, locally aggressive tumor. The main treatment option for TGCT is surgery, but diffuse disease can be challenging to manage surgically and recurrence rates are high (8%–56%) [14], so systemic anti-tumor agent options are of interest.

In the recently completed ENLIVEN Phase 3 trial (NCT02371369), pexidartinib, a novel, orally active, small molecule receptor tyrosine kinase inhibitor has demonstrated efficacy in reducing tumor size and improving functional outcomes [18]. The overall response rate per Response Evaluation Criteria in Solid Tumors version 1.1 (RECIST v1.1) [5] and TVS was 39% vs. 0% and 56% vs. 0% in pexidartinib and placebo patients, respectively [18]. Patient-reported physical functioning and stiffness were included in the trial as key secondary endpoints. Unlike many oncologic diseases, TGCT is non-fatal; thus, functional disability due to disease and improvement of physical function on therapy were considered of seminal importance. Through qualitative work completed in preparation of the ENLIVEN trial, which included interviews with both patients and clinicians, it was demonstrated that physical functioning and stiffness were important treatment outcomes to patients with TGCT [9].

Despite the importance of physical functioning and stiffness as outcomes in the treatment of TGCT, there were no PRO measures specific to this population available for inclusion in the ENLIVEN trial. Therefore, based on the qualitative work [9, 10], items from the Patient-Reported Outcomes Measurement Information System Physical Function (PROMIS-PF) item bank [1, 17] were included in the ENLIVEN trial to assess physical functioning. In addition, a single-item Worst Stiffness NRS was developed to assess stiffness. While the content validity of these items has been established in the TGCT patient population [9, 10], the psychometric properties, including item performance, reliability, validity, ability to detect change, and identification of a responder definition threshold, have yet to be demonstrated. This psychometric evidence is integral to having robust, valid, and reliable PRO measures for use in future clinical trials of therapies for TGCT. Therefore, the purpose of the current work was to describe the methods and present the results of the psychometric evaluation of the PROMIS-PF and Worst Stiffness NRS using data from the ENLIVEN trial.

Methods

Patients

ENLIVEN was a 2-part, multi-center, double-blind, randomized, placebo-controlled Phase 3 study designed to compare the response rate of pexidartinib with that of placebo per RECIST 1.1 at Week 25 in subjects with symptomatic TGCT for whom surgical resection would be associated with potentially worsening functional limitation or severe morbidity (locally advanced disease) [18]. In Part 1, the double-blind phase, eligible candidates were enrolled from May 11, 2015, to September 30, 2016, and centrally randomized in a 1:1 ratio to receive either pexidartinib or placebo for 24 weeks. Randomization was stratified by United States (US) versus non-US sites and by upper extremity (UE) versus lower extremity (LE) involvement.

Eligible patients were age 18 or older, had a histologically confirmed TGCT diagnosis, and had advanced disease for which surgical resection would be associated with potentially worsening functional limitation or severe morbidity. They had symptomatic disease defined as a worst pain or worst stiffness score of at least 4 at any time during the week preceding the Screening Visit (based on scale of 0 to 10, with 10 representing “pain as bad as you can imagine” or “stiffness as bad as you can imagine”), and measurable disease per RECIST v1.1 with a minimum size of 2 cm. 120 subjects across approximately 45 study sites in the US, Canada, EU, and Australia were treated, 61 with pexidartinib and 59 with placebo.

Instruments

PROMIS-PF

Items from the validated PROMIS-PF item bank, which was designed to assess mobility, dexterity, axial, and complex activity function irrespective of specific anatomic location or acuteness of disease [1, 17], were used to assess physical functioning. Due to the heterogeneity in the physical impacts based on the tumor location, items for two customized tumor location-specific scales were selected based on input directly from patients on which activities were impacted by their TGCT [9, 10]. From the 121 validated items available, a 13-item scale and 11-item scale were customized to assess physical function among patients with tumors in the LE and UE, respectively. Nine of the PROMIS-PF items were overlapping across the two customized forms (i.e., included in both LE and UE scales).

Each PROMIS-PF question had five response options ranging in value from 1 to 5. Item-response theory-based parameters were used to calculate person-specific scores. A fixed-parameter calibration with no estimation was done using subject’s responses to the PROMIS-PF items to estimate person latent trait scores. Missing items were not imputed. The item parameters used to estimate person-latent trait scores were obtained from the PROMIS Assessment Center (https://www.assessmentcenter.net/). As is customary for PROMIS, the results are reported as T-scores, which represents physical functioning as a standardized score with a mean of 50 and a standard deviation (SD) of 10. A higher PROMIS T-score represents more of the concept being measured. For positively-worded concepts like physical function, a T-score of 60 is one SD better than average, and a person with a T-score of 40 is one SD worse than the average.

Worst stiffness NRS

The Worst Stiffness NRS was a single-item, which stated, “The following question asks about stiffness at the site of your tumor. Please rate your stiffness by circling the one number that best describes your stiffness at its worst in the last 24 hours.” For consistency the item had a response scale similar to that of the Brief Pain Inventory (BPI) Worst Pain NRS item [2, 4], that was a 0–10 NRS where zero is “no stiffness” and 10 was “stiffness as bad as you can imagine.” The item was included in ENLIVEN because qualitative interviews with patients and clinicians demonstrated that stiffness was an important treatment outcome [9].

The stiffness score was calculated using the number on the 11-point NRS selected by the patient for each day. The range for the score was 0 to 10. The weekly score was calculated as the average of non-missing records during each seven-day period, where the patient-reported entries on an outpatient basis were completed in at least 4 of the 7 days. (i.e., Mean weekly score = [sum of daily scores/# diary days completed]). Patients with fewer than 4 days of Worst Stiffness NRS entries had their stiffness scores for the week set to missing.

Other measures

The BPI Worst Pain NRS administered in ENLIVEN was a single-item, which stated, “The following question asks about pain at the site of your tumor. Please rate your pain by selecting the one number that best describes your pain at its worst in the last 24 hours.” The item was adapted from item 3 of the BPI-short form [2, 4] to include “pain at the site of your tumor.” The item has a response scale that is a 0–10 NRS where zero is “no pain” and 10 is “pain as bad as you can imagine.”

The EQ-5D-5L (heretofore referred to as EQ-5D) is a standardized measure of health status developed by the EuroQol Group in order to provide a simple, generic measure of health for clinical and economic appraisal [6]. The EQ-5D descriptive system includes five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each dimension has five levels: no problems, slight problems, moderate problems, severe problems and unable to/extreme. The EQ visual analogue scale (VAS) records the respondent’s self-rated health on a vertical VAS from 0 to 100 where the endpoints are labeled “Best imaginable health state (100)” and “Worst imaginable health state (0).”

The Patient Global Rating of Concept (PGRC)- Physical Functioning item was a single item that assessed the subject’s perception of physical functioning. Subjects were asked to indicate how much their tumor limits their ability to carry out every day physical activities on a 5-point Likert scale from “Not at all” to “Extremely”.

The Patient Global Impression of Change (PGIC) – Stiffness was a single item that assessed the subject’s perception of change in stiffness at the site of their tumor. Subjects were asked to indicate how much the stiffness at the site of their tumor had changed at Week 25 from Baseline on a 7-point Likert scale from “Much improved” to “Much worse.”

The Tumor Volume Score (TVS) was a semi-quantitative magnetic resonance imaging (MRI) scoring system that described tumor mass. The TVS was based on 10% increments of the estimated volume of the maximally distended synovial cavity or tendon sheath involved. Thus, a tumor that was equal to the volume of a maximally distended synovial cavity or tendon sheath was scored 10, whereas a tumor that was 70% of that volume was scored 7, a tumor that was twice the volume of the maximally distended synovial cavity or tendon sheath was scored 20, etc.

Finally, a passive range of motion (ROM) assessment, standardized according to American Medical Association disability criteria and uses standard goniometers [11], was completed as an objective measure of physical functioning.

Assessments

All PROs were completed via electronic handheld device in the local language of the study participant. The assessment time points for these analyses focus on the double-blind phase and are shown in Fig. 1.

Fig. 1
figure1

Schedule of Assessments, Double-Blind Phase

Statistical analysis

The analytical methods were undertaken to assess item performance, reliability, validity, ability to detect change, and identification of responder definition thresholds for the PROMIS-PF and Worst Stiffness NRS. The January 31, 2018, data cutoff was used for these analyses. Descriptive statistics were used to characterize the socio-demographic and clinical characteristics of the sample, as well as the Baseline and Week 25 PROMIS-PF and Worst Stiffness NRS scores. Confirmatory factor analysis (CFA) was conducted for the PROMIS-PF LE and UE item sets to confirm that the 15 PF candidate items comprised a single underlying factor in patients with TGCT. Model fit was assessed with comparative fit index (CFI), root mean square error approximation (RMSEA), and average weighted correlation residuals (SRMR). CFI > 0.95 was considered a good fit, as well as RMSEA < 0.05 and SRMR < 0.08.

Internal consistency reliability of the PROMIS-PF LE and UE item sets was assessed at Baseline to determine the extent to which individual items in the instrument were related to one another. Cronbach’s alphas ≥0.70 are considered acceptable [15]. Test-retest reliability of the PROMIS-PF and the Worst Stiffness NRS was evaluated to assess the reproducibility of scores when patients were presumed to be stable. Specifically, the test-retest reliability of the PROMIS-PF was assessed among all subjects between Screening and Baseline, and from Week 9 to 17 among subjects with no change on the PGRC – Physical Functioning. For the Worst Stiffness NRS, data from all subjects between each of 2 consecutive days from Day − 1 to Day-7 (e.g., Day − 2 vs Day − 3, Day − 3 vs. Day − 4) was used. Weekly scores (i.e., 7-day average estimates) for Baseline compared with Screening were also analyzed, as well as from Week 9 to 17 among subjects with no change on the PGIC – Stiffness measure. Intraclass correlation coefficients (ICC) were calculated. The ICC ranges from 0.00–1.00; an ICC ≥0.70 among stable subjects is considered acceptable to demonstrate test-retest reliability [16].

Construct validity of the PROMIS-PF and Worst Stiffness NRS was evaluated at Baseline by examining the relationships with the BPI Worst Pain NRS, EQ-5D, TVS, and ROM. All relationships were assessed via the Spearman’s rank-order correlation coefficient. Cohen’s conventions were used to interpret the absolute value of the correlation results, where a correlation > 0.5 is large, 0.3 to ≤0.5 is moderate, 0.1 to < 0.3 is small, and < 0.1 is insubstantial [3]. It was hypothesized that both measures would have large correlations with BPI Worst Pain NRS, and moderate correlations with each other. It was hypothesized that the correlations with the EQ-5D mobility, self-care, usual activities, and pain/discomfort items would be moderate to large.

To assess known-groups validity, which is the extent to which scores from an instrument are different for groups of participants that differ on a relevant clinical or other indicator, the PROMIS-PF and Worst Stiffness NRS were analyzed by levels of pain (no pain, mild, moderate, and severe categories), TVS (small, medium, and large categories), PF limitation (no limitation, low, medium, and high categories), and stiffness (no stiffness, low, medium, and high categories). Mean scores for the PROMIS-PF and Worst Stiffness NRS were compared for each of the groups using analysis of covariance (ANCOVA) (PROC GLM) at Baseline, controlling for age, gender, race, and body mass index (BMI).

A responsiveness analysis of the PROMIS-PF and Worst Stiffness NRS item was completed to evaluate the instruments’ ability to detect changes in participants who had an established change in clinical status. The association between changes in the scores on the PROMIS-PF and Worst Stiffness NRS from Baseline and Week 25 with change scores on the PGRC – Physical Functioning for PROMIS-PF, and PGIC – Stiffness for the Stiffness NRS, and tumor response status (complete response, partial response, progressive disease, and stable disease) defined by RECIST 1.1 response criteria and TVS for both measures, were examined.

Methods to establish the responder definition threshold included triangulation of anchor- and distribution-based analyses. Anchor-based methods are preferred by the FDA for interpretation of PRO scores [8] and were considered the primary analysis. The anchor for the PROMIS-PF was a change in PGRC-Physical Functioning from Baseline to Week 25. Improvement of “-1” was defined as a change in response in any of the following ways: Extremely to Severely; Severely to Somewhat; Somewhat to A little; or A little to Not at all. The mean change in the PROMIS-PF scale observed in the small improvement group (“-1”) was examined as a key anchor-based indicator of a responder. The anchor for Worst Stiffness NRS was change in PGIC-Stiffness from Baseline to Week 25. The mean change score among patients who reported that they were “a little improved” was examined as a key anchor-based indicator of a responder. Distribution-based analyses included the 0.50 and 0.30 baseline SD, as well as one standard error of measurement (SEM). Empirical cumulative distribution function (eCDF) curves were generated for the PROMIS-PF and Worst Stiffness NRS. The eCDF is a continuous (both positive and negative) presentation of the change scores from Baseline to Week 25 on the X-axis and a cumulative proportion of patients with that level of score change on the Y-axis.

Results

A summary of patients’ baseline demographic and disease characteristics for ENLIVEN are shown in Table 1. The mean ± SD age was 44.5 years ±13.35 years with a range of 18 years to 79 years. More subjects were female (n = 71, 59.2%) with the majority identifying as white (n = 106, 88.3%). Most tumors were in the lower extremities (n = 110, 91.7%), most commonly the knee (n = 73, 60.8%), and ankle (n = 21, 17.5%). Based on responses available from 94 subjects, the most disturbing symptom was reported as ‘difficulty with everyday activities’ (n = 54, 57.4%), followed by ‘pain’ (n = 25, 26.6%) and ‘stiffness’ (n = 15, 6.0%).

Table 1 Demographic and Baseline Characteristics (ITT Analysis Set)

Individual unidimensional CFA models were fit for the nine PROMIS-PF items that overlapped between the UE and LE scales, and the 13 PROMIS-PF lower extremity items. The model could not be estimated for the 11 PROMIS-PF UE items, as there were only 7 participants with UE tumors with PROMIS-PF data available. Data from 104 LE and UE subjects were included for the model that included the 9 items that overlapped across tumor location, the factor loadings ranged from 0.609–0.881, with the exception of item PFA16R1 (“Are you able to dress yourself, including tying shoelaces and buttoning up your clothes?”), which had a factor loading of 0.394. The model showed moderate fit with CFI and RMSEA values (0.874 and 0.155, respectively) and an SRMR of 0.063. Post-hoc analyses revealed that removing PFA16R1 did not substantially improve the fit of the model. Data from 97 LE subjects were included for the model that included the 13 PROMIS-PF LE items, the factor loadings ranged from 0.626–0.840, with the exception of item PFA12 (“Are you able to push open a heavy door?”), which had a factor loading of 0.425. The model showed moderate fit with CFI and RMSEA values (0.804 and 0.159, respectively) and an SRMR of 0.069. Post-hoc analyses revealed that removing PFA12 did not substantially improve the fit of the model.

In analyses that included 93 and 7 subjects, respectively, Cronbach’s alphas were 0.93 and 0.91 for the PROMIS-PF LE and UE items, demonstrating that the items are highly related to each other and show good internal consistency reliability. The test-retest reliability ICC for the PROMIS-PF was 0.80 among all patients (n = 101) during the screening period; it was 0.88 among stable patients (n = 33) from Weeks 9–17, defined as those that were stable on the PGRC-Physical Functioning item during this period. When the test-retest reliability of the Worst Stiffness NRS was evaluated on a daily basis (n = 84 to 108) (e.g., Day − 7 to Day − 6), the ICCs ranged from 0.81–0.90. From Baseline to Week 25 the ICC was 0.76 among stable patients (n = 22) (those whose PGIC-Stiffness scores did not change), and when evaluated as a weekly score (n = 29) (Day − 14 to Day − 8 compared to Baseline) the ICC was 0.94.

Construct validity of the PROMIS-PF was supported by a moderate correlation with the BPI (− 0.52) and moderate to strong correlations with the pain/discomfort, mobility, and usual activities items of the EQ-5D (− 0.48, − 0.63, and − 0.67, respectively). Construct validity of the Worst Stiffness NRS was supported by a strong correlation with the BPI (0.83) and moderate correlations with the pain/discomfort, mobility, and usual activities items of the EQ-5D (0.47, 0.37, and 0.31, respectively) (Table 2). The correlations with more clinical measures (such as tumor volume score) were weaker, ranging from − 0.06 to 0.35.

Table 2 Construct Validity: Spearman Correlations of the PROMIS-PF and Worst Stiffness NRS item with Other Related Measures (Baseline)

Known-groups validity of the PROMIS-PF and Worst Stiffness NRS was strongly supported when evaluated by pain level (all p-values < 0.05) (Table 3). PROMIS-PF scores differed among subjects categorized by stiffness level, and Worst Stiffness NRS scores differed among subjects categorized by varying degree of physical function limitations. TVS did not provide evidence of known-groups validity, which is consistent with the concurrent validity findings of a lower correlation with TVS and the rationale for including the PROs as an endpoint in clinical trials.

Table 3 Known-Groups Validity: PROMIS-PF and Worst Stiffness NRS

Responsiveness of the PROMIS-PF and Worst Stiffness NRS item was supported by evaluating change scores between Baseline and Week 25 among different levels of change in PGRC-Physical Functioning and PGIC-Stiffness (overall F values < 0.001) (Table 4). Analyses by tumor responder status (RECIST 1.1 criteria) and tumor volume responder status showed trends in the expected direction to support responsiveness, but were not statistically significant (data not shown).

Table 4 Responsiveness of the PROMIS-PF and Worst Stiffness NRS from Baseline to Week 25

Anchor based analyses demonstrated that improvement of “-1” on the PGRC-Physical Functioning item from Baseline to Week 25, was associated with a least square mean change of 4.04 on the PROMIS-PF. The distribution-based estimates (1/3 SD, 1/2 SD, and 1 SEM) for the PROMIS-PF were 1.85, 2.77, and 2.47, respectively. For the Worst Stiffness NRS, “A little improved” on the perception of stiffness item from Baseline to Week 25, was associated with a least square mean change of − 1.11. The distribution-based estimates (1/3 SD, 1/2 SD, and 1 SEM) for the Worst Stiffness NRS item were 0.61, 0.92, and 0.47, respectively. Triangulation involved examination of the range of estimates and as directed by the FDA PRO guidance, with more consideration allotted to the anchor-based estimates. In selecting a responder definition, the minimum amount of change that is possible on the scale is also considered. This resulted in the responder definition threshold of ≥3 for the PROMIS-PF scale, and ≥ 1 for the Worst Stiffness NRS. The eCDFs are shown in Figs. 2 and 3 for the PROMIS-PF and Worst Stiffness NRS, respectively.

Fig. 2
figure2

Empirical Cumulative Distribution Function of PROMIS-PF by PGRC-Physical Functioning

Fig. 3
figure3

Empirical Cumulative Distribution Function of Worst Stiffness NRS by PGIC-Stiffness

Discussion

This study provides strong support for the psychometric properties of the PROMIS-PF and Worst Stiffness NRS in the TGCT patient population. Specifically, the internal consistency reliability of the PROMIS-PF was acceptable, the test-retest reliability of both instruments was good, the convergent validity with other PRO measures was adequate, and both instruments were able to differentiate between known groups and detect change over time. In addition, the responder definition thresholds for both instruments was ascertained, which informs the interpretation of meaningful within-person change.

Triangulation of the anchor- and distribution-based methods and the eCDFs resulted in responder definition thresholds of ≥3 for the PROMIS-PF scale, and ≥ 1 for the Worst Stiffness NRS. As seen in the eCDFs there is clear separation between the improved, no change, and worsened groups as the proposed thresholds. For the PROMIS-PF, over 50% of the improved subjects achieved the ≥3 threshold, as compared to roughly 10% of the subjects with no change and none of the subjects that worsened. For the Worst Stiffness NRS, nearly 90% of subjects that improved achieved the ≥1 threshold, as compared to roughly 40% and 30% of the subjects that had no change or worsened, respectively. In the context of clinical practice or research, when a patient is initiating new therapy or undergoing an intervention, these thresholds for change scores can be used to complement the primary clinical outcomes and give clinicians and patients a tangible expectation for measurable benefit.

The responder definition thresholds of ≥3 for the PROMIS-PF scale is consistent with estimates that have been calculated in other patient populations. A minimal important difference range of 4.0–6.0 was estimated by Yost and colleagues [22] using anchor-based methods among a cohort of advanced stage cancer patients. Among patients with rheumatoid arthritis, Hays and colleagues [12] used anchor-based analysis and estimated the minimal important difference to be 2 points (about 0.20 of a standard deviation). Finally, Lee and colleagues [13] used anchor- and distribution-based methods to estimate a range of 1.9–2.2 points as a minimal important difference among patients with knee osteoarthritis.

Although CFA results were not entirely definitive in terms of the item content, it did appear that there was a single common physical functioning latent trait that was defined by each of the respective PROMIS-PF scales. Thus, we proceeded with scoring the PROMIS-PF measures using all available items. This decision was supported by the prior qualitative work in which physicians experienced in the treatment of TGCT indicated that there is a very high degree of heterogeneity in terms of PF impacts in this population [9]. Additionally, patients also exhibited variability in terms of the items that they reported being relevant on an individual basis. Thus, the decision to be more inclusive and retain PROMIS items with lower factor loadings was a conscious one.

As hypothesized and observed in the construct validity analysis, the correlations between the PRO and clinical measures, particularly TVS were weak. The coefficients for PROMIS-PF and Worst Stiffness NRS with TVS were − 0.06 and 0.08, respectively. However, over time, from Baseline to Week 25 the correlations were moderate (− 0.34 and 0.43, respectively). These results support the fact that the PROMIS-PF and Worst Stiffness NRS do measure unique, and patient-relevant outcomes, which are complementary to more morphological tumor response metrics.

A major strength of this study is that it is the first to conduct psychometric validation work on PROs in the TGCT patient population. Generic and orthopedic-related PROs have been used historically among patients with TGCT [19,20,21], however, none had established content validity or psychometric properties for TGCT. Completion of this current work, and the prior content validity work [9, 10], provides evidence that the PROMIS-PF and Worst Stiffness NRS are fit for purpose [7]. Specifically, this work has demonstrated these measures are appropriate for the patient population and study design, they are valid and reliable concepts that are clinically relevant, and well-defined.

The analytic methods utilized in this study are consistent with the FDA guidance on the use and interpretation of PRO scores in medical product development [8]. However, there is a limitation in the generalizability of the findings to be considered. Only 10 (8.3%) subjects in ENLIVEN had UE tumors. Confidence in the relevance of these results to subjects with UE tumors is limited and replication of these results among a sufficient sample of these subjects would be worthwhile. Further, subgroup analyses such as known-groups validity among LE tumor type were of particular interest given the predominance of knee tumors, however the small sample size (< 10) in many of the criterion groups hindered the interpretability of those results. Another limitation to acknowledge is the considerable amount of post-baseline data that was missing, due mostly to early discontinuations, technical issues with electronic data capture, site and patient compliance, and enrolment being halted just short of the target because of hepatotoxicity. In the context of a psychometric analysis such as this, missing data could impact analyses using post-baseline data, which in this case was the responsiveness analysis and examination of the responder definition thresholds. Despite limited sample size due to missing data, the responsiveness analysis for both the PROMIS-PF and Worst Stiffness NRS were statstitically significant and impressive in the magnitude of difference in score changes between groups. Further, as evidenced by the eCDFs the change groups had clear separation in score changes, giving confidence in use of the data that was available.

Conclusion

This study is the first to establish the psychometric properties of PRO measures in the TGCT patient population. The evidence provided demonstrates that the PROMIS-PF and Worst Stiffness NRS have good reliability, validity, responsiveness, and provide guidance for their interpretation in this patient population. The PROMIS-PF and Worst Stiffness NRS are well-defined PRO measures that are suitable for use in future trials of therapies for TGCT.

Availability of data and materials

Not applicable.

Abbreviations

ANCOVA:

Analysis of covariance

BMI:

Body mass index

BPI:

Brief Pain Inventory

CFA:

Confirmatory factor analysis

CFI:

Comparative fit index

eCDF:

Empirical cumulative distribution function

EQ-5D:

EuroQol-5D

ICC:

Intraclass correlation coefficients

LE:

Lower extremity

NRS:

Numeric Rating Scale

PF:

PROMIS-Physical Function

PGIC:

Patient Global Impression of Change

PGRC:

Patient Global Rating of Concept

PRO:

Patient-reported outcome

PROMIS:

Patient-Reported Outcomes Measurement Information System

QOL:

Quality of life

RMSEA:

Root mean square error approximation

SD:

Standard deviation

SEM:

Standard error of measurement

SRMR:

Standard root mean squared residual

TGCT:

Tenosynovial giant cell tumors

TVS:

Tumor volume score

UE:

Upper extremity

VAS:

Visual analogue scale

References

  1. 1.

    Bruce, B., Fries, J. F., Ambrosini, D., Lingala, B., Gandek, B., Rose, M., & Ware Jr., J. E. (2009). Better assessment of physical function: Item improvement is neglected but essential. Arthritis Research & Therapy, 11, R191. https://doi.org/10.1186/ar2890.

    Article  Google Scholar 

  2. 2.

    Cleeland, C. S., & Ryan, K. M. (1994). Pain assessment: Global use of the brief pain inventory. Annals of the Academy of Medicine, Singapore, 23, 129–138.

    CAS  PubMed  Google Scholar 

  3. 3.

    Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale: Lawrence Erlbaum Associates Inc.

    Google Scholar 

  4. 4.

    Daut, R. L., Cleeland, C. S., & Flanery, R. C. (1983). Development of the Wisconsin brief pain questionnaire to assess pain in cancer and other diseases. Pain, 17, 197–210. https://doi.org/10.1016/0304-3959(83)90143-4.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Eisenhauer, E. A., et al. (2009). New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). European Journal of Cancer, 45, 228–247. https://doi.org/10.1016/j.ejca.2008.10.026.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    EuroQol (October 2013) EQ-5D-3L. User Guide: Basic information on how to use the EQ-5D-3L instrument. Version 5.0. https://euroqol.org/wp-content/uploads/2016/09/EQ-5D-5L_UserGuide_2015.pdf. Accessed 13 Sep 2019.

  7. 7.

    Food and Drug Administration (FDA). (2018). Patient-focused drug development guidance public workshop: Methods to identify what is important to patients & select, develop or modify fit-for-purpose clinical outcomes assessments Workshop date: October 15-16 2018. https://www.fda.gov/media/116277/download. Accessed 13 Sep 2019.

    Google Scholar 

  8. 8.

    Food Drug Administration (FDA). (2009). Guidance for industry patient-reported outcome measures: Use in medical product development to support labeling claims. Federal Register, 74, 65132–65133.

    Google Scholar 

  9. 9.

    Gelhorn, H. L., et al. (2016). Patient-reported symptoms of Tenosynovial Giant cell tumors. Clinical Therapeutics, 38, 778–793. https://doi.org/10.1016/j.clinthera.2016.03.008.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Gelhorn, H. L., et al. (2019). The measurement of physical functioning among patients with Tenosynovial Giant cell tumor (TGCT) using the patient-reported outcomes measurement information system (PROMIS). Journal of Patient Report Outcomes, 3, 6. https://doi.org/10.1186/s41687-019-0099-0.

    Article  Google Scholar 

  11. 11.

    Gerhardt, J. J., Cocchiarella, L., Lea, R. D., & American Medical Association. (2002). The practical guide to range of motion assessment. Chicago: American Medical Association.

    Google Scholar 

  12. 12.

    Hays, R. D., Spritzer, K. L., Fries, J. F., & Krishnan, E. (2015). Responsiveness and minimally important difference for the patient-reported outcomes measurement information system (PROMIS) 20-item physical functioning short form in a prospective observational study of rheumatoid arthritis. Annals of the Rheumatic Diseases, 74, 104–107. https://doi.org/10.1136/annrheumdis-2013-204053.

    Article  PubMed  Google Scholar 

  13. 13.

    Lee, A. C., Driban, J. B., Price, L. L., Harvey, W. F., Rodday, A. M., & Wang, C. (2017). Responsiveness and minimally important differences for 4 patient-reported outcomes measurement information system short forms: Physical function, pain interference, depression, and anxiety in knee osteoarthritis. The Journal of Pain, 18, 1096–1110. https://doi.org/10.1016/j.jpain.2017.05.001.

    Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Murphey, M. D., Rhee, J. H., Lewis, R. B., Fanburg-Smith, J. C., Flemming, D. J., & Walker, E. A. (2008). Pigmented villonodular synovitis: Radiologic-pathologic correlation. Radiographics, 28, 1493–1518. https://doi.org/10.1148/rg.285085134.

    Article  PubMed  Google Scholar 

  15. 15.

    Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill.

    Google Scholar 

  16. 16.

    Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw Hill.

    Google Scholar 

  17. 17.

    Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the patient-reported outcomes measurement information system (PROMIS). Journal of Clinical Epidemiology, 61, 17–33. https://doi.org/10.1016/j.jclinepi.2006.06.025.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Tap, W. D., et al. (2019). Pexidartinib versus placebo for advanced tenosynovial giant cell tumour (ENLIVEN): A randomised phase 3 trial. Lancet, 394, 478–487. https://doi.org/10.1016/S0140-6736(19)30764-0.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    van der Heijden, L., Mastboom, M. J., Dijkstra, P. D., & van de Sande, M. A. (2014). Functional outcome and quality of life after the surgical treatment for diffuse-type giant-cell tumour around the knee: A retrospective analysis of 30 patients. Bone Joint Journal, 96-B, 1111–1118. https://doi.org/10.1302/0301-620X.96B8.33608.

    Article  PubMed  Google Scholar 

  20. 20.

    van der Heijden, L., van de Sande, M., & Facebook PVNS research group. (2017). Diffuse-type giant cell tumor of the knee and hip-first Facebook-based functional outcome and quality of life study after arthroscopic or open synovectomy in 71 patients. Vienna: 27th Annual Meeting of the European Musculo-Skeletal Oncology Society.

    Google Scholar 

  21. 21.

    Verspoor, F. G., Zee, A. A., Hannink, G., van der Geest, I. C., Veth, R. P., & Schreuder, H. W. (2014). Long-term follow-up results of primary and recurrent pigmented villonodular synovitis. Rheumatology (Oxford), 53, 2063–2070. https://doi.org/10.1093/rheumatology/keu230.

    CAS  Article  Google Scholar 

  22. 22.

    Yost, K. J., Eton, D. T., Garcia, S. F., & Cella, D. (2011). Minimally important differences were estimated for six patient-reported outcomes measurement information system-cancer scales in advanced-stage cancer patients. Journal of Clinical Epidemiology, 64, 507–516. https://doi.org/10.1016/j.jclinepi.2010.11.018.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Funding for this work was provided by Daiichi Sankyo Company, Limited.

Author information

Affiliations

Authors

Contributions

Study Conception: R Speck, X Yin, H Gelhorn; Study Design: R Speck, X Yin, H Gelhorn; Data Collection: none; Data Analysis: R Speck, H Gelhorn; Data Interpretation: R Speck, X Yin, N Bernthal, H Gelhorn; Manuscript Development: R Speck; Manuscript Review and Approval: R Speck, X Yin, N Bernthal, H Gelhorn. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Heather L. Gelhorn.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.

Written informed consent was obtained from all individual participants prior to completing any study measures or procedures.

Consent for publication

Not applicable.

Competing interests

RMS and HLG are are employed by Evidera, which provides consulting and other research services to pharmaceutical, medical device, and related organizations. In their salaried positions, they work with a variety of companies and organizations, and are precluded from receiving payment or honoraria directly from these organizations for services rendered. Evidera received funding from Daiichi Sankyo to perform the study and develop this manuscript. XY is an employee of Daiichi Sankyo Inc. which provided funding for this study. NMB has and may receive honoraria for serving as a consultant.

All authors listed meet the requirements for authorship, believe that this manuscript adheres to ICMJE requirements, and have read and approved this final version of the manuscript and the authorship list.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Speck, R.M., Ye, X., Bernthal, N.M. et al. Psychometric properties of a custom Patient-Reported Outcomes Measurement Information System (PROMIS) physical function short form and worst stiffness numeric rating scale in tenosynovial giant cell tumors. J Patient Rep Outcomes 4, 61 (2020). https://doi.org/10.1186/s41687-020-00217-6

Download citation

Keywords

  • TGCT
  • Psychometrics
  • Patient-reported outcomes
  • PROMIS
  • Worst stiffness NRS