Study design
The design and results of TELESTAR have been previously described [13]. Briefly, patients entered a Screening/Run-in Period of 3- or 4-weeks to establish Baseline symptoms. They were then randomly assigned (1:1:1) on Day 1 to receive one of two dose levels of telotristat ethyl (250 or 500 mg) or placebo thrice daily for 12 weeks. All patients remained on their baseline dose of SSA therapy during the Treatment Period. Subsequently, they participated in a 36-week Open-label Extension Period when everyone received 500 mg of the active study drug thrice daily. This study received Institutional Review Board approval.
The focus is on the primary endpoint, change from Baseline in BM frequency during the Double-blind Treatment Period. The intent-to-treat (ITT) analysis population included all randomized patients. All analyses populations were derived from the ITT dataset. All patients participating in the patient interview substudy after the Double-blind Treatment Period were included in the patient interview subpopulation (ISP).
Study instruments
Evaluation of meaningful change in BM frequency required the inclusion of other supportive clinical outcomes assessments: a Yes/No question about CS gastrointestinal symptom relief; European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire - Core Questionnaire (QLQ-C30) and EORTC Gastrointestinal NET questionnaire (GI.NET21) scales; and patient exit interview responses.
Number of daily BMs
Patients electronically recorded the number of daily BMs. The average BM number was mapped to individual analyses using the following criteria: difference between average Baseline BM frequency and the overall average BM frequency; and the difference between average Baseline BM frequency and average BM frequency at Week 12.
EORTC QLQ-C30 and GI.NET21
The QLQ-C30 contained 30 questions incorporated into 5 functional domains (Physical, Role, Cognitive, Emotional, and Social), 9 symptom scales (Fatigue, Pain, Nausea and Vomiting, Dyspnea, Insomnia, Appetite Loss, Constipation, Diarrhea, and Financial Difficulties), and a single global HRQoL/Global Health Status score [14].
The GI.NET21 module contained 21 questions: 4 single-item assessments about muscle and/or bone pain, body image, information, and sexual functioning, plus 17 items organized into 5 scales: Endocrine Symptoms (3 items), GI Symptoms (5 items), Treatment-related Symptoms (3 items), Social Functioning (3 items) and Disease-related Worries (3 items) [15].
Exit interviews
English- and German-speaking patients were invited to participate in the exit interview study as prespecified in the TELESTAR protocol [16]. All participants consented to the interview procedure to be conducted within 2 weeks after they completed the 12-week Double-blind Treatment Period or early termination. Patients were categorized based on reported satisfaction with improvement over the course of treatment (“very satisfied”; “somewhat satisfied”; “neither satisfied nor dissatisfied”; “somewhat dissatisfied”; or “very dissatisfied”). Patients were also categorized according to perception of BM frequency reduction (“a great deal better”; “much better”; “a little better”; “the same”; “a little worse”; “much worse”; or “a great deal worse”).
Analytic methods
Analyses focused on the derivation and evaluation of thresholds to interpret meaningful change and responsiveness in BM frequency. All patients were included irrespective of receiving treatment (n = 90) or placebo (n = 45). Analyses was conducted using SAS Version 9.3 or higher (SAS Institute, Cary, NC, USA) [17].
Meaningful change on a patient-centered endpoint referred to the smallest difference in scores in the domain of interest (e.g., symptom or functional score), which patients perceived as beneficial. This could then be used further to discriminate between treatment groups and develop a thorough understanding of the HRQOL impact of BM frequency reduction [18, 19].
Change in BM frequency from Baseline was used to develop two individual distribution-based estimates: (1) overall change from Baseline, defined as the difference between average BM frequency during the Run-In Period and average BM frequency during the Double-Blind Treatment Period; and (2) change from Baseline at Week 12, defined as the difference between average BM frequency during the Run-In Period and 7-day average BM frequency at Week 12. Distribution-based thresholds were derived for both estimates independently using the ½ standard deviation (SD) rule [18], which is ½ the SD of both estimates.
The anchor-based approach to derive meaningful change thresholds consisted of mapping change from Baseline in BM frequency to other patient reported assessments of change. The relationship between BM frequency and each continuous patient-reported outcome (PRO) anchor was evaluated prior to inclusion in the anchor based analysis using correlational analyses. The criterion threshold value for determining if the anchor was correlated with the outcome is a correlation coefficient > 0.30 at Baseline, Week 12, or change from Baseline [18]. Anchor-based thresholds were developed by calculating mean change and the associated effect size (ES) statistic for each anchor-based. The ES was calculated from the difference between average score in BM frequency over 12 weeks and average Baseline BM frequency, this difference being divided by the SD of average Baseline BM frequency. A negative ES represented BM frequency reduction compared to Baseline [19, 20]. An additional analysis was conducted, where ES was calculated as the mean score in average BM frequency at Week 12 minus the average Baseline BM frequency divided by the SD of the group average Baseline BM frequency. For both analyses, a single value (or range of values for interpreting change in BM frequency) was developed for the full ITT population by selecting the mean improvement for each analytic group. These thresholds could be applied to stratify patients within each treatment arm by mean change in BM frequency relative to the identified threshold. Negative values of ES indicated reductions from Baseline in BM frequency.
Responsiveness refers to the ability of an assessment to detect change where it exists [18]. To assess responsiveness, patients were defined as improved or worsened based on meaningful change in prespecified categorical endpoints. An absolute improvement of 10 points from Day 1 to Week 12 defined improvement in each of the EORTC domains [20]. The analysis of covariance (ANCOVA) procedure was used to calculate the P-value and adjusted for age, sex, and race. The within-group level of change in individual scores was expressed as a standardized effect size (SES), calculated as the mean change score between Baseline and Week 12, and divided by the SD of the pooled population at Baseline. Based on Cohen’s recommendations, the following values represent the magnitudes of responsiveness: small change (SES = 0.20), moderate change (SES = 0.50), and large change (SES = 0.80) [21]. Statistically significant differences in change in scores between groups were tested through an ANCOVA model adjusting for age, sex, and race.
To demonstrate the application of MCT in responsiveness evaluation, unblinded cumulative distribution function (CDF) curves are presented, calculated as the cumulative percentage of patients achieving various threshold levels of change from Baseline in either the overall average daily BMs or the average BMs during Week 12 by treatment arm.