Skip to main content

Table 2 Description and criteria for performed psychometric analyses

From: Psychometric validation and testing of the 10-item pediatric daily chest-related electronic patient reported outcome (ePRO) diary

Property

Description

Criteria for consideration

Quality of completion

Evaluated frequency and percentage of item-level missing data per participant; number and percentage of participants with at least one missing completion; number and percentage of participants with at least one missing item per completion; number and percentage of participants with no missing data; number and percentage of missing data for each item at each morning and afternoon completion

Issues with completion/time points provide information regarding feasibility of completing an ePRO diary by children twice daily without caregiver assistance

Item response distributions and item performance

Item response distributions were examined and percentages of minimum and maximum responses were calculated to examine floor and ceiling effects for all of the chest ePRO items, and mean scores for each individual chest-related ePRO diary item were evaluated for each time point collected (morning and evening, Day 1-Day 10) and between age groups

Inter-item correlations were examined to gauge the strength of the relationships among items and the appropriateness of scoring them together

For inter-item correlations, moderate to high correlations (0.40–0.80) were expected and items with high (Pearson correlation r > 0.80) were flagged for further consideration

Differential item functioning (DIF)

DIF analyses were conducted using ordinal logistic regression models, to investigate whether the different chest-related ePRO diary items performed in the same manner across age groups

If the coefficient of age was significant (p < 0.01), this was considered evidence of uniform DIF. If interaction coefficient between age group and CGI-S was significant (p < 0.01), considered evidence of non-uniform DIF

Exploratory factor analysis (EFA)

EFA was conducted to explore the potential item-scale structure for the chest-related ePRO diary before item deletions. EFA was used to explore factor solutions separately on both morning and afternoon items, in the total sample and also within age subgroups, comparing results between age groups to inform any differences in item-scale groupings in different age populations

Eigenvalues (and accompanying scree plots) were produced to assess suitable number of dimensions and factor loadings (> 0.40) for each item were examined

Modification indices (MIs) were also assessed, to assess the extent to which items and/or domains were co-dependent and thus potentially redundant. Items and/or domains with MIs greater than or equal to fifteen were considered further [15]

Model fit was assessed using Bentler’s CFI, RMSEA and SRMR. According to the a priori specification, model fit was considered ‘good’ if CFI > 0.95, RMSEA < 0.08 and SRMR < 0.08 [16]

Item response theory (IRT)

IRT was conducted to provide insight into the appropriateness of the response scale and the adequacy of item discrimination using the Graded Response Model (GRM)

Item information curves (IICs) were examined graphically to evaluate how reliably the individual items and the measure as a whole estimated the construct over the entire scale range

Using the GRM, S-X2 fit statistic was examined to assess the differences between observed and expected response proportions for each test score values with statistically significant values (p < 0.01) indicating items with potential misfit. Chi-square statistics to evaluate local dependence (values of > 10 indicating likely local dependence)

Internal consistency reliability

Internal consistency reliability, concerned with the homogeneity of items belonging to the same domain was evaluated in all age subgroups as well as in the total sample

The alpha-if-item-deleted method was also used to assess whether the internal consistency of the summary score would improve with the removal of each item in turn

Internal consistency reliability was evaluated using Cronbach’s alpha coefficient (> 0.70 for good internal consistency) [17]

Test-retest reliability (TRT)

TRT was evaluated by examining the stability of scores between two daily consecutive assessments. Given the acute nature of the common cold, participants were defined as stable if they had ‘no change’ in their overall cold severity based on their CGI-S scores between Visit 1 and Day 1 and between Visit 1 and Day 2

Intraclass correlation coefficients (ICCs) were calculated and an ICC of 0.70 or greater for the stable group was considered evidence of good TRT [18]

Known groups validity

Known groups validity was assessed by comparing differences in the chest-related ePRO diary scores among participants who differed on health/disease related variables based on the CGI-S and CCSQ measures

A mixed model repeated measures analysis was used to examine the relationship between the CGI-S and the total chest-related ePRO diary summary scores using all ten days of diary data available, with the CGI-S treated as a continuous predictive variable and the chest ePRO as the outcome variable. A sensitivity analysis was also performed in which the CGI-S was treated as a categorical variable. In addition to the above analyses where known groups were defined using the CGI-S, groups were additionally defined using the CCSQ summary score made up of the mean of all CCSQ items. For each day that the CCSQ was collected (Days 2, 5 and 8), a mixed model repeated measures analysis was used to examine the relationship between the CCSQ and the total chest-related ePRO diary scores. The CCSQ was used as the predictive variable and treated as a continuous variable with the chest-related ePRO diary score as the outcome variable

T-tests were used for comparisons of pairs of groups to evaluate statistically significant differences (p < 0.05) in chest-related ePRO diary scores between the subgroups

Concurrent validity

Concurrent validity was evaluated by examining the correlation between the scores of the chest-related ePRO diary and the CCSQ on Days 2, 5 and 8

Domains assessing similar or related concepts were expected to correlate at 0.40 or higher

Ability to detect change

The analysis described and compared changes in the chest-related ePRO diary scored between participants considered ‘improved’, ‘no change’ or ‘worsened’ as assessed by ratings on the CGI-C and CGI-S scores, to demonstrate that any observed changes in the chest-related ePRO diary scores corresponded with changes in external criteria

The mean changes in the symptom scores were computed between the change groups using paired t-tests as the ‘no change’ and ‘worsened’ groups were collapsed due to a low sample size

Effect sizes (ES), standardised response mean (SRM) and Guyatt’s statistic were calculated to evaluate the magnitude of changes over time in each group. ES were interpreted in line with Cohen’s guidance (0.20: small changes, 0.50: moderate changes, and 0.80 large changes) [19]