Cultural adaption of existing items selected from PROMIS® item banks
After obtaining permission from PROMIS® Health Organization and with consultation from an advisory committee (AC) consisting of field experts, a total of 40 existing items were drawn from PROMIS® item banks and used as candidates for the PROMIS®-Deaf Profile. Because existing PROMIS® items were written for and validated in the US general population, we followed a specific cultural adaptation process to ensure that these items were relevant and could be answered by DHH individuals. The AC first reviewed and identified items that required adaptation for DHH individuals. A core cultural adaptation leadership team then replaced specific language or colloquial wordings (e.g., “listen” was changed to “pay attention”) and presented all revised items to the full AC for review and approval prior to linguistic translation.
Development of new communication health and early life communication experience items
Using the grounded theory approach to item generation [11], communication health items were developed for the DHH population to assess perceived quality of life specific to being DHH. We used transcripts from in-depth interviews conducted in a previous qualitative study [10] to draft 84 items and then selected additional items from the existing literature to generate an item pool. Each item was evaluated for measure inclusion on the basis of meeting the following two criteria: 1) the item should measure the domain construct of “Communication Health,” and 2) the item should be relevant to the DHH experience, regardless of hearing level or other DHH-specific characteristics. Nineteen items were final candidates for the Communication Health domain of the PROMIS®-Deaf Profile. With permission, four items were modified from the Youth Quality of Life-DHH Module [11] to allow for retrospective reporting of early life communication experiences (ELCE) by DHH adults. The significance of early life communication experiences is highlighted in several studies that have reported its connection with quality of life outcomes in DHH adults [12,13,14].
Linguistic validation of PROMIS®-deaf profile items in ASL
The ASL-English bilingual translation team consisted of two forward translation consultants and a backward translation consultant. All were bilingual in ASL and English, and all had previous experience translating test items. Near-final PROMIS®-Deaf Profile items in ASL were next carefully evaluated by the project director to ensure their conceptual equivalence to the original items in English. The ASL version of the PROMIS®-Deaf Profile was digitally recorded by a native ASL signer with over 10 years of acting experience and teaching ASL linguistics. In addition, two deaf research members with clinical psychology and assessment backgrounds coached the ASL signer during videotaping, to ensure the highest linguistic fidelity to item content.
Using methods from the US National Center for Health Statistics Cognitive Survey Laboratory [15, 16], cognitive debriefing sessions were used to assess 1) whether DHH participants found the signed items easy to complete; 2) if they found any problems with the signed translations; and 3) what their overall reaction was to the ASL version of the PROMIS®-Deaf Profile. We also assessed comprehension by observing consistency in responses to the questions. In two waves of face-to-face interview sessions with four-to-five ASL signers per wave, most participants had a high school education or less. In all cognitive interview sessions, “no English” version items (i.e., ASL only, without any English supporting text) were shown, so as to focus on the clarity of ASL content delivery. If a participant had difficulties with understanding any signing, the interview team tested alternative ASL translations or asked the participant to propose ideas to improve clarity of the items, including replacing signs or phrases. These participant-engagement procedures helped improve the face validity of the items in ASL.
Psychometric testing
Participants
After the Gallaudet University Institutional Review Board approved the study protocol, we recruited participants from the DHH community across the USA, including Hawaii and Alaska. Multiple recruitment methods were employed: snowball sampling through personal networks, distributing flyers, and advertising on DHH-centered organization websites and e-newsletters. A total of 1717 individuals who signed up met eligibility for early deafness (born or became DHH before 13 years old), having bilateral hearing loss, and using ASL in their daily communication. We then enrolled those who provided consent (N = 1612). Some 258 consented participants did not complete the demographics and survey items. Thus, the final psychometric sample included 1354 adult participants.
Study measures
Existing PROMIS® domains and new PROMIS®-deaf profile domains
Existing domains in the PROMIS®-Deaf Profile included Anger (2 items), Anxiety (4 items), Depression (4 items), Fatigue (10 items), Global Health (10 items), Social Isolation (2 items), and Social Support (6 items). New domains were also added to the PROMIS®-Deaf Profile: Communication Health (19 items) and Early Life Communication Experiences (ELCE-4 items).
Demographic and clinical variables
DHH study participants were asked to answer questions about their demographic background and medical conditions. Medical conditions were used to assign clinical and non-clinical status to assess the clinical sensitivity of the PROMIS®-Deaf Profile. Clinical status was assigned if the respondent confirmed one or more medical conditions and reported severity in symptoms associated with the medical condition(s).
Measure administration and data collection
The PROMIS®-Deaf Profile measures were configured as fixed-length assessments and administered via the study’s secure website between April 2016 and April 2018. Study participants accessed this protected website using research study, personal, or publicly-available computers, each of which required internet access. Participants were typically able to complete their assigned measure assessments within 7 days of initial assignment.
“ASL-only” survey administration
For the web-based survey application, supporting English text was hidden. By having all respondents view items and response options only in ASL, we were able to establish the item administration condition that all ASL-Only survey participants view each item only in ASL before providing responses.
“ASL + English” survey administration
Respondents in this subsample could see both ASL video and supporting English text. The ASL + English survey was employed for the targeted purpose of investigating each measure’s potential for bias if supporting English text were included along with each ASL video. The inclusion criteria and recruitment procedures for the ASL + English survey administration were identical to those described for the ASL-Only survey administration.
Psychometric analyses
We employed extensive psychometric analyses, following those recommended in PROMIS®‘s guidelines and standards for measure development and evaluation [17]; analyses included confirmation of the unidimensionality assumption for existing PROMIS® measures and new DHH-specific measures and support for unidimensional item response theory (IRT) model fitting and item calibration. We determined our study’s required sample size on the basis of it being large enough to conduct (a) robust unidimensional confirmatory factor analysis (CFA), (b) two-levels-per-factor differential item functioning (DIF) studies, and, where required, (c) accurate estimation of new IRT item parameters, from which individual participant scores could be estimated.
First, for the new DHH-specific domain, Communication Health, we conducted exploratory factor analysis (EFA) to better understand its dimensionality and, thus, have evidence at hand to propose unidimensional factors and their associated items to undergo further measure development and evaluation. Then, for all PROMIS-Deaf Profile measures, existing and new, we obtained classical test theory (CTT)-based assessments of item and measure performance, including estimates of item-adjusted total score correlations and analyses of potential total score floor and ceiling effects. Next, for each PROMIS-Deaf Profile measure that was four or more items in length, we fit item response data to a unidimensional confirmatory factor analysis (CFA) model to evaluate item and overall CFA model fit. Finally, for each new PROMIS-Deaf Profile measure that was four or more items in length, we fit item response data to an item response theory (IRT)-based graded response model (GRM) to (a) evaluate item and overall GRM model fit and (b) calibrate the items. Further supporting analyses were conducted, included reliability studies (internal consistency and test-retest), bias studies (differential item functioning by item presentation method, age, education, and gender factors), and validity studies (concurrent and known groups).
CTT-based assessments
To evaluate item performance, we identified items (a) with sparse response cells (i.e., items having one or more response categories used by < 10 respondents) and (b) whose item-adjusted total score correlation (i.e., the item’s correlation with its source measure’s total score adjusted by excluding the item correlated) was < 0.40. To evaluate measure performance, we (a) studied measure floor and ceiling effects (i.e., if ≥10% of respondents had the minimum or maximum scores possible per measure) and (b) examined measure score distributions for skewness, excess kurtosis, and the extent to which the distributions could be considered normal. Finally, we calculated descriptive statistics (i.e., mean, median, standard deviation, minimum, and maximum score) for each measure. We conducted CTT-based analyses in IBM SPSS (version 25) [18].
CFA-based assessments
We assessed each PROMIS-Deaf Profile measure that was four or more items in length (i.e., Anxiety, Depression, Fatigue, Global Health-Mental, Global Health-Physical, Social Support, Communication Health, and ELCE) for essential unidimensionality. To evaluate item performance, we identified items (a) with low factor loadings (i.e., < 0.50) or (b) with evidence suggesting local dependence (i.e., a residual correlation > 0.20 or a correlated error modification index ≥100). To evaluate measure performance, we estimated a single-factor CFA model per measure to obtain evidence supportive of measure essential unidimensionality. We assessed overall model fit to each measure’s item response data using the following recommended fit criteria: Comparative Fit Index (CFI) ≥ 0.95, Tucker-Lewis index (TLI) ≥ 0.95, root mean square error of approximation (RMSEA) < 0.10, and standardized root mean residual (SRMR) < 0.08 [19,20,21,22,23,24]. We conducted CFA-based analyses in Mplus (version 7.4).
Measure reliability studies
We estimated two types of reliability for all PROMIS-Deaf Profile measures: internal consistency reliability and test-retest reliability. For internal consistency reliability, we estimated Cronbach’s alpha and reported its raw score value for each measure. For test-retest reliability, we estimated the intra-class correlation (ICC) coefficient using a subset of n = 100 ASL-Only Subsample study participants who had completed a second set of assigned measures within seven to 10 days of completing their first assigned measures. For all PROMIS-Deaf Profile measures, we obtained ICC values based on (a) systematic error plus random error and (b) random error only. For both internal consistency and test-retest reliability estimate standards, we used reliability ≥0.70 as the criterion for using a measure to make group-level comparisons and reliability ≥0.90 as the criterion for using a measure to make individual-level comparisons. We conducted reliability analyses in IBM SPSS (version 25) [18].
DIF studies
We assessed each PROMIS-Deaf Profile measure that was four or more items in length (i.e., Anxiety, Depression, Fatigue, Global Health-Mental, Global Health-Physical, Social Support, Communication Health, and ELCE) for differential item functioning (DIF), which is a potentially biasing item performance effect. Items exhibiting DIF measure investigated population subgroups differently (i.e., unequally). For example, males at measured-trait level Y might consistently receive higher positive Item X scores than do females who are at the same measured-trait level Y and who respond to the same Item X. In Phase One of our DIF studies, identifying items potentially exhibiting DIF, we employed a hybrid IRT ability score and ordinal logistic regression framework, as implemented in the R package, “lordif” (Version 0.3–3) [25]. For these analyses we used a Nagelkerke pseudo-R2 change ≥0.02 as our criterion for flagging items with potential DIF. In Phase Two of these analyses, to determine if flagged DIF items meaningfully impacted measure scores per population subgroups, we used as our criterion for measure score DIF impact > 2% of DIF-corrected vs. DIF-uncorrected score differences exceeding uncorrected score standard errors. We investigated the following factors for potential DIF: item presentation method, age, education, and gender. We conducted DIF analyses in R (version 3.23) [26].
Preliminary validation of PROMIS-deaf profile measures
To establish initial evidence supporting the validity of the PROMIS-Deaf Profile measures, we obtained estimates of convergent and discriminant validity and known-groups validity.
Convergent and discriminant validity
To evaluate the convergent and discriminant validity of the PROMIS-Deaf Profile measures, we estimated Pearson’s r and Spearman’s rho correlations (to account for potentially non-normal measure score distributions) (a) among all PROMIS-Deaf Profile domain-specific measures (i.e., Anger, Anxiety, Depression, Fatigue, Social Isolation, Social Support, Communication Health, ELCE) and (b) between domain-specific vs. general health status measures (i.e., Global Health-Mental, Global Health-Physical). We interpreted the magnitude of the absolute value of a Pearson’s r or Spearman’s rho between measure scores as follows: From 0.60 to 1.00 is a “strong” correlation; from 0.30 to < 0.60 is a “moderate” correlation; and from 0.00 to < 0.30 is a “weak” correlation [27].
Known-groups validity
To evaluate known-groups validity of the PROMIS-Deaf Profile measures, we divided our study participants into approximate quartiles, based first on their Global Health-Mental scores and based second on their Global Health-Physical scores. Via analysis of variance (ANOVA), we then compared the PROMIS-Deaf Profile domain-specific measure scores (i.e., Anger, Anxiety, Depression, Fatigue, Social Isolation, Social Support, Communication Health, ELCE) of participants with the lowest quartile Global Health-Mental scores (i.e., raw scores from 4 to 12) vs. those with the highest quartile Global Health-Mental scores (i.e., raw scores from 16 to 20). We conducted a similar set of ANOVA-based PROMIS-Deaf Profile measure score comparisons between participants with the lowest quartile Global Health-Physical scores (raw scores 4 to 13) vs. those with the highest quartile Global Health-Physical scores (raw scores 17 to 20). For all of these comparisons, we hypothesized that study participants with high quartile Global Health status (i.e., better overall mental or physical health) would have, on average, better PROMIS-Deaf Profile domain-specific health status than those with low quartile Global Health status (i.e., worse overall mental or physical health).
Finally, we compared all PROMIS-Deaf Profile measure scores of our study participants by select demographic (i.e., age, gender, level of education, race/ethnicity, preferred language) and clinical variables (current hearing level, functional hearing status, when hearing loss occurred, if hearing loss is clinical/non-clinical, if family background includes DHH members). For all ANOVAs, we used an F test p value < 0.05 to indicate statistically significant group differences, and we calculated Cohen’s D values to report group mean difference-based effect sizes. We conducted these ANOVA validity analyses in IBM SPSS (version 25) [18].
Sample size requirements
We determined our study’s required sample size on the basis of it being large enough to conduct (a) robust unidimensional CFA analyses, (b) two-levels-per-factor DIF studies, and, as required, (c) accurate estimation of new IRT item parameters, from which individual participant scores could be estimated. For CFA analyses and GRM-based IRT parameter estimation, sample size recommendations range from N = 200 to 1000 (ref) or a minimum of 5 to 10 individuals per item. For DIF studies employing an IRT-based ability score estimate, a minimum sample size of N = 500, with a minimum of n = 200 participants per factor level within each DIF factor tested, is required [28, 29]. Therefore, we required a minimum n = 500 ASL-Only Subsample study participants and a minimum n = 200 ASL + English Subsample study participants to appropriately conduct our planned item presentation method DIF analyses. If this initial DIF study (i.e., ASL-Only vs. ASL + English) were to indicate meaningful item presentation method DIF impact, we would need to be prepared to proceed with PROMIS-Deaf Profile measure development using distinct ASL-Only- and ASL + English-specific samples. For our additional DIF studies, we anticipated our obtained sample would allow us to compare study participants by age (≤40 years vs. > 40 years), education (≤ high school degree vs. > high school degree), and gender (male vs. female).