Sample and procedure
Children aged 8 and older and adults (aged 18 and over) able to complete an online questionnaire were eligible to participate. This study recruited DMD participants via their parents/caregivers, who participated in an earlier study of DMD caregivers (see  for details). Comparison-group participants were recruited via IPSOS-Insight, a market research company (www.ipsos.com). The comparison group sampling frame was constructed such that it was nationally representative of the 2020 general United States (US) population ages 8–45, and to be balanced in age, race/ethnicity, gender, and US region. and were selected to accurately represent the United States population in age, race/ethnicity, gender, and region.
A web-based survey was administered October through December 2020 utilizing Alchemer, a HIPAA-compliant, secure web-based survey platform (www.alchemer.com). Recruitment was stratified by age group: 8–12, 13–17, and > = 18, reflecting the above-described DMD disability trajectory [5, 13,14,15]. Although DMD can progress at varying rates, these age strata reflected the common phases of DMD progression: transitional-to-nonambulatory phase (up to age 12) and non-ambulatory phase (age > = 13), with increasing dependence and involvement of other systems into adulthood (age > = 18). Participants received honoraria to compensate them for their time completing the survey. Those with motor, visual, and/or other problems that made it difficult for them to complete the web-based survey instrument enlisted the assistance of a household member to enter their answers. The protocol was reviewed and approved by the New England Independent Review Board (NEIRB #20,203,038), and all participants provided informed consent before beginning the survey.
Aspirations were measured using qualitative (open-ended) and quantitative (closed-ended) questions. The open-ended questions included: (1) Three Wishes , in which participants were asked, “If you could make three wishes, any three wishes in the whole world, what would they be?”; (2) Goals: “What are the main things you want to accomplish?”; (3) Quality of Life (QOL) Definition: “In a sentence, what does the phrase "Quality of Life" mean to you at this time?” The latter two are part of the QOL Appraisal Profilev1 . The Three Wishes question has been used in previous research in 6–12 year olds comparing people with DMD to a comparison group .
In addition to the open-ended questions, 29 closed-ended goal-delineation items from the QOL Appraisal Profilev2  were included. These rating-scale items queried a broad range of life domains (e.g., living situation, work/school, social relationships, health-related, spiritual, etc.), and asked about specific goals in each life domain. For example, a work/school item asked about whether the individual was concerned about keeping up at work or school. Response options enabled the respondent to indicate how much each goal statement was like them (1 = “not at all like me” through 5 = “very much like me”). Participants were given the option of not responding (Not applicable/Decline/I don’t know), which was coded as missing (− 99). These items are often analyzed individually to describe the person’s context [19, 20]. [The interested reader can contact the corresponding author for the QOLAPv2 Goal-Delineation items.]
Participants of all ages answered the Three Wishes open-ended question, whereas only participants aged 18 and older who did not opt for the Alternate survey answered the open-ended goals, QOL-definition questions, and the closed-ended goal-delineation items. This decision was based on the fact that these questions had not been validated in people under age 18.
Demographic Characteristics included year of birth, gender, year of diagnosis, and whether anyone in the household was or had been infected with the novel coronavirus-2019 (all participants).Footnote 2Adult participants were asked about race, ethnicity, education, marital status, weight, height, with whom the person lives, employment status, and financial strain (difficulty paying bills).
Accommodating age and disability
Because of the broad age range of study participants and because DMD impacts multiple functional domains, the study design tailored the measures collected by age and/or participant preference. Teen or adult DMD participants were offered the option to choose the simpler child form of the survey if they felt they had trouble reading or concentrating. This “Alternate” survey contained fewer questions than the “Adult” survey. Additional file 1: Table S1 shows the questions administered by survey type.
Qualitative and statistical analysis
Coding open-text data
The open-ended data were coded into goal themes by six trained raters (EB, RBB, AD, JBL, EK, MCF), according to an existing framework and then iteratively refined based on emergent themes in the open-text data described above. This existing framework provided a standardized protocol and comprehensive codebook originally derived using both deductive and inductive approaches in an extensive sorting procedure . Themes in the current data were coded as “1” or “0” depending on whether they were reflected in the individual’s written text. As the goal-delineation themes were originally developed with a Human Immunodeficiency Virus (HIV) sample , which generally has different sociodemographic characteristics than the current study sample, some themes were not as prevalent here. Themes were added as needed for the current study, resulting in a set of 40 themes used for both the wishes and goals prompts and 17 for the QOL definition prompt. For each prompt, a theme of “No Direct Answer” was used if the respondent did not provide an answer or answered a different question than the one that was asked. This is distinct from leaving the question blank (i.e., skipping the question). For example, in response to the question “What are the main things you want to accomplish?” exemplary No-Direct-Answer responses were “seems rather great” or “nothing idk lol.”
Each text entry could be coded for as many themes as were reflected in the set for the corresponding prompt. Therefore, one entry could elicit one theme or more than one depending on its wording. For example, if one individual had written for their goal “My bills paid, my family healthy and happy, and family go to church,” it would have been coded as reflecting family welfare, financial concerns, health issues, mental health/mood state, and religious/spiritual concerns. In contrast, another individual’s goal was “Move to a different state,” which was coded with the single theme of living situation. Thus, we are assuming that the relevant factor here is the themes, not the individual wishes, goals, or QOL definitions themselves.
Training took place in two multi-hour sessions to understand the protocol and to utilize fully the codebook, where the themes were described fully and exemplified. Raters coded an initial set of ten participants’ data (from all three prompts), followed by a discussion of differences across raters. Incorporating exchanged feedback, they then coded the next ten participants’ data (again all prompts), and comparison and discussion now revealed almost no differences across raters. Raters coded data from 40 more responses (all three prompts), from which inter-rater reliability per prompt was computed in two ways on the 240 test responses (6 raters * 40 participant entries).
Two methods were used to assess aspects of inter-rater reliability. The first, Fleiss’s kappa  assessed degree of agreement over and above what would be expected by chance. This variant on the more familiar Cohen’s kappa  is used in cases of more than two raters. While there are no generally accepted rules of thumb for a desirable level of either form of kappa, some healthcare researchers have proposed values from 0.41–0.60 as “moderate,” 0.61–0.80 as “good,” and 0.81–1.00 as “very good.”[24, 25]
The second method assessed what proportion of the variance could be explained by the Rater effect. A low number is preferable as it reflects that the scores relate to the individual’s data being coded rather than reflecting a response style of the rater. This method used logistic regression to assess level of agreement among raters, with each of 240 “0” or “1” values regressed on the Rater variable, with its six rater-categories. High inter-rater reliability (IRR) for any given theme would be indicated by a nonsignificant Rater effect, and one that explained a low fraction of the variance in ratings (i.e., a pseudo-R-squared in the low single digits).
Comparing length, number of themes, and inter-method associations
Analysis of Variance (ANOVA) models were used to compare length of open-text response and number of themes (dependent variables) by role (patient or comparison; independent variable) to compare the complexity of the responses by group.
Differences in age distributions by role
The child, teen, and adult data sets revealed age differences between patients and comparisons: there were differences in mean age, the frequency of certain age ranges, and the shape of the age distributions. We decided that, in addition to adjusting for age, in our models we would apply weights so as to simulate more comparable age distributions. We developed a weighting variable that reduced the disproportionate impact of comparison group participants of specific ages so that the age distributions between patients and comparisons would be much more similar. For example, 24–34-year olds were far over-represented amongst comparisons; these were thus down-weighted (i.e., treated as 0.4 of a participant instead of 1.0 participant), not excluded, for multivariate analysis. While the weighting might not completely eliminate the age differences between the two groups, it was conducted to make those distributions comparable enough to render the planned analyses tractable .
Analyzing the aspirations data
Descriptive statistics summarized either the proportion of each group coded as reflecting a given theme for the open-text data or the mean and standard deviation for the closed-ended goal-delineation items. Effect size was summarized by phi for comparison of proportions, or Cohen’s d for comparison of means, for the open- and close-ended data, respectively.
Propensity scores were used to control for demographic differences other than age, between DMD participants and comparison participants in the below-described multivariate models . The goal of the propensity-score modeling was to create a score for covariate adjustment across all age groups, thereby allowing us to compare aspirations across the age span. This was the central contribution of the present work. We thus used the following pragmatic approach for dealing with the fact that some covariates were simply not asked and thus not available (see Additional file 1: Table S1). Accordingly, our propensity-score model adjusted for those covariates that differed between patient and comparison groups in bivariate analyses described below. Separately for adults and for teens/children, a logistic regression model was computed predicting the dependent variable Role (DMD patient vs. comparison participant) from the applicable covariates. For adults who completed the adult survey, the covariates included the following: ethnicity, White race, Black race, region, marital status, difficulty paying bills, whether currently working, education, whether received help completing survey, and whether someone in household had contracted COVID-19. Only male comparison participants were included, since all DMD participants were male. For children, teens, and those who completed the Alternate survey, the covariates included the following: ethnicity, White race, Black race, region, whether received help completing survey, and whether someone in household had contracted COVID-19. For a small proportion of participants (2%), propensity scores were based on the mean propensity score among the individual’s age group.
Multivariate models were then computed to hone the contrasts, comparing the patient and comparison groups on binary themes for the coded open-text data (wishes, QOL definition, goals), or on the rating-scale close-ended goal-delineation items from the QOLAPv2. Because of our particular focus on age-related differences in aspirations, the models evaluated how patients vs. comparison participants (i.e., ‘role’) differed in aspiration outcomes after adjusting for their propensity scores, age, and the role-by-age interaction. Logistic regression was used to analyze the individual themes for the coded open-ended data, while Analysis of Covariance (ANCOVA) was used to analyze the closed-ended goal-delineation data.
For logistic regressions, 13 of the 95 theme variables showed no variation and thus were excluded from analyses. Further, some of those analyzed showed complete or quasi-complete separation in logistic regression, and for these we reported only descriptive results.
Interpretation of main effects in the context of interactions
The abovementioned multivariate models aimed to investigate how patients differed from comparisons in their Aspirations and at different ages, after adjusting for demographic variables that might have confounded the raw descriptive comparisons. Interpreting main effects can be challenging when the model contains interaction terms, because the latter are collinear with the former. To address this challenge, plots of substantial interaction effects were used to facilitate their interpretation. In order to display an interaction effect (Role*Age), we created scatterplots that graphed predicted values from the model (Y-axis) against age (X axis). Any theme variable with a group mean that was < 0.01, or with a |Beta| or |Estimated Beta| out of the usual range (> 1.3), was excluded from interaction graphs.
Interpretation in the context of many contrasts
The present study involves a large number of statistical contrasts primarily because it is investigating research questions that have not been addressed to date and which involve translating nuanced qualitative data into quantitative metrics. In demographic comparisons, we relied on p-values to identify group differences, but in other analyses we focused our interpretation on effect sizes (ES). Cohen’s criteria were used to facilitate interpretation of differences for medium and large effect sizes, respectively: in proportions (Phi of 0.3 and 0.5); in mean differences (Cohen’s d of 0.5 and 0.8); in model explained variance (R2 of 0.06 and 0.14); and in model parameter estimates (standardized beta coefficients of 0.3 and 0.5). While we report ES regardless of magnitude, we interpret only medium or large ES because these are generally considered clinically important . Tables are conditionally formatted using data bars in unadjusted comparisons, and using different colors and saturation levels in adjusted comparisons to highlight effects’ direction and magnitude.
IBM SPSS version 27  and the R software  were used for all analyses.