Literature review to characterize the empirical basis for response scale selection in pediatric populations

Background Despite the importance of response option selection for patient-reported outcome measures, there seems to be little empirical evidence for the selected scale type. This article provides an overview of the published research on response scale types and empirical support within pediatric populations. Methods A comprehensive review of the scientific literature was conducted to identify response scale option types appropriate for use in pediatric populations and to review and summarize the available empirical evidence for each scale type. Results Eleven review/consensus guideline/expert opinion articles and 20 empirical articles that provided guidance or evidence regarding pediatric response scale selection were identified. There was general consensus that 5-point verbal rating scales, including Likert scales, were appropriate for children aged 7 or 8 and older, while graphical or faces scales are often used in pediatric studies with children of younger ages. Conclusion In general, the verbal rating scale, numeric rating scale, visual analogue scale, and graphical scales have each demonstrated to be reliable and valid response option formats in specific contexts among pediatric populations; however, their appropriateness is dependent upon sample age. When selecting response scales, it is important to consider target population and context of use during the development of patient-reported outcome measures, especially with respect to tense, recall period, attribution, number of options, etc. In addition to age, cognitive development is an important aspect to consider for optimizing pediatric self-reported measures. More research is needed to determine clinically relevant changes and differences within pediatric research, which includes different response scale options.


Background
The development of patient-reported outcome (PRO) measures involves the identification of the relevant concepts which are measured through one or more items (questions or statements (items)) that can be evaluated by utilizing a response option set. The response options must be consistent with each item's purpose and intended usage. The selection of response options is an important component of item construction and characterizes how the concept is measured. When determining the type of response options to be used, many factors must be taken into account, most importantly the target population and intended use of the item. For instance, Lukas et al. [1] was able to demonstrate reliable reports of pain through appropriate selection of response types based upon the cognitive ability of the target population.
Historical use of response option types, the use of qualitative research and the assessment of measurement properties contribute to the identification of response categories that will perform most reliably within the intended population. Special consideration should be given to different populations, particularly the pediatric population. Important characteristics of pediatric populations influencing response set choice include age, literacy skills, ability to verbally communicate, cognitive ability to quantify feelings or symptoms, and motivational desire to please or select the 'right' answer [2,3]. Therefore the development of a reliable and valid PRO measure for the pediatric patient population presents unique challenges.
The thoughtful selection of response options for new pediatric PRO measures is important, however, there is very little empirical basis for the type of response scale selected or attributes of the response scale including, number of response options, visual orientation of the scale (for example, vertical vs. horizontal), and response scale anchor wording. The United States Food and Drug Administration emphasizes the importance of self-report in pediatric populations rather than proxy-report, but do not recommend any specific response options for inclusion in a PRO measure. However, several response option types commonly seen in PRO instruments are listed for consideration in their 2009 Guidance for Industry, titled Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims [4]. Typical response options employed in PRO instruments include verbal rating scales (VRS), visual analogue scales (VAS), numeric rating scales (NRS), and various graphical scales such as a Faces scale. In the pediatric literature, it has been reported that children can reliably distinguish and understand fewer response options than adults; for example, in testing the Childhood Asthma Control Test, Liu et al. [5] found that a 4-point response scale was optimal. Further, graphical rather than numeric or verbal response scales may enhance comprehension of response scales in children [2].
The purpose of this article is to provide an overview of the published research on response scale option types used within pediatric populations. Evidence identified through this comprehensive literature review is intended to inform and enhance response scale selection for newly developed PRO instruments designed for use in pediatric populations.

Search procedure
A comprehensive review of the scientific literature was conducted to identify appropriate response scale option types for the pediatric population, and to review and summarize the available empirical evidence for each type of scale. The literature review was part of a larger study, funded by the Critical Path Institute's Patient-Reported Outcome (PRO) Consortium, to summarize the available empirical evidence to support response option selection for PRO measures, by context of use. Published articles, limited to English-based in the preceding 10 years (2004-2014), were retrieved and reviewed to provide information on optimal response options for PRO measures in the pediatric population.
The search databases included EMBASE, MEDLINE, and PsycINFO. In addition to formal searches, a number of supplementary sources were utilized to identify additional relevant articles for inclusion in the review. Further, the reference lists of articles identified from the formal and supplementary searches were also reviewed to identify additional articles to be included in the review.
Lastly, a search was conducted for presentation abstracts that were accepted during the past two years of the meetings/conferences of the International Society for Pharmacoeconomics and Outcomes Research (2013 and 2014) and the International Society for Quality of Life Research (2012 and 2013) meetings/conferences to identify any scientific disclosures prior to publication in the peer-reviewed literature.

Search strategy
Search terms used to identify articles in EMBASE, MED-LINE, and PsycINFO that met the search objective and were applicable to the pediatric population are presented in Table 1. Articles that provided both direct and indirect evidence were included. Direct evidence was defined as evidence that provided a direct answer to the research question of interest; for example, direct evidence articles empirically compared the relative robustness or merits of two different response scale types within the same study/ population. Indirect evidence was defined as relevant evidence that should be considered in the review and overall conclusions, but that did not directly answer the research question or hypothesis. Articles were excluded if they provided no direct or indirect evidence relevant to the search objectives, were not applicable to PRO development, or addressed an area not pre-specified for inclusion.

Data extraction
During the review process, eligibility assessment of both abstracts and full text articles were evaluated by two independent reviewers. In the case of non-agreement, a third senior reviewer made the final judgment. Relevant data were extracted from articles that were identified based on the inclusion/exclusion criteria and summarized in tables. For each article included in the review, an assessment on the quality of the data presented and therefore strength of results and recommendations was made. Each article was assigned a grade based on the type of article and strength of the data, as outlined by the following criteria: A. Primary research; compares different response scales within the study. B. Review or expert opinion; based on empirical evidence. C. Primary research; evaluates a single response scale type within the study.
D. Review or expert opinion; based on expert consensus, convention or historical experience.
The letter grade A reflects the strongest empirical evidence for response scale recommendation and a letter grade D reflects the weakest empirically-based evidence.

Results
The initial search yielded 1083 abstracts; a manual review of additional articles yielded 11 articles; lastly, a review of the potential 439 conference abstracts with the term "scale" yielded three potential abstracts. After abstract and full text screening and screening references from additional sources (full literature review results), in this review, we identified 6 review/consensus guideline/ expert opinion articles and 16 empirical articles that provided guidance or evidence regarding pediatric response scale selection (Fig. 1). Three age groups (4 to 8 years, 6 to 18 years, and 10 to 18 years) emerged as most commonly described in the literature.
Across the review and expert opinion articles, there was general consensus that the 5-point VRS, including Likert scales, were appropriate for children 7 or 8 years of age and older (Table 2) [2,6]. While graphical or faces  scales are often used in pediatric studies [7,8], some of the review articles noted that additional empirical evidence is needed to support the use of these scales and the specific ages for which they are appropriate [2]. There is some evidence that supports using a facial-graphics enhanced response scale in younger children (between the ages of 4 and 7) [9]. For children ages 7 or 8 and older, some review articles advocated use of NRS or VAS [9], whereas another article suggested that children prefer a VRS to an NRS or VAS [2]. Cohen et al. [10] noted that visual orientation of scales, emotions expressed in graphical scales, and word choice for verbal anchors can produce unexpected biases due to immaturity in abstract thinking skills of respondents; for example, a child might choose a numerical response based on favorite number rather than representation of experience.
Children between the ages of 4 and 8 Table 3 presents the empirical studies evaluating optimal response scale choice among the youngest of respondents (ranging in age from 4 up to 7 to 8). Liu et al. [5] found a 4-point VRS with graphical faces to aid in response-enhanced comprehension of the scale in children between 4 and 11 years of age. In a study evaluating various pain response scales for use in children between the ages of 6 and 8, results were inconclusive as to whether a Faces, NRS, VAS, or a color scale (e.g., children select a color associated with level of intensity), was superior [11]. These response scales did produce different estimates of pain and were not considered interchangeable in young children [11].
Children between the ages of 6 and 18 Table 4 includes the studies evaluating multiple response scales within the same study among children 6 years of  [13] found that younger children (ages 6 to 11 years) in their sample (overall range between 6 and 16 years of age) did not fully understand all descriptive words in a 5-point VRS, and that the VRS was not highly correlated with a pain VAS. Bailey et al. [14] found the VAS, NRS, and VRS to be reliable and valid in evaluating pain in children 8 to 17 years of age, but the scores produced were not interchangeable. Connelly and Neville [15] found that the VAS, NRS, and Faces scales tended to be highly inter-related, but scores were generally higher on the NRS, and that VAS and Faces were more responsive to decreasing trends in pain scores in children between the ages of 9 and 18. Bailey et al. [16] evaluated the correspondence between NRS, VAS, VAS with color, and Faces scales in children ages 8 to 18 who had abdominal pain, and found that, while there was high correspondence between the VAS and VAS with color, the NRS did not correspond with the other scale types. Benini et al. [17] found VAS to be easiest for children 7 to 18 years of age to understand and use as compared to Faces and other  Children between the ages of 10 and 18 Table 5 presents studies evaluating response scale selection in older children (10 to 18 years). The NRS was found to produce higher scores than either the VAS or Faces scale, but the NRS had higher correspondence with the VAS than it had with Faces [18]. A 5-point VRS was found to be well accepted and understood [19]. Finally, a 5-point VRS was found to be stable regardless of recall period [20].

Discussion
The aim of this review was to provide an overview of the published research on response scale option types used within pediatric populations. Results showed that there was empirical evidence supporting the use of VRS, NRS, and VAS response options in children and adolescents aged 8 to 18 years with age-appropriate literacy skills and cognitive development. There was also evidence that self-report instruments can be used for children as young as 4 years of age when graphical scales are used as the response scale option. Meanwhile, there was little support in the published literature for a preferred response scale option type in the age group 8 to 18 years. Our findings indicate the importance of evaluating different response options in cognitive interviews when developing a new or modifying an existing PRO measure. In a 2007 review, Grange et al. stated that, for children younger than 5 years old, there was no clear empirical support for the use of self-report instruments [21]. Instead it was recommended that assessments regarding these children should rely upon clinical measures and observational reports. When they evaluated the psychometric properties of different health-related quality of life instruments that measure physical and/or emotional impact of symptoms, Grange et al. noted that these aspects may be just too abstract for children younger than 5 years old [21]. However, several studies have shown that children from the age of 4 years often can provide information on their health status, especially when it concerns concrete aspects, such as pain and use of medication [5,11]. Well-defined, tangible concepts are important when considering self-report for this population.
When choosing response options to be used for 4 to 7 year olds, limitations in reading skills, vocabulary, and conceptual understanding of numbers must also be considered [2,3]. Hence, the VRS, VAS, and NRS are likely not appropriate response options [2,3,9,22]; whereas several instruments that use different graphical scales have shown acceptable validity and reliability (e.g., [5,23]). However, there were aspects found among the acceptable graphical response options that were problematic. These included the expression of gender neutrality, the depiction of images resembling a target population or a stereotype of that population, and the recognition that the emotional cues expressed, such as smiling or frowning, in the faces in a graphical response option may be culturally dependent [5,10].
As for the age group of 10 to 17 years, there was limited support in the published literature for any user-preferred response option. However, some studies have shown that the VAS appears to have shortcomings in this age group. Shields et al. [24] found that only one-third of study participants aged 5 to 14 were able to understand the VAS, and that those who were able to understand the VAS were significantly older (mean = 9.8 years) than those who did not understand it (mean = 8.2 years). More studies are needed to evaluate the robustness of these findings.
Development of PRO measures should be conducted so that the target population informs the selection of the  [25]. The selection of responses should consider psychometric properties, represent the full continuum of the potential respondent experience, and should be ordered, equally spaced, and distinct from one another [4,26,27]. Ease of translatability and cultural adaptation should be considered and assessed early in development of PRO measures. While expert opinion and experience suggests the NRS may not pose problems in translation as the response numbers are not changed, the choice of word(s) used for NRS and VAS anchors may have implications for translations.
Typically 5-or 7-point scales are easier to translate than scales with more than seven response choices which can pose problems in other languages where more granular verbal distinctions do not exist. In a review of the translatability of various commonly used verbal anchors, agreement anchors (e.g., strongly agree, agree, disagree, strongly disagree) were found to be easy to translate and to have a high degree of equivalence across languages. In contrast, terms such as "fair" have multiple translation options and connotations across languages. Anchors such as "a little bit" are difficult to translate because of the lack of equivalence for the term "bit" in some languages. Further, item stems and response anchors corresponding to "…of the time" were found to be difficult to translate comparably across languages [28]. Authors recommended that the NRS or VAS should be used where possible, as those need only minimal translation [28].
Furthermore, the intended mode of data collection (e.g., paper/pen versus electronic) should be considered depending on the age of the person completing the instrument [29,30]. Based on expert opinion and experience, an NRS response set can be easily implemented via electronic modality in an interactive voice response system, handheld (smartphone), tablet, and Web modes. However, formatting on handheld, tablet, and Web-based systems needs to be carefully considered so that the anchors of the NRS are associated with their intended number, and no ambiguity is caused by anchors that extend beyond one numerical category. It is impossible to implement a VAS using an interactive voice response system because the participant cannot place a mark on the line. Further, a VAS may be challenging to implement in other electronic modes due to screen size limitations and space constraints that cannot accommodate the 100-mm length presented on paper. Modifications such as a shorter line length that provides 101 data points can accomplish the same goal, as the electronic version scores the response automatically, eliminating the need for manual measurement using a ruler. For a VRS, the number of response options and length of verbal descriptors should be carefully selected so as to lighten the cognitive load (for an interactive voice response system) and to allow for equidistant formatting on one screen for handheld, tablet, and Web-based implementations. Faces scales, though less widely utilized in the PRO measurement literature reviewed, cannot be administered orally (via an interactive voice response system), but are easily administered via screen-based electronic modes.
This literature review was conducted in early 2015 and was limited to articles published in English during a 10-year timespan from 2004 through 2014, from which the key direct and indirect evidence was identified. Each article was graded based on the type of article and strength of the data. The search strategy was based on pre-specified criteria that may not have been inclusive of global research utilizing different terminology for PRO instruments designed for use in specific pediatric populations (e.g. "response format", "response option", "response set", "item format", "PROM", or "patient-reported outcome(s)"), thus introducing risk of omitting relevant studies in the literature. Due to the scope of our literature review and the paucity of literature identified, more differentiated presentation of the findings was not pragmatic.

Conclusion
The VRS, NRS, VAS, and graphical scales can all be reliable and valid response options in pediatric populations. However, the current empirical basis is insufficient to draw firm conclusions and to make differentiated recommendations. Therefore, when choosing a response format, it is important to consider the context of use during the development/modification of PRO measures and the study design. Apart from age, important aspects to consider are cultural background and cognitive development. More global studies on children's preferences for response formats are needed to optimize pediatric self-reported measures. Additionally, more research is needed to assess the psychometric properties of items and their response options, to determine clinically meaningful changes and differences within pediatric clinical trials, which are impacted by the response scale chosen and the scoring function applied.

Availability of data and materials
This article is entirely based on data and materials that have been published, are publicly available (thus, accessible to any interested researcher), and appear in the References list.

Authors' contributions
Concentrating on the study concept and design were AN, KG, and SS: JH and SS acquired the data; MV focused on the analysis and data interpretation, as did KG and SS. All the authors have agreed to be accountable for all aspects of the work, particularly for ensuring that any questions of the work's accuracy or integrity are promptly investigated and resolved. All authors have given their approval of the final version or the manuscript. Each author participated in creating drafts of the manuscript or in critical revisions.

Ethics approval and consent to participate
This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication
Not applicable.

Competing interests
Mabel Crescioni and Mira Patel report no employment in a pharmaceutical company nor do they hold stocks, shares, or stock options in a pharmaceutical company. Katherine Gries reports she is a current employee at Janssen but reports no stocks, shares, nor options. Anna Rydén is an employee and shareholder of AstraZeneca. Jennifer T. Hanlon is a salaried employee who owns stocks, shares, and stock options at Ironwood Pharmaceuticals. April N. Naegeli is a salaried employee who owns stocks, shares, and stock options at Eli Lilly and Company. Shima Safikhani and Margaret Vernon are employees of Evidera, a research and consulting firm to the biopharma industry and, as such, are not allowed to accept remuneration from any Evidera clients. None of these authors report any other arrangements that could be perceived as conflicts of interest.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.