Skip to main content

Literature review to assemble the evidence for response scales used in patient-reported outcome measures



In the development of patient-reported outcome (PRO) instruments, little documentation is provided on the justification of response scale selection. The selection of response scales is often based on the developers’ preferences or therapeutic area conventions. The purpose of this literature review was to assemble evidence on the selection of response scale types, in PRO instruments. The literature search was conducted in EMBASE, MEDLINE, and PsycINFO databases. Secondary search was conducted on supplementary sources including reference lists of key articles, websites for major PRO-related working groups and consortia, and conference abstracts. Evidence on the selection of verbal rating scale (VRS), numeric rating scale (NRS), and visual analogue scale (VAS) was collated based on pre-determined categories pertinent to the development of PRO instruments: reliability, validity, and responsiveness of PRO instruments, select therapeutic areas, and optimal number of response scale options.


A total of 6713 abstracts were reviewed; 186 full-text references included. There was a lack of consensus in the literature on the justification for response scale type based on the reliability, validity, and responsiveness of a PRO instrument. The type of response scale varied within the following therapeutic areas: asthma, cognition, depression, fatigue in rheumatoid arthritis, and oncology. The optimal number of response options depends on the construct, but quantitative evidence suggests that a 5-point or 6-point VRS was more informative and discriminative than fewer response options.


The VRS, NRS, and VAS are acceptable response scale types in the development of PRO instruments. The empirical evidence on selection of response scales was inconsistent and, therefore, more empirical evidence needs to be generated. In the development of PRO instruments, it is important to consider the measurement properties and therapeutic area and provide justification for the selection of response scale type.


Response scale selection is a critical aspect in the development of patient-reported outcome (PRO) instruments and has implications for the usability of the measure, the level of precision with which the construct of interest is measured, and the quantitative properties of the outcome score including range, standard deviation, scoring, score interpretation guidelines, and ability of the measure to detect change. Additional complicating factors such as placement of response anchors and exact wording of anchors, cultural comparability/translatability of the format and anchor wording, and ability to migrate the scale to various modes of data collection (paper/pencil, electronic) should be examined when selecting the optimal response scale option for a PRO measure.

Despite the importance of response scale selection for PRO instruments, there is little empirical evidence for the optimal type of response scale and number of response options. For PRO measures with multiple items, 5-point and 7-point verbal rating scales (VRS) are commonly used for adult assessments; examples include the Patient-Reported Outcomes Measurement Information System (PROMIS) item banks and EXAcerbations of Chronic Pulmonary Disease Tool (EXACT®). Eleven-point numeric rating scales (NRS) (particularly recommended for use in pain measurement but used in various other areas as well [1]), and 10 cm (cm) /100 mm (mm) visual analogue scales (VAS) are commonly used for single item adult assessments. In the pediatric literature, there is some evidence that children can reliably distinguish and understand fewer response options than adults. For example, in testing the Childhood Asthma Control Test (cACT), Liu et al. [2] found that a 4-point response scale with no neutral center value was optimal. Furthermore, a graphical scale rather than a NRS or VRS may enhance comprehension of response scales in children [3].

The objective of this literature review was to assemble the evidence on the selection of response scale types to guide the development of PRO instruments. This paper focuses on the overall methodology and results of the literature review. A large body of the available evidence was specific to PRO instruments that were developed for the measure of pain or based on age of the respondent. Because of this, the results of those searches were provided in separate publications [4, 5].


A comprehensive review of the scientific literature was conducted to identify response scale types in the development of PRO instruments and the empirical evidence used to justify the appropriate scale type by context of use. The targeted search strategy included formal guidelines or review articles on the selection of response scales and response scale methodology (not specific to PRO instruments) and evidence on the selection of response scales for use in PRO instruments [Table 1]. Evidence was assembled and collated based on pre-determined categories: reliability, validity, and responsiveness of a PRO instrument; select therapeutic areas: asthma, cognition, depression, fatigue in rheumatoid arthritis, and oncology; and the optimal number of response scale options.

Table 1 Literature review search terms

Searches were conducted in the EMBASE, MEDLINE, and PsycINFO databases. Limits were applied to include only articles published in English in the preceding 10 years (2004–2014). The duplicates across individual searches were removed prior to abstract/article review. During the full text article review and data extraction, several supplementary sources were used to identify additional relevant articles for inclusion in the review. These supplementary sources were not limited by publication date, and included the reference lists of key articles, publications not included in the search databases, and websites for major PRO-related working groups and consortia (e.g., PROMIS, NIH Toolbox, Medical Outcomes Study, Neuro-QoL, ASEQ-ME, EORTC, EuroQol Group, and FACIT Measurement System). In addition, conference abstracts were identified and reviewed from annual meetings within the preceding 2 years for Joint Statistical Meetings, Psychometric Society Meetings, International Society for Pharmacoeconomics and Outcomes Research, and International Society for Quality of Life Research. An outline of the review procedure is included in Fig. 1.

Fig. 1
figure 1

Outline of search procedures and data extraction. PRO: patient-reported outcome

Study selection

During the review process, both abstracts and then full text publications were evaluated for eligibility by two independent reviewers. In the case of non-agreement, a third senior reviewer determined the final judgment. Articles were excluded if they provided no direct or indirect evidence relevant to the search objectives, were not applicable to PRO development, or addressed a therapeutic area not pre-specified for inclusion.

Synthesis of results

Once articles fitting the search criteria were identified, the relevant data were extracted and summarized. The extraction tables included data on the study objective, study design, study population, therapeutic area, name of PRO instrument, type of response scale, and empirical evidence for response scale selection.

Each article deemed relevant to the review and included in the extraction tables was categorized as including either direct evidence or indirect evidence. Direct evidence was defined as evidence that provided an answer specific to a research question of interest; for example, direct evidence articles compared empirically the relative robustness or merits of two different response scale types within the same study/population. Indirect evidence was defined as evidence that, while relevant to the review and the overall conclusions, does not directly answer a research question or hypothesis. For example, review articles and articles that evaluated a single response scale type within the study/population (i.e., a study evaluating comprehension of VAS in cognitively impaired patients) were considered to contain indirect evidence.

Response scale types

The most common types of response scales identified in the literature included: VAS, VRS with or without numerical anchors, NRS, and to a lesser extent graphical scales such as the Faces Scale. Several less commonly used scales were also identified, such as Likert scales and Binary scales.

Visual analogue scale

The VAS is a scale comprised of a horizontal or vertical line, usually 10 cm (100 mm) in length, anchored at both ends by verbal descriptors [6]. The respondent places a line perpendicular to the VAS line at the point that represents the intensity of the effect in question (e.g., pain). The length of the VAS is imperative on paper, as the score is determined using a ruler and measuring the distance between the lower anchor and the mark made by the respondent (range: 0–100). A variation of the VAS includes either numbers or adjectives indicating intensity along the scale, though this is not encouraged as the numbers and adjectives can bias the results by adding additional components to the scale that may alter interpretation.

Verbal rating scale

A VRS is a scale that consists of a list of words or phrases describing different levels of the main effect (e.g., pain), in order from least to most intense. The respondent reads the list of verbal descriptors and chooses the one that best describes the intensity of his/her experience [6]. Traditionally a VRS does not contain numbers, but the review identified many examples of VRS with numbers assigned to all or some of the verbal anchors. The study team considered VRS with numbers to be a subcategory of the VRS, with the use of numbers present for scoring purposes and/or to indicate to the respondent that the verbal anchors are meant to have equidistant intervals. Based on the results of the literature review, the VRS was also referred to as a verbal category scale, verbal graphic rating scale, and verbal descriptor scale; and for purposes of this report, were classified as a VRS.

Numeric rating scale

The NRS is a scale that represents an intensity continuum for respondents to rate the effect (e.g., pain) using a range of integers [6]. The most common NRS is an 11-point scale ranging from 0 (no effect) to 10 (maximal effect). The respondent selects one number that best represents the intensity being experienced. Variations of the NRS included the use of verbal anchors at various points at the middle or ends of a scale; this is common in the context of PRO instrument development.

Faces scale

A Faces scale is a type of graphical scale that uses photographs or pictures to show a continuum of facial expressions. Line drawings of faces are the most common graphic representation, as their lack of gender or ethnicity indicators makes them applicable to a wider range of respondents [6]. The respondent then selects the face that best describes how he or she is feeling. Verbal labels are usually very simple or non-existent for use in children. The Faces scale does not require reading ability or specific language, thereby facilitating pediatric and multi-cultural comprehension.

Likert (Likert-type) response scale

The Likert scale is a type of ordinal scale characterized by several features: the scale contains more than one item; response levels are arranged horizontally; response levels are anchored with consecutive integers; response levels are also anchored with verbal labels, which connote more-or-less evenly-spaced gradations; verbal labels are bivalent and symmetrical around a neutral middle; and the scale often measures attitude in terms of level of agreement/disagreement with a target statement [7]. Likert-type scales are most often used to assess agreement, attitude, and probability; while common in social psychology or health psychology scales, they have less use in health outcomes assessments [6]. One exception is a Global Impression of Change scale, where an evaluation of health is made at the start of a new treatment or over a specific time frame. The provision of an odd number of response categories allows respondents to choose a middle, or neutral, response. An even number of response categories forces the respondent to commit themselves to one side of the scale or the other side. The choice between odd and even response categories depends on the desirability of allowing a neutral position. One of the main differences between Likert or Likert-type scales and the VRS is the presence of the neutral middle anchor in the Likert-type scale but not in the VRS, which orders descriptors from least to most measurable attribute(s) [6].

In this literature review, response scales were frequently referred to as Likert or Likert-type; however, most of these scales did not strictly meet the requirements for a Likert scale. Thus, while many scales were referred to as Likert or Likert-type in the original publication, they were more appropriately classified as VRS, and in the literature review will be referred to as VRS.


Study selection

The literature search for evidence on types of response scales in formal guidelines or review articles identified 1315 abstracts, plus 13 additional articles selected through secondary sources and 5 conference abstracts. The literature search on the selection of response scale types specific to the development of PRO instruments resulted in 5299 abstracts, 35 abstracts from secondary sources, and 46 conference abstracts. After review the number of references totaled 186 full-text articles. During abstract screening 6199 irrelevant references were excluded, then 463 full text articles were reviewed and 51 conference abstracts. Reasons for exclusion after full-text review included: no discussion or available evidence on the response scale selection (n = 233), duplicate (n = 36), clinician or observer-rated instrument (n = 5), full-text publication not available (n = 3), and 48 conference abstracts were excluded for not containing enough details for data extraction. Results are presented on the selection of response scale types based on reliability, validity, responsiveness, therapeutic areas, and optimal number of response scale options. Over 40% of the included literature (77 references) discussed the selection of response scale type for the measurement of pain and based on study population; therefore, these conclusions were published separately for a comprehensive discussion on the unique issues pertaining to single item pain scales and the differences between pediatric and adult PRO instruments [4, 5].

Synthesis of results


Results for the selection of response scale type based on reliability of a PRO instrument were variable. A study on the pediatric population (non-specific therapeutic area) found no difference in test-retest reliability among the VRS, VAS, and a numeric VAS response scale [8]. A study in adults with rheumatoid arthritis found the NRS to be more reliable than VAS or 5-point VRS, with greater test-retest reliability in a subset of participants who were illiterate [9]. Phan and colleagues [10] also found the NRS to have superior test-retest reliability compared to VAS or 4-point VRS when assessed in adults with chronic pruritus. Test-retest reliability was greater for the VAS compared to the other two scale types in healthy adults [11]. Two studies (one on adult geriatric patients with neurological disorders; another on adults with pain) compared 5-point VRS to VAS; VAS was found to have slightly greater test-retest reliability in both studies [12, 13]. A study in adults with angina compared a 5-point VRS to NRS and found no difference in the test-retest reliability of the measure [14]. In another comparison of the NRS and VAS, a study of perceptual voice evaluation in adults for an IVR (interactive voice response) system, there was no difference in intra-rater agreement [15]. However, overall, the NRS and VAS tend to demonstrate better test-retest reliability than the VRS.


Many studies reported concurrence between the response scale types being evaluated within each study. The majority reported large correlations between different items/scales that evaluated the same concept; this is an important consideration in the validity of results compared between response scale types. Only one study in adults with angina reported on the magnitude of correlations using external criterion variables for the response scales under consideration; there was no difference between an NRS and 5-point VRS in concurrent validity [16].


Results for the evaluation of these scale types based on responsiveness, or the ability of the scale to detect change in the underlying condition of a patient with treatment in a naturalistic setting, are provided in Table 2. Results for responsiveness were found only in the pain literature and, as such, may not be generalizable to other therapeutic areas. The comparative responsiveness of VRS and NRS to measure the intensity of pain in patients with chronic pain was assessed directly using two 6-point VRS (current pain) items and four 11-point NRS items from the Brief Pain Inventory (BPI; worst pain, least pain, average pain, and current pain) [17]. The 6-point VRS included the Present Pain Index (PPI) (0 = no pain, 1 = mild, 2 = discomforting, 3 = distressing, 4 = horrible, and 5 = excruciating) and the 6-point Oswestry Disability Index (ODI) (0 = no pain, 1 = very mild, 2 = moderate, 3 = fairly severe, 4 = very severe, and 5 = worst imaginable). For all participants, the standardized response mean (SRM) was small while the VRS-PPI (0.29; 95% CI: 0.17, 0.41) and VRS-ODI (0.27; 95% CI: 0.15, 0.38) were smaller than the BPI NRS measure for current pain (0.36; 95% CI: 0.23, 0.48) [17]. For participants classified as responders, the BPI NRS current pain (0.89, 95% CI: 0.70, 1.07) exhibited large responsiveness and the VRS-PPI (0.58; 95% CI: 0.40, 0.77) and VRS-ODI (0.52; 95% CI: 0.34, 0.70) achieved moderate responsiveness [17].

Table 2 Key studies that support response scale selection for PRO instruments based on responsiveness

Therapeutic area

Results to support the selection of response scale type based on select therapeutic areas are provided in Table 3. A 5-point VRS used in a PRO instrument evaluating asthma was well understood and acceptable to adults and a 4-point VRS with graphics was understood by children (ages 4 through 11), based on cognitive interviews [2, 18]. Patients with cognitive impairment preferred a VRS over a VAS, but test-retest reliability was similar for both formats [13]. For depression, cognitive interviews supported use of an 11-point NRS, and a 4-point VRS was just as precise in measurements as a 5-point VRS [19]. For fatigue in RA, the VAS and NRS were correlated but not interchangeable; meanwhile, scores from the NRS were higher than the VAS, and patients found the VAS more difficult to understand [20]. Results in oncology studies support use of an 11-point NRS, VAS, VRS, and graphical scales based on the contexts of use and study populations.

Table 3 Key studies that support response scale selection used in PRO instruments based on select therapeutic areas

Optimal number of response scale options

Literature on the optimal number of response scale options is presented in Table 4. In the comparison of a 5-point and 3-point VRS, there was evidence across studies that a 5-point scale was more informative and discriminative than a 3-point scale, but additional research was suggested [21]. Similarly, a 3-point scale was acceptable when compared to a 5-point scale if a simple scale was preferred based on the study population and construct of interest [22]. In a comparison of the 5-point VRS, 7-point VRS, and 11-point NRS scales to evaluate self-esteem, academic performance, and socioeconomic status, the 11-point NRS scale was more normally distributed than the shorter scale options, and demonstrated adequate validity; the authors therefore recommended selection of an 11-point NRS for self-reported measures used to assess social constructs [23]. An item response theory (IRT) analysis on the PROMIS items concluded that 4 to 6 was the optimal response set number; when more than 6 points were used, two or more response options were typically collapsed to improve model fit [24].

Table 4 Key studies that support response scale selection used in PRO instruments based on optimal response set number


The aim of this targeted literature review was to provide an overview of the response scale types commonly used in PRO instruments and to collate the empirical evidence for each type of scale. In the development of PRO instruments, the selection of the response scale(s) used should be based on the best available evidence.

Results for therapeutic area were limited based on the number of references provided for each disease state, thus, limiting the ability to recommend a type of response scale for a therapeutic area of interest. Empirical evidence suggests that a researcher’s choice of a VAS, NRS, VRS, or Faces scale is not based on the therapeutic area but on other aspects, such as study population (age), format of response option, and the concept being measured in the PRO instrument. The optimal number of response options depends on the construct and the number of items making up the domain of measure. A 5-point or 6-point VRS was more informative and discriminative than response scales with fewer response options, and that an 11-point NRS was more normally distributed than shorter scale options [21, 23]. However, while having more response options may be appropriate when assessing symptoms, it is important to consider the size of the instrument and the burden of response for patients, particularly if you are assessing functioning or daily activity, where such measures typically ask for a large set of responses. If these measures are being used as endpoints in a clinical trial setting, note that scores may vary depending not only on the overall number of items in the measure, but also the number of options for response to each individual item.

The intention of the literature review was to provide recommendations in the selection of response scale options for the development of new PRO instruments. But because the evidence is equivocal and there are several factors that needs to be taken into consideration, it is not as easy as providing broad recommendations. But we have provided a hypothetical case example to showcase value in collating the empirical evidence.

In this hypothetical example, a new PRO instrument needs to be developed to assess change in symptoms and change in functioning after patients are treated with a new compound as part of a clinical trial. There will be approximately 20 items and the evidence suggests that the VRS, NRS, and VAS are all appropriate response scale options for consideration.

  1. a.

    Selection: 6-point VRS

Justification: Empirical evidence suggest that data from an 11-point NRS was more normally distributed than a 5-point or 7-point VRS, but the developers decided to reduce the number of options given the larger number of items being asked of the subjects, therefore going with a VRS. Once the VRS and anchors were selected, the developers had to decide on the number of options, with evidence supporting anything between 4-points and 7-points. The objective was to select a scale that would discriminate between treatment arms; based on the evidence a 6-point scale showed slightly better discrimination and reliability compared to a 5-point scale and response sets of greater than 6 choices typically collapsed two or more options when scoring to improve model fit. This literature review was limited in that the key evidence was identified from articles published over the 10-year timespan from 2004 through 2014. Results were limited to a small number of studies that provided direct evidence, and multiple studies were difficult to compare given the variety in study design and diversity of terminology. The search strategy was based on pre-specified criteria that may not have been inclusive of global research using different terminology for PRO instruments. In the development of a PRO measure, the reliability, validity, and responsiveness is not only dependent on the response option, as examined in this study, but also on the item stem and concept being measured. The results of the literature review are limited to the evidence provided on only response scale variable and does not include investigation into how the psychometric properties are also related to the item stem.

Important considerations for response scale selection in PRO measures that were not addressed in the literature review include item response theory (IRT) and the use of Rasch analysis to support the type and format of response scales. IRT was not included as part of this literature review, since it was most likely not employed in older studies, which would mean there would be insufficient information to reach a valid conclusion. However, these types of analyses are now important in addressing the gaps in the literature to further assess the psychometric properties of items and their response options.

While the literature review identified an abundance of support for the VAS, this was based on historical data and does not take into consideration the preferences of patients or regulatory agencies when PRO instruments are used as primary or key secondary endpoints in clinical trials to support labeling claims. Further, this literature review did not demonstrate that the VAS was superior to other scale types in terms of psychometric properties or responsiveness. With the publication of the FDA Guidance in 2009 [25], PRO instrument development and selection of appropriate response scales for the context of use needs to be well documented, with evidence justifying the selection. Thus, when new instruments are being developed, it is important to elicit patient feedback regarding preferences and ease of use of different response scale types.

In summary, the VRS, NRS, and VAS, can all be acceptable response scale options in PRO instruments. However, when choosing a response scale type, it is important to consider the study objective and the context of use (i.e., construct being assessed, type of study population, frequency of assessment) during the development/modification of PRO instruments along with the study design.



Food and Drug Administration


Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials


Numeric rating scale


Patient-reported outcome


Visual analogue scale


Verbal rating scale


  1. Dworkin, R. H., Turk, D. C., Farrar, J. T., Haythornthwaite, J. A., Jensen, M. P., Katz, N. P., Kerns, R. D., Stucki, G., Allen, R. R., Bellamy, N., Carr, D. B., Chandler, J., Cowan, P., Dionne, R., Galer, B. S., Hertz, S., Jadad, A. R., Kramer, L. D., Manning, D. C., Martin, S., McCormick, C. G., McDermott, M. P., McGrath, P., Quessy, S., Rappaport, B. A., Robbins, W., Robinson, J. P., Rothman, M., Royal, M. A., Simon, L., Stauffer, J. W., Stein, W., Tollett, J., Wernicke, J., Witter, J., & IMMPACT. (2005). Core outcome measures for chronic pain clinical trials: Immpact recommendations. Pain, 113(1–2), 9–19.

    Article  PubMed  Google Scholar 

  2. Liu, A. H., Zeiger, R., Sorkness, C., Mahr, T., Ostrom, N., Burgess, S., Rosenzweig, J. C., & Manjunath, R. (2007). Development and cross-sectional validation of the childhood asthma control test. The Journal of Allergy and Clinical Immunology, 119(4), 817–825.

    Article  PubMed  Google Scholar 

  3. Matza, L. S., Patrick, D. L., Riley, A. W., Alexander, J. J., Rajmil, L., Pleil, A. M., & Bullinger, M. (2013). Pediatric patient-reported outcome instruments for research to support medical product labeling: Report of the ispor pro good research practices for the assessment of children and adolescents task force. Value in Health, 16(4), 461–479.

    Article  PubMed  Google Scholar 

  4. Safikhani, S., Gries, K. S., Trudeau, J. J., Reasner, D., Rudell, K., Coons, S. J., Bush, E. N., Hanlon, J., Abraham, L., & Vernon, M. (Under review) response scale selection in adult pain measures: Results from a literature review. Journa of Patient-Reported Outcomes.

  5. Naegeli, A. N., Hanlon, J., Gries, K. S., Safikhani, S., Ryden, A., Patel, M., Crescioni, M., & Vernon, M. (Under review) literature review to characterize the empirical basis for response scale selection in pediatric populations. Journal of Patient-Reported Outcomes.

  6. Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use, fourth edition (Fourth ed.). New York: Oxford University Press.

    Book  Google Scholar 

  7. Likert, R. A. (1952). A technique for the development of attitude scales. Educational and Psychological Measurement, 12, 313–315.

    Article  Google Scholar 

  8. van Laerhoven, H., van der Zaag-Loonen, H. J., & Derkx, B. H. (2004). A comparison of likert scale and visual analogue scales as response options in children's questionnaires. Acta Paediatrica, 93(6), 830–835.

    Article  CAS  PubMed  Google Scholar 

  9. Ferraz, M. B., Quaresma, M. R., Aquino, L. R., Atra, E., Tugwell, P., & Goldsmith, C. H. (1990). Reliability of pain scales in the assessment of literate and illiterate patients with rheumatoid arthritis. The Journal of Rheumatology, 17(8), 1022–1024.

    CAS  PubMed  Google Scholar 

  10. Phan, N. Q., Blome, C., Fritz, F., Gerss, J., Reich, A., Ebata, T., Augustin, M., Szepietowski, J. C., & Stander, S. (2012). Assessment of pruritus intensity: Prospective study on validity and reliability of the visual analogue scale, numerical rating scale and verbal rating scale in 471 patients with chronic pruritus. Acta Dermato-Venereologica, 92(5), 502–507.

    Article  PubMed  Google Scholar 

  11. Grant, S., Aitchison, T., Henderson, E., Christie, J., Zare, S., McMurray, J., & Dargie, H. (1999). A comparison of the reproducibility and the sensitivity to change of visual analogue scales, borg scales, and likert scales in normal subjects during submaximal exercise. Chest, 116(5), 1208–1217.

    Article  CAS  PubMed  Google Scholar 

  12. Lund, I., Lundeberg, T., Sandberg, L., Budh, C. N., Kowalski, J., & Svensson, E. (2005). Lack of interchangeability between visual analogue and verbal rating pain scales: A cross sectional description of pain etiology groups. BMC Medical Research Methodology, 5, 31.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Hagell P, Kutsson I (2013) Single-item assessment of perceived health in neurological disorders: Verbal response categories vs. visual analog scale. Paper presented at the ISOQOL 20th annual conference, Miami, FL, October 9-12.

  14. Changhe Y, Guanlin Y, Zhihui C, Huiyong Z, Meijuan LV, Zhe Z, Yuan M (2012) Likert or number rate scale? A comparison study on Seattle angina questionnaire. Paper presented at the ISOQOL 19th annual conference, Budapest, Hungary, October 24-27.

  15. Yiu, E. M., & Ng, C. Y. (2004). Equal appearing interval and visual analogue scaling of perceptual roughness and breathiness. Clinical Linguistics & Phonetics, 18(3), 211–229.

    Article  Google Scholar 

  16. Chanques, G., Viel, E., Constantin, J. M., Jung, B., de Lattre, S., Carr, J., Cisse, M., Lefrant, J. Y., & Jaber, S. (2010). The measurement of pain in intensive care unit: Comparison of 5 self-report intensity scales. Pain, 151(3), 711–721.

    Article  PubMed  Google Scholar 

  17. Chien, C. W., Bagraith, K. S., Khan, A., Deen, M., & Strong, J. (2013). Comparative responsiveness of verbal and numerical rating scales to measure pain intensity in patients with chronic pain. The Journal of Pain, 14(12), 1653–1662.

    Article  PubMed  Google Scholar 

  18. Sherbourne C, Eberhart NK, Edelen MO, Stucky BD, Lara-Greenberg M, Sin N (2012) Development of asthma-specific quality of life items for item banking. Paper presented at the ISOQOL 19th annual conference, Budapest, Hungary, October 24-27.

  19. Preston, K., Reise, S., Cai, L., & Hays, R. D. (2011). Using the nominal response model to evaluate response category discrimination in the promis emotional distress item pools. Educational and Psychological Measurement, 71(3), 523–550.

    Article  Google Scholar 

  20. Nicklin, J., Cramp, F., Kirwan, J., Urban, M., & Hewlett, S. (2010). Collaboration with patients in the design of patient-reported outcome measures: Capturing the experience of fatigue in rheumatoid arthritis. Arthritis Care and Research, 62(11), 1552–1558.

    Article  PubMed  Google Scholar 

  21. Dumas, R., Baumstarck, K., Michel, P., Lancon, C., Auquier, P., & Boyer, L. (2013). Systematic review reveals heterogeneity in the use of the scale to assess unawareness of mental disorder (sumd). Current Psychiatry Reports, 15(6), 361.

    Article  PubMed  Google Scholar 

  22. Bakshi, A. B., Wee, S. L., Tay, C., Wong, L. M., Leong, I. Y., Merchant, R. A., & Luo, N. (2012). Validation of the care transition measure in multi-ethnic south-east asia in Singapore. BMC Health Services Research, 12, 256.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Leung, S. O., & Xu, M. L. (2013). Single-item measures for subjective academic performance, self-esteem, and socioeconomic status. Journal of Social Service Research, 39(4), 511–520.

    Article  Google Scholar 

  24. DeWalt, D., Rothrock, N., Yount, S., Stone, A. A., & PROMIS cooperative group. (2007). Evaluation of item candidates: The promis qualitative item review. Medical Care, 45(5), S12–S21.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Food and Drug Administration. (2009). Guidance for industry. Patient-reported outcome measures: Use in medical product development to support labeling claims. Federal Register, 74(235), 65132–65133 Available at: Https://www.Fda.Gov/downloads/drugs/guidances/ucm193282.Pdf.

    Google Scholar 

  26. Grotle, M., Brox, J. I., & Vollestad, N. K. (2004). Concurrent comparison of responsiveness in pain and functional status measurements used for patients with low back pain. Spine (Phila Pa 1976), 29(21), E492–E501.

    Article  Google Scholar 

  27. Skovlund, E., Bretthauer, M., Grotmol, T., Larsen, I. K., & Hoff, G. (2005). Sensitivity of pain rating scales in an endoscopy trial. The Clinical Journal of Pain, 21(4), 292–296.

    Article  PubMed  Google Scholar 

  28. Dogan, S. K., Ay, S., Evcik, D., Kurtais, Y., & Gokmen Oztuna, D. (2012). The utility of faces pain scale in a chronic musculoskeletal pain model. Pain Medicine, 13(1), 125–130.

    Article  PubMed  Google Scholar 

  29. Gonzalez-Fernandez, M., Ghosh, N., Ellison, T., McLeod, J. C., Pelletier, C. A., & Williams, K. (2014). Moving beyond the limitations of the visual analog scale for measuring pain: Novel use of the general labeled magnitude scale in a clinical setting. American Journal of Physical Medicine & Rehabilitation, 93(1), 75–81.

    Article  Google Scholar 

  30. Lasch, K. E., Hassan, M., Endicott, J., Piault-Luis, E. C., Locklear, J., Fitz-Randolph, M., Pathak, S., Hwang, S., & Jernigan, K. (2012). Development and content validity of a patient reported outcomes measure to assess symptoms of major depressive disorder. BMC Psychiatry, 12, 34.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Hewlett, S., Hehir, M., & Kirwan, J. R. (2007). Measuring fatigue in rheumatoid arthritis: A systematic review of scales in use. Arthritis and Rheumatism, 57(3), 429–439.

    Article  CAS  PubMed  Google Scholar 

  32. Nicklin, J., Cramp, F., Kirwan, J., Greenwood, R., Urban, M., & Hewlett, S. (2010). Measuring fatigue in rheumatoid arthritis: A cross-sectional study to evaluate the Bristol rheumatoid arthritis fatigue multi-dimensional questionnaire, visual analog scales, and numerical rating scales. Arthritis Care & Research (Hoboken), 62(11), 1559–1568.

    Article  Google Scholar 

  33. Khanna, D., Pope, J. E., Khanna, P. P., Maloney, M., Samedi, N., Norrie, D., Ouimet, G., & Hays, R. D. (2008). The minimally important difference for the fatigue visual analog scale in patients with rheumatoid arthritis followed in an academic clinical practice. The Journal of Rheumatology, 35(12), 2339–2343.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Koshy, R. C., Kuriakose, R., Mathew, A., & Chandran, N. (2004). Cancer pain intensity measurements in outpatients: Preferences and comparison of pain scales among patients, caregivers, physicians and nurses in southern India. Journal of Pain & Palliative Care Pharmacotherapy, 18(3), 5–13.

    Google Scholar 

  35. Anderson, K. O. (2007). Assessment tools for the evaluation of pain in the oncology patient. Current Pain and Headache Reports, 11(4), 259–264.

    Article  PubMed  Google Scholar 

  36. Rohan, E. A. (2012). Removing the stress from selecting instruments: Arming social workers to take leadership in routine distress screening implementation. Journal of Psychosocial Oncology, 30(6), 667–678.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Sigurdardottir, K. R., Kaasa, S., Rosland, J. H., Bausewein, C., Radbruch, L., Haugen, D. F., & Prisma. (2014). The european association for palliative care basic dataset to describe a palliative care cancer population: Results from an international delphi process. Palliative Medicine, 28(6), 463–473.

    Article  PubMed  Google Scholar 

  38. King, M. T., Stockler, M. R., Butow, P., O'Connell, R., Voysey, M., Oza, A. M., Gillies, K., Donovan, H. S., Mercieca-Bebber, R., Martyn, J., Sjoquist, K., & Friedlander, M. L. (2014). Development of the measure of ovarian symptoms and treatment concerns: Aiming for optimal measurement of patient-reported symptom benefit with chemotherapy for symptomatic ovarian cancer. International Journal of Gynecological Cancer, 24(5), 865–873.

    Article  PubMed  Google Scholar 

  39. Jacobs, S., Baggott, C., Agarwal, R., Hesser, T., Schechter, T., Judd, P., Tomlinson, D., Beyene, J., & Sung, L. (2013). Validation of the children's international mucositis evaluation scale (chimes) in paediatric cancer and sct. British Journal of Cancer, 109(10), 2515–2522.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Ng, A. W. Y. (2012). A cross sectional study of use of different pain assessment tools in chinese cancer patients. Journal of Pain Management, 5(1), 83–91.

    Google Scholar 

  41. Chordas, C., Manley, P., Merport Modest, A., Chen, B., Liptak, C., & Recklitis, C. J. (2013). Screening for pain in pediatric brain tumor survivors using the pain thermometer. Journal of Pediatric Oncology Nursing, 30(5), 249–259.

    Article  PubMed  Google Scholar 

  42. Banthia, R., Malcarne, V. L., Roesch, S. C., Ko, C. M., Greenbergs, H. L., Varni, J. W., & Sadler, G. R. (2006). Correspondence between daily and weekly fatigue reports in breast cancer survivors. Journal of Behavioral Medicine, 29(3), 269–279.

    Article  PubMed  Google Scholar 

  43. Grassi, L., Johansen, C., Annunziata, M. A., Capovilla, E., Costantini, A., Gritti, P., Torta, R., Bellani, M., & Italian Society of Psycho-Oncology Distress Thermometer Study G. (2013). Screening for distress in cancer patients: A multicenter, nationwide study in Italy. Cancer, 119(9), 1714–1721.

    Article  PubMed  Google Scholar 

  44. Cleopas, A., Kolly, V., & Perneger, T. V. (2006). Longer response scales improved the acceptability and performance of the Nottingham health profile. Journal of Clinical Epidemiology, 59(11), 1183–1190.

    Article  PubMed  Google Scholar 

  45. Janssen, M. F., Birnie, E., Haagsma, J. A., & Bonsel, G. J. (2008). Comparing the standard eq-5d three-level system with a five-level version. Value in Health, 11(2), 275–284.

    Article  PubMed  Google Scholar 

  46. Chomeya, R. (2010). Quality of psychology test between likert scale 5 and 6 points. Journal of Social Sciences, 6(3), 399–403.

    Article  Google Scholar 

  47. Rhodes, R. E., Matheson, D. H., & Mark, R. (2010). Evaluation of social cognitive scaling response options in the physical activity domain. Measurement in Physical Education and Exercise Science, 14(3), 137–150.

    Article  Google Scholar 

  48. Janssen, M. F., Pickard, A. S., Golicki, D., Gudex, C., Niewada, M., Scalone, L., Swinburn, P., & Busschbach, J. (2013). Measurement properties of the eq-5d-5l compared to the eq-5d-3l across eight patient groups: A multi-country study. Quality of Life Research, 22(7), 1717–1727.

    Article  CAS  PubMed  Google Scholar 

Download references


The authors gratefully acknowledge the managerial and logistical support provided by Theresa Hall during the completion of the overall project and these manuscripts. They thank Janet Dooley of the Evidera Editorial and Design team for her editorial and preparation assistance. In addition, they thank Sarah Mann of the PRO Consortium for her assistance to the authors with communications, and reporting of disclosures and contributions.


This project was funded by the Patient-Reported Outcome (PRO) Consortium’s Measurement Projects Fund. The Measurement Projects Fund is supported by the members of the PRO Consortium ( The Critical Path Institute’s PRO Consortium is funded, in part, by Critical Path Public Private Partnerships Grant number U18 FD005320 from the U.S. Food and Drug Administration.

Availability of data and materials

This article is entirely based on data and materials that have been published, are publicly available (thus, accessible to any interested researcher), and appear in the References list.

Other information

In order to preserve the double-blind peer review, journal-requested information on Authors, Institutions, Funding, Competing Interests, Authors’ Contributions, Authors’ Information, and Acknowledgements are in the cover letter.

Author information

Authors and Affiliations



All the authors have agreed to be accountable for all aspects of the work, particularly for ensuring that any questions of the work’s accuracy or integrity are promptly investigated and resolved. All authors have given their approval of the final version or the manuscript. Each author participated in creating drafts of the manuscript or in critical revisions. KG and SS contributed to the study concept and design: MH and SS dealt with the data acquisition; KR and MV concentrated on the analysis and data interpretation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Margaret Vernon.

Ethics declarations

Ethics approval and consent to participate

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

This article is published together with 10.1186/s41687-018-0051-8 and 10.1186/s41687-018-0053-6.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gries, K., Berry, P., Harrington, M. et al. Literature review to assemble the evidence for response scales used in patient-reported outcome measures. J Patient Rep Outcomes 2, 41 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: