Literature review to assemble the evidence for response scales used in patient-reported outcome measures

Gries, Katharine; Berry, Pamela; Harrington, Magdalena; Crescioni, Mabel; Patel, Mira; Rudell, Katja; Safikhani, Shima; Pease, Sheryl; Vernon, Margaret

doi:10.1186/s41687-018-0056-3

Journal of Patient-Reported Outcomes

Table 4 Key studies that support response scale selection used in PRO instruments based on optimal response set number

From: Literature review to assemble the evidence for response scales used in patient-reported outcome measures

Reference	Response Scale Type	Study Type, Evidence Type^a, Grade^b	Study Population	Summary of Results	Conclusion
Cleopas et al. 2006 [44]	Binary 3-point VRS 5-point VRS	Prospective study, Direct, A	1996 adult patients discharged from the hospital in Switzerland	Superior reliability, assessed by Cronbach’s alpha and test -retest, and convergent and discriminant validity for the 5-point version compared to the binary or 3-point version in the Nottingham Health Profile (NHP).	5-point VRS improved patient acceptability, reduced ceiling effects, and improved measurement properties
DeWalt et al. 2007 [24]	4-point VRS 5-point VRS 6-point VRS	Instrument development and/or validation study, Direct, A	Analysis of PROMIS items; pain, fatigue, emotional distress, physical function, and social function	Optimal response set number was somewhat dependent on the item and construct, 4 to 6 response options was typically optimal because this number both reduced cognitive burden for respondents and each option could provide unique information; investigators found that with response sets of greater than six choices, two or more options were typically collapsed to improve step-disorder and model fit.	Based on IRT analyses recommend 4-point to 6-point based on the item construct
Janssen et al. 2008 [45]	3-level 5-level	Instrument development and/or validation study, Direct, A	81 adult respondents in a panel session	5-level version had higher acceptability and comprehension and demonstrated superior reliability, validity, and discriminatory power.	5-level reduced ceiling effect, increased benefit in the detection of mild problems and in measuring general population health
Chomeya 2010 [46]	5-point Likert 6-point Likert	Instrument development and/or validation study, Direct, A	180 undergraduate students from Mahasarakham University	The 6-point Likert scale had slightly better discrimination and reliability, assessed by Cronbach’s alpha, compared to a 5-point scale.	Both the 5-point and 6-point scales gave discrimination at acceptable level per the standard of psychology tests
Rhodes et al. 2010 [47]	5-point Likert 7-point Likert	Instrument development and/or validation study, Direct, A	412 volunteer students in introduction psychology or physical education courses.	The 7-point scale (strongly disagree, moderately disagree, slightly disagree, undecided, slightly agree, moderately agree, strongly agree) had slightly higher reliability, assessed by Cronbach’s alpha, overall but predictive validity was largely comparable to the 5-point scale (strongly disagree, moderately disagree, undecided, agree, strongly agree). The 7-point scale demonstrated larger variability compared to the 5-point scale.	Either the 5-point or the 7-point scale is appropriate for use in scales for physical activity research
Bakshi et al. 2012 [22]	3-point Likert 5-point Likert	Instrument development and/or validation study, Direct, A	Inpatients aged 50 years and above in Singapore (n = 579); caregivers were interviewed as a patient proxy if the patient was not contactable, too weak, or had a language barrier.	The 3-point versions (disagree, neutral, and agree) were comparable to the 5-point versions (strongly disagree, disagree, neutral, agree, and strongly agree); the scores performed similarly. The 3-point versions were not less reliable, assessed by Cronbach’s alpha, or discriminative.	The 3-point scale is acceptable if a simple scale is required
Leung and Xu 2013 [23]	5-point VRS 7-point VRS 11-point NRS	Review, Indirect, B	7147 students (age 12 to 22 years) in Macau. 795 students in China. 844 secondary students in Macau.	Single item measures with an 11-point scale from 0 to 10 are closer to normality and interval scales, and have construct validity with major social constructs.	The 11-point scale was more normally distributed than the shorter scale options and had good validity.
Dumas et al. 2013 [21]	3-point VRS 5-point VRS	Review, Indirect, B	Published literature for the Scale to Assess Unawareness of Mental Disorder (SUMD).	The 5-point scale was more informative and discriminative than a 3-point scale.	Authors state that further research is required to determine if a 3-point or 5-point scale should be used with the SUMD.
Janssen et al. 2013 [48]	3-level 5-level	Instrument development and/or validation study, Direct, A	3919 adults with chronic conditions (cardiovascular disease, respiratory disease, depression, diabetes, liver disease, personality disorders, arthritis, and stroke)	For the 5-level system, the ceiling was reduced from 20.2% (3 L) to 16.0% (5 L). Absolute discriminatory power (Shannon index) improved considerably with 5 L (mean 1.87 for 5 L versus 1.24 for 3 L), and relative discriminatory power (Shannon Evenness index) improved slightly (mean 0.81 for 5 L versus 0.78 for 3 L). Convergent validity with WHO-5 was demonstrated and improved slightly with 5 L. Known-groups validity was confirmed for both 5 L and 3 L.	5-level version had higher acceptability and comprehension and demonstrated superior reliability, validity, and discriminatory power.

PRO patient-reported outcome, VRS verbal rating scale, NRS numeric rating scale
^aDirect evidence: Primary research that compares different response scales within study. Indirect evidence: Review or expert opinion based on empirical evidence or primary research that evaluates a single response scale type within the study
^bGrade Key: A) Primary research: compares different response scales within study; B) Review or expert opinion: based on an empirical evidence base; C) Primary research: evaluates a single response scale type within the study; and D) Review or expert opinion, based on expert consensus, convention, or historical evidence

Back to article page