Exploring content and psychometric validity of newly developed assessment tools for itch and skin pain in atopic dermatitis

Background Atopic dermatitis (AD) is a common skin disorder characterized by chronic inflammation, altered skin barrier function, and inflammatory cell skin infiltration that decreases health-related quality of life (HRQoL). The study objective was to understand the patient perspective of AD burden and determine suitable patient-reported outcome (PRO) measures. Methods This mixed methods study involved the collection of qualitative and quantitative information from adults (≥ 18 years old) and adolescents (12 – 17 years old) with clinician-confirmed AD regarding their experiences of AD symptoms and its impact on HRQoL. The first part of the study included three stages: in-person concept elicitation (CE) interviews, a 2-week daily electronic diary (eDiary) study, and in-person cognitive debriefing (CD) interviews. An Itch numeric rating scale (NRS) (v1.0) and a Skin Pain NRS (v1.0) evaluation during CD interviews required participants to think about their ‘worst’ itch and ‘worst’ skin pain in the past 24 h. Other PRO measures allowed for psychometric testing. The second part of the study involved telephone-depth interviews (TDIs) and qualitative feedback from participants who had not participated in the CD interviews. Qualitative data were thematically analyzed. Psychometric evaluation of NRS measures was performed using eDiary data. Results In the CE interviews, itch and/or itching and skin pain were the most prevalent symptoms consistently discussed by participants. Both NRS measures demonstrated strong psychometric reliability and were applicable across ages with suitable concurrent validity. During the CD interviews, some participants focused their answers on their ‘average’ itch/itching in the past 24 h, rather than their ‘worst’ itch. Some participants answered the Skin Pain NRS thinking about general pain or other types of pain, rather than skin pain specifically. Consequently, modifications to both measures addressed these issues and re-tested as paper-and-pen versions in subsequent TDIs. Itch NRS (v2.0) modifications helped participants focus on their worst itching. Most participants preferred Skin Pain NRS v2.0b, which included skin pain descriptors. Conclusions Itching and skin pain are the most important and relevant AD symptoms. The Itch NRS (v2.0) and Skin Pain NRS (v2.0b) appear to be appropriate endpoints for the assessment of itching and skin pain severity for clinical trials with adults and adolescents with AD.


Background
Atopic dermatitis (AD) is one of the most common, chronic, inflammatory skin diseases [1], with a prevalence of 2% to 10% among adults and 15% to 30% among children in Western geographies [2]. The disease is characterized by widespread skin lesions which can be red, itchy, swollen, cracked, and/or weeping/oozing [3]. Predominant symptoms reported by patients include itch and skin pain [4]. Patients with moderate to severe AD experience a heavy disease burden including physical and psychosocial abnormalities, and a decrease in health-related quality of life (HRQoL), especially in children [5].
The most common measures used to assess AD severity include the Investigator Global Assessments (IGAs) [6], the SCORing Atopic Dermatitis (SCORAD) [7], the Eczema Area and Severity Index (EASI) [8], and the Patient-Oriented Eczema Measure (POEM) [9]. Traditional measures are primarily based on visual assessment of clinical signs and symptoms by physicians. To better understand the burden of AD from the patient perspective, patient-reported outcome (PRO) measures are recommended as an important additional assessment tool [10]. A literature review was conducted which indicated limited evidence of pre-existing PRO tools suitable for use in a clinical trial setting to assess symptoms such as itch and skin pain severity as the main concepts of interest. Of the existing itch measures, the Eppendorf Itch Questionnaire (EIQ), has been reported in three studies in terms of its use and/or measurement properties [11][12][13], however it has not reported as having been used in the context of a clinical trial. The EIQ is likely unsuitable for clinical trial purposes because it includes 127 items, representing a considerable burden on trial participants. A less burdensome measure, which also has precedence in clinical trial work is the POEM. The POEM is a 7-item HRQoL measure that has been recommended by the HarmoniSing Outcome Measures for Eczema (HOME) group and focuses on the symptoms of AD. However, the POEM assesses each symptom in terms of symptom frequency rather than severity and produces a total score combining the patient's response to itch, disturbed sleep and dermatological symptoms such as bleeding and flaking skin. Assessing symptom frequency may not reflect the patients' experience of symptom severity, and a total score consisting of a combination of symptoms may obscure meaningful treatment benefit relating to reduction in specific symptoms. In relation to the measurement of pain in AD, the literature shows that a pain visual analogue scale [14,15] and the McGill Pain Questionnaire [16], have both previously been used as outcome measures of AD. The measures used in these studies were neither AD specific, nor focused on dermatologicalrelated pain.
The existing tools for the assessment of AD-related itch and skin pain in a clinical trial setting are limited in their ability to accurately assess the impact of new therapies on the patient-reported symptom severity in AD. Per guidance from the Food and Drug Administration (FDA) and European Medicines Agency (EMA), the variability, duration, and frequency of the symptoms measured should be factors in choice of the recall period, and shorter recall periods are typically preferable as reliance on a respondent's memory over a longer period of time may undermine content validity. Given the high daily symptom variability present in AD, 24-h recall is preferred to allow for the reliable assessment of symptoms [10]. As such, we sought to develop a simple, symptom-specific measures as part of a tool to be administered in a daily diary. Numeric rating scale measures for the daily evaluation of symptoms are commonly used and a low burden way to assess the change in the patients' condition that have previously been used for the daily assessment of symptomology in clinical trials [17][18][19][20][21].
This study documents the generation of two patientreported symptom specific measures of AD severity that can be used as clinical trial endpoint measures. This study used a patient-centered, mixed-methods approach to collect and assess patient-reported disease symptoms and impact on HRQoL to gain a better understanding of AD from the patients' perspective. Specifically, this study aimed to identify the concepts most relevant to participants through concept elicitation, validate the psychometric capabilities of the arising measures and confirm their content through cognitive debriefing with patients.

Study design and participants
The study was conducted in two components (Fig. 1). Component one involved three stages: in stage one, 43 English-speaking participants from the United States (US) participated in in-person concept elicitation (CE) interviews, including 28 adult participants (≥18 years old) and 15 adolescent participants (12-17 years old) with AD. In stage two of component one, an eDiary study was conducted for 2-weeks involving all 43 participants who had taken part in the CE interviews, as well as another 31 US English speaking participants (22 adults and 9 adolescents) who were newly recruited for this stage. Thus, stage two of component one had a total of 74 participants. In the final stage of component one of the studies, 45 participants (30 adults and 15 adolescents) returned for a cognitive debriefing (CD) interview no more than 2 days after completing stage two. Among the 45 participants, 24 had also taken part in stage one (CE interviews). In component two of the study (which began 16 weeks after component one ended), 20 telephone interviews were conducted to evaluate revisions made to the Itch Numeric Rating Scale (NRS) and Skin Pain NRS. Of the 20 participants (13 adults and 7 adolescents), 13 had also participated in stages one and/ or two of component one of the studies.

Study procedures
Patients aged ≥ 12 years old who were diagnosed with AD for at least 2 years were recruited by a third-party recruitment agency using physician referral from outpatient clinical sites at six cities (Baltimore, Chicago, Los Angeles, St. Louis, New Orleans, and Philadelphia) in the US. Participants were eligible if they had a history of inadequate clinical response to at least one of the following: 1) hydration plus topical steroids and/or antibiotics (eg, tetracycline, trimethoprim, sulfamethoxazole, cephalosporins) and/or topical immune modulators (eg, tacrolimus, pimecrolimus), 2) systemic steroids and/or phototherapy or 3) cyclosporine and/or other immunomodulators (eg, methotrexate, mycophenolate mofetil, azathioprine). Participants were excluded if they had a history of any other clinically relevant skin conditions or a significant psychiatric or physical comorbid condition including abuse of drugs or alcohol in the past 12 months. All participants were able to read, understand and give informed consent and adolescent participants provided assent along with permission/consent from their parent or legal guardian. Additional patientreported demographic and clinician-reported health data were collected for each participant enrolled by the study recruiter at each site.
The CE interviews were conducted in-person by a trained interviewer using a semi-structured discussion guide. The guide was primarily focused around discussion of participants' symptom experience with AD and its impact on their HRQoL. The interviewer initially asked open-ended questions designed to elicit spontaneous discussion, which were then followed by in-depth probing around the concepts of interest. Interviews were conducted at a convenient location for the participant and lasted approximately 1 h.
Following the concept elicitation, two 11-point NRS items were developed, each with a 24-h recall period. For itch/itching severity the item required participants to think about their 'worst itching' and respond on a scale anchored between 0 = 'no itch' and 10 = 'worst itch imaginable'. For skin pain severity, the item required participants to think about their 'worst skin pain' over the previous 24 h and respond on a scale between 0 = 'no pain' to 10 = 'worst pain imaginable' [22]. For the daily eDiary stage, all participants received training on how to use the daily eDiary. The daily eDiary was programmed to be answered every morning between 5:00 am and 12:00 pm for 14 days. As well as the Itch NRS and Skin Pain NRS, the daily eDiary also contained an adapted version of the POEM which was administered daily and had a 24-h recall period along with two global items: a patient global impression of severity (PGIS) and a patient global impression of change (PGIC) . The 7-day recall questionnaires were programmed for completion on Days 0, 7, and 14 of the 2-week period. Participants were sent an email reminder at the start of each day's completion window. The daily eDiary was programmed so that all questions had to be answered; participants could not submit partial daily eDiary data.
After the 2-week daily eDiary stage, CD interviews were conducted to explore the relevance, appropriateness, and interpretability of the daily eDiary. Per the CE interviews, the CD interviews were conducted in-person by a trained interviewer using a semi-structured discussion guide and lasted approximately 1 h. The CD interviews involved a "think aloud" exercise whereby participants first verbalized their thoughts while completing the Itch NRS and Skin Pain NRS in order to explore their understanding of the items' instructions, content, recall period and response scale as well as the relevance of the concepts measured. To explore interpretation of score change, participants described what itch and skin pain was like at different points along the response scale of the NRS measures and where on the scale they would be to consider a change in the severity of their itch/itching and skin pain to be meaningful.
For the subsequent telephone CD interviews, the Itch NRS was reworded to "worst itch" to help participants focus on their worst itching experience over the previous 24-h. The Skin Pain NRS was also reworded to "worst skin pain" to help patients focus on their worst skin pain experience over the previous 24 h. In addition, a second version of the Skin Pain NRS was developed that included descriptors of skin pain in the item anchors to help define the target symptom. Per the in-person CD interviews, the telephone CD interviews focused on the relevance, appropriateness, and interpretability of version 2.0 of the NRS measures and, in addition, participants were asked to select which of the two alternative versions of the Skin Pain NRS (version 2.0a without descriptors or version 2.0b with descriptors) they preferred and why.

Ethics
All study documents were submitted and approved by the US Copernicus Group Independent Review Board®. The study was performed in accordance with the Declaration of Helsinki and US 21 Code of Federal Regulations [23]. Participants and their referring clinicians received remuneration for their participation.

Analytic methods
Demographic and health information data were summarized descriptively using data collected at Day 0 for all 74 participants.
Qualitative analyses of CE data were performed using ATLAS.ti (v8) software. Data were coded using a mixed deductive-inductive approach using a pre-specified codebook along with the creation of new codes derived from the data as needed. While describing their experiences of AD, participants were first asked an open-ended general interview question about their experiences, concepts reported in response to this question were coded as spontaneous. The interviewer then asked probing questions around the concepts of interest, responses to this question were coded as probed. The qualitative analyses of CD data from both the in-person and telephone interviews focused adults' and adolescents' understanding and relevance of the target concepts, response scale and options, and recall periods used in the proposed NRS measures.
Measurement properties of the NRS measures were then evaluated using data collected from the eDiary study. This included examination of response distributions (based on eDiary data from Days 1, 7, and 14), test-retest reliability (using intra-class correlation coefficients (ICCs) calculated between Day 7 and Day 14 and between Week 1 and 2 average), concurrent validity (using correlations with the Dermatology Life Quality Index [DLQI], Children's Dermatology Life Quality Index [CDLQI], Skin Pain NRS, Itch NRS and the PGIS using eDiary data from Days 1, 7, and 14 and the Week 1 and 2 averages (with the exception of the PGIS, no weekly average generated), and responsiveness (based on change categories endorsed on the PGIC scores calculated between Day 7 and Day 14 and between the Week 1 and 2 averages). In terms of guidelines for interpretation, for concurrent validity, the relationship was considered low if the resulting coefficient was < 0.4, moderate if 0.4 -0.7, and large if > 0.7 [24][25][26]. For testretest reliability, ICCs ≥ 0.7 were considered acceptable, ≥ 0.80 were considered acceptable for making judgements on a group basis and ≥ 0.90 for making judgements on an individual participant basis [25]. For responsiveness, significant differences between patients who reported a decline on the PGIC and those who reported stability were supported by effect sizes based on Cohen's recommendations, the values representing the magnitudes of responsiveness used in this study were considered small if the effect size (ES) = 0.20, moderate if ES = 0.50, and large if ES = 0.80 [26].

Study population
The total study population (N = 74) was between 12 and 76 years old with a mean time since AD diagnosis of 11.3 years. The study population included 51.4% female participants and most (41.1%) had moderate AD based on clinician-reported severity (self-reported AD severity was consistent with clinician ratings). Caucasian participants made up 50% of the total study population, followed by African American participants (32.4%). Common comorbidities including asthma, food allergy, and allergic rhinitis were present in approximately 25% of the study population. Conditions such as attention deficit hyperactivity disorder, anxiety, and obesity were present in only a few participants. Table 1 provides a summary of the key characteristics of the study population and by age group.
Component one: findings from the CE interviews (stage 1) In the CE interviews, participants (n = 43) described experiencing a range of signs and symptoms associated with their AD; the two most frequently reported were skin pain (n = 36) and itch (n = 35) ( Table 2). When participants were asked to name their worst symptom, itch/itching was identified most often by participants (n = 17). Itch was described as occurring suddenly and was mainly experienced around the areas of the upper body, including the arms or elbows (n = 8), neck (n = 6) and chest (n = 4). Some locations of itch were more prominent among adults compared with adolescents; for example, adolescents reported itch/itching more frequently in the knee/leg area [33.3% versus 3.6%] and adults reported itch/itching more frequently in the arm/ elbow area [21.4% versus 13.3%] and the neck area [17.9% versus 6.7%]. However, all locations were noted at least once in both groups. Some participants described the severity of itch/itching as dependent on its duration while others described severity as the intensity of itch. Seasonality and climate were mentioned as triggers for their itch/itching (n = 19, while others described using lotions and creams (n = 16), or cold presses and ice packs (n = 7) to help relieve their itch. Itch was reported as impacting participants' lives in multiple ways; n = 12 talked about feeling embarrassed because of scratching when itching occurs, and n = 10 described how their itch/itching impacts their social life, primarily because of scratching in front of others. Seven participants also reported feeling annoyed, frustrated, and sad due to the frequency with which they experience itch.
Skin pain was characterized as a burning sensation (n = 24), soreness or discomfort (n = 24), a stinging sensation (n = 17), a sensitivity (n = 16) and a stabbing feeling (n = 8). Some skin pain descriptions were more prominent among adults compared with adolescents; for example, more adults reported burning pain (64% versus 40%), soreness (71% versus 27%), sensitivity (50% versus 13%) and stabbing pain (21% versus 0%), while more adolescents reported stinging pain (67% versus 25%). Participants described their skin pain in terms of their perceptions of severity, which ranged from very minor to very painful/ very severe. Some skin pain descriptors were used to refer to milder sensations; for example, one adult participant rated skin soreness as a 1-4 on a 10point scale, whereas skin pain she rated as a 5-10. Like itch, the triggers for skin pain included the climate and seasons, contact with the skin, as well as perfumes, scented lotions, and sweat. Applying cold water or a cold towel (n = 8) and using creams or lotions (n = 7) were mentioned by participants as means of coping with skin pain, however several described the impact it has on their lives including limitations doing physical activities (n = 8), at work or school (n = 7), and in their social lives (n = 6).

Component one: findings from the diary study (stage 2)
The Itch NRS response distribution was examined for floor and ceiling effects on Days 1, 7, and 14 (Fig. 2). Overall, the responses were well distributed, with participants using the majority of the response options at all time points. Floor and ceiling responses were low and ranged from 4.8% to 7.3% and 0% to 3.6%, respectively of the total population. Similar response distributions were observed in the adolescent and adult subgroups. Evaluation of test-retest reliability showed that the Itch NRS was consistent over time (Table 3). For participants who demonstrated no change on the PGIS (n = 21), the ICCs between the Day 7 and Day 14 (0.94) and between the Week 1 to Week 2 average values (0.97) were above the ≥0.90 ICC threshold [25]. Psychometric analysis also revealed that the Itch NRS scores showed high correlations with the POEM for both the Week 1 (r = 0.80) and Week 2 (r = 0.84) average score and moderate to high correlations on Day 1 with the CDLQI summary score (0.76) and the DLQI summary score (0.59) ( Table 3). In addition, the Itch NRS had moderate (0.40 ≤ r ≤ 0.70) to high correlations (≥ 0.70) with the PGIS at Week 1 (r = 0.72) and Week 2 (r = 0.74) and on Day 7 (r = 0.66) and on Day 14 (r = 0.73). The correlations within the adolescent and adult subgroups fluctuated across time points, but generally yielded similar values to the total study population. Overall, participants who reported a decline on the PGIC (n = 5) showed a statistically significant greater worsening from the Week 1 to Week 2 average Itch NRS scores than those who did not report a decline (n = 24) (mean change of 0.7 versus − 0.4, p = 0.020) ( Table 3). Participants who showed a decline had a large effect size change (1.21) compared to those who did not show a decline (0.40).
Analysis of the response distribution on Days 1, 7, and 14 for the Skin Pain NRS revealed that the most severe response option did not receive endorsement from any participant at any time point (Fig. 3). Although an increase in participants reporting "0" was observed by Day 14, the variability around these numbers due to the low participant counts suggests that this was general variability rather than an improvement over time. Floor  Rash "It's like a rash. Like if you used to have a rash you used to scratch it and be like red-ish and stuff. Just like a whole bunch of bumps in one area." (Female adolescent) Scaling "Scaliness is like a snake or something like that … but this is just how the bumps on your skin feel. Like normal. A person who don't have the condition wouldn't say a whole bunch of bumps on they skin. They'd say 'oh you feel like … you feel scaly.'" (Female adolescent)  showed statistically significant greater worsening from the Week 1 to Week 2 average Skin Pain NRS scores than those who did not report a decline (n = 39) (mean change of 0.9 versus − 0.3, p = 0.003). This is supported by a large effect size of 1.09 in the decline group compared to a small effect size of 0.34 in the group who did not report decline.

Component one: findings from the CD interviews (stage 3)
Forty-five CD interviews were conducted and analyzed to explore participants' understanding of item content, the relevance of the concept measured, and the appropriateness of the response scale and options as well as the recall period. Overall, there were no differences between adolescent and adult participants in their understanding of the content, relevance, response scale and options, or recall period for the two NRS measures.
A total of 44 participants demonstrated a good understanding of the concept of itch/itching as described in the wording of the Itch NRS. There was clear alignment between the item wording and the language used by participants in the CE interviews to talk about their itch experiences, for example, one adolescent described his experience: "How do you rate the severity the itch caused by your skin condition over the past seven days? I would um, rate this a little bit higher, it's moderate, because when my skin got dry because I took a warm bath or I didn't put on the medicine, then it started getting, then it started itching a little bit, and the longer I didn't put on the medicine it would start itching more". Another Week The NRS scales ranged from 0 to 10 Test-retest calculation assessed score changes from Day 7 and Day 14 and the Week 1 to Week 2 average in patients who showed a 0-or 1-point change on the PGIS scale [c] Correlations performed on Day 1, 7, 14, and the Week 1 to Week 2 average except for PGIS, which was only collected at Day 7 and 14 [d] p-value from test for difference in mean score change of participants who scored above and below 4 on the PGIC between Weeks 1 and 2 and Day 7 and Day 14 adolescent appeared to correctly think about itch/itching as distinct from other symptoms when describing his chosen response to the Itch NRS, however he conflated itch and skin pain when asked to interpret the item using his own words: "Uh, like how much pain, like how much pain did you have from [inaudible], like zero means, like, none, and 10 means, like, the most you've ever felt". In addition, participants in the in-person CD interviews commented that the measurement concept of itch/itching was highly relevant based on their own disease experiences, for example, an adult described his experience: "I itched the day before, and I itched the day before that, and … I've been itching the last 20 years". Five participants reported that they experience itch but responded '0' to this item because they had avoided triggers. The 0 to 10 response scale was adequate and well understood, for example participants who scored '0' experienced no itch in the past 24 h: "I didn't have any itch in the past 24 hours. So, that's why I selected no itch." Indeed, most participants reported thinking back over 24 h when answering the question, supporting the appropriateness of the recall period. When asked, 25 participants appeared to answer the Itch NRS thinking about their itch in general over the past 24 h or how bad their itch/ itching was on average compared to their worst itch/ itching ever: "Interviewer: Okay. And so, when I say, 'worst itching, ' what do you think we mean? Participant: Like, how bad … the intensity." Indeed, two participants suggested that the Itch NRS could be improved by providing more explanation of what is meant by 'worst' itch: "a little more details to what's included in your worst itching, what makes it worse? I was just guessing as far as choosing a number. A six is not as high of a number but higher than average." The Skin Pain NRS was also well-understood by the majority of participants in the in-person CD interviews, noting it as a relevant symptom. Based on qualitative feedback, the response scale and recall period appeared to be suitable for these participants: "Uh, I'm gonna say a four because it did sting a little last night. Like it did hurt, but it wasn't too bad. It wasn't, like, enough, like to make me cry or anything." However, there were some participants whose interpretation of the question was not as intended; for example, three participants reported thinking about skin pain and itch/itching when answering the NRS, two participants reported thinking about all types of pain (ie, not specifically skin pain), one participant was unclear about the type of pain being referred to and finally, one participant reported that he excluded skin pain caused by scratching due to itch. Indeed, four participants suggested describing the type of skin pain in the item wording to clarify its meaning. In addition, three participants reported that they had never experienced skin pain due to AD, and one participant felt that other pain descriptors should be used to capture less severe sensations: "Maybe they could even say, um, irritation, or, like, agitated skin, maybe… It's just hard to imagine your skin being painful. Like, I just wouldn't describe it as painful. This [item] is like, I'm a burn victim or something". In terms of the level of change participants considered meaningful using the Skin Pain NRS, there was no clear preference, but a 1-or 2-point change appeared beneficial to the majority of participants.
The findings from Stage 2 indicated that the Itch NRS and Skin Pain NRS were psychometrically valid measures of the target concepts and were responsive to changes in condition across the age categories. Both the Itch NRS and the Skin Pain NRS demonstrated reliability over time and exhibited concurrent validity (ie, moderate -high correlations (0.65-0.84) with concurrent measures). Despite these results, the in-person CD interviews conducted in Stage 3 revealed some specific measurement issues based on participants' feedback. For the Itch NRS, it was not clear if participants were correctly rating their worst itch/itching or averaging across all itch/itching episodes within the 24-h period. In addition, for the Skin Pain NRS, a few participants felt the term "pain" may be too severe or were unclear if milder sensations should be considered when answering this question. Therefore, we made modifications to address these concerns and subsequently tested the revised versions of the NRS measures in telephone CD interviews. The wording of the Itch NRS was revised to help participants appropriately interpret the concept of 'worst' itch, ie, "worst level of itching". To maintain consistency between the two measures, the wording of the Skin Pain NRS was modified in line with these revisions. An alternative version of the Skin Pain NRS was also created with descriptors of skin pain in the item stem with a view to capture pain experiences that participants may consider "less severe" and help participants to focus on skin pain specifically, rather than general pain or other types of pain. The recall period and response scale and options were retained per the original versions of the measures.

Component two: findings from the telephone CD interviews
In the telephone CD interviews, all 20 participants described their chosen responses to the revised Itch NRS in terms of itch/itching severity or in terms of the intensity of their need to scratch. Of the 13 participants who were asked about their interpretation of the concept of 'worst itch' based on how it was described in the item wording, five reported that they answered in terms of how bad their itch/itching had been at its worst point in the day. Eight participants commented that they answered thinking about the severity of their itch/itching over the whole 24-h period and compared this to their worst itch ever. Consistent with the in-person CD interviews, there were no issues raised in terms of the response scale or recall period. Similarly, when asked what score change would be meaningful using the revised Itch NRS, participants asked stated that a 3-to 4-point (n = 8; 40%) reduction would be ideal, again consistent with the in-person CD findings. Regarding the two revised versions of the Skin Pain NRS, both were wellunderstood, yet more participants (n = 19 vs. n = 14) appeared to understand the target concept when it was presented alongside the skin pain descriptors (v2.0b) compared to the version without the descriptors (v2.0a). For example, two participants were unsure if v2.0a was exclusive to itching/scratching or if it related to skin pain caused by other factors, and four participants interpreted the question as asking about pain and itch. In contrast, just one participant mentioned that he thought about itching as well as skin pain when answering v2.0b of the NRS. Participants also gave feedback on the descriptors used in v2.0b. No issues were reported and when asked which other terms could be used to describe skin pain, n = 11 participants suggested adding 'burning/ hot' , n = 7 suggested 'ache' and n = 3 suggested 'stinging'. These terms were similar to those spontaneously used by participants in the CE interviews to describe their skin pain experiences. When asked what score change would be meaningful on each of the two revised versions of the Skin Pain NRS, there was no clear pattern or discernable best score change among participants.

Discussion
The objective of this research was to incorporate the patient voice in the development of valid and reliable patient outcomes in AD. This study incorporated mixed research methods to ensure that the measurement properties of the two generated measures captured daily symptoms. The first stage of this study was undertaken to gather insights from adults and adolescents regarding the symptoms and impacts of AD from the patient perspective. All participants in the CE interviews commented that itch/itching and skin pain are the worst symptoms associated with AD and described them as priority areas for treatment. Adults and adolescents with AD described very similar disease experiences that negatively affected their quality of life including their ability to do physical tasks, as well as feeling embarrassed or ashamed in front of others. Thus, there was clear overlap between adults' and adolescents' AD symptoms and HRQoL experiences, indicating that a consistent measurement approach can be implemented across these age ranges.
Indeed, to capture the severity of itch/itching and skin pain from the perspective of adult and adolescent AD patients, two patient-reported NRS measures were developed and tested in the subsequent stages of this study. In component one, the Itch NRS performed well both qualitatively and quantitatively and was shown to be a valid, reliable, and sensitive measure of AD-related itch. However, revisions were made in response to the finding that not all participants thought about their worst itch/itching when answering the question. Findings from the telephone CD interviews (component two) demonstrated that the revisions to the Itch NRS improved participants' understanding of the target concept consistent with the intended meaning. Similarly, the Skin Pain NRS was shown to be both valid and reliable based on the in-person CD interviews and daily eDiary study, but revisions were also made to maintain consistency with the updated format of the Itch NRS as well as to test the utility of including skin pain descriptors within the item stem itself. Findings from the telephone CD interviews (component two) revealed that both revised versions of the Skin Pain NRS were fit for purpose. However, participants preferred the version with the descriptors as it was better understood and better represented their overall skin pain experiences.
Findings from the quantitative analysis support that original versions of both the Itch NRS and Skin Pain NRS are reliable at both the group and individual patient level [25]. The modifications to the tools were made to improve comprehension rather than to address psychometric weaknesses. Since the modifications were shown to address the comprehension issue, quantitative evidence supporting the original versions will likely hold true, or perhaps be enhanced, for the revised measures. Nonetheless, future research using this revised tool should aim to confirm the psychometric properties of these measures. Specifically, the magnitude of change, denoted by the ES parameter, should be replicated in a larger clinical population. Effect size (ie, ES) is a random variable and sensitive to sample variability which is related inversely to sample size [27,28]. In this mixed methods study, the observed larger effect size (ES = 0.40) in the stable population can be attributed to the variability in this small sample population.
In the telephone CD interviews, both measures were well understood, but it was noted that over half of participants in component two of the study suggested adding "burning" to the list of descriptors within the Skin Pain NRS (v2.0b). In the CE interviews, a burning sensation was described by some participants as a painful experience but for others, the term was used to describe the redness of their skin or the feeling of heat on their skin that was not seen as a 'painful' sensation. Given the different ways the term 'burning' was used by participants in the CE interviews, it was agreed that it should not be included as a descriptor of skin pain since those who did not consider it a painful experience would likely find this confusing. Furthermore, the overwhelming majority of participants in the telephone CD interviews understood the concept as it was described in v2.0b of the Skin Pain NRS and felt that the existing descriptors were sufficient in helping them to understand the intended meaning of the concept.
Thresholds for interpreting change on both measures were qualitatively explored; in the CD interviews, both in-person and telephone, participants indicated a meaningful change threshold between 3 and 4 points for the Itch NRS. For the Skin Pain NRS, there was no clear pattern based on qualitative feedback, which may be in part due to participants generally scoring at the low end of the response scale for the Skin Pain NRS. Therefore, further evidence is required to determine the most appropriate responder thresholds for both measures; this is currently being explored as part of a longitudinal, interventional, study with adult AD patients in which full psychometric validation of the revised measures is also being undertaken.

Conclusions
This study has addressed the previously identified gap in AD PRO measurement literature by creating two measures that assess symptoms of itching and skin pain. Revisions made to improve the validity of the measures were drawn directly from the recommendations of patients with AD who completed and provided feedback on initial versions of the measures and through supplementary interviews with additional AD patients following revisions to the measure. Participants reported dayto-day variations in AD symptomology, and therefore the measures were revised to a daily, 24-h recall period to better capture experience with these symptoms. Existing measures in the field rely on a retrospective 7-day recall period [15,28], this study suggests that is not adequate for capturing the day-to-day variation experienced by patients with AD and may impact validity of the measure. Further, following the revisions to the measures, this study utilized qualitative and quantitative data from patients with AD show that the new measures are both valid and reliable and adequate for use.