Skip to main content

Validation of a novel patient reported tool to assess the impact of treatment in erythropoietic protoporphyria: the EPP-QoL

A Correction to this article was published on 15 August 2021

This article has been updated



A novel treatment has been developed for erythropoietic protoporphyria (EPP) (a rare condition that leaves patients highly sensitive to light). To fully understand the burden of EPP and the benefit of treatment, a novel patient reported outcome (PRO) measure was developed called the EPP-QoL. This report describes work to support the validation of this measure.


Secondary analysis of trial data was undertaken. These analyses explored the underlying factor structure of the measure. This supported the deletion of some items. Further work then explored the reliability of these factors, their construct validity and estimates of meaningful change.


The factor analyses indicated that the items could be summarised in terms of two factors. One of these was labelled EPP Symptoms and the other EPP Wellbeing, based on the items included in the domain. EPP Symptoms had evidence to support its reliability and validity. EPP Wellbeing had poor psychometric properties.


Based on the analysis it was recommended to drop the EPP Wellbeing domain (and associated items). EPP Symptoms, despite limitations in the development of items, showed evidence of validity. This work is consistent with the recommendations of a task force that provided recommendations regarding the development, modification and use of PROs in rare diseases.


Erythropoietic protoporphyria (EPP) is a rare metabolic disease characterized by abnormally elevated levels of protoporphyrin IX in erythrocytes (red blood cells) and plasma [1, 12]. When exposed to light in the visible spectrum, protoporphyrin IX is activated resulting in severe phototoxicity [12]. EPP patients show sensitivity to visible rather than UV light and it manifests in painful phototoxicity. Symptoms tend to occur within a few minutes of skin exposure to light/sunlight, and can take hours or days to resolve. Repeated episodes of phototoxicity can result in altered skin appearance with permanent changes (e.g. skin thickening with a waxy or leathery appearance) [8]. EPP can greatly impact on patients’ health related quality of life (HRQL), including daily and social activities and personal relationships [5, 7]. Some patients are effectively restricted to spending much of their life indoors and out of the light/sun.

CLINUVEL has developed a photoprotective agent for use in EPP that has been tested in international trials. Because of the nature of the disease and the protective effect of the treatment, a major outcome to be assessed is the impact on health related quality of life and other patient reported outcomes such as daily activities. CLINUVEL has previously used the SF-36 questionnaire [13] in earlier trials to assess the generic impact of treatment on patients’ HRQL. Other studies used the Dermatology Life Quality Index (DLQI) [4]. Neither of these measures was considered specific enough to capture the full impact of EPP on HRQL. Therefore, CLINUVEL worked with clinical experts to develop an assessment of HRQL for use specifically in patients with EPP. The resulting measure is called the EPP-QoL. The EPP-QoL was developed based on clinical expert opinion and was used in clinical trials without formal evidence of its psychometric properties. However, the trial data provide a useful resource for exploring the psychometric properties of the tool retrospectively. These analyses were planned, described in a statistical analysis plan and conducted independently from the team who conducted the trial analysis.

The first task in this analysis was to explore the underlying factor structure of the instrument. This was done to explore which items grouped together, to help identify a scoring system for the measure and to identify poorly functioning items. Further analyses then assessed the reliability and validity of the emerging domains.


Development of the EPP-QoL

The EPP-Qol was developed through a process of expert clinical consultation. Before the development of a modern treatment for EPP there was no standardised form of patient reported assessment. The clinical study team considered that the Dermatology Life Quality Index lacked face validity in the assessment of EPP because certain key features of the condition are not reflected in the questions. In order to develop the EPP-QoL the clinical experts held a series of round table meetings to agree on the concepts to measure and to discuss the development of the item wording. The instrument then went through rounds of review before its content was considered finalised. Informal assessments of the instrument by several groups of EPP patients were undertaken to provide feedback on the items and wording. The measure had not been formally validated prior to the initiation of this trial programme.


The psychometric validation was conducted on data collected from two trials: trial CUV029 (conducted over a 9 month period in Europe and included 6 site visits) and trial CUV030 (conducted over a 6 month period in the US and included 4 site visits).


Erythropoietic protoporphyria-quality of life (EPP-QoL)

The EPP-QoL was designed to assess quality of life in EPP patients. It was originally designed as a unidimensional instrument that included 15 items measuring various aspects of quality of life, including the impact of EPP on well-being, ability to partake in social/leisure/outdoor activities on sunny days, the choice of clothes worn on sunny days, and issues around transportation. The items are scored using a 4-point Likert scale ranging from 0 to 3, − 3-0, or − 2-1 depending on the item. The scale is scored additively, resulting in a maximum of 35 and a minimum of − 10. The higher the score, the more the quality of life is impaired. The recall period for the EPP-QoL is 2 months. The instrument was provided to patients for completion every 2 months during the clinical trials. Completion rate was 93%.


The approach to the psychometric analysis was informed by Fayers & Machin [3] and Revicki et al. [9]. The focus of the psychometric analysis was on exploring the psychometric structure of the instrument through factor analysis. Following this, the performance of the identified domains was examined in terms of reliability and validity. Reliability assessments explored measurement error. The validity assessments were designed to determine what the domains were measuring and how changes in the domain scores can be interpreted. The analyses were limited in their scope because this was secondary analysis of clinical trial data which meant that the analyses were constrained to available data collection points and variables.

Exploratory factor analysis

An exploratory factor analysis (EFA) was used to identify the structure of the EPP-QoL (e.g. whether the unidimensional structure could be supported). Factors with an eigenvalue of 1 or more were extracted. Different rotations were examined, including oblique and orthogonal approaches. The suitability of the data for conducting an EFA was assessed on the Kaisser-Mayer-Olkin Measure of Sampling Adequacy and Bartlett’s Test of Sphericity.

Item performance

The performance of each item of the EPP-QoL was assessed to explore the frequency of responses (to identify heavily skewed items or items with large floor or ceiling effects). A skewed distribution of responses was defined where fewer than 10% of responses occurred in two adjacent scale points. This was used to highlight problematic items [10].

Instrument review

Each item was reviewed in terms of poor functioning based on the findings from the EFA and the item analysis. Evidence of poor functioning items was based on whether the item did not fit the emerging factor structure; or there was evidence of skew, floor or ceiling effects or high rates of missing responses. If there was substantial evidence that an item was functioning poorly then the item could be removed from the measure. The factor analysis suggested that the items grouped into two broad domains that were labelled EPP Symptoms and EPP Wellbeing (see below). Analyses of reliability and validity therefore explored the performance of these two domain scores.


Internal consistency (Cronbach’s alpha) was estimated for EPP Symptoms & EPP Wellbeing using data from visit 3 only. To explore test-retest reliability of the scale, data were analysed from the middle period of the trial (visits 3 and 4). Participants were considered to be stable if they had experienced no phototoxicity prior to visit 3 and visit 4. The intra-class correlation coefficient (ICC) and Pearson’s correlation were estimated for the EPP Symptoms and EPP Wellbeing domain scores.

Construct validity

The performance of the EPP Symptoms and Wellbeing domains was tested in terms of its relationship to other markers of disease severity or outcomes as well as other measures of HRQL. This analysis focused on the Dermatology Life Quality Index (DLQI) which is the most widely used patient reported measure of health status or HRQL used in dermatology. However, it is a generic dermatology tool not specific to EPP and has not been specifically validated in EPP patients to the best of our knowledge.

The DLQI data can be used to express an overall impact of the dermatological condition on the patients’ life quality. Cut-off scores have been published (0–1 No effect on patients’ life; 2–5 small effect; 6–10 moderate effect; 11–20 very large effect; 21–30 extremely large effect) [6]. Participants were divided into groups based upon these cut-off scores and the differences in EPP scores were estimated. The EPP-QoL domain scores were also benchmarked against the severity of recent phototoxicity episodes as rated by the patient in a diary (Table 1).

Table 1 Description of the severity of phototoxicity reactions as recorded in the trials


The sensitivity of the EPP scores was estimated in terms of effect sizes. The EPP scores were explored to test the extent to which they changed as a result of a phototoxicity event. Simple effect sizes were estimated for patients who reported moving from experiencing some photoxicity at Visit 3 to experiencing none at Visit 4.

Minimal important difference

The trial data included two subjective markers of health status that were used as anchors to estimate minimal change – the DLQI and peak phototoxicity severity. These variables were used as a proxy for the degree of difference between groups that would be considered important. The MID estimates were calculated as the arithmetic difference between mean values for different groups of patients defined either in terms of peak phototoxicity or DLQI grades.


Factor structure

A varimax exploratory factor analysis was conducted to identify the structure of the EPP-QoL. Kaisser-Mayer-Olkin Measure of Sampling Adequacy was 0.95 and Bartlett’s Test of Sphericity was significant (p ≤ 0.001), suggesting the data were suitable for EFA. Two factors were consistently identified in different iterations of the analysis and explained a total of 69.4% of the variance (Table 2). The factor analysis identified that ten items loaded on Factor 1, three items on Factor 2; and two items did not load on either dimension. Item 2 showed a weak loading on Factor 1, and items 3 and 9 showed weak loadings on Factor 2. A Promax EFA conducted to confirm the findings of the varimax analysis, reported very similar results.

Table 2 Mean (and standard deviation, 95% confidence intervals) of EPP-QoL domain scores separated by degree of impact of EPP (determined by DLQI score)

Reviewing the items that load on each domain suggests that Factor 1 items could be described in terms of EPP severity and the impact of disease (and so this domain was labelled EPP Symptoms). Factor 2 includes items relating to the broader impact on quality of life and well-being (and was termed EPP Wellbeing). Items 2 and 9 did not load on either factor. Items 2, 10 and 13 showed a skewed pattern of responses. Seven items showed evidence of floor or ceiling effects (2, 8, 10, 11, 12, 14 & 15). Item 3 cross-loaded on both Factors 1 & 2, and was not conceptually coherent with the other items in the Factor 2 domain. Based on this analysis, items 2, 3 and 9 were dropped from the instrument. With these items removed a repeated EFA explained 77% of the variance.

Data distribution

Figure 1a & b show the distribution of EPP Symptoms and Wellbeing domain scores over time from the trial data. These analyses are collapsed across trial arms. The Symptoms domain shows an improvement in scores over the course of the trial. The Wellbeing domain shows no evidence of a change in scores over time, and a clustering of data below 20 (at baseline).

Fig. 1
figure 1

a and b EPP Symptoms and EPP Wellbeing over time. Plots show median (broad bar), interquartile range or IQR (brown box), 1.5 times IQR (whiskers), circles and stars are outliers

Scale reliability

Both subscales show high (and acceptable) internal consistency [11]. Table 3 also shows estimates of the test-retest reliability of the two domain scores over time in the total sample and a defined stable patient group. Table 4 shows the influence of each individual item on internal consistency. The EPP-Symptoms domain shows some evidence of stability; the ICC is at the lower end of what would be considered acceptable. The EPP-Wellbeing domain shows poor test-retest reliability when only stable patients are included.

Table 3 Internal consistency and test-retest reliability for the EPP-Qol subscales
Table 4 Item-total correlations and Cronbach’s alpha with item deletion for the EPP Symptoms score (Cronbach’s alpha for scale =0.954) from Visit 3 only

Construct validity

Benchmarking EPP-QoL against DLQI

The construct validity of the EPP-QoL domain scores was tested by benchmarking against the DLQI. The DLQI data showed that all patients (bar one) were classed as having at least a moderate effect of EPP on their lives. There is no difference in the EPP Wellbeing domain scores for patients in the different severity level groups. In contrast, there is a clear linear change in the EPP Symptoms score. The mean EPP Symptom score is 46.6 for moderate effect, 73.1 for very large effect and 82.9 for extremely large effect (Table 2), (F = 9.23; P < 0.0001). DLQI total score is significantly correlated with EPP Symptom domain (r = 0.52; P < 0.0001), but not the EPP Wellbeing domain (r = − 0.10; n.s.).

Benchmarking EPP-QoL against the experience of recent phototoxicity?

The trial participants were grouped according to peak level of phototoxicity in the previous 60 days. There is no significant difference between groups defined by peak photosensitivity severity for the EPP Wellbeing score (Table 5). In contrast the EPP Symptom score shows a linear change in scores with increasing severity of groups defined in terms of the phototoxicity variable (F = 51.5, P < 0.0001).

Table 5 EPP-QoL domain scores with respect to peak toxicity severity from previous 60 days

Minimal important difference (MID)

MID estimates were calculated for differences in EPP-Symptoms score based on the differences between sub-groups defined by the DLQI and the 60 day peak phototoxicity severity (Table 6). An average MID value was estimated, which was a change of 13 points. It was not possible to estimate an MID for differences on the EPP-Wellbeing because none of the differences between sub-groups were statistically significant.

Table 6 Estimates of important difference on EPP-Symptoms scale based on benchmark criteria from DLQI and 60 day peak phototoxicity severity


Many people advocate the use of condition specific outcome measures to really understand the nature of the impact of a disease [2]. In a condition like EPP there is evidence from patient testimonies that patients simply stay inside to avoid light exposure and resulting phototoxicity. This may have a very significant effect on patients’ psychological state and social wellbeing. Phototoxicity may be avoided, but clearly people are unable to live an otherwise normal life. Therefore, measuring the impact of EPP using a dermatology specific instrument may capture the impact of skin lesions but will likely miss the wider social and psychological impact of EPP and so will underestimate the disease burden. This led the team to try to develop a novel instrument to assess the burden of EPP.

Rare diseases present some significant challenges for researchers concerned with the development and use of patient reported outcomes (PROs). In a prevalent disease, a new PRO can be developed using in depth qualitative research with people affected by the disease followed by large scale psychometric research to understand the measurement properties of the new instrument. In rare diseases, this approach to development is much harder because of the difficulty in recruiting sufficient numbers of patients. In the present study the team relied upon expert opinion to guide the content of the instrument. Anecdotally the EPP-QoL was well received by patients, but no formal cognitive interviewing was conducted to support this, which is a limitation. The initial trial work has been used to explore the measurement properties of the instrument. Based upon these analyses the team has made substantial changes to the instrument items and scoring.

The present report describes psychometric analysis of this instrument using available trial data. The team recognises the limitations in the development of the measure and so where the psychometrics suggest that changes to the instrument could improve it, then these have been actioned. The work that has been undertaken suggested that the measure reflected two underlying domains – EPP Symptoms and EPP Wellbeing. However, based on a review of all of the analyses the team has concluded that only the EPP Symptoms domain should be taken forward as a measure. The EPP Wellbeing domain has been dropped. In broad summary, the EPP Symptom score has good internal consistency, and test-retest reliability. The EPP Symptom score reflected differences in DLQI scores and peak photosensitivity reactions supporting its construct validity. An MID was estimated as a 13 point change. More work is needed to explore MID and also to establish was might be considered a meaningful change threshold. In future clinical studies the use of a smaller MID estimate (5–6 points) could be tested if only very mildly affected patients are included. This should be stated a priori and it should be tested.

Rare disease trials also present problems with the use of PRO measures. PRO data are subjective, prone to biases and so the resulting data can have high error variance. Large studies can detect the therapeutic signal against the background noise using statistical analysis. In rare diseases, clinical trials are usually small, and so the interpretation of the PRO data and change from baseline in scores is therefore more complex. A guidance paper from the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) recommends that in rare disease trials it is preferable to use disease specific measures [2]. In addition, they state that evidence of the reliability, construct validity and the ability to detect change should be evaluated and documented. They also recognise how challenging it is to develop new PROs in rare diseases. Where there is interest in developing a measure in a rare disease, they outline the simplified methods that could be used, reflecting the limited availability of patients. The present study is perhaps an example of this pragmatic approach recommended by ISPOR. The development of the tool had some limitations, but the availability of data from the trials allowed the team to heavily edit the instrument content based on evidence.

Other study limitations should also be noted. The psychometric analysis was based on secondary use of trial data. This meant that the analyses could only use available data and available time points. For example test retest reliability was assessed over a 2 month time window (in stable patients) which was probably too long a time period for the assessment of test retest reliability. This is partly due to the fact that the measure has a 2 month recall period and afamelanotide is dosed every 2 months. The 2 month recall period may be a limitation of the measure. This allows patients to reflect over a period of time, which is useful considering that phototoxic events are infrequent (although milder reactions can be more common and can disrupt activities of daily living). The long recall period introduces a greater risk of measurement bias. Also, the trial data did not allow us to examine the validity of the measure in different cultural contexts. More work is needed to establish criterion validity and an estimate of what would be considered a meaningful change threshold. The exploratory factor analysis that identified the two underlying factors in the EPP-QoL was only based on Eigen values. Other methods also exist, such as the Hull method, for identifying underlying factors. Quite a high proportion of items showed evidence of floor or ceiling effects. Several of these items had a high frequency of “not at all” responses – often to questions about the limitations on aspects of life that people with EPP experienced. This may be evidence of poorly designed items. It should also be clear that the EPP-Qol was developed to support the trial programmes for afamelanotide, a treatment developed by the study sponsor. It was developed by clinical experts in EPP to capture the impact of the condition. This psychometric analysis was restricted to data from two clinical trials. The EPP-QoL has been used in other clinical trials, notably a US trial CUV039 the data from which could also be explored in future analyses. Lastly it is possible that the psychometric properties of this instrument vary with seasonality. This again could be explored in future research. Because of different study limitations we believe that this report should not be considered the definitive psychometric analysis of the EPP-QoL, but rather an analysis that explores aspects of the performance of the measure based on available data from the trials.

In conclusion we report a case study of the development of a new PRO in a very rare condition – EPP. The measure was tested and adapted in order to produce the best possible instrument from the original content. Lessons were learnt in this process regarding the measurement of patient benefit in a rare disease. Despite its limitations we hope that this instrument will be informative regarding the burden of EPP.

Availability of data and materials

Study data are not available for public use.

Change history



Dermatology Life Quality Index


Exploratory Factor Analysis


Erythropoietic protoporphyria


Erythropoietic Protoporphyria-Quality of Life


Health-related quality of life


Intra-class correlation coefficient


International Society for Pharmacoeconomics and Outcomes Research


Minimal Important Difference


  1. Anderson, K., Sassa, S., Bishop, D., et al. (2009). Disorders of Heme biosynthesis: X-linked Sideroblastic Anemia and the Porphyrias. In D. Valle, A. L. Beaudet, B. Vogelstein, et al. (Eds.), (2013) The online metabolic and molecular bases of inherited disease.

    Google Scholar 

  2. Benjamin, K., Vernon, M., Patrick, D., et al. (2017). Patient-reported outcome and observer-reported outcome assessment in rare disease clinical trials: An ISPOR COA emerging good practices task force report. Value Health, 20(7), 838–855.

    Article  PubMed  Google Scholar 

  3. Fayers PM, Machin D. Quality of Life: The Assessment, Analysis and Reporting of Patient-reported Outcomes, 3rd Edition. Wiley-Blackwell; 2016. p. 648. ISBN: 978–1–444-33795-2.

  4. Finlay, A., & Khan, G. (1994). Dermatology life quality index (DLQI)-a simple practical measure for routine clinical use. Clin Exp Dermatol, 19(3), 210–216.

    Article  CAS  PubMed  Google Scholar 

  5. Holme, S., Anstey, A., Finlay, A., et al. (2006). Erythropoietic protoporphyria in the U.K.: Clinical features and effect on quality of life. Br J Dermatol, 155(3), 574–581.

    Article  CAS  PubMed  Google Scholar 

  6. Hongbo, Y., Thomas, C., Harrison, M., et al. (2005). Translating the science of quality of life into practice: What do dermatology life quality index scores mean? J Investig Dermatol, 125(4), 659–664.

    Article  CAS  PubMed  Google Scholar 

  7. Jong, C., Finlay, A., Kerr, A., & Ferguson, J. (2008). The quality of life of 790 patients with photodermatoses. Br J Dermatol, 159(1), 192–197.

    Article  CAS  PubMed  Google Scholar 

  8. Lecha, M., Puy, H., & Deybach, J. (2009). Erythropoietic protoporphyria. Orphanet J Rare Dis, 4(1), 19.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Revicki, D., Osoba, D., Fairclough, D., et al. (2000). Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Qual Life Res, 9(8), 887–900.

    Article  CAS  PubMed  Google Scholar 

  10. Skevington, S., Lotfy, M., & O'Connell, K. (2004). The World Health Organization's WHOQOL-BREF quality of life assessment: Psychometric properties and results of the international field trial. A Report from the WHOQOL Group. Qual Life Res, 13(2), 299–310.

    Article  CAS  PubMed  Google Scholar 

  11. Streiner, D. L., & Norman, G. R. (2003). Health measurement scales. A practical guide to their development and use, (3rd ed., ). Oxford: Oxford University Press.

    Google Scholar 

  12. Todd, D. J. (1994). Erythropoietic protoporphyria. Br J Dermatol, 131(6), 751–766.

    Article  CAS  PubMed  Google Scholar 

  13. Ware, J., & Sherbourne, C. (1992). The MOS 36-ltem short-form health survey (SF-36). Med Care, 30(6), 473–483.

    Article  PubMed  Google Scholar 

Download references


We would like to acknowledge the time and effort of the study participants and their willingness to engage in the development process.


All funding for this work was provided by Clinuvel.

Author information

Authors and Affiliations



All authors reviewed and commented on the manuscript. In addition, Lloyd undertook all analyses and wrote the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to A. J. Lloyd.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All participants provided consent to publish their data in an aggregated form.

Competing interests

Authors Wolgren & Wright are both employees of Clinuvel who manufacture Scenesse™.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: The family name of G. Biolcati has been corrected.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biolcati, G., Hanneken, S., Minder, E.I. et al. Validation of a novel patient reported tool to assess the impact of treatment in erythropoietic protoporphyria: the EPP-QoL. J Patient Rep Outcomes 5, 65 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: