Study design and data collection
The design of the HUI3 preference measurement study included two complementary surveys: a survey to collect measurements required for fitting HUI3 multi-attribute utility functions—the HUI3 Modeling survey (HUI3-M)—and an associated survey to collect direct utility measures for 53 states, including states prevalent in the general population (HUI3-D). The HUI3-D provides a valuable commensurate data set for assessing inter-survey or external agreement of HUI3 utility scores. These surveys were conducted in a face-to-face interview.
Both surveys were conducted in five cities in Japan (Sapporo, Tokyo, Nagoya, Osaka, and Fukuoka). These cities are representative of various regions in Japan and are geographically dispersed. All respondents from 20 to 79 years of age were recruited based on snowball sampling by a research company (ANTERIO Inc.). The HUI3-M preference survey collected value and utility measurements from 774 respondents. Sets of health states were randomly allocated to respondents according to strata. Health state strata were defined as follows: scale anchor states (pits [V6, H6, S5, A6, D6, E5, C6, and P5], dead, and perfect health), methodological marker states (MA [V2, H1, S1, A1, D1, E1, C1, and P3], MB [V2, H1, S1, A3, D1, E2, C1, and P3], and MC [V2, H1, S1, A1, D1, E2, C3, and P5], single-attribute states, and block states. The HUI3-D preference survey collected value and utility measurements from 263 respondents. As in the HUI3-M survey, sets of health states were randomly allocated to HUI3-D respondents according to strata. Health state strata for the HUI3-D survey were defined as follows: scale anchor states, methodological marker states, most prevalent states, and less prevalent states. For the HUI3-M survey, the number of respondents providing value and utility measures varied by health state strata; therefore, precision of the mean preference scores varied by strata.
Value scores were measured using the two-sided feeling thermometer developed by Furlong et al. [19], a prop for eliciting preference scores based on the VAS technique. Standard gamble questions were administered using a modified version of the original chance board prop as follows: in the first step, the interviewee was with certainty in the described health state; in the second step, they were in the best possible state with a certain probability or in the state they considered to be the worst possible one with complementary probability. Different probability values were proposed in an iterative manner until the interviewees stated that they felt indifferently toward both propositions. This last set of data enabled us to establish the function for transforming the values into utilities. Interviews were conducted by 50 interviewers whom we trained in the specific field of preference elicitation in each region. Interviewers used specifications that included both instructions for managing the interviews and those to be read aloud to the interviewees.
Statistical analysis
Direct preference measures, both values and utilities, are summarized using various statistics: the 10% trimmed mean (5% trimmed off each end of the distribution), standard deviation, minimum, and maximum. The trimmed mean was selected, rather than the median or mode, to maintain most statistical properties associated with using mean-type estimates while reducing the effects of outlier scores on the estimates of central tendency for distributions of the health state preference score with skewed distributions. The person-mean score was defined as the trimmed mean for a specific health state.
The underlying theory of the multiplicative, multi-attribute utility function was described previously by Keeny and Raiffa [20]. The general form for an eight-attribute multiplicative function is as follows:
$$ 1+c=\prod \limits_{j=1}^8\left(1+c\ast cj\right) $$
(1)
The subscripts (j) indicates a sub-group of attributes.
where \( \prod \limits_{j=1}^8 \) is the product of all (1 + c*cj) from c1 to c8.
Respondents were classified into two groups according to the state that each respondent selected as the lowest anchor state when using the feeling thermometer (group A respondents reported pits to be equally or less preferable to dead, and group B respondents reported dead less preferable than pits). Person-mean value scores were calculated for groups A (person-mean (A)) and B (person-mean (B)) scores. Overall person-mean disutility scores were used to fit a multi-attribute disutility function (MADUF), with the scale defined such that perfect health = 0.00 and pits = 1.00 (2), and the MADUF was converted into a multi-attribute utility function (MAUF), with the scale defined such that pits = 0.00 and perfect health = 1.00 (3). Then, the MAUF on the pits/PH scale was converted to a MAUF on the conventional dead = 0.00 to PH = 1.00 scale (4).
Formula (pits/PH scale)
MADUF:
$$ \overline{u}=\Big[1/c\left\lceil \prod \limits_{j=1}^8\left(1+c\ast cj\ast {\overline{u}}_j\right)-1\right\rceil $$
(2)
MAUF:
$$ u=1-\overline{u} $$
(3)
Conversion to the dead/PH scale
$$ {\displaystyle \begin{array}{l}{\overline{u}}^{\ast }=\overline{u}/{\overline{u}}_{Dead}\\ {}{u}^{\ast }=1-{\overline{u}}^{\ast}\end{array}} $$
(4)
Each respondent’s value scores for the single-attribute (including corner) states were normalized such that the least desirable health state was assigned a value score of 0.00 and the most desirable health state was assigned a value score of 1.00. Next, respondent preference measures (i.e., value and utility scores) were classified into one of two groups: person-mean (A) or person-mean (B). The person-mean single-attribute disutility scores provide the \( {\overline{u}}_j \) ‘s. The cj ’s are the disutility scores for each of the lowest attribute-level states (i.e., the corner states) on the pits/PH scale, and c was calculated by iteratively solving the equation.
Finally, external agreement (i.e., the extent to which each model can predict utility scores for a group respondents other than the group whose preference scores were used to develop the model) was assessed by comparing the utility scores calculated using the MAUF for each of the 53 health states (marker and 50 other states) to the mean of directly measured utility scores for these states, as reported by respondents in the HUI-D preference survey. Agreement between utility scores by SG and scores was assessed using a two-way mixed model intraclass correlation (ICC) in which the SG and HUI3 MAUF scores were treated as fixed effects and interactions between the participant and instrument were treated as random effects [21]. The ICC estimates the proportion of between-subject variation in relation to total variation, where 1 represents perfect agreement and 0 indicates no agreement at all. A coefficient < 0.40 was considered as poor agreement [22]. Statistical analyses were performed using SAS 9.4.