Design and setting
The Mid-Swed Health Survey [20] was conducted in Region Örebro County, a central region of Sweden with approximately 290,000 inhabitants. The region contains a larger city, several small towns, and rural areas, and about 250,000 people live in a city or small-town area and 40,000 in rural areas.
A random sample, stratified by sex and age, was selected from the general population. An equal part of men and women were randomly selected. From the age group 20–29 and 30–39 years, 800 persons were recruited to each group, as a lower response rate was expected in these age groups. From all other age groups, 340–480 persons were recruited.
The sample size was calculated setting a power at 80% (α = 0.05) to detect a between-group difference of 10 scale points, which can be considered as a medium difference for scale scores ranging between 0 and 100 [12, 21, 22]. The estimation of sample size was based on the RAND-36 subscale (role-functioning/physical) that requires the largest sample to detect a 10-point difference between two groups. In September 2015, 4040 persons received an invitation to participate along with an information letter via regular mail. The survey comprised a 9-page questionnaire including the Swedish RAND-36 and questions about gender, year of birth, occupation, and level of education [20]. The questionnaire was distributed along with a prepaid return envelope via regular mail. After 2 weeks, a thank-you/reminder card was delivered. If the questionnaire was not returned after 5 weeks, a reminding letter and a new questionnaire was sent via regular mail. Because of a low response rate, an additional sample of 4100 persons were invited in March 2016, according to the same stratification principles.
Classification of subgroups
Age was grouped into 10-year intervals (20–29, 30–39, etc.) with people 80 years and older in one group.
Education was classified into three categories: mandatory (grade 0–9), high school (grade 10–12), and university education (> 12).
The occupation variable included 11 categories: employed, own company, parental leave, student, in labour market program, job seeker, old age pension, activity or sickness compensation, long term sickness, and other. The following subgroups were created from the occupation variable: 1. Employed and self-employed (including employed, own company and parental leave); 2. Unemployed (participants in labour market programs and job seeker) 3. On sick leave (including activity or sickness compensation, and long term sickness). Persons with old age pension and students were not included in the analysis of occupation.
Rand-36
The RAND-36 consists of 36 items grouped into eight multi-item scales: physical functioning (PF), role-functioning/physical (RP), pain (P), general health (GH), energy/fatigue (EF), social functioning (SF), role-functioning/emotional (RE), and emotional well-being (EW). An additional item asks about health change (HC) in the past year. Scale scores are summed and transformed into scales ranging from 0 (worst possible health state) to 100 (best possible state). A scale score was calculated if at least half of the items in a scale were answered by the respondent (half-scale method) and missing-item values were imputed using a person-specific mean value based on the non-missing items [1]. The half-scale rule is used in scoring of SF-36; however, this criterion is not used in the standard scoring algorithm for RAND-36.
Psychometric methods
Testing of data quality, scaling assumptions, and reliability followed methods recommended for the International Quality of Life Assessment (IQOLA) Project [23], previously used for psychometric testing of SF-36 and RAND-36 [24]. Psychometric tests were performed in the total sample and in subgroups by gender, age, education, and occupation.
Data quality
Completeness of data was evaluated by calculating the percentage of missing data for each of the 36 items. At the scale level, the percentage of computable scale scores was calculated using the half-scale method.
Floor and ceiling effects
Floor and ceiling effects were analyzed by calculating the proportion of participants scoring at the lowest and highest possible levels. At the item level, a floor or ceiling effect was considered if at least 50% of the respondents scored at the minimum or maximum level [25]. At the scale level, an effect was indicated if at least 15% of the respondents scored at the lowest or highest level [26].
Reliability
Cronbach’s alpha coefficients were computed to estimate the internal consistency reliability of scale scores. A coefficient of at least 0.70 is considered appropriate for group data, although 0.80 is desirable. A coefficient of 0.90 or better is recommended for individual assessment [27].
Test of scaling assumptions
Item–scale correlations, that is, the correlation between each item and its own subscale (corrected for overlap), were calculated. A correlation of 0.40 or greater is considered satisfactory [26]. The correlation between items and other subscales was assessed and considered adequate if the correlations were better with the own scale than with other scales. The significance of a difference between two item-scale correlations was determined using the standard error of the correlation matrix (1/√n). The recommended significance criterion of two standard errors was used [26]. Pearson correlation analysis was performed to assess the correlation between item and scales scores.
Inter-scale correlations
The correlations (Pearson correlation) among subscales were tested and interpreted as low (< 0.30), medium (0.30–0.49), or strong (0.50) [22] Hypotheses about the magnitude of inter-scale correlations were based on results of the validation of the Swedish SF-36v1 [6]. The strongest correlation was expected between energy/fatigue and emotional well-being, and the weakest between physical functioning and emotional well-being. According to factor analysis, physical functioning, role-functioning/physical, pain, and general health, are strongly related to physical health, while energy/fatigue, social functioning, role-functioning/emotional, and emotional well-being, are strongly associated to mental health. It was hypothesized that the correlations between the four scales that primarily measure physical health would be strong, as well as between the four scales that primarily measure mental health.
Test of group differences
Known-groups analysis was performed to test the sensitivity of the scales and ability to capture expected differences between subgroups based on gender, age, education, and occupation [12]. Based on the results of the validation of the Swedish SF-36v1 [5, 6], we assumed: a) that men report better health than women on all eight scales; b) that the four physical health scales gradually deteriorate with age; c) that the differences based on age are smaller for the four mental health scales; d) that those with a low level of education, the unemployed, or those on sick leave, report poorer health.
Weighted mean RAND-36 0–100 scale scores were calculated for the total sample and for subgroups based on gender, age, education, and occupation. Weighted mean T-scores were also calculated to improve comparability across subgroups. T-scores have a mean value of 50 and standard deviation of 10 in the total sample and a T-score above 50 indicates better HRQoL compared to the total norm population. A sampling weight was derived to reflect the demographic distribution of age and gender of the Swedish population in 2015, and non-response. Differences in weighted means between two groups were tested using the nonparametric Somers’ D-test [28], and three or more groups were tested using F-test, with taking account for the sampling design. P-values for post hoc pairwise comparisons of the weighted means were adjusted for multiple comparisons using Šidák’s method [29], and adjusted p-value 5% or lower was considered to be statistically significant. Linear regression was used to examine whether there was linear or quadratic trend by increasing age. To test a linear trend, the seven-level age variable was used as a continuous variable, and to test quadratic trend, the square of the age variable was added to the model. Survey design and the sampling weight were accounted in linear regression models. SAS 9.4 (SAS Institute, Cary, NC, USA), IBM’s Statistics for Windows Version 22 (IBM, Armonk, NY, USA) and Stata SE Version 15 (StataCorp, College Station, Texas, USA) were used for statistical analysis.