Consumer Assessment of Healthcare Providers and Systems (CAHPS®) survey of experiences with ambulatory healthcare for Asians and non-Hispanic Whites in the United States

Background Differences in experiences of care reported by Asian Americans (Asians) compared to non-Hispanic Whites (Whites) may be due to lack of measurement invariance. Methods We evaluated the three-factor structure and the equivalence of responses to the Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Clinical and Group (CG-CAHPS) Adult Visit Survey 1.0 and compared care experiences of Asians and Whites. Thirteen questions were used to elicit reports about specific aspects of care and two questions assessed overall care perceptions. This analysis of the CAHPS database included 769 providers and 266,327 respondents. Most surveys (98%) were administered by mail and the rest (2%) by phone. Only 0.5% of the surveys were administered in Spanish. The sample was 64% female, 89% White and 2% Asian, 39% 65 years or older, and 32% were high school graduates or less. Results A three-factor model was supported by categorical confirmatory factor analysis using weighted least squares with mean and variance adjustment: confirmatory fit index (CFI) = 0.99 and root mean squared error of approximation (RMSEA) = 0.03). A multi-group configural invariance model also fit the data well: (CFI = 0.993, RMSEA = 0.031). Regression models indicated that Asians reported worse access, lower scores on office staff courtesy and helpfulness and rating their doctors and were less likely to recommend their doctors to family/friends than did Whites. Conclusions Use of the CG-CAHPS Adult Visit Survey 1.0 to assess perceptions of care by Asians and Whites is supported. Quality improvement efforts are needed to address worse experiences of care among Asians in the United States. Supplementary Information The online version contains supplementary material available at 10.1186/s41687-021-00303-3.


Background
Patient evaluations of care are used by health plans, physician groups, hospitals, and other health care providers to inform patients about their options and improve quality of care [1][2][3][4]. Several studies have found racial/ethnic disparities in patient experiences with care. For example, Snyder and colleagues [5] found that Asian Americans reported worst self-reported access to care (wait times, reaching the doctor's office via phone) than all racial/ethnic groups. Phillips et al. [6] analyzed the 1996 Medical Expenditure Panel Survey data and found that African Americans, Hispanics, Asian Americans and Pacific Islanders were less satisfied with care than non-Hispanic Whites, and Asian Americans were the most dissatisfied. Asian Americans were also less satisfied with care in the 1998 National Research Corporation Healthcare Market Guide® survey [7]. Morales et al. [8] analyzed Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Health Plan 1.0 survey data collected from 54 commercial and 34 Medicaid health plans and found that most minorities reported experiences like non-Hispanic Whites, except for Asian Americans, who expressed worse perceptions of care. Weech-Maldonado and colleagues [9] found that racial/ethnic minorities fared worse than non-Hispanic Whites on the CAHPS Health Plan 2.0 survey. Several other studies have confirmed worse experiences with care for Asian Americans [10,11].
Comparisons between Asian Americans and non-Hispanic Whites requires measurement equivalence between the two racial/ethnic groups. Dyer and colleagues [12] found support for three factors for the CG-CAHPS Adult Visit 1.0 Survey: access to care, physician communication and office staff courtesy and helpfulness. However, the authors used a CFA model that assumes continuous measures, while the appropriate model for the analysis would be categorical factor analysis. The authors also included in the analysis only respondents that answered all CG-CAHPS core questions (21,318 out of 103,442 respondents). Factor analyses of the CAHPS Medicare survey provided support for measurement equivalence between non-Hispanic Whites (n = 1,326, 410) and Asians (n = 40,672) for the composites (communication, access, customer service) and global ratings [13].
This paper evaluates measurement equivalence on the CAHPS clinician and group adult visit survey 1.0 composites (access to care, physician communication, and office staff courtesy and respect) between non-Hispanic Whites and Asians. Prior to evaluating measurement equivalence, we assess whether the three-factor structure of the CG-CAHPS items observed by Dyer et al. [12] is replicated using categorical confirmatory factor analysis in this dataset. This is necessary to establish before evaluating measurement equivalence of the CAHPS survey for non-Hispanic Whites and Asians. Finally, we compare reports and ratings of care between these two race/ethnic subgroups.

Sample
The dataset used in this analysis includes 769 providers and 266,327 respondents. Most surveys (98%) were administered by mail and the rest (2%) by phone. Only 0.5% of the surveys were administered in Spanish. Sample characteristics and data missingness are provided in Table 1. Respondents were predominantly non-Hispanic White and educated.

Instrument
The C-G CAHPS Adult Visit Survey 1.0 asks patients about their most recent visit to a doctor (a primary care doctor or a specialist). Thirteen questions are used to create three reporting compositesaccess to care (6 questions), physician communication (5 questions) and office staff courtesy and helpfulness (2 questions) and there are two global rating questions. Table 2 shows the 15 questions used to elicit reports and ratings of care and their response options.
The access to care composite items asks about timeliness of care, timeliness of responses to medical questions over the phone and waiting times for appointments during the last 12 months using fourcategory response options (Always, Usually, Sometimes, Never). The physician communication composite questions refer to the most recent visit and ask whether the doctor listened carefully, gave clear instructions, showed respect, spent enough time with and seemed knowledgeable about respondent's medical history using a three-category response scale (Yes, definitely; Yes, somewhat; No). The office staff courtesy and helpfulness composite questions refer to the most recent visit and ask whether clerks and receptionists were helpful and treated the respondent with courtesy and respect using the same three-category response scale used for physician communication. The global rating items ask respondents (1) to rate their doctor on a 0-10 rating scale where 0 is the worst possible doctor and 10 is the best doctor and (2) whether they would recommend their doctor to family and friends using the three-category response scale (Yes, definitely; Yes, somewhat; No).
Two questions in the CAHPS surveys ask about race/ ethnicity. The first question asks if the respondent is of Hispanic or Latino origin. The next question asks whether the respondent is: 1) White, 2) Black or African American, 3) Asian, 4) Native Hawaiian or Pacific Islander, 5) American Indian or Alaska Native, or 6) other. Like the US Decennial Census, those respondents confirming Hispanic or Latino origin were coded as Hispanic. Those who reported no Hispanic origin or had a missing value were coded in accordance with their selfreported race. The resulting race/ethnic categories were Hispanic, White, Black/African American, Asian and other: Native Hawaiians or Pacific Islanders, American Indians, Alaska Native and other races were coded into the other group. The questionnaire also assessed patient gender, education, age, and health status (Table 1).

Analysis plan
The evaluation of factor structure and measurement invariance was limited to the White and Asian subgroups. The assessment of differences in care experiences for these two subgroups was performed in a related rather than isolated model environment using the full sample to improve model fitting.

Evaluation of factor structure
We assess a three-factor structure (access to care, physician communication and office staff courtesy and helpfulness) shown in the online supplemental material ( Figure S1). We evaluate whether a multi-level model was necessary using an intraclass correlation threshold of 0.10 or higher [14]. We conducted categorical confirmatory factor analysis with Mplus 7 using the robust weighted least squares estimation procedure, the weighted least square mean and variance adjusted estimation (WLSMV). We used theta parameterization and full information maximum likelihood estimation. Theta parameterization is a model specification method that considers variance differences across groups [15]. We evaluated model fit using a parsimony correction index root mean squared error of approximation (RMSEA) and the comparative fit index (CFI). RMSEA values of 0.05 or less are considered a close fit. CFI of 0.95 or  greater are considered acceptable fit [16]. We used a standardized factor loading of 0.40 or greater as the threshold for the appropriateness of the item for the proposed factor [17]. We estimated correlations among factors, with very high correlations (0.80 or above) implying that factors are redundant.

Evaluation of measurement invariance
We fit multiple group confirmatory factor analysis (MG CFA) models to assess measurement invariance between Asians and Whites. Configural invariance confirms that the number of factors and the pattern of indicator-factor loadings are identical across the groups. Metric and scalar invariance confirm equality of factor loadings, and factors loadings and intercepts, respectively [18].
In the first step, we performed single group CFA separately in the White and Asian subgroups to confirm that the measurement model had an acceptable fit in the two groups. Model fit indices and parameter estimates for the two groups were evaluated. In the second step, we conducted a test of configural invariance to explore whether the pattern of free factor loadings and thresholds were similar across groups. Lack of configural invariance implies that different latent variables in the two groups may have been measured. In the third step, we first constrained factor loadings to equality in the two groups to conduct a metric invariance test. Evidence of metric invariance implies that latent variables are related to survey items the same way across groups. That is, one-unit change in item score translates to the same unit change in estimated latent variable score in the groups. Once the metric invariance test results show an acceptable model fit, a new constraintequality of thresholds across groupswas added to test for scalar invariance. Evidence of scalar invariance implies that individuals with the same score on a latent variable are likely to have the same score on survey items corresponding to the latent variable. Thus, score means across groups can be compared validly.
We performed two versions of a metric invariance test. In the first version, we followed an approach suggested by Brown [18], where we fixed factor means to 0 in both groups and factor variances to 1 in one group and placed no constraints in the other. This model produced a difference of 12 constraints (difference in degrees of freedom) between the configural and metric models, while we placed a constraint on 15 parameters (15 factor loadings). In order to address this issue, we fixed factor means to 0 and factor variance to 1 in both groups and obtained a difference in degrees of freedom corresponding to the number of constrained parameters.
We also performed two versions of scalar invariance testing. In the first approach (proposed in [18]), we fixed factor means to 0 and factor variances to 1 in one group and set both parameters free in the second group. The test results showed a difference in degrees of freedom between the two models to be 33, while we placed constraints on 36 parameters. In the second approach, we fixed factor means to 0 and factor variances to 1 in both groups, which yielded a difference in degrees of freedom of 36.
In nested models, chi square values are generally expected to be higher for the more constrained models compared to the less constrained models. In our analysis, chi square values in the metric models were consistently lower than those in the configural models. Chi square values from nested models are considered not to be directly comparable when the WLSMV estimation is used. This places limitation on evaluation of a model fit, as many practical fit indices are derived from chi square values. To confirm the model findings, we also estimated models with an MLR estimation. The MLR based models also produced lower chi square values in the metric models compared to those in the configural models. Therefore, in the final step, we run the models with the ML estimation. The chi square values in more constrained models were higher than those in less constrained models when using the ML estimation, thus permitting comparison of model fit indices across models. CFI values in nested models using the WLSMV and MLR estimations may not be directly comparable [19].
Evaluation of racial-ethnic differences in patient experiences with care First, an ordinary least squares (OLS) model was run with the main effects in the model, adjusting for practice site using the cluster option in STATA 14. Adjusted scores (means) for Asians and Whites for composites and global ratings were estimated using recycled predictions in STATA 14. Recycled predictions were obtained from regression models and used to understand marginal effects of independent variables on a dependent variable. Independent variables other than the one of primary interest were fixed. We fixed the covariates (age, gender, levels of education and self-reported health) at their means.
Secondly, we evaluated possible two-way interactions between non-Hispanic Asian race/ethnicity and each independent variable in our model. Interaction terms are included into the model to explore whether care experience reported by Asians differ from others in the sample by age, gender, levels of education and reported health status. Non-significant interaction terms (p = 0.05) were excluded from the final model.
For the analyses of the 0-10 rating of the doctor and would recommend doctor to friend/family items, we initially considered estimating ordinal logistic regression models as a sensitivity analysis in addition to the OLS models. The Brant test was used to evaluate the proportionality assumption. The proportional odds assumption was not met for 16 out of 20 variables in the 0-10 rating of the doctor model and 13 out of 20 variables in the would recommend doctor to friend/family model. Therefore, generalized models were used.
Positive response tendencies can be corrected by standard case-mix adjustments (i.e. for age, education) in a regression analysis. Extreme response tendencies, on the other hand, are not addressed by case-mix adjustments in the presence of skewed data. Weech-Maldonado et al. [20] recommended pooling responses at the lower (0-6) and top end (9-10) when examining racial/ethnic differences in CAHPS ratings of care. In this paper, we used this categorization approach (0-6, 7-8, 9-10) in a generalized logistic regression analysis.

Results
Response rates for the survey are reported by 470 physician groups out of 769. The lowest reported response rate is 6%, the highest 97% and the median -35%. Because these are self-reported by sponsors and could be biased, appropriate caution is warranted.
Substantive CAHPS survey questions are only asked of those for which they apply. Inappropriately missing values ranged from 1% to 6%. Means and standard deviations for the CAHPS items and composites are provided in Tables 3 and 4. Reports about care and global rating items are negatively skewed, with skewness ranging from − 0.93 to − 4.52. Intraclass correlations for the three factors ranged from 0.03 to 0.10 (access to care -0.1, physician communication and global ratings -0.04, office staff helpfulness and courtesy -0.03) supporting individual-level factor analyses.

Factor structure
The practical fit indices for the three-factor model were acceptable: CFI was 0.99 and RMSEA was 0.03. All factor loadings were statistically significant (P > 0.01) and above the 0.40 cut point ( Table 4). The smallest loading was found for the item asking about how often a respondent was seen within 15 min of an appointment (0.57). None of the estimated correlations among factors were above the 0.80 threshold and they ranged from 0.46 to 0.60. Single group CFA model outputs supported an acceptable model fit for both Asians and non-Hispanic Whites ( Table 5). The model for Asians produced RMSEA (upper and lower limits of 90% confidence interval) = 0.037 (0.040; 0.034), CFI = 0.991 and TLI = 0.989 and for Whites = 0.031 (0.031; 0.030), CFI = 0.994. All factor loadings were statistically significant at p < 0.001 and greater than 0.50. While the loadings for Asians were uniformly higher than for Whites for access to care and office staff courtesy and helpfulness composites, the loadings for physician communication and global ratings did not differ between the two subgroups. The magnitude of difference in the standardized loadings between the two groups was the greatest for the item on timely response to a medical question after office hours (0.075; p < 0.01). Access to care 1. In the last 12 months, when you phoned this doctor's office to get an appointment for care you needed right away, how often did you get an appointment as soon as you thought you needed? 2. In the last 12 months, when you made an appointment for a check-up or routine care with this doctor, how often did you get an appointment as soon as you thought you needed?
3. In the last 12 months, when you phoned this doctor's office during regular office hours, how often did you get an answer to your medical question that same day? 4. In the last 12 months, when you phoned this doctor's office after regular office hours, how often did you get an answer to your medical question as soon as you needed? 5. Wait time includes time spent in the waiting room and exam room. In the last 12 months, how often did you see this doctor within 15 min of your appointment time?
Office staff courtesy and helpfulness 1. During your most recent visit, were clerks and receptionists at this doctor's office as helpful as you thought they should be? 2. During your most recent visit, did clerks and receptionists at this doctor's office treat you with courtesy and respect?

Measurement invariance
In the MG CFA analysis using WLSMV estimation, the fit indices for the configural invariance model were in the acceptable range -RMSEA (the upper and lower limit of 90% confidence interval) = 0.031 (0.031; 0.030), CFI = 0.993. We conducted two versions of metric and scalar invariance testing using WLSMV estimation. Both versions of the tests showed an acceptable model fit ( Table 5). The metric invariance model with factor means fixed to 0 and factor variances fixed to 1 in both groups produced a slightly better fit compared with the model with factor means fixed to 0 in both groups and factor variances fixed to 1 in one group and no constraints placed in the other group (RMSEA = 0.024 vs. 0.027, CFI = 0.996 vs. 0.994). Neither of the scalar invariance models produced consistently better fit indices (RMSEA = 0.024 vs. 0.025, CFI = 0.995 vs. 0.994). Both MLR and ML models supported metric invariance when using approximate fit indices. Table S1 presents recycled predictions (main effects and interaction terms models). Asian Americans reported the worst access (predicted score: Asian Americans (72.10), non-Hispanic Whites (79.10), African American (79.01), Hispanic (77.78) and Other (77.97)) and lowest (worse experiences) scores (predicted score: Asian Americans (92.74), non-Hispanic Whites (94.85), African American (95.26), Hispanic (94.55) and Other (93.60)) on the office staff courtesy and helpfulness measure of all five racial/ethnic groups. Asian Americans also reported worse scores on rating their doctors and were also the least likely to recommend their doctors to family and friends of all five racial/ethnic groups. There were no significant differences between Asian Americans and non-Hispanic Whites on physician communication. The "other" racial/ ethnic group reported the worst physician communication.

Online Supplementary Material
We were unable to explore regional variations in reports and ratings among Asian Americans in our main model due to collinearity between the practice site and region. We ran a secondary model where we replaced the practice site with the region and found that Asian Americans from the Northeast report better experience than Asian Americans from the West. Asian Americans in the South rate care worse on most measures than Asian Americans in the West, Midwest and Northeast.
Several interactions between Asian American race/ethnicity and gender, age, education, and health were significant. For example, Asian Americans who rate their health as excellent reported better experience than Asian Americans with other self-reported health states. Asian Americans in the 45-54 age group reported worse access to care compared to Asian Americans of other ages. Asian Americans with less than high school education had the worst access among Asian Americans of various education levels. However, the interactions were not in a consistent direction.
The findings from the generalized ordinal logistic models for the 0-10 rating of the doctor and would recommend doctor to friend/family items were in general consistent with the OLS model results and are presented in the Online Supplemental Material, Table S2.

Discussion
The categorical confirmatory factor analysis showed that the three factor structure fits well in our dataset and in line with the findings from the continuous factor analysis conducted by Dyer et al. [12]. All the items loaded significantly on to the respective factors. The lowest loadings were observed for the item on wait times at the doctor's office. This item was shown to load weakly on access to care related factors in other CAHPS surveys as well [21,22]. Our study provides support for measurement invariance between Asians and Whites in the CG-CAHPS Adult Visit Survey 1.0 measures of access to care, physician communication and global ratings, and All were scaled on a 0-100 possible range and the observed minimum and maximum were 0 and 100 for each variable office staff courtesy and helpfulness. The criteria for both metric and scalar invariance were met, suggesting that the mean differences reported in the CG-CAHPS Adult Visit Survey 1.0 between these two groups were likely to be due to differences in care experiences. Asian Americans reported the worst scores on access to care, office staff courtesy and helpfulness, rating of their doctors and were the least likely to recommend their doctors to family and friends of the all five racial/ethnic groups. Given support for measurement invariance in our analyses, we did not explore differential item functioning further [23,24]. Our study makes several important contributions to the literature. Earlier studies analyzed Asian Americans and Pacific Islanders together; a sufficiently large sample of Asian Americans in our dataset allowed us to analyze Asian Americans separately from Pacific Islanders. Our findings also show that regional variations in patient experiences among Asian Americans exist. Underlying reasons for regional variations among Asian Americans in CAHPS surveys are little studied and require further research.
Racial-ethnic disparities can be driven by differential access or selection into plans or providers of differing quality. However, several studies report that "within provider" differences account for the significant share of disparities between Asian Americans and non-Hispanic Whites. In our study, we control for the "between providers" effects by including in our model provider identifications.
While several studies have evaluated the factor structure of CAHPS surveys, only Dyer et al. [12] have evaluated the CG-CAHPS Adult Visit survey 1.0 and this was done using factor analysis that assumed continuous items and within only a subset of the sample. We conducted MG CFA using three estimation approaches: theta parametrization and WLSMV estimation (used for categorical variables), MLR and ML estimation. Our analysis also found that chi square values may not be directly comparable across models in MG CFA analyses when MLR estimation is used with categorical data. Further research will be needed to explore whether this is supported in other datasets.
Our study has several limitations. Although the sample analyzed represented respondents with various racial/ ethnic and demographic backgrounds, the respondents were predominantly White and educated. In addition, response rates were not available in the dataset. CAHPS adult surveys in samples other than Medicare tend to yield response rates below 40%. If non-responders differ in how they interpret and respond to survey questions, the study results may not generalize to patients in care more generally. Moreover, the dataset did not include information about insurance and that could explain some of the observed differences. Furthermore, we did not have information about Asian subgroups, English In conclusion, the validity of racial/ethnic comparisons of reports of patient experiences is critical to informing quality improvement initiatives and policy decisions. The findings of this study support the use of the CG-CAHPS Adult Visit Survey 1.0 measures in comparisons of patient perceptions of care across Asians and Whites in the US. The differences in reports and ratings of care reported between Asians and Whites in the CG-CAHPS Adult Visit Survey 1.0 data are likely due to differences in care experiences. Future studies are required to explore the underlying reasons for the racial-ethnic differences in physician communication, access to care and office staff support. These studies should aim to inform care providers and payers about underlying reasons for differences and help to tailor quality improvement initiatives to address racial-ethnic disparities in care. Factor means fixed to 0 in both groups and factor variances fixed to 1 in one group and freed in the second group b Factor means fixed to 0 and factor variances fixed to 1 in both groups