The Arabic DASH was subjected to the requirements of the Rasch measurement model in the current study. Initially, the Arabic DASH did not satisfy the requirements of the Rasch measurement model with several areas of deviations including misfitting items, dysfunctional response categories, significant violation of local item independence, and lack of unidimensionality. After the accommodation of the local dependency by creating 2 super items reflecting upper extremity activity limitation and impairment, the Arabic DASH satisfied the requirement of the Rasch measurement model indicating that the scale if fairly unidimensional measure of upper extremity activity limitation and impairment.
Item 21 (sexual activities) was the only item that had missing responses (no response). In the Arabic DASH, item 21 (sexual activities) was the only item that was labeled as an optional item which might explain why only this item had missing responses. Given that item 21 cover sensitive issue (sexual activities), cultural factors might have also contributed to some missing responses to this item. The optional nature of the item and the high missing rate led to the removal of the item 21 (sexual activities) before conducting Rasch analysis in the current study.
After removing misfitting patients, Items 30 (confidence), 26 (tingling), and 29 (sleep) were misfitting based on their fit residuals and chi-square statistics. This indicate that the behavior of these items did not follow the expectation of the Rasch measurement model. The non-conformity of these items to the Rasch measurement model suggests that these items do not belong to the major underlying trait measured by the majority of the scale items, upper extremity activity limitation. The violation of local item independence observed in the current study might have driven these items to deviate from the measurement model [29, 33]. Although these items were individually misfitting, the activity limitation and impairment testlets that were created to accommodate the local dependency showed good fit to the Rasch measurement model allowing the retention of these items. In patients with rheumatoid arthritis, Prodinger et al. reported good fit of the same 2 testlets (activity limitation and impairment testlets) used in our study providing support to the findings of the current study . Studies that examined the fit of the DASH to the Rasch measurement model reported a number of misfitting items ranging from 1 to 16 items [15,16,17,18,19, 35,36,37,38,39,40]. All of the misfitting items in our study have been reported in the literature to deviate from the Rasch measurement model in different upper extremity musculoskeletal populations including patients with elbow disorders, shoulder disorders, hand disorders, Dupuytren’s contracture, humeral shaft fracture, and in patients with various upper extremity disorders [15,16,17,18,19, 35,36,37]. Item 26 (tingling) was the most consistently reported misfitting item across these studies followed by item 30 (confidence). Items 26 (tingling), and 30 (confidence) has also been reported to misfit the Rasch model in patients neurological disorders affecting upper extremity function [38, 39].
Proper functioning of the scale response categories manifest as ordered coverage of the underlying continuum by the response categories where each response category becomes the most probable option in part of the continuum (Fig. 1). Under this ordered coverage, the response option “no difficulty” should be the most probable option for individuals with high level of upper extremity function followed by “mild difficulty”, “moderate difficulty”, “severe difficulty”, then the response option “unable” with decreasing levels of upper extremity function. Improper functioning of the response categories and disordered thresholds were detected in 11 items in the Arabic DASH (Table 4). This indicates that response categories in these items were not used in the expected manner (either because wordings of response categories were not clear, or patients were not able to discriminate between them). Another potential reason for the disordered thresholds in the Arabic DASH is the significant violation of local item independence which could cause items’ thresholds to be disordered . All items with disordered thresholds in the Arabic DASH showed violation of local item independence as indicated by the high residual correlations. Prior studies that examined the internal structure of the DASH also pointed to disordered thresholds suggesting improper functioning of the scale response categories [15,16,17,18, 35, 37,38,39]. The number of items with disordered thresholds reported in the literature ranged from 2 items in patients with hand and elbow disorders [15, 35] to 19 items in patients with various upper extremity disorders .
The Arabic DASH has major violation of local item independence indicating response dependency and multidimensionality [28, 33]. The Arabic DASH suffered from local dependency between 37 item pairs. This indicates that these items have something in common other than the measured underlying trait that is upper extremity function violating the requirement of local item independence [20, 21, 27, 29]. This violation caused the scale to deviate from the Rasch measurement model. The accommodation of local dependency within the scale by creating testlets (activity limitation and impairment testlets) led to satisfactory fit of the Arabic DASH to the Rasch measurement model supporting the validity of the scale. Examining the item pairs with residual correlations above the predetermined threshold indicates that most of these items inquire about similar functional activities or symptoms. Items 18 (recreational: force) and 19 (recreational: free arm) had the highest bivariate residual correlation after the removal of the underlying trait “upper extremity function”. Both items inquire about the difficulty encountered in recreational activities while taking some force through the arm (item 18) and in free arm movement (item 19). The similar content of the 2 items and possible redundancy might explain the high residual correlation observed. Items with the second highest residual correlation were items 10 (carry shopping bag) and 11(carry heavy object). Both items are related to the same functional activity that is “carrying” and response to one item is likely to be linked or dependent on the response of the other item. For example if a patient was able to carry a heavy bag over 4.5 kg (item 11) then that patient would be able to carry a shopping bag (item 10). This problem of response dependency detected in the Arabic DASH is of concern given that it artificially inflates reliability and influence person estimates [28, 29].
Similar to the residual correlation observed between activity-related items, impairment-related items also exhibited high residual correlation. This pattern of residual correlation might be an indicator of multidimensionality where these items represent an impairment-related dimension. Additionally, similar content might also explain some of the residual correlations among impairment-related items. Items 24 (pain), 25 (pain during activity), and 29 (sleep) for example inquire about pain severity and pain-related sleep difficulty. This enquiry about the same impairment might constitute the shared concept that lead to the high residual correlation even after the removal of the major underlying factor. Consistent with our findings, DASH has been reported to violate the requirement of local item independence. A pattern of high residual correlation similar to ours where activity-related item group together while impairment-related items group together has been reported in patients with various upper extremity musculoskeletal disorders [15, 16, 18, 34, 36, 37]. Similar to the approach used in the current study, Prodinger et al. reported the use of two testlets (activity limitation and impairment testlets) to accommodate the issue of local dependency within the scale and this method yielded satisfactory fit to the Rasch model in line with the findings of the current study .
Rasch measurement model is a unidimensional model where the probability of being able to perform an activity is only governed by a single factor that is the person ability (level of upper extremity function possessed by the patient). We believe that the major breach of local item independence was the main reason explaining why the Arabic DASH did not satisfy the requirement of unidimensionality initially. After the accommodation of the local dependency by the creation of activity limitation and impairment testlets, The Arabic DASH satisfied the requirement of unidimensionality as suggested by the principal component analysis of residuals followed by the t-test [29, 33]. Although items were grouped into two groups similar to having two subscales, the majority of the common non-error variance was retained by the two testlets suggesting that the scale has one general factor . These results support the unidimensionality of the Arabic DASH and supports the validity of providing one single summary score for the Arabic DASH. Similar to the findings of the current study, Prodinger et al. reported that the DASH was sufficiently unidimensional in patients with rheumatoid arthritis after addressing the issue of local dependency between items using testlets . The accommodation of local dependency through the use of testlets also led to unidimensionality of the Finnish DASH in patients with hand and wrist disorders. Lack of unidimensionality of the DASH was reported in the literature in patients with hand disorders , Dupuyteren’s contracture , various upper extremity musculoskeletal disorders [18, 19], shoulder disorders  and also in patient with stroke . Number of these studies did not examine for violation local item independence [17, 19, 40]. On the other hand, the studies that examined for this violation used high threshold for detecting local dependency thus underestimated the degree of dependency and also did not examine the effect of the accommodation of dependency using testlets on scale unidimensionality [16, 18, 36].
The Arabic DASH items seems to have no uniform or non-uniform DIF by age, sex, surgical status, or affected side. This suggests that the scale items behave in a similar manner regardless of the patient characteristics and that items were not biased to any of the levels of these characteristics (for example bias toward males versus females). The lack of uniform and non-uniform DIF was also observed in the current study at the level of testlets suggesting that the testlets also are invariant to patients’ characteristics. These measurement invariance results of the Arabic DASH at the item level and at the testlet level should be interpreted with caution giving the limited number of participants in each subgroup  and a follow-up study might be need to confirm our findings. To the best of our knowledge, the current study is the first study which suggested that the DASH is invariant to the treatment received (surgically or non-surgical) and whether the affected side was the dominant or the non-dominant side. Similar to the findings of the current study, the activity limitation and impairment testlets were reported to have no DIF by age in patients with rheumatoid arthritis and the reported DIF by sex in the testlets was considered trivial and required no modifications . DASH individual items has also been reported previously to have no DIF by sex [36, 37] and age . On the contrary, number of studies reported DIF by sex [15,16,17, 35, 41] and age [15, 36, 37, 41] in the DASH items but the results of these studies were inconsistent regarding the number of items, and the specific items exhibiting DIF.
This study represents the first attempt to examine the structural validity (internal construct validity) of the Arabic DASH. Rasch measurement model pointed to areas of dysfunction in the behavior of the Arabic DASH items mainly local dependency between items that could not have been determined using classical test theory methods. The accommodation of the local dependency between items improved the internal structure of the Arabic DASH resulting in an interval-level unidimensional measure of upper extremity function. The results of this study would help in guiding future modifications of the scale aiming to improve its validity. Although the sample size used in the current is adequate for conducting Rasch analysis , it is at the lower end of what is considered adequate sample size thus further testing of the measure might be needed using larger number of participants to confirm the findings of the current study. Additionally, when the whole sample was split into subsamples for DIF analysis (e.g. male and female), the number of participants within each group was below the recommended number for examining scale invariance . Thus a caution should be practiced when interpreting the findings of the DIF analysis reported in the current study. The majority of participants in the current study have shoulder and arm disorders then wrist and hand disorders with few participants who had elbow and forearm disorders, thus the results of the current should be interpreted with caution especially for patients with elbow and forearm disorders.