I have 400 responses to a 20 item questionnaire which purports to measure an attitudinal constuct in medical students. The instrument was validated in the US for a single year of medical students and the published data is very "clean"- all ritc values >0.3,alpha 0.84, PCA with a stable four factor structure etc. In my sample I have found 5 of 20 items to have ritc<0.2 and in a cultural subpopulation ( n=70) these ritc values are zero/negative. If I retain all itmes, those with poor ritc either do not load on any factor or sort into a 2-item factor toegther ( factor 4). I hyporthesize that (& would like to investigate) this is due to either (i) a small cultural subpopulation for which the construt may be poorly captured, or (ii) beacuse I have responses from students across all stages of a programme and there is a developmental aspect to the construct poorly captured by the scale items. Is there a statistical test which will allow me to investigate this?
Should items with ritc be deleted from the scale and if so do I do this sequentially starting with the lowest and at what point should I stop deleting items/ have I lost something from the questionnaire? If I want to compare the scale's factor structure between the major and minor subpopulations, how do I attempt this or is the minor subsample too small to draw conclusions? Any references would be greatly appreciated.
Finally, the purpose of validating the scale is to use it to determine effectiveness of an intervention using a pre & post intervention score - if an item has a low ritc, I presume it may impact on the reliability of the scale in an experimental setting, or am I incorrect? Is there any statistical way to determine the utility of a scale designed to measure constructs which have a developmental aspect- ie do all items function appropriately as the student develops "more" of the attitudinal construct?