1

I have open source data for K-12 educational performance. It is for North Carolina, at the individual school and subject level. It is conformant with FERPA regs, leading to my problem. It reports among other measures pct_glp (% grade level proficient). According to FERPA, this must be "masked," that is, if =<5% it is reported as "<5" and if =>95% it is reported as ">95". So the data for this item is mostly numeric except for some instances of those encodings. These are what are properly called censored data. Is there a best practice for dealing with this in analysis? Can I use a stand-in like 2.5% for <5% and 97.5% for >95%??? I cannot just leave out school/subject rows because that would in essence conflate bad performance (<5%) with excellent performance (>95%).

user1320487
  • 111
  • 4
  • 1
    See this question on coarse/censored data: http://stats.stackexchange.com/questions/202348/statistical-methods-for-data-where-only-a-minimum-maximum-value-is-known Using 2.5% or 97.5% may be an okay approximation and that kind of thing used to be a standard approach for e.g. blood biomarker levels below a level of detection, but nowadays censored data methods are more frequently used. – Björn Mar 24 '16 at 06:41
  • 1
    That sort of thing is known as [censoring](https://en.wikipedia.org/wiki/Censoring_%28statistics%29) -- the large and small values are said to be censored rather than missing. I have fixed your tags but you should probably also amend your question and title. – Glen_b Mar 26 '16 at 06:19

0 Answers0