I have data that I want to run an ANOVA on, but I need to test it for normality. Do I test the whole dataset for normality or each subset of the data associated with a unique group level? For example I have fish counts for a species at different sites and during different seasons and years. Would I test the entire dataset for normality or would I test the fish counts associated with each season individually? The whole dataset has 47 sites, 2 seasons (dry/wet), and 15 years.
> head(df)
site season year species_name num
1 1 dry 2019 Sailfin molly 11
2 2 dry 2019 Sailfin molly 7
3 3 dry 2019 Sailfin molly 9
4 4 dry 2019 Sailfin molly 7
5 5 dry 2019 Sailfin molly 12
6 1 wet 2019 Sailfin molly 0