I am running LOESS regression models in R, and I want to compare the outputs of 12 different models with varying sample sizes. I can describe the actual models in more details if it helps with answering the question.
Here are the sample sizes:
Fastballs vs RHH 2008-09: 2002
Fastballs vs LHH 2008-09: 2209
Fastballs vs RHH 2010: 527
Fastballs vs LHH 2010: 449
Changeups vs RHH 2008-09: 365
Changeups vs LHH 2008-09: 824
Changeups vs RHH 2010: 201
Changeups vs LHH 2010: 330
Curveballs vs RHH 2008-09: 488
Curveballs vs LHH 2008-09: 483
Curveballs vs RHH 2010: 213
Curveballs vs LHH 2010: 162
The LOESS regression model is a surface fit, where the X location and the Y location of each baseball pitch is used to predict sw, swinging strike probability. However, I'd like to compare between all 12 of these models, but setting the same span (i.e. span = 0.5) will bear different results since there is such a wide range of sample sizes.
My basic question is how do you determine the span of your model? A higher span smooths out the fit more, while a lower span captures more trends but introduces statistical noise if there is too little data. I use a higher span for smaller sample sizes and a lower span for larger sample sizes.
What should I do? What's a good rule of thumb when setting span for LOESS regression models in R? Thanks in advance!