I am conducting a logistic regression in order to predict the service point win percentage of a tennis player.
In terms of data - I have (for each player A) approx 300 matches - for each match I have the total number of player A service points (points where he is the server), total number of player A service point wins and total number of player A service point losses.
To do so, I have service point win percentage as the DV, and my independent variables are:
+Average service win percentage of last 3 matches
+ln(player's ranking points)
+ln(opposition's ranking points)
+surface the match was played on
My dependent variable data, service win percentage, lies usually in the range of 0.4-0.8, there are pretty much no values greater that 0.8 (about 2.8% of values and this drops to < 1% at around 0.84) and there exists no values less than 0.22. In addition my data is much more concentrated above 0.5 than it is below 0.5.
Thus, I worry that since my data doesn't have points close to zero or 1, and is not symmetrical around 0.5 (like the sigmoidal curve of logistic regression) that I am wasting my time with this model type. The results it is giving for my preliminary model outlined above are, although not shocking, pretty volatile.
I am conducting this in R and using the weights
command to allow me use a proportion in the DV, giving the total number of trials as the weights. I use ln(points) because ranking points are exponential in nature.
The goal is to predict / forecast the service point win percentage of the player based on the IV's. Considering my data distribution, and my goal, does logistic regression make sense? If not is there any other type of model that makes more sense?