I was suggested that my questions were too broad. As I commented below, I have nearly a million data points and perhaps a hundred variables. This may be a very basic modeling question: I am curious to know how to start a GAM with a large dataset. I have tried the 'bam' function with a much smaller dataset, and it didn't work as what I expected. I do have access to supercomputers, but it still seems unpractical to tune a GAM with this big dataset. I was suggested to pick 8 to 10 variables and fit a GAM. Still, it is slow to run a GAM with the complete dataset. So my guess is that I need to reduce the number of variables and sample size to fit a GAM.
My original questions: I have 61 bioclimatic variables that explain different or similar aspects of insect life cycles and some of them are highly correlated. My study extent covers the North American continent and the spatial resolution is 10 km. The temporal resolution is yearly and temporal range is 20 years. This means that my dataset is huge for GAMs. I have built models using GLMs instead for prediction purpose. However, the models are complicated (e.g., 777265*263) and not easy to interpret. So I am trying to use GAMs to build small models that only include fewer variables and some percentage of samples for interpretation purpose. I followed some questions on the package 'mgcv' and found that most of the examples are using a very small number of variables. Does that mean I need to handpick the variables? I used the 'gam.selection' function with a smaller dataset (828*54), and I can see some variables are not significant in a smoothed term. I also used the 'concurvity' function to examine potential multicollinearity. Now I need some suggestions on variable selection: What are the appropriate number of variables for an explanatory GAM? Do I select the variables based on my knowledge, the 'gam.selection' results where a significant nonlinearity is detected, and the 'concurvity' results? Or what would be the most efficient way in this variable selection process? I appreciate your thoughts and timely help.