Permutation based variable importance (VI) can be implemented for any bagged ensemble. However, the neat thing about an RF model is you get the out-of-bag error estimate and VI for almost no extra computation cost, as it is already bagged.
You can compute VI for any ensemble, where you care to bootstrap the model 50 times and aggregate, that includes gradient boosting models. If needed, I could write you a short VImyModel wrapper for R, [here].
You should look into what is the structure of your trained models. The model surfaces of non-linear non-additive models cannot be described with coefficients like for their GLM hyper plane counter parts. These answers should update you on some generel black box method (partial dependence) and some tree specific methods.
If the simple linear model or single tree by some measure seem just as useful as the RF or GBT, why use the advanced models in the first place? If the simplified model perform much worse, why settle with a good description of what model NOT to pick. Therefore, I prefer never to use a stand in dummy model to explain the more advanced model. Spend your time learning the techniques for understanding the advanced models.
To inspect your model structure is important. Sometimes you will realize your advanced model is exploiting some data leak to make accurate predictions. However, such model may not be useful in practice. Maybe the model is using patient IDs to predict cancer. However, if patient IDs are assigned after diagnosis according to where the patient is hospitalized, a model predicting patients in palliative hospices most likely to have cancer is not useful in any way. Sometimes you learn the model utilize some relationship you had not thought of yourself, and it seems there are some plausible causal explanations for this link. Hurray! Go investigate further.