Random forest handle it rather well, see Would a Random Forest with multiple outputs be possible/practical? or scikit learn's documentation. I guess GBM or any tree based method can be adapted in a similar fashion.
More generally, when you run any learning algorithm minimizing a score, you usually work on minimizing $\sum_i(p_i-y_i)^2$ which is one-dimensional. But you can specify any target function. If you were working on (two-dimensional) position prediction, $\sum_i(\hat{y}_i-y_i)^2+(\hat{x}_i-x_i)^2$ would be a good metric.
If you have mixed type output (classification and regression) then specifying the target function will probably require you to specify a target function that gives more weight to some targets than other: which scaling do you apply to continuous responses ? Which loss do you apply to miss-classifications?
As for further academic reading,
SVM Structured Learning's Wikipedia
Simultaneously Leveraging Output and Task
Structures for Multiple-Output Regression
The Landmark Selection Method for Multiple Output Prediction
(deals with high dimensional dependent variables)