To help answering your questions let me quote a nice explanation of the training process (taken from here):
Sample N cases at random with replacement to create a subset of the data. The subset should be about 66% of the total set.
At each node:
For some number m (see below), m predictor variables are selected at random from all the predictor variables
The predictor variable that provides the best split, according to some objective function, is used to do a binary split on that node.
At the next node, choose another m variables at random from all predictor variables and do the same.
Depending upon the value of m, there are three slightly different systems:
- Random splitter selection: m =1
- Breiman’s bagger: m = total number of predictor variables
- Random forest: m << number of predictor variables. Breiman suggests three possible values for m: ½√m, √m, and 2√m
So going back to your questions:
1. The correlation won't really matter because, depending on the chosen system, the algorithm will look at either one variable at a time or choose the best 'splitter' from the subset.
2. Not sure what you mean by performance here... If you mean the 'speed' of the algorithm then you can figure out the performance affect by looking at the process described above (how and how many variables you choose for each node).
Now if by performance you meant the accuracy of the model then, in general, the more the predictors and the more independent they are the better, also because correlating them 'artificially' may lead to better results if initially the results are not satisfactory.