2

I would like to expand the original question posted here First steps learning to predict financial timeseries using machine learning

with another question of my own:

Let's fit an overly basic RF algorithm in R following the example:

getSymbols("GOOG")

fit <- RF(lag(GOOG.Close,1), GOOG.Close, data=GOOG[1:(NROW(GOOG)-20)])

prediction <- predict(fit,GOOG[(NROW(GOOG)-19):NROW(GOOG)])

sig = ifelse(prediction >0, 1,-1)

ret = diff(log(GOOG$GOOG.Close))

pnl = ret * sig

I'm not lagging the signal sig because I already lagged the explanatory variable.

Is that correct or I should lag the signal again such as:

sig1 = lag(sig,1)

pnl = ret * sig1

What is the correct procedure ?

Thank you!

StatArb
  • 75
  • 1
  • 5

1 Answers1

1

For these problems it helps to think about how you plan to score your model in its real application.

For predicting stock price, your input will be all the close prices up to and including yesterday's (or for your example: just yesterday's). You will try to predict today. Your measure of success will be the prediction made today versus today's true close price.

Fitting today's price as a function of yesterday's price means lag.

Comparing today's predicted price to today's actual price means no lag.

Dex Groves
  • 1,593
  • 8
  • 12
  • Hi Dex Groves, when you say "Your measure of success will be the prediction made today versus today's true close price.", you mean : your measure of success will be the prediction made yesterday for today's price versus today's true close price ? I'm stressing this out because I'm making my prediction yesterday for today's price. – StatArb Sep 26 '16 at 02:33
  • You know the prior day's close price when you make your prediction right? Then it's the same logic: you make a decision with the most up to date record, one record into the future. If you have to predict two records out, you should lag your variable by two. – Dex Groves Sep 26 '16 at 02:37