Some pointers:
1)
Regarding your data set: You say that you model the returns, i.e. the changes in index data. Why do you want to do this with an ARMA model? An ARMA model might be needed, if you decided to model the actual index value - and even then you don't account for heteroscedasticity that is likely to be present in your data.
Stock/Index returns typically exhibit volatility clustering/heteroscedasticity, heavy tails/high kurtosis and leverage effect (i.e. negative shocks have higher influence on volatility than positive ones). To account for this, I would rather suggest to use an EGARCH, GJR-GARCH or T-GARCH model (all contained in the "rugarch" package in R) with student's t-distributed errors. If you want to model the actual index data, then you can simultaneously use an ARMA model in the "ugarchspec" specification. If you model the returns, I don't think that this will be necessary.
An alternative to said models would be the Beta-t-EGARCH model ("tegarch" package in R) that accounts specifically for the heavy tails.
2)
Regarding the test: Did you choose the lag order according to your sample size? An appropriate lag order (according to Tsay(2005) ) would be about $log(n)$.
Also, some suggest that the ljung-box test is inappropriate for autoregressive time series. It might be sensible to use another, like the Breusch-Godfrey test.
Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey