With some slightly different notation you can make the problem easier to interpret. It seems like you are looking to maximize the correlation between
$$\begin{array}{rclcrcl}
P_j &=& \sum_{i=1}^n P_{ij} V_i & \quad \text{and} \quad & T_j &=& \sum_{i=1}^m T_{ij} W_i \end{array}$$
The solution to maximize the correlation is the same as when the columns of $P_{ij}$ and $T_{ij}$ are translated such that they have zero mean, so let's assume that the columns have zero means.
The problem is similar to the question Fitting of a sum of vectors $Y \alpha = X \beta + \epsilon$ The difference is in the dimension of $\alpha$ and $\beta$, or your $V_i$ and $W_i$. In the other question, these dimensions are the same, but the principle remains the same.
The answer to that question argues that $W_i$ is equal to the eigenvector with the largest eigenvalue of the following matrix
$$\begin{array}{rcl}
M &=& (T^tT)^{-1}(QT)^t(QT) \\
\end{array}$$
with $Q$ the projection matrix
$$\begin{array}{rcl}
Q &=& P(P^tP)^{-1}P^t
\end{array}$$
The correlation of the solution $\hat{P}_j,\hat{T}_j$ will be related to the eigenvalue $\lambda$
$$\text{cor}(\hat{P}_j,\hat{T}_j) = \sqrt{\lambda}$$
Code example
Below we demonstrate the algorithm with some code. A comparison is made with a numerical optimization
### create exams let's have
### 3 theoretical exams
### 6 practical exams
### 20 students
### the data are correlated normal distributed variables
set.seed(1)
ns = 20
nt = 3
np = 6
rho = 0.5
data = t(mvrnorm(nt+np,mu = rep(0,ns), Sigma = diag(rep(1-rho,ns))+rho ))
P = data[,1:np]
T = data[,(np+1):(np+nt)]
vector_solution = function(P,T) {
### subtract the means (without loss of generality)
Pm = P - rep(1,length(P[,1])) %*% t(colMeans(P))
Tm = T - rep(1,length(T[,1])) %*% t(colMeans(T))
### compute matrices
Q = Pm %*% solve(t(Pm) %*% Pm) %*% t(Pm) ### projection matrix
M = solve(t(Tm) %*% Tm) %*% t(Q %*% Tm) %*% (Q %*% Tm)
### solve with eigenvectors
eig = eigen(M)
eVec = eig$vectors
eVal = eig$values
### the first eigenvalue is the highest and the solution
W = eVec[,1]
corr = eVal[1]^0.5
Tsol = T %*% W
V = as.numeric(lm(Tsol ~ 0 + Pm)$coef) ### find V by OLS
return(list(W = W, V = V, correlation = corr))
}
### optimization of correlation
f<-function(par,P,T){
n = length(P[1,])
m = length(T[1,])
v = par[1:n]
w = par[(n+1):(n+m)]
### correlation
rho = cor((P %*% v),(T %*% w))
### extra term in the optimization
### to ensure that a single solution is possible
### (multiples of the vectors have the same correlation)
d = abs(sum(v^2)-1)^2
return(-rho+d)
}
numerical_solution = function(P,T) {
n = length(P[1,])
m = length(T[1,])
mod <- optim(par = rep(1, n+,),
fn = f,
method = "BFGS",control = list(maxit=1*10^5,trace=0,reltol=10^-16),
P = P, T = T)
v = mod$par[1:n]
w = mod$par[(n+1):(n+m)]
rho = cor((P %*% v),(T %*% w))
return(list(W = w, V = v, correlation = rho))
}
Output
The output below shows that the method with the eigenvectors gives the same result as a numerical optimization. The difference is in the scale of the solution for $W_i$ and $V_i$. You can multiply these with an arbitrary constant and you get the same result. The correlation $\text{cor}(aT,bP)$ is the same as the correlation $\text{cor}(T,P)$.
> numerical_solution(P,T)
$W
[1] -3.621469 5.383181 -2.667191
$V
[1] 0.2700293 -0.2930244 0.4962644 0.3950038 0.2474477
[6] 0.6145594
$correlation
[,1]
[1,] 0.7103213
> vector_solution(P,T)
$W
[1] 0.5162597 -0.7674021 0.3802235
$V
[1] -0.09827352 0.17602598 -0.29498806 -0.41464957
[5] -0.23167924 -0.43388866
$correlation
[1] 0.7103213
> vector_solution(P,T)$W/numerical_solution(P,T)$W
[1] -0.1425553 -0.1425555 -0.1425558
> vector_solution(P,T)$V/numerical_solution(P,T)$V
[1] -0.7723890 -0.7723903 -0.7723889 -0.7723876 -0.7723897
[6] -0.7723882