Orthogonal initialization of weight matrix

Question

Searching for the way to initialize the matrix weights as orthogonal (i.e. W*W^T = I ~~and all the eigenvalues are equal either 1 or -1),~~(I was wrong) I found this post with a code from Lasagne:

https://stats.stackexchange.com/questions/228704

However, this code does not produce orthogonal, but unitary matrix with complex eigenvalues, which still satisfies equality W*W^T = I quite well:

import numpy as np


np.random.seed(1000)


def floatX(arr):
    """Converts data to a numpy array of dtype ``theano.config.floatX``.
    Parameters
    ----------
    arr : array_like
        The data to be converted.
    Returns
    -------
    numpy ndarray
        The input array in the ``floatX`` dtype configured for Theano.
        If `arr` is an ndarray of correct dtype, it is returned as is.
    """
    return np.asarray(arr, dtype=np.float32)

def sample(shape,gain=1.0):

        flat_shape = (shape[0], np.prod(shape[1:]))
        a = np.random.normal(0, 1, flat_shape).astype(np.float32)  #get_rng().normal(0.0, 1.0, flat_shape)
        u, _, v = np.linalg.svd(a, full_matrices=False)
        # pick the one with the correct shape
        q = u if u.shape == flat_shape else v
        q = q.reshape(shape)
        res = floatX(gain * q)

        #eigenvalues:
        eigenvalues, _ = np.linalg.eig(u)

        print(str(np.min(eigenvalues))+" "+ str(eigenvalues[1024])+ " "+ str(np.max(eigenvalues))  )

        return res 


W = sample([2048, 2048])


print(np.linalg.norm(W.dot(np.transpose(W))-np.eye(2048)))

The output is

(-1+0j) (0.376566+0.92639j) (1+0j)

for some of the eigenvalues and

1.00117376019e-05

for the proximity of the W*W^T and I.

1) Is this a bug in Lasagne, as numpy.linalg.svd indeed produces unitary matrix as it is stated in its manual, and

2) How can I initialize a (non-square) orthogonal matrix for myself in Python, except the unit one?

Thank you!

EDIT: Thanks everyone! From all the explanations I've managed to create a simple Python solution:

import numpy as np

## Using a subset of vectors
np.random.seed(1000)

p = 4096
A = np.random.normal(0,1,(p,p))

q, _ = np.linalg.qr(A)  # q -orthonormal

print(q.shape) #(4096,4096)

q = q[:,::2] #cut q

print(q.shape) #(4096,2048) m>n ==> q^T * q = I_n

print(np.linalg.norm(np.transpose(q).dot(q)-np.eye(2048))) # 4.26e-14

(2) is answered at https://stats.stackexchange.com/questions/215497/how-to-create-an-arbitrary-covariance-matrix/215647#215647. The positive/negative determinant issue is addressed at https://stats.stackexchange.com/questions/143903/how-to-generate-uniformly-random-orthogonal-matrices-of-positive-determinant/143970#143970 — whuber, Nov 15 '17 at 18:43
Incidentally, very few orthogonal matrices have just $\pm 1$ as eigenvalues. Most have complex eigenvalues of unit norm. The simplest example is the rotation matrix $$\pmatrix{\cos(t)&\sin(t)\\-\sin(t)&\cos(t)}.$$ This is all explained in the post you reference, which you appear to misinterpret--so please study it more. — whuber, Nov 15 '17 at 19:44
@whuber Oh, I am so sorry. Somehow I also had this wrong fact in mind and just took the question from math.stackexchange as reference without checking the answers. Please excuse me for confusion. — Slowpoke, Nov 15 '17 at 19:49
regarding your second question, you can generate squared matrix and just remove a few columns. All vectors still will be perfectly orthogonal — itdxer, Nov 15 '17 at 20:11
@whuber Thanks very much! I've tried to replicate your code in Python: R's procedure `qr` is equivalent to `np.linalg.qr` which produces a square orthonormal matrix `q` with high precision. I want to add that in my neural network I wish to experiment with non-square weight matrices as well, so I would be very glad to know ways to generate good quality semi-orthogonal ones. — Slowpoke, Nov 15 '17 at 20:12
regarding first question, method is valid and the reason why you have norm `1.00117376019e-05` is because you use 32 bit float numbers. If you use 64 bit instead, you norm would be even smaller. In my machine I got `2.0298795662912376e-13` — itdxer, Nov 15 '17 at 20:15
If by "semi-orthogonal" you mean with mutually orthogonal columns (or rows), then the SVD will do that. An extended analysis of its properties appears at https://stats.stackexchange.com/questions/139047. — whuber, Nov 15 '17 at 20:15
btw, why do you need orthogonal weights? usually randoms are fine to start with — Aksakal, Nov 15 '17 at 20:34
Thanks @whuber ! I've managed to create the code based on your point and explanations from the Aksakal's answer ! — Slowpoke, Nov 15 '17 at 20:34
@Aksakal I'm experimenting with special activation function ([OPLU](https://arxiv.org/abs/1604.02313)) which preserves orthogonality. — Slowpoke, Nov 15 '17 at 20:40

Aksakal · Accepted Answer · 2017-11-15T20:12:15.270

How can I initialize a (non-square) orthogonal matrix for myself in Python, except the unit one?

One way to make the orthogonal matrix is to get the subset of eigenvectors of any positive definite matrix.

Here's Python code:

import numpy as np
import math
from scipy.linalg import toeplitz
from numpy import linalg as LA

# create the positive definite matrix P
p = 4
A = np.random.rand(p,p)
P = (A + np.transpose(A))/2 + p*np.eye(p)

# get the subset of its eigenvectors
vals, vecs = LA.eig(P)
w = vecs[:,0:p-1]

#check that it's orthogonal 
print("matrix shape: ",w.shape)
print("check orthogonality: ",np.matmul(np.transpose(w),w))

matrix shape:  (4, 3)
check orthogonality:  [[  1.00000000e+00   3.88578059e-16  -6.21031004e-16]
 [  3.88578059e-16   1.00000000e+00   5.55111512e-16]
 [ -6.21031004e-16   5.55111512e-16   1.00000000e+00]]

UPDATE

This matrix does not (and cannot) have complex elements, of course. Here's the proof:

Python Code:

import numpy as np
import math
from scipy.linalg import toeplitz
from numpy import linalg as LA

# create the positive definite matrix P
p = 400
A = np.random.rand(p,p)
P = (A + np.transpose(A))/2 + p*np.eye(p)

# get the subset of its eigenvectors
vals, vecs = LA.eig(P)
w = vecs[:,0:p]

#check that it's orthogonal 
print("matrix shape: ",w.shape)
print("check orthogonality: ",np.matmul(np.transpose(w),w))
print("has complex elements: ",np.max(np.iscomplex(w)))

Output:

matrix shape:  (400, 400)
check orthogonality:  [[  1.00000000e+00   1.33573708e-16   1.67509236e-16 ...,  -5.20417043e-18
    7.50267903e-17  -8.84708973e-17]
 [  1.33573708e-16   1.00000000e+00  -7.39425882e-15 ...,  -1.08940634e-15
    1.03172679e-15  -6.50521303e-16]
 [  1.67509236e-16  -7.39425882e-15   1.00000000e+00 ...,   3.15318518e-15
    1.58553726e-15   1.27155231e-15]
 ..., 
 [ -5.20417043e-18  -1.08940634e-15   3.15318518e-15 ...,   1.00000000e+00
    4.93181884e-15  -1.13573213e-13]
 [  7.50267903e-17   1.03172679e-15   1.58553726e-15 ...,   4.93181884e-15
    1.00000000e+00  -9.91155075e-13]
 [ -8.84708973e-17  -6.50521303e-16   1.27155231e-15 ...,  -1.13573213e-13
   -9.91155075e-13   1.00000000e+00]]
has complex elements:  False

Hmm, I took a bigger `p=40` and took a square one to begin with (`w = vecs[:,0:p]`). Then I printed eigenvalues in the same way: least, bottom and the biggest - the result I've got is `(-1+0j) (0.987257065263-0.159133551111j) (1+0j) `. Shouldn't orthogonal matrix have only real eigenvalues all equal `-1, 1` ? Or I'm doing something wrong? — Slowpoke, Nov 15 '17 at 19:34
with `p=40` square matrix `w` has eigenvalues like `0.60580531-0.79561292j 0.46181022+0.88697876j 0.46181022-0.88697876j` which cannot be considered as floating point errors. — Slowpoke, Nov 15 '17 at 19:42
Please excuse me for confusion, somehow I was mistaken about the eigenvalues of an orthogonal matrix and was pretty sure about it. — Slowpoke, Nov 15 '17 at 19:52
@Slowpoke, Orthogonal matrix can have eigenvalues as complex number. Simple example matrix: `[[0, 1], [-1, 0]]`, its eigenvalues are `[ 0.+1.j, 0.-1.j]`. Matrix 2x2 makes 90 degree rotation to any vector with real values and there is no vector with real values that mutlipling by this matrix give itself. It only exists in space with complex numbers — itdxer, Nov 15 '17 at 20:03
@Slowpoke, I'm not what's going on with your test, but this should not have complex elements by construction. I put the test code in updated answer — Aksakal, Nov 15 '17 at 20:12
No, everything is fine - I was talking only about eigenvalues, not matrix elements. My test was for square w: `eigenvalues, _ = np.linalg.eig(w)` and then `print(str(np.min(eigenvalues))+" "+ str(eigenvalues[2])+ " "+ str(np.max(eigenvalues)) )` — Slowpoke, Nov 15 '17 at 20:18
@Slowpoke, Ok. My matrix P should not have complex eigenvalues by construction. — Aksakal, Nov 15 '17 at 20:32

Orthogonal initialization of weight matrix

1 Answers1

UPDATE