Here is what Lasagne does, it should answer your two questions:
class Orthogonal(Initializer):
"""Intialize weights as Orthogonal matrix.
Orthogonal matrix initialization [1]_. For n-dimensional shapes where
n > 2, the n-1 trailing axes are flattened. For convolutional layers, this
corresponds to the fan-in, so this makes the initialization usable for
both dense and convolutional layers.
Parameters
----------
gain : float or 'relu'
Scaling factor for the weights. Set this to ``1.0`` for linear and
sigmoid units, to 'relu' or ``sqrt(2)`` for rectified linear units, and
to ``sqrt(2/(1+alpha**2))`` for leaky rectified linear units with
leakiness ``alpha``. Other transfer functions may need different
factors.
References
----------
.. [1] Saxe, Andrew M., James L. McClelland, and Surya Ganguli.
"Exact solutions to the nonlinear dynamics of learning in deep
linear neural networks." arXiv preprint arXiv:1312.6120 (2013).
"""
def __init__(self, gain=1.0):
if gain == 'relu':
gain = np.sqrt(2)
self.gain = gain
def sample(self, shape):
if len(shape) < 2:
raise RuntimeError("Only shapes of length 2 or more are "
"supported.")
flat_shape = (shape[0], np.prod(shape[1:]))
a = get_rng().normal(0.0, 1.0, flat_shape)
u, _, v = np.linalg.svd(a, full_matrices=False)
# pick the one with the correct shape
q = u if u.shape == flat_shape else v
q = q.reshape(shape)
return floatX(self.gain * q)
This RNN tutorial does the same thing (minus the gain):
# orthogonal initialization for weights
# see Saxe et al. ICLR'14
def ortho_weight(ndim):
W = numpy.random.randn(ndim, ndim)
u, s, v = numpy.linalg.svd(W)
return u.astype('float32')
So I assume it's correct (I hope so since this is the code I use).
I probably should have mentioned but Iam planning to use it with python/tensorflow if possible.
In TensorFlow:
def orthogonal_initializer(scale = 1.1):
''' From Lasagne and Keras. Reference: Saxe et al., http://arxiv.org/abs/1312.6120
'''
print('Warning -- You have opted to use the orthogonal_initializer function')
def _initializer(shape, dtype=tf.float32):
flat_shape = (shape[0], np.prod(shape[1:]))
a = np.random.normal(0.0, 1.0, flat_shape)
u, _, v = np.linalg.svd(a, full_matrices=False)
# pick the one with the correct shape
q = u if u.shape == flat_shape else v
q = q.reshape(shape) #this needs to be corrected to float32
print('you have initialized one orthogonal matrix.')
return tf.constant(scale * q[:shape[0], :shape[1]], dtype=tf.float32)
return _initializer