LSTM in numpy

Let's calculate LSTMCell in numpy manually.

Posted by Chris Hoyean Song on October 10, 2017

Reference

1. Understanding LSTM Networks - Chris Olah’s blog

All the images and the equations are from Chris Olah’s blog. I appreciate his crystal-clear articles.

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

2. Tensorflow example code reference:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/kernel_tests/rnn_cell_test.py

LSTM in numpy

The purpose of this article is to understand the internal calculations of Basic LSTMCell.

png

LSTMCell In Tensorflow

To compare the tensorflow results and manual computation, run the tensorflow session with LayerNormBasicLSTMCell.

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import variable_scope
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import variables
from tensorflow.contrib.rnn.python.ops import rnn_cell

res = []

with tf.Session() as sess:
  with variable_scope.variable_scope(
      "other", initializer=init_ops.constant_initializer(0.5)) as vs:
    x = array_ops.zeros(
      [1, 3])  # Test BasicLSTMCell with input_size != num_units.
    c = array_ops.zeros([1, 2])
    h = array_ops.zeros([1, 2])
    state = (c, h)
    cell = rnn_cell.LayerNormBasicLSTMCell(2, layer_norm=False)
    g, out_m = cell(x, state)
    sess.run([variables.global_variables_initializer()])
    res = sess.run([g, out_m], {
      x.name: np.array([[1., 1., 1.]]),
      c.name: 0.1 * np.asarray([[0, 1]]),
      h.name: 0.1 * np.asarray([[2, 3]]),
    })

    print(res[1].c)
    print(res[1].h)

    expected_h = np.array([[ 0.64121795, 0.68166804]])
    expected_c = np.array([[ 0.88477188, 0.98103917]])
[[ 0.64121795, 0.68166804]]
[[ 0.88477188, 0.98103917]]

In numpy manually

Now, I’m going to calculate the LSTM result manually only using numpy.

import numpy as np
x = np.array([[1., 1., 1.]])
c = 0.1 * np.asarray([[0, 1]])
h = 0.1 * np.asarray([[2, 3]])
num_units = 2
args = np.concatenate((x,h), axis=1)
print(args)

[[ 1.   1.   1.   0.2  0.3]]
out_size = 4 * num_units
proj_size = args.shape[-1]
print(out_size)
print(proj_size)
8
5
weights = np.ones([proj_size, out_size]) * 0.5
print(weights)
[[ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
 [ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
 [ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
 [ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
 [ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]]
out = np.matmul(args, weights)
print(out)
[[ 1.75  1.75  1.75  1.75  1.75  1.75  1.75  1.75]]
bias = np.ones([out_size]) * 0.5
print(bias)
[ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
concat = out + bias
print(concat)
[[ 2.25  2.25  2.25  2.25  2.25  2.25  2.25  2.25]]
i, j, f, o = np.split(concat, 4, 1)
print(i)
print(j)
print(f)
print(o)
[[ 2.25  2.25]]
[[ 2.25  2.25]]
[[ 2.25  2.25]]
[[ 2.25  2.25]]
g = np.tanh(j)
print(g)
[[ 0.97802611  0.97802611]]

png

Step1 - Calculate forget gate

def sigmoid_array(x):
    return 1 / (1 + np.exp(-x))
forget_bias = 1.0
sigmoid_f = sigmoid_array(f + forget_bias)
print(sigmoid_f)
[[ 0.96267311  0.96267311]]

png

Step2 - Calculate C

sigmoid_array(i) * g
array([[ 0.88477185,  0.88477185]])
new_c = c * sigmoid_f + sigmoid_array(i) * g
print(new_c)
[[ 0.88477185  0.98103916]]

png

Step3 - Calculate h

new_h = np.tanh(new_c) * sigmoid_array(o)
print(new_h)
[[ 0.64121796  0.68166811]]

We can see that the manual computation result is same with the one by tensorflow.

print(new_h)
print(new_c)
[[ 0.64121796  0.68166811]]
[[ 0.88477185  0.98103916]]
print(res[1].h)
print(res[1].c)
[[ 0.64121795  0.68166804]]
[[ 0.88477188  0.98103917]]
np.testing.assert_almost_equal(res[1].h, np.array(new_h, dtype=np.float32))
np.testing.assert_almost_equal(res[1].c, np.array(new_c, dtype=np.float32))

Here is the full source code on my gist.