Reference
1. Understanding LSTM Networks - Chris Olah’s blog
All the images and the equations are from Chris Olah’s blog. I appreciate his crystal-clear articles.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
2. Tensorflow example code reference:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/kernel_tests/rnn_cell_test.py
LSTM in numpy
The purpose of this article is to understand the internal calculations of Basic LSTMCell.
LSTMCell In Tensorflow
To compare the tensorflow results and manual computation, run the tensorflow session with LayerNormBasicLSTMCell.
import tensorflow as tf
import numpy as np
from tensorflow.python.ops import variable_scope
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import variables
from tensorflow.contrib.rnn.python.ops import rnn_cell
res = []
with tf.Session() as sess:
with variable_scope.variable_scope(
"other", initializer=init_ops.constant_initializer(0.5)) as vs:
x = array_ops.zeros(
[1, 3]) # Test BasicLSTMCell with input_size != num_units.
c = array_ops.zeros([1, 2])
h = array_ops.zeros([1, 2])
state = (c, h)
cell = rnn_cell.LayerNormBasicLSTMCell(2, layer_norm=False)
g, out_m = cell(x, state)
sess.run([variables.global_variables_initializer()])
res = sess.run([g, out_m], {
x.name: np.array([[1., 1., 1.]]),
c.name: 0.1 * np.asarray([[0, 1]]),
h.name: 0.1 * np.asarray([[2, 3]]),
})
print(res[1].c)
print(res[1].h)
expected_h = np.array([[ 0.64121795, 0.68166804]])
expected_c = np.array([[ 0.88477188, 0.98103917]])
[[ 0.64121795, 0.68166804]] [[ 0.88477188, 0.98103917]]
In numpy manually
Now, I’m going to calculate the LSTM result manually only using numpy.
import numpy as np
x = np.array([[1., 1., 1.]])
c = 0.1 * np.asarray([[0, 1]])
h = 0.1 * np.asarray([[2, 3]])
num_units = 2
args = np.concatenate((x,h), axis=1)
print(args)
[[ 1. 1. 1. 0.2 0.3]]
out_size = 4 * num_units
proj_size = args.shape[-1]
print(out_size)
print(proj_size)
8 5
weights = np.ones([proj_size, out_size]) * 0.5
print(weights)
[[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]]
out = np.matmul(args, weights)
print(out)
[[ 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75]]
bias = np.ones([out_size]) * 0.5
print(bias)
[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
concat = out + bias
print(concat)
[[ 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25]]
i, j, f, o = np.split(concat, 4, 1)
print(i)
print(j)
print(f)
print(o)
[[ 2.25 2.25]] [[ 2.25 2.25]] [[ 2.25 2.25]] [[ 2.25 2.25]]
g = np.tanh(j)
print(g)
[[ 0.97802611 0.97802611]]
Step1 - Calculate forget gate
def sigmoid_array(x):
return 1 / (1 + np.exp(-x))
forget_bias = 1.0
sigmoid_f = sigmoid_array(f + forget_bias)
print(sigmoid_f)
[[ 0.96267311 0.96267311]]
Step2 - Calculate C
sigmoid_array(i) * g
array([[ 0.88477185, 0.88477185]])
new_c = c * sigmoid_f + sigmoid_array(i) * g
print(new_c)
[[ 0.88477185 0.98103916]]
Step3 - Calculate h
new_h = np.tanh(new_c) * sigmoid_array(o)
print(new_h)
[[ 0.64121796 0.68166811]]
We can see that the manual computation result is same with the one by tensorflow.
print(new_h)
print(new_c)
[[ 0.64121796 0.68166811]] [[ 0.88477185 0.98103916]]
print(res[1].h)
print(res[1].c)
[[ 0.64121795 0.68166804]] [[ 0.88477188 0.98103917]]
np.testing.assert_almost_equal(res[1].h, np.array(new_h, dtype=np.float32))
np.testing.assert_almost_equal(res[1].c, np.array(new_c, dtype=np.float32))
Here is the full source code on my gist.