程序员带你一步步分析AI如何玩FlappyBird - yhthu(5)_H5之家

x_t, r_0, terminal = game_state.frame_step(do_nothing) # 首先将图像转换为80*80，然后进行灰度化 x_t = cv2.cvtColor(cv2.resize(x_t, (80, 80)), cv2.COLOR_BGR2GRAY) # 对灰度图像二值化 ret, x_t = cv2.threshold(x_t, 1, 255, cv2.THRESH_BINARY) # 四通道输入图像 s_t = np.stack((x_t, x_t, x_t, x_t), axis=2)

4. DQN训练过程

这是代码部分要讲的重点，也是上述Q-learning算法的代码化。

i. 在进入训练之前，首先创建一些变量： # define the cost function a = tf.placeholder("float", [None, ACTIONS]) y = tf.placeholder("float", [None]) readout_action = tf.reduce_sum(tf.multiply(readout, a), axis=1) cost = tf.reduce_mean(tf.square(y - readout_action)) train_step = tf.train.AdamOptimizer(1e-6).minimize(cost) # open up a game state to communicate with emulator game_state = game.GameState() # store the previous observations in replay memory D = deque()

在TensorFlow中，通常有三种读取数据的方式：Feeding、Reading from files和Preloaded data。Feeding是最常用也最有效的方法。即在模型（Graph）构建之前，先使用placeholder进行占位，但此时并没有训练数据，训练是通过feed_dict传入数据。

这里的a表示输出的动作，即强化学习模型中的Action，y表示标签值，readout_action表示模型输出与a相乘后，在一维求和，损失函数对标签值与输出值的差进行平方，train_step表示对损失函数进行Adam优化。

赋值的过程为：

# perform gradient step train_step.run(feed_dict={ y: y_batch, a: a_batch, s: s_j_batch} )

ii. 创建游戏及经验池 D # open up a game state to communicate with emulator game_state = game.GameState() # store the previous observations in replay memory D = deque()

经验池 D采用了队列的数据结构，是TensorFlow中最基础的数据结构，可以通过dequeue()和enqueue([y])方法进行取出和压入数据。经验池 D用来存储实验过程中的数据，后面的训练过程会从中随机取出一定量的batch进行训练。

变量创建完成之后，需要调用TensorFlow系统方法tf.global_variables_initializer()添加一个操作实现变量初始化。运行时机是在模型构建完成，Session建立之初。比如：

# Create two variables. weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="weights") biases = tf.Variable(tf.zeros([200]), name="biases") ... # Add an op to initialize the variables. init_op = tf.global_variables_initializer() # Later, when launching the model with tf.Session() as sess: # Run the init operation. sess.run(init_op) ... # Use the model ...

iii. 参数保存及加载

采用TensorFlow训练模型，需要将训练得到的参数进行保存，不然一关机，就一夜回到解放前了。TensorFlow采用Saver来保存。一般在Session()建立之前，通过tf.train.Saver()获取Saver实例。

saver = tf.train.Saver()

变量的恢复使用saver的restore方法：

# Create some variables. v1 = tf.Variable(..., name="v1") v2 = tf.Variable(..., name="v2") ... # Add ops to save and restore all the variables. saver = tf.train.Saver() tf.Session() as sess: # Restore variables from disk. saver.restore(sess, "/tmp/model.ckpt") print("Model restored.") # Do some work with the model ...