x_t, r_0, terminal = game_state.frame_step(do_nothing) # 首先将图像转换为80*80,然后进行灰度化 x_t = cv2.cvtColor(cv2.resize(x_t, (80, 80)), cv2.COLOR_BGR2GRAY) # 对灰度图像二值化 ret, x_t = cv2.threshold(x_t, 1, 255, cv2.THRESH_BINARY) # 四通道输入图像 s_t = np.stack((x_t, x_t, x_t, x_t), axis=2)
4. DQN训练过程这是代码部分要讲的重点,也是上述Q-learning算法的代码化。
i. 在进入训练之前,首先创建一些变量: # define the cost function a = tf.placeholder("float", [None, ACTIONS]) y = tf.placeholder("float", [None]) readout_action = tf.reduce_sum(tf.multiply(readout, a), axis=1) cost = tf.reduce_mean(tf.square(y - readout_action)) train_step = tf.train.AdamOptimizer(1e-6).minimize(cost) # open up a game state to communicate with emulator game_state = game.GameState() # store the previous observations in replay memory D = deque()在TensorFlow中,通常有三种读取数据的方式:Feeding、Reading from files和Preloaded data。Feeding是最常用也最有效的方法。即在模型(Graph)构建之前,先使用placeholder进行占位,但此时并没有训练数据,训练是通过feed_dict传入数据。
这里的a表示输出的动作,即强化学习模型中的Action,y表示标签值,readout_action表示模型输出与a相乘后,在一维求和,损失函数对标签值与输出值的差进行平方,train_step表示对损失函数进行Adam优化。
赋值的过程为:
# perform gradient step train_step.run(feed_dict={ y: y_batch, a: a_batch, s: s_j_batch} )
ii. 创建游戏及经验池 D # open up a game state to communicate with emulator game_state = game.GameState() # store the previous observations in replay memory D = deque()经验池 D采用了队列的数据结构,是TensorFlow中最基础的数据结构,可以通过dequeue()和enqueue([y])方法进行取出和压入数据。经验池 D用来存储实验过程中的数据,后面的训练过程会从中随机取出一定量的batch进行训练。
变量创建完成之后,需要调用TensorFlow系统方法tf.global_variables_initializer()添加一个操作实现变量初始化。运行时机是在模型构建完成,Session建立之初。比如:
# Create two variables. weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="weights") biases = tf.Variable(tf.zeros([200]), name="biases") ... # Add an op to initialize the variables. init_op = tf.global_variables_initializer() # Later, when launching the model with tf.Session() as sess: # Run the init operation. sess.run(init_op) ... # Use the model ...
iii. 参数保存及加载采用TensorFlow训练模型,需要将训练得到的参数进行保存,不然一关机,就一夜回到解放前了。TensorFlow采用Saver来保存。一般在Session()建立之前,通过tf.train.Saver()获取Saver实例。
saver = tf.train.Saver()变量的恢复使用saver的restore方法:
# Create some variables. v1 = tf.Variable(..., name="v1") v2 = tf.Variable(..., name="v2") ... # Add ops to save and restore all the variables. saver = tf.train.Saver() tf.Session() as sess: # Restore variables from disk. saver.restore(sess, "/tmp/model.ckpt") print("Model restored.") # Do some work with the model ...