DeepMind one shot learning 论文批注 One-Shot Generalization in D(2)_H5之家

Each step:

1. Generate an independent set of K-dimensionallatent variables Zt (Stochasitc随机产生)

2. 函数Fh联系了相邻潜变量的依赖关系（类似LSTM）Deterministic

Fh transition function : LSTM network

3. hidden canvas 隐画布：输入：LSTMfcallows for many different transformations, and it is here where generative(writing) attention is used.生成了(写)注意力

4. Condition使用observationfunction fo(c; θo)计算

All parameters of this generative model asθ = {θh, θc, θo}.

3.2.2. Free Energy Objective

Objective function for inference andparameter learning

Optimize this objective function for the variationalparameters φ and the modelparameters θ, by stochastic gradientdescent using a mini-batch of data.

As with other VAEs, we use a single sampleof the latent variables generated from qφ(z|x)when computing the Monte Carlogradient.

当计算蒙特卡洛梯度时，使用单个从qφ(z|x)分布生成的浅变量。

3.2.3. HIDDEN CANVAS FUNCTIONS

Canvas transition function fc(ct1,ht;θc)更新hiddencanvas状态：

使用非线性变换fw转换当前隐状态ht，然后和已存在的canvas Ct-1融合。

Hidden canvas：隐画布，与原始图像拥有同样，多个通道。

更新hidden canvas的两种方法

1. Additive Canvas

在原画布上添加hidden state的转换fw(Ct-1,ht; θc)

2. Gated Recurrent Canvas

使用Convolutional gatedrecurrent unit(CGRU)卷积门循环单元，提供非线性递归更新机制，类似于convolutional LSTMSs

Functionfw(ht; θw) is a writing function that is used by the canvas function to transformthe LSTM hidden state into the coordinate system of the hidden canvas.

LSTM隐层状态——>隐画布的坐标系。

这个映射可以使全部/部分连接，本文使用writing or generative attentionfunction

Final phase of the generative processtransforms

Hidden canvas CT—fo(c; θo)—>似然函数的参数

output function fo(c; θo) ： 1*1卷积实现，当隐画布hidden canvas有不同尺寸时，使用CNN.

Transform the LSTM hidden state into the coordinatesystem of the hidden canvas.

3.2.4. DEPENDENT POSTERIOR INFERENCE 依赖后验推断

使用拥有自回归形式结构化的后验近似,i.e.q(zt|z<t,x).

Inference network实现这个分布。

Each step：

1. 使用非线性变换fr生成一个关于输入图像和隐状态t-1的低维表示rt。

Reading function（与writing attention function配对）。

Reading function：Input image to be transformed into a new coordinate space that allows for easierinference computations.

Be implemented as a fully- orlocally-connected network,

Better inferenceis obtained using a reading or recognitionattention.

进一步使用非线性函数将Fr的结果与先前状态ht-1融合生成K维对角高斯分布的均值μ和方差σ。