In Appendix F, the authors say "The generator and critic are residual networks; we use pre-activation residual blocks with two 3 × 3 convolutional layers each and ReLU nonlinearity."
Based on this, I'm thinking something like
def resblock(x, training=False):
y = conv1(x)
y = bn1(x, training=training)
y = relu(x)
y = conv2(x)
y = bn2(x, training=training)
return relu(y + matmul(W, x))
where W is some trainable weight matrix sized correctly to account for whatever padding is being used in the conv layers. I'm not really sure what's meant by "pre-activation residual blocks" though and generally, I'm not sure if I've got the implementation quite right.
This is also related to the paper arXiv:1802.05957