β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK


269
views
1
8 months ago by
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., … Deepmind, G. (n.d.). β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK.

Learning an interpretable factorised representation of the independent data gen-erative factors of the world without supervision is an important precursor for the development of artificial intelligence that is able to learn and reason in the same way that humans do. We introduce β-VAE, a new state-of-the-art framework for automated discovery of interpretable factorised latent representations from raw image data in a completely unsupervised manner. Our approach is a modification of the variational autoencoder (VAE) framework. We introduce an adjustable hy-perparameter β that balances latent channel capacity and independence constraints with reconstruction accuracy. We demonstrate that β-VAE with appropriately tuned β > 1 qualitatively outperforms VAE (β = 1), as well as state of the art unsu-pervised (InfoGAN) and semi-supervised (DC-IGN) approaches to disentangled factor learning on a variety of datasets (celebA, faces and chairs). Furthermore, we devise a protocol to quantitatively compare the degree of disentanglement learnt by different models, and show that our approach also significantly outperforms all baselines quantitatively. Unlike InfoGAN, β-VAE is stable to train, makes few assumptions about the data and relies on tuning a single hyperparameter β, which can be directly optimised through a hyperparameter search using weakly labelled data or through heuristic visual inspection for purely unsupervised data.
Community: paperstack

1 Answer


0
8 months ago by
Motivation
Discover disentangled latent representations in a purely unsupervised manner
Related work
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. Retrieved from http://arxiv.org/abs/1606.03657
Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B. (2015). Deep Convolutional Inverse Graphics Network. Retrieved from http://arxiv.org/abs/1503.03167
Shortcomings of related work
require a priori knowledge about the number and nature of generative factors
DC-IGN: semi supervised
InfoGAN: training instability, reduce sample diversity, sensitive to choice of prior distribution
VAE: limited disentanglement performance
Contributions
extension of the Variational Auto Encoder
single hyperparameter $$\beta$$
reconstruction fidelity vs quality of disentanglement
controls capacity of the latent information channel and independence pressure
$$\beta = 1$$ is the VAE
$$\beta > 1$$ higher pressure on disentanglement
Shortcomings
$$\beta$$ still needs to be optimized using weakly labeled data or visual inspection
Application to Reinforcement Learning
Higgins, I., Pal, A., Rusu, A., Matthey, L., Burgess, C., Pritzel, A., … Lerchner, A. (n.d.). DARLA: Improving Zero-Shot Transfer in Reinforcement Learning.
I think that this explanation is missing a lot of information. What is a disentangled latent representation? Why do we care about learning such a representation? Why is this learning being done in a purely unsupervised manner? I also think that the criticisms of DC-IGN and InfoGAN and the explanation of the paper's key contributions should be more fleshed out.
written 8 months ago by Rylan Schaeffer  
Please login to add an answer/comment or follow this question.

Similar posts:
Search »