Disentangled Representation Learning in Real-World Image Datasets via Image Segmentation Prior
Disentangled Representation Learning in Real-World Image Datasets via Image Segmentation Prior
Blog Article
We propose a novel method that can learn easy-to-interpret latent representations in real-world image datasets using a VAE-based model by splitting an image into several disjoint regions.Our method performs object-wise disentanglement by exploiting image segmentation and alpha compositing.With remarkable results obtained by unsupervised disentanglement methods for Full Bookcase Storage Bed toy datasets, recent studies have tackled challenging disentanglement for real-world image datasets.However, these methods involve deviations from the standard VAE architecture, which has favorable disentanglement properties.Thus, for disentanglement in images of real-world image datasets with preservation of the VAE backbone, we designed an encoder and a decoder that embed an image into disjoint sets of latent variables corresponding to objects.
The encoder includes a pre-trained image segmentation network, which allows our model to focus only on representation learning while adopting image segmentation as an inductive bias.Evaluations using real-world image datasets, CelebA and Stanford Cars, showed that our method achieves Base Layers improved disentanglement and transferability.