9th International Conference on Image Processing Theory, Tools and Applications (IPTA), İstanbul, Turkey, 6 - 09 November 2019
In this paper, we address the problem of learning to summarize personal photo albums. That is, given a photo album, we aim to select a small set of representative images from the album so that the extracted summary captures most of the story that is being told through the images. More specifically, we extend a recently proposed recurrent neural network based framework by employing a more effective way to represent images and, more importantly, adding a diversity term to the main objective. Our diversity term is based on the idea of jointly training a discriminator network to evaluate the diversity of the selected images. This alleviates the issue of selecting near-duplicate or semantically similar images, which is the primary shortcoming of the base approach. The experimental results show that our improved model produces better or comparable summaries, providing a good balance between quality and diversity.