Zero-Shot Temporal Super-Resolution

In recent years video has become one of the most important content platforms; from YouTube and Instagram to sport events and surveillance cameras. Unlike single images, a video has a third sampling dimension – time or FPS. This heavily impacts the perceived quality of the video and the user experience, as well as the video’s usefulness.
As it is useful to enhance spatial resolution in order to “gain” finer spatial details such as textures, it is also useful to enhance the temporal resolution, to “gain” finer temporal details, such as smoother motion.
Thus, it is natural to expand works on image super resolution to video. While there are works that use deep learning to enhance natural videos’ temporal resolution, these works usually involve very deep networks that are trained on a vast collection of natural videos, and as a result perform well on videos similar enough to the ones they were trained on after an immensely costly training process.
In this paper we expand on the work of Zero-Shot Super Resolution (ZSSR, Shocher et al., 2018) [1] to video, using a small, light-weight and compact network that is trained for a given video. Our results are comparable with SOTA while requiring only consumer level hardware and relatively short training time.

Zero-Shot Temporal Super-Resolution
Zero-Shot Temporal Super-Resolution