Advancements in display technology have greatly enhanced our viewing experience, making it more intense and enjoyable. Watching content in 4K 60FPS is much more satisfying than the older standard of 1080P 30FPS. However, not everyone can access this high-quality content due to the high data requirements. 4K 60FPS videos can cost up to 6 times more in terms of data compared to 1080P 30FPS videos, making it inaccessible for many users.
To address this issue, researchers have developed methods to increase the resolution and/or framerate of delivered videos. One such method is video frame interpolation, which adds new frames to a video sequence by estimating the motion between existing frames. This technique has been widely used in various applications, including slow-motion videos, frame rate conversion, and video compression, resulting in visually pleasing videos.
However, measuring the quality of video frame interpolation results has long been a challenge. Existing methods often use standard metrics that may not align with human perception due to the unique artifacts present in interpolation results. Some methods resort to subjective tests, which can be time-consuming. So, how can we accurately measure the quality of video interpolation?
A group of researchers has developed a dedicated perceptual quality metric for video frame interpolation results. They designed a novel neural network architecture based on the Swin Transformers for video perceptual quality assessment. The network takes a pair of frames as input: one from the original video sequence and one interpolated frame. It then outputs a score that represents the perceptual similarity between the two frames. The researchers built a large dataset containing pairs of frames from various videos, along with human judgments of their perceptual similarity, to train the network. The network is trained using a combination of L1 and SSIM objective metrics, which measure the absolute difference and structural similarity between two images, respectively. This approach allows the network to predict scores that are accurate and consistent with human perception. Notably, this method does not require reference frames, making it suitable for client devices that typically lack such information.
To learn more about this research, you can check out the paper. Don’t forget to join our ML SubReddit, Discord Channel, and Email Newsletter to stay updated on the latest AI research news and cool AI projects. If you have any questions or feedback, feel free to email us.
Finally, if you’re interested in exploring more AI tools, check out AI Tools Club for a wide range of options.
[Photo Credit: MarkTechPost | Ekrem Çetinkaya]