Can Language Models Laugh at YouTube Short-form Videos?

Published in EMNLP 2023, 2023

A video humor explanation benchmark via a multimodal-filtering pipeline to evaluate LLMs’ understanding of complex multimodal tasks like humor. We generate several frame captions and filter them based on video segments to enhance LLMs with vision capabilities.

Recommended citation: Dayoon Ko, Sangho Lee, Gunhee Kim. (2023). "Can Language Models Laugh at YouTube Short-form Videos?" EMNLP 2023.
Download Paper