Blog

/../assets/images/pages/heatmap.png
Long Context Transfer from Language to Vision

Our paper explores the long context transfer phenomenon and validates this property on both image and video benchmarks. We propose the Long Video Assistant (LongVA) model, which can process up to 2000 frames or over 2000K visual tokens without additional complexities.

6 min read
/../assets/images/pages/lmms-eval-video.png
Embracing Video Evaluations with LMMs-Eval

We introduce a video evaluation feature to lmms-eval, supporting video model evaluations with over most popular datasets.

10 min read
/../assets/images/pages/lmms-eval.png
Accelerating the Development of Large Multimodal Models with LMMs-Eval

One command evaluation API for fast and thorough evaluation of LMMs, providing multi-faceted insights on model performance with over 40 datasets.

Tags: Lmms-Eval

5 min read