Zyphra Releases Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models
Zyphra has launched Zamba2-VL, a family of open vision-language models (VLMs) available in 1.2B, 2.7B, and 7B parameter sizes. These models utilize a hybrid Mamba2 state-space model and Transformer backbone, which significantly cuts time-to-first-token latency by about an order of magnitude compared to traditional Transformer-based VLMs. Zamba2-VL demonstrates strong performance in visual counting and document understanding, while showing areas for improvement in knowledge-heavy reasoning tasks.
Zamba2-VL offers a new architecture for vision-language models that prioritizes lower latency and competitive accuracy, making it a promising option for applications requiring fast processing of both visual and textual information, especially in edge and mid-range deployments.
Learn one new AI thing every day.
Daily Deck sends you seven plain-English cards like this every morning. Free.
Start free