Zyphra Releases Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models
Zyphra has launched Zamba2-VL, a new family of open vision-language models (VLMs) available in 1.2B, 2.7B, and 7B parameters. These models are built on a hybrid State Space Model (SSM) and Transformer architecture, aiming to enhance accuracy while significantly reducing latency. Zamba2-VL shows particular strength in visual counting and document understanding, outperforming larger baselines in these specific tasks on Zyphra's internal benchmarks.
The hybrid architecture allows Zamba2-VL to cut time-to-first-token latency by an order of magnitude compared to traditional Transformer-based VLMs, making it more efficient for certain applications without significant accuracy loss.
Learn one new AI thing every day.
Daily Deck sends you seven plain-English cards like this every morning. Free.
Start free