← all stories other 1 sources · 19h ago

Zyphra Releases Zamba2-VL Vision-Language Models With Hybrid SSM-Transformer Architecture

Zamba2-VL offers a practical alternative to pure Transformer VLMs by trading architectural complexity for speed without sacrificing benchmark scores, and releasing all sizes openly lowers the barrier for developers who need fast image recognition on limited hardware.

Reporting from 1 sources: GIGAZINE.

Zyphra Releases Zamba2-VL Vision-Language Models With Hybrid SSM-Transformer Architecture

Zyphra released the Zamba2-VL family of vision-language models on June 11, 2026. The models use a hybrid SSM-Transformer architecture that combines Transformer with Mamba2, aiming for faster image recognition at quality comparable to similarly scaled Transformer models. Three variants are available: 1.2B, 2.7B, and 7B parameters, all open under Apache License 2.0.

Zyphra announced the Zamba2-VL family of vision-language models on June 11, built on a hybrid architecture the company calls SSM-Transformer. The design combines the dominant Transformer architecture with Mamba2, a state-space model introduced in 2024. Zyphra claims the hybrid achieves faster image recognition processing than Transformer-only models of similar scale while maintaining equivalent quality on benchmarks.

Three model sizes are released: Zamba2-VL-1.2B (2 billion parameters), Zamba2-VL-2.7B (2.7 billion), and Zamba2-VL-7B (7 billion). All three are open models distributed under the Apache License 2.0 and available for download via Hugging Face. The company published a graph showing time to first token against average benchmark score, positioning the Zamba2-VL series as competitive on both speed and accuracy relative to peers.

  • Zamba2-VL-1.2B: 2-billion-parameter variant
  • Zamba2-VL-2.7B: 2.7-billion-parameter variant
  • Zamba2-VL-7B: 7-billion-parameter variant

Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.

Sources