← all stories other 1 sources · Jun 11 · June 11, 2026

Zyphra Releases Zamba2-VL Vision-Language Models With Hybrid SSM-Transformer Architecture

Zamba2-VL offers a practical alternative to pure Transformer VLMs by trading architectural complexity for speed without sacrificing benchmark scores, and releasing all sizes openly lowers the barrier for developers who need fast image recognition on limited hardware.

Key Facts

Zyphra released the Zamba2-VL family of vision-language models on June 11, 2026.
The models use a hybrid SSM-Transformer architecture that combines Transformer with Mamba2.
Three variants are available: 1.2B, 2.7B, and 7B parameters.
All three models are open under Apache License 2.0 and available on Hugging Face.
Zyphra claims the hybrid architecture achieves faster image recognition than Transformer-only models of similar scale.

Reporting from 1 source: GIGAZINE.

Zyphra Releases Zamba2-VL Vision-Language Models With Hybrid SSM-Transformer Architecture

Zyphra released the Zamba2-VL family of vision-language models on June 11, 2026. The models use a hybrid SSM-Transformer architecture that combines Transformer with Mamba2, aiming for faster image recognition at quality comparable to similarly scaled Transformer models. Three variants are available: 1.2B, 2.7B, and 7B parameters, all open under Apache License 2.0.

Zyphra announced the Zamba2-VL family of vision-language models on June 11, built on a hybrid architecture the company calls SSM-Transformer. The design combines the dominant Transformer architecture with Mamba2, a state-space model introduced in 2024. Zyphra claims the hybrid achieves faster image recognition processing than Transformer-only models of similar scale while maintaining equivalent quality on benchmarks.

Three model sizes are released: Zamba2-VL-1.2B (2 billion parameters), Zamba2-VL-2.7B (2.7 billion), and Zamba2-VL-7B (7 billion). All three are open models distributed under the Apache License 2.0 and available for download via Hugging Face. The company published a graph showing time to first token against average benchmark score, positioning the Zamba2-VL series as competitive on both speed and accuracy relative to peers.

Zamba2-VL-1.2B: 2-billion-parameter variant
Zamba2-VL-2.7B: 2.7-billion-parameter variant
Zamba2-VL-7B: 7-billion-parameter variant

Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.

Sources

GIGAZINE 高速かつ高精度な視覚言語モデル「Zamba2-VL」が登場、Transformerより高速なアーキテクチャで開発

Key Facts

Zyphra released the Zamba2-VL family of vision-language models on June 11, 2026.
The models use a hybrid SSM-Transformer architecture that combines Transformer with Mamba2.
Three variants are available: 1.2B, 2.7B, and 7B parameters.
All three models are open under Apache License 2.0 and available on Hugging Face.
Zyphra claims the hybrid architecture achieves faster image recognition than Transformer-only models of similar scale.

Zyphra Releases Zamba2-VL Vision-Language Models With Hybrid SSM-Transformer Architecture

Key Facts

More on this

Sources