GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the reasoningenabled boolean. Learn more in our docs
Recent activity on GLM 4.5V
Total usage per day on OpenRouter
Prompt
6.64M
Reasoning
644K
Completion
397K
Prompt tokens measure input size. Reasoning tokens show internal thinking before a response. Completion tokens reflect total output length.