GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the reasoningenabled boolean. Learn more in our docs
Recent activity on GLM 4.5
Total usage per day on OpenRouter
Prompt
332M
Completion
21.8M
Reasoning
11M
Prompt tokens measure input size. Reasoning tokens show internal thinking before a response. Completion tokens reflect total output length.