Back to Models
z-ai/glm-4-6v
Not Available

GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

12/8/2025
131,072 tokens
Specifications

Modalities

Input
image
text
video
Output
text

Supported Parameters

frequency_penalty
include_reasoning
max_tokens
min_p
presence_penalty
reasoning
repetition_penalty
response_format
seed
stop
structured_outputs
temperature
tool_choice
tools
top_k
top_p

Max Output Tokens

131,072