vision Models
Vision-language models can process both text and images, enabling tasks like visual question answering, image captioning, document analysis, and visual reasoning.
3 open-source models available
Filters
3 models
Qwen2-VL 7B
7BAlibaba's compact vision-language model with strong image and video understanding.
qwenopen-sourcemultimodal
2.3MApache 2.0
LLaMA 3.2 11B Vision
11BMeta's multimodal model with vision capabilities. 11B parameters.
llamaopen-sourcemultimodal
2.3MLlama 3.2 Community License
LLaVA 1.6 34B
34BLarge vision-language model combining LLaMA with visual understanding.
llavaopen-sourcemultimodal
1.2MApache 2.0