LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

1Renmin University of China, 2Ant Group

TL;DR: We introduce LLaDA-V, a competitive diffusion-based vision-language model, outperforming other diffusion MLLMs.

LLaDA_vs_LLaMA
LLaDA_vs_LLaMA_chat