Skip to content
#

evaluation-llms

Here are 2 public repositories matching this topic...

Language: All
Filter by language

CompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes.

  • Updated Aug 6, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the evaluation-llms topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-llms topic, visit your repo's landing page and select "manage topics."

Learn more