A Multilingual & Multimodal Benchmark for GitHub Issue Resolution
Method
Model
%Resolved
Org
Site
Date
📝 Notes
OmniGIRL is a multilingual & multimodal GitHub-issue-resolution benchmark with 959 tasks spanning four programming languages. Inputs may include text, screenshots, rendered web pages and other modalities.
For realistic evaluation, we recommend that methods automatically examine each task’s raw input to detect available modalities (e.g., embedded webpages, images), retrieve the relevant content by themselves, and invoke the appropriate tools— instead of relying on manual hints. Doing so better assesses a solver’s general-purpose issue-resolution ability in real-world scenarios.
Our baseline system is released for research purposes only; please cite OmniGIRL if you use it.
📨 How to Submit
Prepare a .json or .jsonl file. Each record must contain at least the keys instance_id, model_name_or_path, and model_patch.