🏆 OmniGIRL Leaderboard 🏆

A Multilingual & Multimodal Benchmark for GitHub Issue Resolution

github paper arxiv hf
Method Model %Resolved Org Site Date

📝 Notes

  1. OmniGIRL is a multilingual & multimodal GitHub-issue-resolution benchmark with 959 tasks spanning four programming languages. Inputs may include text, screenshots, rendered web pages and other modalities.
  2. For realistic evaluation, we recommend that methods automatically examine each task’s raw input to detect available modalities (e.g., embedded webpages, images), retrieve the relevant content by themselves, and invoke the appropriate tools— instead of relying on manual hints. Doing so better assesses a solver’s general-purpose issue-resolution ability in real-world scenarios.
  3. Our baseline system is released for research purposes only; please cite OmniGIRL if you use it.

📨 How to Submit

  1. Prepare a .json or .jsonl file. Each record must contain at least the keys instance_id, model_name_or_path, and model_patch.
  2. Email the file to guolh8@mail2.sysu.edu.cn.
  3. We will evaluate your submission locally and update the leaderboard once the results are verified.

🤗 More Leaderboards

🙏 Acknowledgements

  1. We build on prior work — SWE-bench, Agentless, and AutoCodeRover — which laid the groundwork for this study.
  2. We thank the EvalPlus leaderboard team for releasing the elegant page template that inspired this site.
  3. Finally, we are grateful to the open-source developer community for their invaluable contributions.