Skip to content

Tables & Resources

This page contains statistical tables and resources from our comprehensive survey on Issue Resolution in Software Engineering.


📊 Evaluation Datasets Overview

A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasetsbased on programming language, modality support, source repositories, data scale (Amount), and the availability ofreproducible execution environments.

Dataset Language Multimodal Repos Amount Environment Link
SWE-bench-train Python 37 19k GitHub HuggingFace
SWE-bench Python 12 2294 GitHub HuggingFace
SWE-bench Lite Python 12 300 GitHub HuggingFace
SWE-bench Verified Python / 500 GitHub HuggingFace
SWE-bench-java Java 19 1797 GitHub HuggingFace
SWE-bench Multimodal JS,TS,HTML,CSS 17 619 GitHub HuggingFace
SWE-bench-extra Python 2k 6.38k HuggingFace
Visual SWE-bench Python 11 133 GitHub HuggingFace
SWE-Lancer JS, TS / 1488 GitHub
Multi-SWE-bench Java, JS, TS, Go, Rust, C, C++ 76 4,723 GitHub HuggingFace
R2E-Gym Python 10 8,135 GitHub HuggingFace
SWE-PolyBench Python, Java, JS, TS 21 2110 GitHub HuggingFace HuggingFace
Loc-Bench Python / 560 GitHub HuggingFace
SWE-smith Python 128 50k GitHub HuggingFace
SWE-bench Multilingual C, C++, Go, Java, JS, TS, Rust, Python, Ruby, PHP 42 300 GitHub HuggingFace
SWE-Fixer Python 856 115406 GitHub HuggingFace HuggingFace
OmniGIRL Python, TS, Java, JS 15 959 GitHub HuggingFace
SWE-rebench Python 30,000 21,336 HuggingFace
SWE-bench-Live Python 93 1319 GitHub HuggingFace
SWE-Gym Python 11 2,438 GitHub HuggingFace
SWE-Flow Python 74 18081 GitHub
SWE-Factory Python, Java, JS, TS 12 430 GitHub HuggingFace
SWE-Bench-CL Python 8 273 GitHub
Skywork-SWE Python 2531 10169 /
SWE-MERA Python 200 300 GitHub HuggingFace
SWE-Perf Python 12 140 GitHub HuggingFace
RepoForge Python / 7.3k /
SWE-Mirror Python, Rust, Go 40 60k /
SWE-Bench Pro Go, TS, Python 41 1865 GitHub HuggingFace
SWE-InfraBench Python, TS / 100 /
SWE-Sharp-Bench C# 17 150 GitHub HuggingFace
SWE-fficiency Python, Cython 9 498 GitHub
SWE-Compass Python, JS, TS, Java, C, C++, Go, Rust, Kotlin, C# / 2000 GitHub HuggingFace
SWE-bench++ Python, Go, TS, JS, Ruby, PHP, Java, Rust, C++, C#, C 3,971 1,782 GitHub HuggingFace
SWE-EVO Python 7 48 GitHub

🎯 Training Trajectory Datasets

A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.

Dataset Language Repos Amount Link
R2E-Gym Python 10 3,321 GitHub HuggingFace
SWE-Gym Python 11 491 GitHub HuggingFace
SWE-Synth Python 11 3,018 GitHub HuggingFace
SWE-Fixer Python 856 69,752 GitHub HuggingFace
SWE-Factory Python 10 2,809 GitHub HuggingFace

🔧 Supervised Fine-Tuning (SFT) Models

Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).

Model Name Base Model Size Arch. Training Scaffold Res.(\%) Code Data Model
Devstral Mistral Small 3 22B Dense OpenHands 46.8 / Website HuggingFace

🤖 Reinforcement Learning (RL) Models

A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. \%) on issue resolution benchmarks.

Model Name Base Model Size Arch. Train. Scaffold Reward Res.(\%) Code Data Model
560B Models (MoE)
LongCat-Flash-Think LongCatFlash-Base 560B-A27B MoE R2E-Gym Outcome 60.4 GitHub / HuggingFace
72B Models
Kimi-Dev Qwen 2.5-72B-Base 72B Dense BugFixer + TestWriter Outcome 60.4 GitHub / HuggingFace
Multi-turn RL(Nebius) Qwen2.5-72B-Instruct 72B Dense SWE-agent Outcome 39.0 / / /
Agent-RLVR-RM-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 27.8 / / /
Agent-RLVR-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 22.4 / / /
70B Models
SWE-RL Llama-3.3-70B-Instruct 70B Dense Agentless-mini Outcome 41.0 GitHub / /
36B Models
FoldAgent Seed-OSS-36B-Instruct 36B Dense FoldAgent Process 58.0 GitHub Website /
32B Models
OpenHands Critic Qwen2.5-Coder-32B 32B Dense SWE-Gym / 66.4 GitHub / HuggingFace
KAT-Dev-32B Qwen3-32B 32B Dense / / 62.4 / / HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense / Outcome 60.2 GitHub HuggingFace HuggingFace
SeamlessFlow-32B Qwen3-32B 32B Dense SWE-agent Outcome 45.8 GitHub / /
DeepSWE Qwen3-32B 32B Dense R2E-Gym Outcome 42.2 GitHub HuggingFace HuggingFace
SA-SWE-32B / 32B Dense SkyRL-Agent / 39.4 / / /
OpenHands LM v0.1 Qwen2.5-Coder-32B 32B Dense SWE-Gym / 37.2 GitHub / HuggingFace
SWE-Dev-32B Qwen2.5-Coder-32B 32B Dense OpenHands Outcome 36.6 GitHub / HuggingFace
Satori-SWE Qwen2.5-Coder-32B 32B Dense Retriever + Code editor Outcome 35.8 GitHub HuggingFace HuggingFace
SoRFT-32B Qwen2.5-Coder-32B 32B Dense Agentless Outcome 30.8 / / /
Agent-RLVR-32B Qwen2.5-Coder-32B 32B Dense Localization + Repair Outcome 21.6 / / /
14B Models
Agent-RLVR-14B Qwen2.5-Coder-14B 14B Dense Localization + Repair Outcome 18.0 / / /
SEAlign-14B Qwen2.5-Coder-14B 14B Dense OpenHands Process 17.7 / / /
9B Models
SWE-Dev-9B GLM-4-9B 9B Dense OpenHands Outcome 13.6 GitHub / HuggingFace
8B Models
SeamlessFlow-8B Qwen3-8B 8B Dense SWE-agent Outcome 27.4 GitHub / /
SWE-Dev-8B Llama-3.1-8B 8B Dense OpenHands Outcome 18.0 GitHub / HuggingFace
7B Models
SWE-Dev-7B Qwen2.5-Coder-7B 7B Dense OpenHands Outcome 23.4 GitHub / HuggingFace
SoRFT-7B Qwen2.5-Coder-7B 7B Dense Agentless Outcome 21.4 / / /
SEAlign-7B Qwen2.5-Coder-7B 7B Dense OpenHands Process 15.0 / / /

🌟 General Foundation Models

Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.

Model Name Size Arch. Inf. Scaffold Reward Res.(\%) Code Model
KAT-Coder / / Claude Code Outcome 73.4 / Website
Deepseek V3.2 671B-A37B MoE Claude Code, RooCode / 73.1 GitHub HuggingFace
Kimi-K2-Instruct 1T MoE Agentless Outcome 71.6 / HuggingFace
Qwen3-Coder 480B-A35B MoE OpenHands Outcome 69.6 GitHub HuggingFace
gpt-oss-120b 116.8B-A5.1B MoE Internal tool Outcome 62.0 GitHub HuggingFace
Minimax M2 230B-10B MoE R2E-Gym Outcome 61.0 GitHub HuggingFace
GLM-4.5-Air 106B-A12B MoE OpenHands Outcome 57.6 / /
Minimax M1-80k 456B-A45.9B MoE Agentless Outcome 56.0 GitHub Website
Minimax M1-40k 456B-A45.9B MoE Agentless Outcome 55.6 GitHub Website
Llama 4 Maverick 400B-A17B MoE mini-SWE-agent Outcome 21.0 GitHub HuggingFace
Llama 4 Scout 109B-17B MoE mini-SWE-agent Outcome 9.1 GitHub HuggingFace