☕︎ JavaBench Leaderboard ☕️

A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models

Method Context Incremental

Class-wise

#	Model		Completion	Compilation	Pass

Test-wise

#	Model		Compilation	Pass

📝 Submission

Thank you for your interest in JavaBench. We warmly welcome researchers to submit additional benchmarking results, as we believe that collaborative efforts can significantly advance the study of Large Language Models and software engineering. For submission guidelines, please refer to our Github Repo.

🤗 Acknowledgement

Thanks for the EvalPlus for sharing the leaderboard template. In addition to JavaBench leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as:

EvalPlus Leaderboard
Big Code Models Leaderboard
Chatbot Arena Leaderboard
CrossCodeEval
ClassEval
CRUXEval
Code Lingua
Evo-Eval
HumanEval.jl - Julia version HumanEval with EvalPlus test cases
InfiCoder-Eval
LiveCodeBench
NaturalCodeBench
RepoBench
SWE-bench
TabbyML Leaderboard
CruxEval-X
DomainEval