☕︎ JavaBench Leaderboard ☕️
A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
# | Model | Completion | Compilation | Pass |
---|
# | Model | Compilation | Pass |
---|
📝 Submission
Thank you for your interest in JavaBench. We warmly welcome
researchers to submit additional benchmarking results, as we believe
that collaborative efforts can significantly advance the study of
Large Language Models and software engineering. For submission
guidelines, please refer to our
Github Repo.
🤗 Acknowledgement
Thanks for the EvalPlus for sharing the leaderboard template. In addition to JavaBench leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as:- EvalPlus Leaderboard
- Big Code Models Leaderboard
- Chatbot Arena Leaderboard
- CrossCodeEval
- ClassEval
- CRUXEval
- Code Lingua
- Evo-Eval
- HumanEval.jl - Julia version HumanEval with EvalPlus test cases
- InfiCoder-Eval
- LiveCodeBench
- NaturalCodeBench
- RepoBench
- SWE-bench
- TabbyML Leaderboard
- CruxEval-X
- DomainEval