AliciaBench is a benchmark created by Juan Echeverria to measure LLMs' ability to solve a very specific problem: escaping from a maze.

If you're interested in knowing why and how I created it, visit the About page.

RankingLLMTotal scoreCostn=5n=7n=9n=11n=13n=15n=17n=19n=21n=23
#1997.29$3.55100.00100.00100.00100.00100.00100.0099.88100.00100.0097.42
#2988.48$17.67100.00100.00100.0096.6797.6799.3199.3898.93100.0096.53
#3967.00$0.70100.00100.00100.0096.6799.00100.0099.5093.6499.8878.31
#4965.46$19.48100.00100.00100.0097.77100.0094.6099.2598.5298.2477.07
#5918.40$12.93100.00100.00100.0095.2397.6792.6188.3894.7685.2464.52
#6858.28$18.35100.00100.0094.7095.4997.6793.7077.5158.6494.0946.49
#7650.71$4.96100.00100.00100.0096.4498.6781.2149.005.6519.750.00
#8637.32$0.54100.00100.0096.3698.0094.3367.5045.2719.3316.520.00
#9625.57$5.4690.0070.0090.0284.3679.6776.5738.2628.1658.939.60
#10442.01$0.32100.00100.0088.3357.4457.6718.5720.000.00--