KoCEM Leaderboard

Source: en_mcqa_2025-10-04.json ยท Updated: 2025-10-04 14:44:07

Scoring: columns ordered as Overall, Val, Test. Per column, 1st is bold, 2nd is underlined.

NameSizeModalityOverallValTestLicenseDate
Gemini 2.5 Pro
-multimodal85.8%85.3%86.4%proprietary2025-09-02
GPT-5
-multimodal79.8%76.2%83.3%proprietary2025-08-26
Claude Sonnet 4
-multimodal75.7%75%76.4%proprietary2025-08-31
Claude Opus 4.1
-multimodal76.6%74.5%78.8%proprietary2025-08-29
GPT-4.1
-multimodal71.1%69.5%72.7%proprietary2025-08-24
Gemini 2.5 Flash
-multimodal68%68.4%67.7%proprietary2025-09-01
gpt-oss-20b
21Btext-only63.5%60.7%66.2%open-source2025-08-27
gpt-oss-120b
117Btext-only64.8%60.3%69.2%open-source2025-08-26
llama4:scout
109Bmultimodal51.2%47%55.4%open-source2025-08-28
llava-v1.6:13b
13Bmultimodal46.5%44.4%48.6%open-source2025-08-28
llava-v1.6:7b
7Bmultimodal45.9%42.6%49.1%open-source2025-08-28
Frequent Choice
1None34.2%39.1%29.2%-2025-09-28
Stratified Random Choice
2None31.7%37.5%25.9%-2025-09-28
deepseek-vl2
27.5Bmultimodal34.9%31.4%38.3%open-source2025-09-08
Random Choice
0None25.1%26.3%23.8%-2025-09-28