In the tests for reasoning capabilities, DeepSeek-V3.1 has shown remarkable progress. In the MMLU-Pro test, the score improved from 84.8 to 85.0, while the GPQA-Diamond test score rose from 80.1 to 80 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results