Skip to content

Incorrect Answer Sudoku #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sebastianherreramonterrosa opened this issue May 22, 2025 · 0 comments
Open

Incorrect Answer Sudoku #22

sebastianherreramonterrosa opened this issue May 22, 2025 · 0 comments

Comments

@sebastianherreramonterrosa

A few days ago, I create a Request od Sudoku Image.

Image

The evaluatios is here:
https://visioncheckup.com/assessments/sudoku-puzzle-extraction/

I addition, the image is not displayed of the web, I think the evaluation is incorrect. Here my evaluation:

Model Exact board? Correct cells Accuracy Extracted grid (rows separated by a blank)
ChatGPT-4o 46 / 81 56.8 % ...4.8... .6.7..1.. 72.9.5.4. .97.4..3. ..7.5.8.. .896.5... 941.7.8.. 7..6..4.. ..2.7....
Claude 3.7 Sonnet 63 / 81 77.8 % ....48... .6...7..1 7.2.9.5.4 .9.74.3.. ..7.5.8.. .8.96.5.. 9.4.1.7.8 .7...6.4. ....27...
Claude 4 Opus 81 / 81 100 % ...4.8... .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7...
Claude 4 Sonnet 58 / 81 71.6 % ...4.8... .6..7...1 7.2.9.5.4 .9.7.4..3 ..7.5.8.. ..8.9.6.5 9..4.1.7.8 .7...6..4 ...2.7...
GPT-4.1 38 / 81 46.9 % ..4.8.... .6.7..1.. 72.9.5.4. 9.7.4..3. ..7.5.8.. 8.9.6..5. 941.7.8.. 7..6..4.. ..2.7....
GPT-4.1 Mini 53 / 81 65.4 % ..4.8.... .6..7..1. 7.29.5.4. .9.7.4.3. .7.5.8... 8.9.6.5.. 9.4.1.7.8 .7..6..4. ..2.7....
Gemini 2.0 Flash 80 / 81 98.76 % ...4.8.. .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7...
Gemini 2.0 Flash Lite 43 / 81 53.1 % ....4.8.. ..6..7.1. 7.2.9.5.4 ..9.7.4.3 ...7.5.8. ..8.9.6.5 9.4.1.7.8 ..7..6.4. ....2.7..
Gemini 2.5 Pro 81 / 81 100 % ...4.8... .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7...
OpenAI O1 40 / 81 49.4 % ...4..8.. 6..7....1 72.9.5..4 97..4...3 7...5.8.. 89...6.5. 94178.... 764...... .......27
OpenAI O4 Mini 57 / 81 70.4 % ...4.8... .6..7..1. 72..9.5.4 .9.7.4.3. .7..5..8. ..8.9.6.5. 941.7..8. ..7.6..4. ....27...
Qwen 2.5 VL 7B 12 / 81 14.8 % . . 4 8 . . . 6 7 . . 1 . 7 2 9 5 4 . 9 7 4 3 . . 7 5 8 . 8 9 6 5 . 9 4 1 7 8 . 7 . 6 4 . 2 . 7 .

Correct answer: ...4.8... .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7...

Only Claude 4 Opus and Gemini 2.5 Pro cpmplete the task correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant