bench-debug | Skill Performance & Reviews | TopRankSkills

TopRank Skills

Home / Skills / tools / bench-debug

bench-debug

maintained by opendataloader-project

star 831 account_tree 45 verified_user MIT License
bolt View GitHub

name: bench-debug description: Debug specific document parsing failures

/bench-debug <doc_id>

Compares parsing output with ground-truth for a specific document and analyzes failure causes.

Usage

/bench-debug 01030000000189

Execution Steps

  1. Run benchmark for the specific document

    ./scripts/bench.sh --doc-id <doc_id>
    
  2. Compare files

    • Ground-truth: tests/benchmark/ground-truth/markdown/<doc_id>.md
    • Prediction: tests/benchmark/prediction/opendataloader/markdown/<doc_id>.md
    • Original PDF: tests/benchmark/pdfs/<doc_id>.pdf
  3. Analyze differences

    • Missing/extra text locations
    • Table structure differences (TEDS score causes)
    • Heading level mismatches (MHS score causes)
    • Reading order errors (NID score causes)
  4. Identify root causes

    • Which PDF elements caused the issue
    • Which Java core components are involved
  5. Suggest improvements

    • Java classes/methods that need modification
    • Expected impact scope

Reference Files

  • ground-truth/reference.json: Per-document element info (categories, coordinates, etc.)
  • java/opendataloader-pdf-core/: Core parsing logic

Example Output

Document 01030000000189 Analysis:

Overall: 0.2763 (one of the worst performing documents)

Issues:
1. 2 of 3 tables not detected (TEDS: 0.15)
   - Table boundary detection failed
   - Related code: TableDetector.java

2. Reading order errors (NID: 0.45)
   - Multi-column layout handling failed
   - Related code: ColumnDetector.java

Recommended Actions:
- Adjust clustering threshold in TableDetector
- Improve multi-column detection logic

chat Comments (0)

chat_bubble_outline

No comments yet. Be the first to share your thoughts!

Skill Details

GitHub Stars 831
GitHub Forks 45
Created Jan 2026
Last Updated il y a 4 mois
tools tools debugging

Related Skills

fabric
chevron_right
typescript-expert
chevron_right
break-loop
chevron_right
burp-suite
chevron_right
page-behavior-audit
chevron_right

Build your own?

Join 12,000+ developers contributing to the Claude ecosystem.