Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Kokokor provides extensive configuration options to handle different document types, languages, and layouts. This guide covers all configurable parameters and their effects.Configuration Structure
ThereconstructParagraphs function accepts an optional second parameter for configuration:
Line Detection Options
Control how OCR observations are grouped into text lines:Pixel Tolerance
Additional vertical tolerance for grouping observations into the same line:Vertical tolerance in pixels at 72 DPI. This value is automatically scaled based on the document’s actual DPI.
- Higher values: More permissive line grouping (more text on same line)
- Lower values: Stricter line grouping (more separate lines)
Line Height Factor
Fixed line height factor for grouping observations:If not provided, the system computes an adaptive factor based on document analysis.Typical values:
0.15- Very tight line grouping (small gaps)0.3- Standard line height0.5- Generous spacing tolerance
When
lineHeightFactor is not specified, Kokokor analyzes the document’s spacing patterns using an internal adaptive algorithm to determine the optimal value automatically.RTL Text Support
Enable right-to-left text processing:When enabled, coordinates are flipped for proper RTL text alignment. The default is
true as Kokokor was originally designed for Arabic text processing.Centering Detection
Control how centered text (titles, headings, poetry) is identified:Tolerance for center point alignment as a ratio of page width.
0.02- Stricter centering (within 2%)0.05- Standard centering (within 5%)0.1- Looser centering (within 10%)
Minimum margin required on each side as a ratio of page width.
0.1- At least 10% whitespace on each side0.2- At least 20% whitespace (default)0.3- At least 30% whitespace (very strict)
Poetry Detection Options
Fine-tune the poetry detection algorithm:Hemistich Pair Detection
Maximum vertical gap between two observations to be considered a poetry pair (hemistichs).Measured as a ratio of average line height:
1.5- Closer spacing required2.0- Standard spacing3.0- Wider spacing allowed
How similar in width two hemistichs must be.The check:
|width1 - width2| / average < ratio0.2- Very similar widths required0.4- Moderate similarity0.6- More variation allowed
How similar in word count two hemistichs must be.The check:
|count1 - count2| / max < ratio0.3- Very similar counts0.5- Moderate similarity0.7- More variation allowed
Wide Poetic Line Detection
Minimum width a single line must have to be analyzed for poetry.As a ratio of page width:
0.4- Shorter lines included0.6- Standard threshold0.8- Only very wide lines
Word density threshold for identifying poetry. Poetry typically has lower word density than prose.A line is poetic if its density ≤
ratio * avgProseDensity:0.7- Very sparse text required0.95- Close to prose density allowed0.9- Moderate spacing required
General Poetry Options
Minimum words required for a line to be considered poetry. Filters out noise like page numbers.
Delimiter used when merging detected poetry pairs (hemistichs).Examples:
" "- Simple space" ... "- Visual separator:صدر ... عجز" – "- Em dash separator
Layout Elements
Provide structural hints for better text classification:Array of horizontal line elements detected in the document. Used to identify footnote boundaries - text appearing below the last horizontal line is classified as footnotes.
Array of rectangle elements detected in the document. Text within rectangles is classified as headings.
Paragraph Grouping Options
Control how text lines are grouped into paragraphs:Factor for detecting paragraph breaks based on vertical spacing.A new paragraph starts when gap >
previousGap * verticalJumpFactor:1.5- More sensitive to spacing changes2.0- Standard sensitivity3.0- Less sensitive (fewer paragraph breaks)
Threshold for identifying “short” lines that indicate paragraph endings.As a ratio of reference width:
0.75- Mark more lines as short0.85- Standard threshold0.95- Only very short lines marked
See the Paragraph Options guide for detailed explanations of how these options affect paragraph breaks.
Text Formatting Options
Control the final text output format:Optional symbol to insert before the first footnote in the formatted text output.Examples:
"---"- Horizontal line separator"\n***\n"- Decorative separator"Footnotes:"- Text label
Debug Logging
Enable detailed logging for troubleshooting:Optional logging function for debugging. Receives detailed information about processing decisions and intermediate steps.
Complete Configuration Example
Here’s a complete example with all major options configured:Next Steps
Paragraph Options
Deep dive into paragraph grouping behavior
Layout Elements
Work with horizontal lines and rectangles