Documentation Index Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Kokokor provides built-in support for right-to-left (RTL) text processing, essential for languages like Arabic, Hebrew, Farsi, and Urdu. The library handles coordinate transformation and normalization to ensure accurate text reconstruction.
RTL support is enabled by default (isRTL: true). For left-to-right languages, set isRTL: false in options.
RTL Coordinate System
The Challenge
OCR engines typically return coordinates in a left-to-right (LTR) coordinate system where:
Origin (0, 0) is at the top-left corner
X-axis increases rightward
Y-axis increases downward
For RTL text, this creates problems:
Text flows right-to-left, but coordinates are left-to-right
Logical reading order doesn’t match spatial order
Text alignment and centering calculations are inverted
The Solution: Coordinate Flipping
Kokokor transforms RTL coordinates by flipping the x-axis :
export const mapOcrResultToRTLObservations = (
observations : Observation [],
imageWidth : number
) => {
return observations . map (( o ) => ({
... o ,
bbox: {
... o . bbox ,
x: imageWidth - o . bbox . x - o . bbox . width
}
}));
};
Reference: src/utils/normalization.ts:27
newX = imageWidth - originalX - textWidth
Before Transformation (LTR coordinates): 0 imageWidth (800px)
|---------------------|------------------|
Arabic text at x=100, width=50
After Transformation (RTL coordinates): 0 imageWidth (800px)
|---------------------|------------------|
Arabic text at x=650
Calculation: 650 = 800 - 100 - 50
Processing Pipeline
Stage 1: Preprocessing
RTL transformation happens early in the pipeline during the flipAndAlignObservations step:
export const flipAndAlignObservations = (
observations : Observation [],
imageWidth : number ,
dpiX : number ,
options : Partial < Pick < MapObservationsToTextLinesOptions , 'isRTL' | 'log' >> = {}
) => {
// 1. Filter noise
observations = observations . filter ( filterNoisyObservations );
if ( observations . length === 0 ) {
return [];
}
// 2. Apply RTL coordinate flip
if ( options . isRTL ) {
observations = mapOcrResultToRTLObservations ( observations , imageWidth );
}
// 3. Normalize x-coordinates for alignment
return normalizeObservationsX ( observations , dpiX );
};
Reference: src/utils/paragraphs.ts:43
Preprocessing Steps
Noise Filtering
Remove invalid or noisy observations const filterNoisyObservations = ( o : Observation ) =>
o . text ?. replace ( / [ ،,؛;؟?۔.:\-() ] / g , '' ). length > 1 ;
Reference: src/utils/normalization.ts:54
RTL Coordinate Flip
Transform x-coordinates for RTL text flow if ( options . isRTL ) {
observations = mapOcrResultToRTLObservations ( observations , imageWidth );
}
X-Coordinate Normalization
Align observations to clean left edge return normalizeObservationsX ( observations , dpiX );
Coordinate Normalization
After RTL flipping, coordinates are normalized to create clean alignment:
export const normalizeObservationsX = (
observations : Observation [],
dpi : number ,
standardDPI : number = 300
) => {
const thresholdPx = ( standardDPI / dpi ) * 5 ;
const minX = Math . min ( ... observations . map (( o ) => o . bbox . x ));
return observations . map (( o ) => {
if ( Math . abs ( o . bbox . x - minX ) <= thresholdPx ) {
return { ... o , bbox: { ... o . bbox , x: minX } };
}
return o ;
});
};
Reference: src/utils/normalization.ts:84
Why Normalize?
OCR engines may produce slightly inconsistent x-coordinates for aligned text:
Line 1: x=50.2
Line 2: x=50.8
Line 3: x=49.5
Normalization snaps these to a common baseline:
Line 1: x=49.5
Line 2: x=49.5
Line 3: x=49.5
Benefits:
Cleaner paragraph detection
Better indent recognition
Improved poetry centering
Arabic Text Example
const observations = [
{
bbox: { x: 100 , y: 50 , width: 150 , height: 20 },
text: "السلام"
},
{
bbox: { x: 260 , y: 50 , width: 100 , height: 20 },
text: "عليكم"
}
];
const pageWidth = 800 ;
After RTL Flip
[
{
bbox: { x: 550 , y: 50 , width: 150 , height: 20 },
text: "السلام" // 800 - 100 - 150 = 550
},
{
bbox: { x: 440 , y: 50 , width: 100 , height: 20 },
text: "عليكم" // 800 - 260 - 100 = 440
}
]
Logical Order
Now the observations are in correct RTL reading order:
First word (rightmost): “السلام” at x=550
Second word (leftmost): “عليكم” at x=440
RTL Poetry Detection
RTL transformation ensures poetry detection works correctly:
Hemistich Example
// Arabic poetry hemistichs (before RTL flip)
const observations = [
{
bbox: { x: 100 , y: 200 , width: 220 , height: 18 },
text: "في البدء كانت الكلمة"
},
{
bbox: { x: 480 , y: 200 , width: 210 , height: 18 },
text: "والكلمة عند الله"
}
];
After RTL flip (imageWidth = 800):
[
{
bbox: { x: 480 , y: 200 , width: 220 , height: 18 },
text: "في البدء كانت الكلمة" // 800 - 100 - 220 = 480
},
{
bbox: { x: 110 , y: 200 , width: 210 , height: 18 },
text: "والكلمة عند الله" // 800 - 480 - 210 = 110
}
]
Combined bounding box:
Left edge: min(480, 110) = 110
Right edge: max(480+220, 110+210) = max(700, 320) = 700
Width: 700 - 110 = 590
Center: 110 + 590/2 = 405
Page center: 800/2 = 400
Difference: 5px (within tolerance) ✓
Configuration
Enable RTL Processing
import { reconstructParagraphs } from 'kokokor' ;
const result = reconstructParagraphs (
{ observations , page , layout },
{
line: {
isRTL: true , // Enable RTL coordinate flipping
},
}
);
Disable for LTR Languages
const result = reconstructParagraphs (
{ observations , page , layout },
{
line: {
isRTL: false , // Disable RTL flipping for English, etc.
},
}
);
RTL is enabled by default (isRTL: true). Explicitly set isRTL: false for left-to-right languages like English, French, or Spanish.
Mixed Text Handling
RTL Text with LTR Numbers
Arabic text often contains embedded Latin numerals:
Kokokor handles this correctly because:
OCR engines typically return observations in visual order (left to right on page)
RTL coordinate flip maintains relative positions
Text content remains unchanged (only coordinates flip)
Bidirectional Text
For documents with both RTL and LTR sections:
Predominantly RTL
Predominantly LTR
Mixed Document
{
line : {
isRTL : true , // Flip coordinates
}
}
{
line : {
isRTL : false , // Don't flip coordinates
}
}
Process RTL and LTR pages separately: // RTL pages
const rtlResult = reconstructParagraphs (
{ observations: rtlObservations , page , layout },
{ line: { isRTL: true } }
);
// LTR pages
const ltrResult = reconstructParagraphs (
{ observations: ltrObservations , page , layout },
{ line: { isRTL: false } }
);
Supported RTL Languages
Arabic Full support for Arabic script, poetry hemistichs, and diacritics
Hebrew Hebrew text with proper coordinate transformation
Farsi/Persian Persian poetry and prose with RTL layout
Urdu Urdu text with Nastaliq script support
Common Patterns
Pattern 1: Arabic OCR Processing
import { reconstructParagraphs } from 'kokokor' ;
const arabicResult = reconstructParagraphs (
{
observations: ocrObservations ,
page: {
width: 1700 ,
height: 2200 ,
dpiX: 300 ,
dpiY: 300 ,
},
layout: {
horizontalLines: [], // Footnote separators
rectangles: [], // Heading boxes
},
},
{
line: {
isRTL: true ,
poetryDetectionOptions: {
// Arabic poetry often uses hemistichs
pairWidthSimilarityRatio: 0.4 ,
pairWordCountSimilarityRatio: 0.5 ,
},
poetryPairDelimiter: ' ' , // Join hemistichs with space
},
}
);
Pattern 2: Hebrew Processing
const hebrewResult = reconstructParagraphs (
{ observations , page , layout },
{
line: {
isRTL: true ,
// Hebrew typically has less poetry formatting
poetryDetectionOptions: {
minWordCount: 3 , // Stricter poetry detection
},
},
}
);
Pattern 3: Bilingual Documents
// Detect language per page and process accordingly
function processPage ( observations , page , layout , isRTLPage ) {
return reconstructParagraphs (
{ observations , page , layout },
{
line: {
isRTL: isRTLPage ,
},
}
);
}
// Process Arabic page
const arabicPage = processPage ( arabicObs , page , layout , true );
// Process English page
const englishPage = processPage ( englishObs , page , layout , false );
Debugging RTL Processing
Enable Logging
const result = reconstructParagraphs (
{ observations , page , layout },
{
line: {
isRTL: true ,
log : ( message , ... args ) => {
console . log ( `[RTL] ${ message } :` , args );
},
},
}
);
Output:
[RTL] mapOcrResultToRTLObservations: [Array of observations]
[RTL] normalizeObservationsX: [Array after normalization]
[RTL] indexObservationsAsLines: [Grouping info]
Visual Inspection
Compare before and after coordinates:
function inspectRTLTransform ( observations , imageWidth ) {
console . log ( 'Before RTL flip:' );
observations . forEach ( o => {
console . log ( ` x= ${ o . bbox . x } , text=" ${ o . text } "` );
});
const transformed = mapOcrResultToRTLObservations ( observations , imageWidth );
console . log ( ' \n After RTL flip:' );
transformed . forEach ( o => {
console . log ( ` x= ${ o . bbox . x } , text=" ${ o . text } "` );
});
}
RTL coordinate transformation is O(n) where n is the number of observations. The overhead is minimal.
Operations:
Coordinate flip: Simple arithmetic per observation
Normalization: Single pass to find minimum, single pass to adjust
Total: ~2n operations
Edge Cases
Empty Documents
if ( observations . length === 0 ) {
return []; // Early exit, no RTL processing needed
}
Reference: src/utils/paragraphs.ts:51
Single Observation
RTL flip still applies:
const single = [{ bbox: { x: 100 , width: 50 }, text: "مرحبا" }];
// After flip (imageWidth=800): x = 650
Zero-Width Observations
Filtered out during noise removal:
const filterNoisyObservations = ( o ) => o . text ?. length > 1 ;
Next Steps
Poetry Detection How RTL affects poetry detection
Processing Pipeline Where RTL transformation fits in the pipeline
Configuration Complete RTL configuration options
Examples Full Arabic OCR processing examples