Documentation Index Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Function Signature
export const reconstructParagraphs = (
input : ReconstructInput ,
options : ReconstructOptions = {}
): ReconstructResult
Description
The recommended high-level API for OCR paragraph reconstruction. This function provides a complete one-shot solution that converts raw OCR observations into structured text lines, groups them into coherent paragraphs, and formats the final output.
Internally, it orchestrates the full reconstruction pipeline:
Converts observations into lines using mapObservationsToTextLines
Groups lines into paragraphs using mapTextLinesToParagraphs
Formats text blocks using formatTextBlocks
Parameters
Input payload containing observations, page context, and optional layout elements. Show ReconstructInput fields
Array of OCR text observations to reconstruct. Each observation contains:
text: The recognized text string
bbox: Bounding box with position (x, y) and dimensions (width, height)
Page geometry and DPI context:
width: Page width in pixels
height: Page height in pixels
dpiX: Horizontal DPI
dpiY: Vertical DPI
Optional structural layout hints:
horizontalLines: Array of horizontal line bounding boxes for footnote detection
rectangles: Array of rectangle bounding boxes for heading detection
options
ReconstructOptions
default: "{}"
Configuration options for the reconstruction pipeline. Show ReconstructOptions fields
line
Partial<MapObservationsToTextLinesOptions>
Line-detection options:
pixelTolerance: Vertical tolerance for line grouping (default: 5 pixels at 72 DPI)
lineHeightFactor: Fixed line height factor (default: computed adaptively)
isRTL: Whether text is right-to-left (default: false)
poetryDetectionOptions: Fine-tune poetry detection heuristics
poetryPairDelimiter: Delimiter for merging poetry pairs (default: ” ”)
centerToleranceRatio: Tolerance for center alignment (default: 0.05)
minMarginRatio: Minimum margin ratio for centering (default: 0.1)
log: Optional logging function for debugging
Paragraph-detection options:
verticalJumpFactor: Factor for detecting paragraph breaks (default: 2)
widthTolerance: Threshold for short line detection (default: 0.85)
Text formatting options:
footerSymbol: Optional symbol to insert before first footnote
Returns
Intermediate line-level text blocks with metadata (centering, headings, footnotes, poetry).
Final paragraph-level text blocks. Prose lines are merged into paragraphs while poetry lines are preserved.
Formatted plain text output with proper line breaks and spacing.
Example
import { reconstructParagraphs } from 'kokokor' ;
const result = reconstructParagraphs (
{
observations: [
{ text: 'Chapter' , bbox: { x: 100 , y: 50 , width: 80 , height: 20 } },
{ text: 'One' , bbox: { x: 190 , y: 50 , width: 40 , height: 20 } },
{ text: 'This' , bbox: { x: 50 , y: 100 , width: 40 , height: 15 } },
{ text: 'is' , bbox: { x: 95 , y: 100 , width: 20 , height: 15 } },
{ text: 'the' , bbox: { x: 120 , y: 100 , width: 30 , height: 15 } },
{ text: 'first' , bbox: { x: 155 , y: 100 , width: 35 , height: 15 } },
{ text: 'paragraph.' , bbox: { x: 195 , y: 100 , width: 80 , height: 15 } }
],
page: {
width: 612 ,
height: 792 ,
dpiX: 72 ,
dpiY: 72
},
layout: {
rectangles: [
{ x: 95 , y: 45 , width: 140 , height: 30 } // Box around "Chapter One"
]
}
},
{
paragraph: {
verticalJumpFactor: 2.5 ,
widthTolerance: 0.8
},
format: {
footerSymbol: '---'
}
}
);
console . log ( result . text );
// Output:
// Chapter One
//
// This is the first paragraph.
Notes
This is the recommended high-level API for most use cases
Handles the complete reconstruction pipeline in a single function call
Returns both intermediate results (lines, paragraphs) and final formatted text
Poetry detection preserves line breaks for poetic content
Headings and footnotes are automatically detected using layout hints
All options have sensible defaults and are optional