Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt

Use this file to discover all available pages before exploring further.

Function Signature

export const reconstructParagraphs = (
  input: ReconstructInput,
  options: ReconstructOptions = {}
): ReconstructResult

Description

The recommended high-level API for OCR paragraph reconstruction. This function provides a complete one-shot solution that converts raw OCR observations into structured text lines, groups them into coherent paragraphs, and formats the final output. Internally, it orchestrates the full reconstruction pipeline:
  1. Converts observations into lines using mapObservationsToTextLines
  2. Groups lines into paragraphs using mapTextLinesToParagraphs
  3. Formats text blocks using formatTextBlocks

Parameters

input
ReconstructInput
required
Input payload containing observations, page context, and optional layout elements.
options
ReconstructOptions
default:"{}"
Configuration options for the reconstruction pipeline.

Returns

lines
TextBlock[]
Intermediate line-level text blocks with metadata (centering, headings, footnotes, poetry).
paragraphs
TextBlock[]
Final paragraph-level text blocks. Prose lines are merged into paragraphs while poetry lines are preserved.
text
string
Formatted plain text output with proper line breaks and spacing.

Example

import { reconstructParagraphs } from 'kokokor';

const result = reconstructParagraphs(
  {
    observations: [
      { text: 'Chapter', bbox: { x: 100, y: 50, width: 80, height: 20 } },
      { text: 'One', bbox: { x: 190, y: 50, width: 40, height: 20 } },
      { text: 'This', bbox: { x: 50, y: 100, width: 40, height: 15 } },
      { text: 'is', bbox: { x: 95, y: 100, width: 20, height: 15 } },
      { text: 'the', bbox: { x: 120, y: 100, width: 30, height: 15 } },
      { text: 'first', bbox: { x: 155, y: 100, width: 35, height: 15 } },
      { text: 'paragraph.', bbox: { x: 195, y: 100, width: 80, height: 15 } }
    ],
    page: {
      width: 612,
      height: 792,
      dpiX: 72,
      dpiY: 72
    },
    layout: {
      rectangles: [
        { x: 95, y: 45, width: 140, height: 30 } // Box around "Chapter One"
      ]
    }
  },
  {
    paragraph: {
      verticalJumpFactor: 2.5,
      widthTolerance: 0.8
    },
    format: {
      footerSymbol: '---'
    }
  }
);

console.log(result.text);
// Output:
// Chapter One
//
// This is the first paragraph.

Notes

  • This is the recommended high-level API for most use cases
  • Handles the complete reconstruction pipeline in a single function call
  • Returns both intermediate results (lines, paragraphs) and final formatted text
  • Poetry detection preserves line breaks for poetic content
  • Headings and footnotes are automatically detected using layout hints
  • All options have sensible defaults and are optional