reconstructParagraphs

Function Signature

export const reconstructParagraphs = (
  input: ReconstructInput,
  options: ReconstructOptions = {}
): ReconstructResult

Description

The recommended high-level API for OCR paragraph reconstruction. This function provides a complete one-shot solution that converts raw OCR observations into structured text lines, groups them into coherent paragraphs, and formats the final output. Internally, it orchestrates the full reconstruction pipeline:

Converts observations into lines using mapObservationsToTextLines
Groups lines into paragraphs using mapTextLinesToParagraphs
Formats text blocks using formatTextBlocks

Parameters

input

ReconstructInput

required

Input payload containing observations, page context, and optional layout elements.

Show ReconstructInput fields

observations

Observation[]

required

Array of OCR text observations to reconstruct. Each observation contains:

text: The recognized text string
bbox: Bounding box with position (x, y) and dimensions (width, height)

page

PageContext

required

Page geometry and DPI context:

width: Page width in pixels
height: Page height in pixels
dpiX: Horizontal DPI
dpiY: Vertical DPI

layout

LayoutElements

Optional structural layout hints:

horizontalLines: Array of horizontal line bounding boxes for footnote detection
rectangles: Array of rectangle bounding boxes for heading detection

options

ReconstructOptions

default:"{}"

Configuration options for the reconstruction pipeline.

Show ReconstructOptions fields

line

Partial<MapObservationsToTextLinesOptions>

Line-detection options:

pixelTolerance: Vertical tolerance for line grouping (default: 5 pixels at 72 DPI)
lineHeightFactor: Fixed line height factor (default: computed adaptively)
isRTL: Whether text is right-to-left (default: false)
poetryDetectionOptions: Fine-tune poetry detection heuristics
poetryPairDelimiter: Delimiter for merging poetry pairs (default: ” ”)
centerToleranceRatio: Tolerance for center alignment (default: 0.05)
minMarginRatio: Minimum margin ratio for centering (default: 0.1)
log: Optional logging function for debugging

paragraph

ParagraphOptions

Paragraph-detection options:

verticalJumpFactor: Factor for detecting paragraph breaks (default: 2)
widthTolerance: Threshold for short line detection (default: 0.85)

format

object

Text formatting options:

footerSymbol: Optional symbol to insert before first footnote

Returns

lines

TextBlock[]

Intermediate line-level text blocks with metadata (centering, headings, footnotes, poetry).

paragraphs

TextBlock[]

Final paragraph-level text blocks. Prose lines are merged into paragraphs while poetry lines are preserved.

text

string

Formatted plain text output with proper line breaks and spacing.

Example

import { reconstructParagraphs } from 'kokokor';

const result = reconstructParagraphs(
  {
    observations: [
      { text: 'Chapter', bbox: { x: 100, y: 50, width: 80, height: 20 } },
      { text: 'One', bbox: { x: 190, y: 50, width: 40, height: 20 } },
      { text: 'This', bbox: { x: 50, y: 100, width: 40, height: 15 } },
      { text: 'is', bbox: { x: 95, y: 100, width: 20, height: 15 } },
      { text: 'the', bbox: { x: 120, y: 100, width: 30, height: 15 } },
      { text: 'first', bbox: { x: 155, y: 100, width: 35, height: 15 } },
      { text: 'paragraph.', bbox: { x: 195, y: 100, width: 80, height: 15 } }
    ],
    page: {
      width: 612,
      height: 792,
      dpiX: 72,
      dpiY: 72
    },
    layout: {
      rectangles: [
        { x: 95, y: 45, width: 140, height: 30 } // Box around "Chapter One"
      ]
    }
  },
  {
    paragraph: {
      verticalJumpFactor: 2.5,
      widthTolerance: 0.8
    },
    format: {
      footerSymbol: '---'
    }
  }
);

console.log(result.text);
// Output:
// Chapter One
//
// This is the first paragraph.

Notes

This is the recommended high-level API for most use cases
Handles the complete reconstruction pipeline in a single function call
Returns both intermediate results (lines, paragraphs) and final formatted text
Poetry detection preserves line breaks for poetic content
Headings and footnotes are automatically detected using layout hints
All options have sensible defaults and are optional

Main Functions

Preprocessing

Utility Functions

Types

reconstructParagraphs

Function Signature

Description

Parameters

Returns

Example

Notes

Main Functions

Preprocessing

Utility Functions

Types

Documentation Index

​Function Signature

​Description

​Parameters

​Returns

​Example

​Notes

Function Signature

Description

Parameters

Returns

Example

Notes