Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Observation is the fundamental unit of OCR output, containing both the recognized text content and its precise location within the document. Observations typically represent individual words or short phrases as identified by the OCR engine.

Type Definition

type Observation = {
  bbox: BoundingBox;
  text: string;
};

Fields

bbox
BoundingBox
required
The bounding box defining the exact position and dimensions of the text within the document coordinate system. This includes:
  • Position (x, y) of the top-left corner
  • Dimensions (width, height) of the text area
See BoundingBox for details on the coordinate system.
text
string
required
The recognized text content of the observation.This is the actual text string extracted by the OCR engine, which may include punctuation, numbers, or special characters.

Usage Example

import { reconstructParagraphs } from 'kokokor';

// Observations from your OCR engine
const observations: Observation[] = [
  {
    bbox: { x: 100, y: 200, width: 150, height: 30 },
    text: 'Hello'
  },
  {
    bbox: { x: 260, y: 200, width: 120, height: 30 },
    text: 'world'
  }
];

const result = await reconstructParagraphs({
  observations,
  page: {
    width: 2550,
    height: 3300,
    dpiX: 300,
    dpiY: 300
  }
});

console.log(result.text);
// Output: "Hello world"

Integration with OCR Engines

You can adapt output from various OCR engines to match the Observation type:
// Example: Tesseract.js
const tesseractResult = await Tesseract.recognize(image);
const observations: Observation[] = tesseractResult.data.words.map(word => ({
  bbox: {
    x: word.bbox.x0,
    y: word.bbox.y0,
    width: word.bbox.x1 - word.bbox.x0,
    height: word.bbox.y1 - word.bbox.y0
  },
  text: word.text
}));

// Example: Google Cloud Vision
const visionResult = await client.textDetection(image);
const observations: Observation[] = visionResult[0].textAnnotations
  .slice(1) // Skip first annotation (full text)
  .map(annotation => ({
    bbox: {
      x: annotation.boundingPoly.vertices[0].x,
      y: annotation.boundingPoly.vertices[0].y,
      width: annotation.boundingPoly.vertices[1].x - annotation.boundingPoly.vertices[0].x,
      height: annotation.boundingPoly.vertices[2].y - annotation.boundingPoly.vertices[0].y
    },
    text: annotation.description
  }));

See Also