Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt

Use this file to discover all available pages before exploring further.

Overview

PageContext provides essential information about the page geometry and resolution (DPI) needed for accurate text reconstruction. This metadata helps Kokokor normalize measurements, detect layout patterns, and make informed decisions about text structure.

Type Definition

type PageContext = {
  width: number;
  height: number;
  dpiX: number;
  dpiY: number;
};

Fields

width
number
required
Page width in pixels.This represents the total horizontal dimension of the document image. Used for:
  • Centering detection
  • Margin calculation
  • Layout analysis
height
number
required
Page height in pixels.This represents the total vertical dimension of the document image. Used for:
  • Footnote detection
  • Overall layout analysis
dpiX
number
required
Horizontal DPI (dots per inch).The horizontal resolution of the document image. Used to:
  • Convert pixel measurements to physical units
  • Scale tolerance values appropriately
  • Normalize measurements across different scan resolutions
Common values: 72, 150, 200, 300, 600
dpiY
number
required
Vertical DPI (dots per inch).The vertical resolution of the document image. Used to:
  • Convert pixel measurements to physical units
  • Scale vertical spacing calculations
  • Normalize measurements across different scan resolutions
Common values: 72, 150, 200, 300, 600

Usage Example

import { reconstructParagraphs } from 'kokokor';

// Standard 300 DPI scan
const result = await reconstructParagraphs({
  observations: [...],
  page: {
    width: 2550,   // 8.5 inches × 300 DPI
    height: 3300,  // 11 inches × 300 DPI
    dpiX: 300,
    dpiY: 300
  }
});

// Lower resolution scan
const lowResResult = await reconstructParagraphs({
  observations: [...],
  page: {
    width: 1275,   // 8.5 inches × 150 DPI
    height: 1650,  // 11 inches × 150 DPI
    dpiX: 150,
    dpiY: 150
  }
});

// High resolution scan
const highResResult = await reconstructParagraphs({
  observations: [...],
  page: {
    width: 5100,   // 8.5 inches × 600 DPI
    height: 6600,  // 11 inches × 600 DPI
    dpiX: 600,
    dpiY: 600
  }
});

DPI Importance

The DPI values are critical for accurate text reconstruction:
// The pixelTolerance option is automatically scaled by DPI
const result = await reconstructParagraphs(
  {
    observations: [...],
    page: { width: 2550, height: 3300, dpiX: 300, dpiY: 300 }
  },
  {
    line: {
      pixelTolerance: 5  // 5 pixels at 72 DPI baseline
                         // Automatically scaled to ~21 pixels at 300 DPI
    }
  }
);

Getting Page Context

Extract page context from various sources:
// From image metadata
import sharp from 'sharp';

const metadata = await sharp('document.png').metadata();
const page: PageContext = {
  width: metadata.width!,
  height: metadata.height!,
  dpiX: metadata.density || 72,
  dpiY: metadata.density || 72
};

// From PDF
import { getDocument } from 'pdfjs-dist';

const pdf = await getDocument(pdfPath).promise;
const firstPage = await pdf.getPage(1);
const viewport = firstPage.getViewport({ scale: 1.0 });

const page: PageContext = {
  width: viewport.width,
  height: viewport.height,
  dpiX: 72,  // PDF uses 72 DPI by default
  dpiY: 72
};

See Also