Documentation notebook

Field guide for DocShark internals

Home / Docs / Database Schema

Database Schema

DocShark uses SQLite with FTS5 full-text search. Zero external dependencies — uses bun:sqlite which bundles SQLite natively.

Data layer

Schema strategy

The schema prioritizes deterministic crawl updates and fast retrieval over relational complexity. Most joins stay shallow and indexed by library and page identifiers.

Tables Overview

libraries ──< pages ──< chunks ──< chunks_fts (FTS5)
                                    crawl_jobs

libraries

Stores metadata about each indexed documentation source.

ColumnTypeDescription
idTEXTPrimary key (ULID)
nameTEXTUnique slug identifier
display_nameTEXTHuman-readable name
urlTEXTBase documentation URL
versionTEXTOptional version string
page_countINTEGERNumber of indexed pages
chunk_countINTEGERNumber of search chunks
statusTEXTindexed, crawling, error
configTEXTJSON config blob
created_atTEXTISO timestamp
updated_atTEXTISO timestamp

pages

Individual documentation pages belonging to a library.

ColumnTypeDescription
idTEXTPrimary key (ULID)
library_idTEXTForeign key → libraries
urlTEXTFull page URL
pathTEXTRelative path within library
titleTEXTPage title
content_markdownTEXTFull Markdown content
content_hashTEXTFor incremental updates
http_etagTEXTHTTP ETag header
http_last_modifiedTEXTHTTP Last-Modified header
created_atTEXTISO timestamp
updated_atTEXTISO timestamp

chunks

Semantic chunks of documentation content, linked to their source page.

ColumnTypeDescription
idTEXTPrimary key (ULID)
page_idTEXTForeign key → pages
library_idTEXTForeign key → libraries
heading_hierarchyTEXTBreadcrumb context
contentTEXTChunk text content
token_countINTEGERApproximate token count
chunk_indexINTEGERPosition within page
created_atTEXTISO timestamp

chunks_fts (FTS5)

Virtual FTS5 table for full-text search across all chunks.

CREATE VIRTUAL TABLE chunks_fts USING fts5(
  content,
  heading_hierarchy,
  content='chunks',
  content_rowid='rowid',
  tokenize='porter unicode61'
);

The FTS5 index is automatically kept in sync via database triggers on the chunks table (INSERT, DELETE, UPDATE).

Design Decisions

  • WAL mode — Write-Ahead Logging for concurrent read/write access
  • ON CONFLICT — Upsert semantics for robust cross-process crawling
  • CASCADE deletes — Removing a library cleans up all pages, chunks, and FTS entries
  • Porter stemmingtokenize='porter unicode61' for English-language documentation search

A local-first research notebook for software documentation. Crawl, index, and serve real docs to coding agents without adding a cloud layer to the workflow.

GitHub repository

Built for grounded documentation workflows and long-form technical reading.

© 2026 DocShark