Skip to main content

Akoma Ntoso

Corpus engineering for Zurich cantonal law

Jurisdiction

Switzerland (Canton of Zurich)

Primary Rules

All acts and ordinances

Target Audience

Judicial authorities, general public

Project Hash ID

#I723

Project Category Name

Akoma

The Canton of Zurich's legal portal "ZH-Lex" publishes its entire systematically ordered collection of cantonal legislation as PDF files in subject-area folders. A partial plain vanilla XML version of the corpus exists via a separate endpoint, but it is incomplete, inconsistently structured and riddled with encoding errors, making it an unreliable foundation for automated processing. This project therefore works from the PDFs directly, treating them as the authoritative source.

The result is a corpus of Zurich cantonal law in a form that supports semantic search, cross-reference analysis, machine translation and integration with legal intelligence toolchains, without requiring human re-keying of any legislative text.

Work Steps

Reverse-engineer the portal

Browser developer tools were used to intercept the network requests made by the JavaScript-rendered frontend of zhlex.zh.ch, revealing three usable API endpoints and establishing the folder and identifier structure of the corpus.

Design the discovery strategy

Because no single endpoint proved universally reliable, a three-strategy fallback was designed.

Write the downloader

A resumable downloader was implemented with rate-limiting, exponential backoff, and a metadata-enrichment step that extracts PDF URLs from detail pages or reconstructs them from the portal's filename convention.

Develop the parser

The PDF text extraction and heuristic parser were developed iteratively, testing against a sample of acts across different folders to calibrate the four regular expressions and the state-machine logic.

Implement the AKN serialiser

The Akoma Ntoso 3.0 output layer was built to the OASIS schema.