IA Radar

Insurance Intermediaries Register

How this register is built & protected

資料來源與保護機制

A plain-language reference for the scraping, storage, and backup mechanisms behind IA Radar.

Why this exists

So consumers can check who they're dealing with. Before you buy insurance from an agent, broker or agency in Hong Kong, IA Radar lets you confirm — free, in seconds — that their Insurance Authority licence is valid, see who they represent and their full past record, and surface any licence conditions or public disciplinary actions. It puts the official register's consumer-protection information one search away.

Technically, it is an independent, read-only mirror of the Hong Kong Insurance Authority's public Register of Licensed Insurance Intermediaries (iir.ia.org.hk), covering both firms (agencies FA/GA, broker companies FB/GB) and individuals (technical representatives, IA–IZ / JA–JZ), with their status, licence type, appointments, conditions and any disciplinary actions.

Scraping mechanism

The official site is an AngularJS single-page app backed by a REST API (/IISPublicRegisterRestfulAPI) sitting behind F5 BIG-IP bot protection. Every call needs a token that is only issued after solving an image captcha.

Work is driven every minute by a single orchestrated tick, run under a single-flight lock so overlapping cron runs can't collide:

  1. Retry queue — re-probe numbers that previously hit a soft block/401, with exponential backoff. This is what turns a transient block into a short delay instead of a permanent gap.
  2. Detail refresh — re-fetch the oldest-updated stored records (older than DETAIL_MAX_AGE_DAYS, default 7) to catch status flips, renewals, new/ended appointments and disciplinary actions. Existence is never re-checked — a licence number, once issued, is permanent.
  3. Discovery — walk each prefix's number range to find newly-issued licences, skipping numbers already stored (cheap) and enqueueing any soft-blocked ones to the retry queue. The cursor advances only after a slice is fully processed, so a run killed mid-slice is simply re-done rather than leaving a hole.

A weekly cron rewinds prefixes past the discovery cadence (DISCOVERY_INTERVAL_DAYS, default 30) to look for new licences again. Because numbers are not issued chronologically and appear scattered across the range, the whole empty space is re-checked — but only the empty slots, never the ~115k known records.

Completeness. Gaps from past block waves show up as long contiguous runs of missing numbers aligned to the old batch size — a statistical signature of a skipped batch, detectable without any external list. A heal pass enqueues exactly those runs for re-probing; firms are additionally reconciled against the IA's official published lists.

Data model (Cloudflare D1)

TableHolds
recordsOne row per licence (firm or individual): status, names, LOB, contacts, address, dates, plus officers/appointments/conditions/actions and the raw detail JSON. The source of truth.
appointmentsFlattened "who is appointed by whom", current + history.
rangesSweep plan + resumable cursor per prefix (checked / found / errors / done / last discovery).
retry_queueNumbers still owed a probe (soft-blocked or heal targets), with attempts + next-due time.
official_firmsAuthoritative IA firm lists for cross-checking firm completeness.
metaMisc state (tick lock, last refresh/discovery, dropped count).

KV is used only as a cache (captcha token ~4 min; per-record results 1 day) — it is regenerable and is not backed up. The Worker source code is the backup of "the site" itself.

Backup mechanism

Only D1 is backed up (it is the sole source of truth). Three independent layers:

Note: the database is hundreds of MB, so backups are produced by wrangler d1 export (which streams the dump out of D1's service), not by an in-Worker job — a Worker cannot hold a full export in memory.

Regular backup schedule

Restore: recent mistakes → wrangler d1 time-travel restore hkiaradar_db --timestamp <ISO>; full rebuild → load a .sql export with wrangler d1 execute hkiaradar_db --remote --file <backup.sql>.

Recent backups in R2

ObjectSizeUploaded (UTC)
d1/d1-20260617T000000Z.sql.gz29 MBWed Jun 17 2026 03:42:49 GMT+0000 (Coordinated Universal Time)