# Bulk Data Downloads — Finradar API > Version: 3.61.0 | Generated: 2026-06-20 | Content Hash: b242f38f > Fetch this file at: https://uat.finradarapi.com/llms/bulk-data-downloads.txt ## Authentication All endpoints require an API key. Pass it via query parameter `?apiKey=YOUR_KEY` or header `X-API-Key: YOUR_KEY`. WebSocket endpoints accept the key in the `token` auth payload or query parameter. --- ## Bulk Data Downloads Download historical insider trading data in bulk. Monthly JSONL.gz files for Form 3, 4, and 5 filings. Each file contains complete filing records (same structure as individual filing detail endpoint). ### GET /api/insiders/bulk/form-{form_type}/index.json Get index of available monthly bulk download files for a form type. Returns list of available months with file size, record count, generation date, and download URL. Status is 'available' (pre-generated) or 'pending' (will be generated on first download). **Token cost:** 25 tokens per call **Response fields:** - `form_type` (string): Form type echoed back (`3`, `4`, or `5`). Useful for asserting your client's request resolved correctly. - `files` (array): Array of monthly bulk-file metadata rows, sorted by `(year, month) DESC` (most-recent month first). Coverage starts in 2003 (the SEC's electronic Form 4 mandate). Empty array on form types with no bulk coverage (won't happen in practice — all 3 form types are covered). - `files[].year` (integer): Calendar year of the file's coverage period (e.g. `2026`). Combined with `month` forms the file's coverage window. - `files[].month` (integer): Calendar month 1-12 of the file's coverage period. Files cover the calendar month exactly (no fiscal-period offset). - `files[].filename` (string): Bulk file name in `form-{type}-{YYYY}-{MM}.jsonl.gz` format (e.g. `form-4-2026-04.jsonl.gz`). Pass to `/bulk/form-{form_type}/{year}/{filename}` to retrieve the signed download URL. - `files[].row_count` (integer): Number of JSONL lines (= filings) in the gzipped file. Form 4 monthly files: typically 25K-35K; Form 3: 1K-3K; Form 5: 0.5K-2K. Use for client-side cost-budgeting before download. - `files[].size_bytes` (integer): Compressed file size in bytes (post-gzip). Form 4 monthly files: typically 4-8 MB compressed (~50-80 MB uncompressed). Form 3 and Form 5 files: < 1 MB. Plan downstream consumer pipelines accordingly. - `files[].sha256` (string): SHA-256 hex digest of the gzipped file contents. Use for download-integrity verification — the bulk-fetch endpoint returns the same SHA, so client code can assert the downloaded blob matches before processing. - `files[].generated_at` (string): ISO-8601 UTC timestamp when this file was last regenerated. Files are regenerated nightly to incorporate late amendments — values within the last 24 hours indicate a fresh generation; older values indicate the cold-storage path is being served. **Since:** v3.14.0 **Utility:** Discover the bulk-data inventory before fetching individual files — returns the full list of monthly Form 3/4/5 bulk export files with file sizes, row counts, generation timestamps, and SHA-256 digests for integrity verification. The right entry point for any bulk-export workflow: enumerate this first to find the months you need, then drill into `/bulk/form-{form_type}/{year}/{filename}` to retrieve signed download URLs. Files cover one calendar month each, regenerated nightly to incorporate late amendments. Files older than 12 months are kept in cold storage with ~1 minute extra latency on first surface — once warmed, subsequent index calls return them instantly. **Use case:** Building a bulk data download page that shows available months, file sizes, and record counts — like SEC EDGAR's full-text search bulk download index. **Parameters:** - `form_type` (path, required): SEC insider-trading form type to enumerate. Accepts `3` (initial statement of beneficial ownership — filed when an insider first joins), `4` (statement of changes — every transaction; the highest-volume), or `5` (annual statement — captures missed Form 4 filings). Form 4 is the most-requested for trading analytics; Forms 3 and 5 are typically only relevant for compliance / governance research. **Sample response:** ```json { "form_type": "4", "files": [ { "year": 2026, "month": 4, "filename": "form-4-2026-04.jsonl.gz", "row_count": 28432, "size_bytes": 4128000, "sha256": "a1b2c3d4...", "generated_at": "2026-05-01T00:30:12.000Z" } ] } ``` ### GET /api/insiders/bulk/form-{form_type}/{year}/{filename} Download a monthly bulk data file as gzip-compressed JSONL (one JSON object per line). Each line is a complete filing record with issuer, owners, transactions, holdings, footnotes, and signatures. Files are pre-generated when available, or streamed on-the-fly if not yet cached. **Token cost:** 25 tokens per call **Response fields:** - `download_url` (string): Signed S3 download URL valid for 24 hours from response time. Direct download — no auth required for the URL itself, signature handles authorization. Stream the response with `gzip` decompression; do NOT load the full uncompressed payload into memory for large months (Form 4 monthly files uncompress to 50-80 MB). - `filename` (string): Filename echoed back, useful for asserting your client's path-construction matched what was retrieved (especially when constructing from an enumeration loop). - `row_count` (integer): Number of JSONL lines (= filings) in the file. Use for download-completeness verification: after gunzip, your client should see exactly `row_count` lines (`wc -l`) — any mismatch indicates a partial download, retry. - `size_bytes` (integer): Compressed file size in bytes. Use for `Content-Length` validation against the S3 download — mismatches indicate a corrupted or partial download. - `sha256` (string): SHA-256 hex digest of the gzipped file contents. Compute the SHA-256 of the downloaded blob locally and compare to assert integrity before processing — critical for any pipeline that depends on byte-exact reproducibility. - `expires_at` (string): ISO-8601 UTC timestamp when the signed URL expires (always `now + 24h`). After expiry the URL returns 403 — re-call this endpoint to mint a fresh signed URL (cheap; cache hit on the underlying file for up to 7 days). **Since:** v3.14.0 **Utility:** Retrieve a signed S3 download URL for one month's worth of insider filings as gzip-compressed JSONL — one filing per line, fully nested with issuer + reporters + non-derivative/derivative transactions + holdings + footnotes + signatures (matches the per-filing `/insiders/filings/{accession_number}` shape). The bulk-export workflow for quantitative research: backtest insider-signal strategies on 20+ years of Form 4 history, build a local database mirror, or train models on the full event-time stream. Pair with `/bulk/form-{form_type}/index.json` to enumerate available months first, then fetch each file via this endpoint. The 24-hour signed-URL TTL gives you ample window for download retries; for fresh URLs after expiry, just re-call (the underlying file is cached for 7 days, so re-call is fast). **Use case:** Bulk downloading all Form 4 filings from 2024 for quantitative research, backtesting insider trading strategies, or building a local database mirror. **Parameters:** - `form_type` (path, required): SEC insider-trading form type — `3` (initial statement), `4` (changes — highest volume), or `5` (annual statement). Same set as the index endpoint's `form_type` parameter. - `year` (path, required): 4-digit calendar year of the file (e.g. `2025`). Coverage starts 2003 (SEC's electronic Form 4 mandate). Returns 404 for years outside coverage. - `filename` (path, required): Bulk file name in `form-{type}-{YYYY}-{MM}.jsonl.gz` format. Get exact filenames from `/bulk/form-{form_type}/index.json` — DO NOT hand-construct (filename schema may evolve). Returns 404 if the filename doesn't match any indexed file. **Sample response:** ```json { "download_url": "https://finradar-bulk.s3.amazonaws.com/form-4/2026/form-4-2026-04.jsonl.gz?X-Amz-Signature=...", "filename": "form-4-2026-04.jsonl.gz", "row_count": 28432, "size_bytes": 4128000, "sha256": "a1b2c3d4...", "expires_at": "2026-05-02T20:55:12.000Z" } ``` ### POST /api/insiders/bulk/form-{form_type}/generate/{year}/{month} Generate a monthly bulk data file on demand. Use this when /api/insiders/bulk/form-{form_type}/index.json reports the desired month as 'pending'. Triggers a server-side packing job and returns the download URL once the file is ready. Idempotent — a second call against an already-generated month returns the cached file. **Token cost:** 25 tokens per call **Response fields:** - `job_id` (string): Celery task UUID assigned to the regeneration job. Surface in support tickets when a regeneration appears stuck — operations can grep this directly out of `flower` / Celery logs to trace the exact worker, queue, and per-step duration. - `queued_at` (string): ISO-8601 UTC timestamp at which the job entered the bulk-export queue. For end-to-end latency measurement, compare this against the new `generated_at` timestamp on the index endpoint after polling. - `estimated_seconds` (integer): Server-side ETA in whole seconds for the regeneration to complete, computed from the requested month's row count + worker concurrency. Form 4 recent months: typically 60-120 seconds; cold-storage months: 90-300 seconds. Treat as a hint — schedule a re-poll at `queued_at + 1.5 × estimated_seconds`. - `status` (string): Always the literal string `"queued"` on a successful enqueue. The regeneration itself is asynchronous — this endpoint never returns `"completed"`. Detect completion by polling `/bulk/form-{form_type}/index.json` and watching for the new `generated_at` timestamp + updated `sha256`. **Since:** v3.14.0 **Utility:** Force on-demand regeneration of a monthly bulk file — useful when amendments have been filed AFTER the nightly regeneration window and you want a fresh file that incorporates them. Also the right endpoint for cold-storage warming (months older than 12 months may need this call to surface). The job is enqueued on the bulk-export Celery worker and runs asynchronously — returns a job descriptor immediately, then poll `/bulk/form-{form_type}/index.json` for the new `generated_at` timestamp and updated `sha256`. Idempotent — calling against an already-generated month returns the cached file with the same SHA. Heavy operation: 25-token cost reflects the full month re-pack; budget accordingly for backfill workflows that hit many months. **Parameters:** - `form_type` (path, required): SEC insider-trading form type — `3`, `4`, or `5`. Same set as the read endpoints. Form 4 has the highest regeneration latency due to volume; Forms 3 and 5 typically complete in < 30 seconds. - `year` (path, required): 4-digit calendar year of the month to regenerate. Coverage starts 2003. Returns 400 for years outside coverage. Old months (>12 months) may take an extra ~60 seconds for the cold-storage hydration step. - `month` (path, required): Calendar month 1-12 to regenerate. Single-month scope only — for multi-month backfills, loop the call client-side. The job's `estimated_seconds` reflects the volume of the requested month; recent months are faster due to warm cache, archival months are slower. **Sample response:** ```json { "job_id": "8f1a2b30-c4d5-46e7-9f01-23456789abcd", "queued_at": "2026-05-01T20:55:12.000Z", "estimated_seconds": 90, "status": "queued" } ```