Skip to content
/api/v1/scrapping/pdf

Download filing as PDF, HTML, or plain text.

Download filing as PDF, HTML, or plain text. Auto-detects CIK from the SEC URL (or accepts an explicit `cik` body field as fallback). iXBRL viewer links (containing `/ix?doc=`) are normalized server-side.

25 tokensSince v1.0.0

Why use this

Render any SEC filing or exhibit URL to a PDF (or HTML / plain text) for archival, investor memos, regulatory submissions, or simply human-readable distribution. Runs a headless Chrome render under the hood — supports the full SEC filing surface including 10-K/10-Q (often 100-400 page documents), 8-Ks with embedded press-release exhibits, proxy statements with vote tabulations, S-1 prospectuses with financial-statement tables. iXBRL viewer URLs (`https://www.sec.gov/ix?doc=/...`) are auto-cleaned to point at the underlying document. Returns a signed S3 URL valid for 24 hours — fetch the actual file from there. The 25-token cost reflects the heavy headless-Chrome render. For raw cached HTML/PDF without re-rendering use `GET /api/v1/scrapping/public/{filepath}`; for surgical item-level extraction use `/scrapping/extractor`.

Common use case

Generating a PDF for an investor presentation.

Renders any filing or exhibit URL to a PDF and returns a signed download link valid for 24 hours. Use when you need archival-quality copies of filings (e.g. saving the 10-K MD&A as PDF for an investor memo). Heavy operation — runs a headless Chrome under the hood — so the 25-token cost is justified. Prefer GET /api/v1/scrapping/public/{filepath} when you need the raw HTML/PDF without re-rendering.

Parameters

NameInRequiredDefaultAllowedDescriptionExample
typebodyoptionalpdfOutput format. `pdf` (default) — full headless-Chrome render with embedded fonts, tables, and images preserved. `html` — cleaned HTML with embedded styles inlined (lighter weight, smaller files). `txt` — plain-text strip-down for NLP pipelines and LLM grounding (loses table structure but tiny files).pdf
fileNamebodyoptionalCustom filename for the download (extension auto-appended based on `type`). Useful for archival workflows where you want predictable filenames (e.g. `{ticker}-{form_type}-{fiscal_year}`). Defaults to a hash-based filename derived from the source URL when omitted.AAPL-10K-2025

Response schema

FieldTypeNullableDescription
pdf_urlstringnoSigned S3 URL for the rendered output (PDF, HTML, or TXT — name is `pdf_url` for legacy reasons regardless of `type`). Valid for 24 hours from `rendered_at` — fetch the file before `expires_at`. Direct download — no auth required for the signed URL itself, the signature handles authorization.
page_countintegernoNumber of pages in the rendered output. For PDF: actual page count (10-K outputs typically 100-400, proxy statements 30-150, 8-K with exhibits 5-50). For HTML/TXT: synthetic page count assuming ~3000 chars per page. Useful for client-side cost estimation when feeding the document into LLMs.
rendered_atstringnoISO-8601 UTC timestamp the headless-Chrome render completed. Renders are cached server-side for 7 days keyed by `(link, type)`; values older than 7 days indicate a fresh re-render was triggered (cold path).
expires_atstringnoISO-8601 UTC timestamp the signed S3 URL expires (always `rendered_at + 24h`). After expiry the URL returns 403 — re-call this endpoint to mint a new signed URL (cached render path; 25-token cost still applies but render itself is reused for 7 days).

Sample response

·
  • "pdf_url": "https://finradar-pdf.s3.amazonaws.com/AAPL-10K-2025.pdf?X-Amz-Signature=..."
  • "page_count": 124
  • "rendered_at": "2026-05-01T20:55:12.000Z"
  • "expires_at": "2026-05-02T20:55:12.000Z"
}

Errors

StatusLabelDescription
200OKRequest succeeded.
400Bad RequestInvalid query, body, or path parameter.
401UnauthorizedMissing or invalid Authorization header / api_Token.
402Payment RequiredInsufficient token balance for this call. Top up
429Too Many RequestsRate limit exceeded for your tier (see /pricing for tier limits). Tier limits
500Server ErrorUnexpected server-side failure. Retry with backoff; report if persistent.

Code samples

curl -X POST "https://api.finradar.ai/api/v1/scrapping/pdf?api_Token=YOUR_API_KEY" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Generate an API key in /account/credentials to run live queries (literal YOUR_API_KEY placeholder shown until then).

Try it

Related endpoints