/api/v1/scrapping/extractorExtract a specific item section from a SEC filing — a sec-api.io Extractor replica.
Extract a specific item section from a SEC filing — a sec-api.io Extractor replica. Supports 10-K, 10-Q and 8-K form types (20-F / 40-F are rejected with 400). The COMPLETE item section is returned in a single response — no length cap or truncation (a large 10-K Item 1A can run tens of thousands of characters).
Why use this
Common use case
Extracts a single item section from a 10-K, 10-Q or 8-K filing as plain text (type=text) or style-preserving HTML (type=html) — a clean sec-api.io Extractor replica. The item grammar is form-specific: 10-K bare codes (1A, 7, …), 10-Q Part-prefixed codes (part1item2, part2item1a), 8-K dotted event codes (2.02, 9.01, …; hyphen 2-2 also accepted, requires form_type=8-K). An invalid item-for-form or an unsupported form (20-F / 40-F) is rejected with a refunded 400; a valid-but-absent item returns a refunded 404. Use POST /api/v1/scrapping/search/ first to locate the right filing + item. (Foreign 20-F / 40-F annual reports are intentionally out of scope here; the first-party stock-research card uses the OOM-safe /business-description endpoint for those.)
Parameters
| Name | In | Required | Default | Allowed | Description | Example |
|---|---|---|---|---|---|---|
| url | query | required | — | — | Direct URL of the filing's primary HTML document on SEC EDGAR (NOT the index page — must be the actual HTML body of the 10-K/20-F). Get this from `filings[].linkToFilingDetails` then drill into the document URL, or from the `filing_url` field on `/api/v1/sec/filings`. Server-side rejected if the URL is not on the `sec.gov` domain or returns non-HTML content. | https://www.sec.gov/Archives/edgar/data/320193/000032019324000123/aapl-20240928.htm |
| item | query | required | — | — | Item code to extract — the FORMAT depends on `form_type`. **10-K** bare canonical codes: `1`, `1A`, `1B`, `1C`, `2`, `3`, `4`, `5`, `6`, `7`, `7A`, `8`, `9`, `9A`, `9B`, `9C`, `10`–`16` (case-insensitive, `1a` = `1A`). **10-Q** PART-PREFIXED codes: `part1item1`…`part2item6` (e.g. `part1item2` = MD&A, `part2item1a` = Risk Factors) — the Part prefix is REQUIRED so a Part I vs Part II Item number can never resolve to the wrong section; a bare `item=1` on a 10-Q is rejected with 400. **8-K** dotted event codes: `2.02`, `5.02`, `9.01`, … (the hyphen form `2-2` is also accepted and normalized to `2.02`) plus `signature` — 8-K codes require `form_type=8-K`. An item code that is not valid for the given form returns **400** (refunded, not billed). A valid item that is simply absent from that particular filing returns **404** (also refunded). | 1A |
| type | query | optional | text | — | Output format — ONE located section, two renderings (the in-house engine runs the SAME detection for both). `text` (default) returns clean plain-text: tags stripped, HTML entities kept literal (e.g. ` `), tables wrapped between `##TABLE_START` / `##TABLE_END` markers — best for NLP pipelines and LLM grounding. `html` returns STYLE-PRESERVING HTML: the filing's native heading styling (inline `font-weight` / `font-size` / `text-decoration`) is KEPT, while `color` / background / positioning and ALL XSS vectors (`<script>` tags, event-handler attributes, `javascript:` URLs) are stripped via an allowlist sanitizer (nh3) — safe to render in a browser. The response is a RAW string body (Content-Type `text/plain` for `text`, `text/html` for `html`), NOT a JSON envelope. Read the `X-Extractor-Mode` response header to see which path produced the body. On a cold-cache first fetch of a large filing the server may return HTTP 503 with `X-Extractor-Mode: pending` and a `Retry-After` header while it prepares the section in the background — this response is NOT billed; retry after a few seconds. Cost is identical across formats. | text |
| form_type | query | optional | 10-K | — | Form type of the filing — selects the item-code grammar and detection rules. One of `10-K` (default), `10-Q`, or `8-K`. Any other value (e.g. `20-F`, `40-F`) returns **400** ('not supported'), refunded. Because the `item` grammar is form-specific (10-Q needs Part-prefixed codes like `part1item2`, 8-K needs dotted codes like `2.02`), you MUST set `form_type` correctly for 10-Q and 8-K — leaving the `10-K` default with a 10-Q/8-K item yields a 400. | 10-K |
Response schema
| Field | Type | Nullable | Description |
|---|---|---|---|
| (response body) | string | no | This endpoint returns the extracted section as a RAW STRING body — NOT a JSON envelope. Content-Type is `text/plain` for `type=text` and `text/html` for `type=html`. The `text/html` body is allowlist-sanitized (nh3) before return (no `<script>`, no event-handler attributes, no non-http(s) URLs) and PRESERVES inline heading style (font-weight / font-size / text-decoration) while dropping color/background/positioning. The `X-Extractor-Mode` response header reports which path produced the body (`text` | `structured` | `legacy_html` | `notfound` | `pending`). An invalid item-for-form, or an unsupported form (20-F / 40-F), returns HTTP 400 (and is NOT billed). A valid item that is simply absent from the filing returns HTTP 404 (also NOT billed). On a cold-cache first fetch of a large filing the server may return HTTP 503 + `Retry-After` with `X-Extractor-Mode: pending` (also NOT billed) while it prepares the section in the background — retry to get the body. Body length ranges from ~1K to 100K+ characters (Item 1A on a large 10-K can exceed 100K). |
Sample response
Errors
| Status | Label | Description |
|---|---|---|
| 200 | OK | Request succeeded. |
| 400 | Bad Request | Invalid query, body, or path parameter. |
| 401 | Unauthorized | Missing or invalid Authorization header / api_Token. |
| 402 | Payment Required | Insufficient token balance for this call. Top up |
| 429 | Too Many Requests | Rate limit exceeded for your tier (see /pricing for tier limits). Tier limits |
| 500 | Server Error | Unexpected server-side failure. Retry with backoff; report if persistent. |
Code samples
curl "https://api.finradar.ai/api/v1/scrapping/extractor?api_Token=YOUR_API_KEY" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"Generate an API key in /account/credentials to run live queries (literal YOUR_API_KEY placeholder shown until then).