/api/v1/scrapping/extractor

Extract a specific item section from a SEC filing — a sec-api.io Extractor replica.

Extract a specific item section from a SEC filing — a sec-api.io Extractor replica. Supports 10-K, 10-Q and 8-K form types (20-F / 40-F are rejected with 400). The COMPLETE item section is returned in a single response — no length cap or truncation (a large 10-K Item 1A can run tens of thousands of characters).

25 tokensSince v3.3.1

Why use this

Surgical item-level extraction from 10-K, 10-Q and 8-K filings — return just the Risk Factors (10-K Item 1A), just the MD&A (10-K Item 7 / 10-Q Part I Item 2), or a single 8-K event item (e.g. 2.02 Results of Operations), without parsing the full filing. Powers sentiment-analysis pipelines, NLP topic models, year-over-year diff tools, and LLM-grounded research workflows that need surgically-clean input text. ONE in-house BeautifulSoup engine locates the section and renders it two ways (`type=text` plain text, `type=html` style-preserving HTML). 10-K items supported: 1, 1A, 1B, 1C, 2, 3, 4, 5, 6, 7, 7A, 8, 9, 9A, 9B, 9C, 10–16 (bare codes). 10-Q items supported: PART-PREFIXED `part1item1`…`part2item6` (e.g. `part1item2` = MD&A, `part2item1a` = Risk Factors) — the Part prefix disambiguates the Part I / Part II Item-number collision. 8-K items supported: dotted event codes 1.01–9.01 (e.g. 2.02, 5.02, 8.01) plus `signature`. Foreign annual reports (20-F / 40-F) are NOT supported here — use the stock-research business-description endpoint for those. Use `POST /api/v1/scrapping/search/` to locate the right filing first.

Common use case

Extracting 'Risk Factors' (10-K Item 1A) for sentiment analysis, the MD&A from a 10-Q (Part I Item 2), or an 8-K's 'Results of Operations' (Item 2.02) for earnings-event monitoring.

Extracts a single item section from a 10-K, 10-Q or 8-K filing as plain text (type=text) or style-preserving HTML (type=html) — a clean sec-api.io Extractor replica. The item grammar is form-specific: 10-K bare codes (1A, 7, …), 10-Q Part-prefixed codes (part1item2, part2item1a), 8-K dotted event codes (2.02, 9.01, …; hyphen 2-2 also accepted, requires form_type=8-K). An invalid item-for-form or an unsupported form (20-F / 40-F) is rejected with a refunded 400; a valid-but-absent item returns a refunded 404. Use POST /api/v1/scrapping/search/ first to locate the right filing + item. (Foreign 20-F / 40-F annual reports are intentionally out of scope here; the first-party stock-research card uses the OOM-safe /business-description endpoint for those.)

Parameters

Name	In	Required	Default	Allowed	Description	Example
url	query	required	—	—	Direct URL of the filing's primary HTML document on SEC EDGAR (NOT the index page — must be the actual HTML body of the 10-K/20-F). Get this from `filings[].linkToFilingDetails` then drill into the document URL, or from the `filing_url` field on `/api/v1/sec/filings`. Server-side rejected if the URL is not on the `sec.gov` domain or returns non-HTML content.	https://www.sec.gov/Archives/edgar/data/320193/000032019324000123/aapl-20240928.htm
item	query	required	—	—	Item code to extract — the FORMAT depends on `form_type`. 10-K bare canonical codes: `1`, `1A`, `1B`, `1C`, `2`, `3`, `4`, `5`, `6`, `7`, `7A`, `8`, `9`, `9A`, `9B`, `9C`, `10`–`16` (case-insensitive, `1a` = `1A`). 10-Q PART-PREFIXED codes: `part1item1`…`part2item6` (e.g. `part1item2` = MD&A, `part2item1a` = Risk Factors) — the Part prefix is REQUIRED so a Part I vs Part II Item number can never resolve to the wrong section; a bare `item=1` on a 10-Q is rejected with 400. 8-K dotted event codes: `2.02`, `5.02`, `9.01`, … (the hyphen form `2-2` is also accepted and normalized to `2.02`) plus `signature` — 8-K codes require `form_type=8-K`. An item code that is not valid for the given form returns 400 (refunded, not billed). A valid item that is simply absent from that particular filing returns 404 (also refunded).	1A
type	query	optional	text	—	Output format — ONE located section, two renderings (the in-house engine runs the SAME detection for both). `text` (default) returns clean plain-text: tags stripped, HTML entities kept literal (e.g. ` `), tables wrapped between `##TABLE_START` / `##TABLE_END` markers — best for NLP pipelines and LLM grounding. `html` returns STYLE-PRESERVING HTML: the filing's native heading styling (inline `font-weight` / `font-size` / `text-decoration`) is KEPT, while `color` / background / positioning and ALL XSS vectors (`<script>` tags, event-handler attributes, `javascript:` URLs) are stripped via an allowlist sanitizer (nh3) — safe to render in a browser. The response is a RAW string body (Content-Type `text/plain` for `text`, `text/html` for `html`), NOT a JSON envelope. Read the `X-Extractor-Mode` response header to see which path produced the body. On a cold-cache first fetch of a large filing the server may return HTTP 503 with `X-Extractor-Mode: pending` and a `Retry-After` header while it prepares the section in the background — this response is NOT billed; retry after a few seconds. Cost is identical across formats.	text
form_type	query	optional	10-K	—	Form type of the filing — selects the item-code grammar and detection rules. One of `10-K` (default), `10-Q`, or `8-K`. Any other value (e.g. `20-F`, `40-F`) returns 400 ('not supported'), refunded. Because the `item` grammar is form-specific (10-Q needs Part-prefixed codes like `part1item2`, 8-K needs dotted codes like `2.02`), you MUST set `form_type` correctly for 10-Q and 8-K — leaving the `10-K` default with a 10-Q/8-K item yields a 400.	10-K

Response schema

Field	Type	Nullable	Description
(response body)	string	no	This endpoint returns the extracted section as a RAW STRING body — NOT a JSON envelope. Content-Type is `text/plain` for `type=text` and `text/html` for `type=html`. The `text/html` body is allowlist-sanitized (nh3) before return (no `<script>`, no event-handler attributes, no non-http(s) URLs) and PRESERVES inline heading style (font-weight / font-size / text-decoration) while dropping color/background/positioning. The `X-Extractor-Mode` response header reports which path produced the body (`text` \| `structured` \| `legacy_html` \| `notfound` \| `pending`). An invalid item-for-form, or an unsupported form (20-F / 40-F), returns HTTP 400 (and is NOT billed). A valid item that is simply absent from the filing returns HTTP 404 (also NOT billed). On a cold-cache first fetch of a large filing the server may return HTTP 503 + `Retry-After` with `X-Extractor-Mode: pending` (also NOT billed) while it prepares the section in the background — retry to get the body. Body length ranges from ~1K to 100K+ characters (Item 1A on a large 10-K can exceed 100K).

Sample response

"<p style=\"font-weight:bold;font-size:10pt\">ITEM 1A. RISK FACTORS</p>\n<p style=\"font-size:10pt;font-weight:normal\">You should carefully consider the risks described below together with the other information set forth in this report, which could materially affect our business, financial condition and operating results...</p>"

Errors

Status	Label	Description
200	OK	Request succeeded.
400	Bad Request	Invalid query, body, or path parameter.
401	Unauthorized	Missing or invalid Authorization header / api_Token.
402	Payment Required	Insufficient token balance for this call. Top up
429	Too Many Requests	Rate limit exceeded for your tier (see /pricing for tier limits). Tier limits
500	Server Error	Unexpected server-side failure. Retry with backoff; report if persistent.

Code samples

Reveal credentials

curl "https://api.finradar.ai/api/v1/scrapping/extractor?api_Token=YOUR_API_KEY" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Generate an API key in /account/credentials to run live queries (literal YOUR_API_KEY placeholder shown until then).

Try it

url*(query)

item*(query)

type(query)

form_type(query)

Related endpoints

/api/v1/scrapping/search/25 tokens /api/v1/scrapping/query/25 tokens /api/v1/sec/filings5 tokens /api/v1/scrapping/pdf25 tokens