Extract Files
API that converts PDFs, Images and Audio files into nice markdown text with all the format such as table and charts preserved.
The proj_id parameter is now deprecated. Please use asset_ids to specify which assets to extract.
Authorizations
Headers
Pass in an API version to guarantee a consistent response format.The latest version should be used for all new API calls. Existing API calls should be updated to the latest version when possible.
Valid versions:
-
Latest API version (recommended):
2025-03
-
Previous API version (maintenance mode):
2025-02
If no API version header is included, the response format is considered unstable and could change without notice (not recommended).
Body
Optional. Asset IDs to run extraction on. Overrides transform_id.
Optional. Transformation ID of the transform housing the assets to extract.
Enum representing different parsing strategies. Note that OCR and XML will be deprecated soon.
optimized
, ocr
, xml
, markdown
, advanced_markdown
Optional, defaults to False. If True, a response will come back after at most 120 seconds with the extracted values.