File Conversion API Reference
Learn about the file conversion endpoints and how to convert various file formats to markdown.
File conversion allows you to transform documents from various formats to markdown, making them more accessible for processing and display. SourceSync supports conversion for various document types including PDF, DOCX, PPTX, and more.
The /v1/convert/file
endpoint is a synchronous operation that returns
results immediately. It's designed for smaller files only and has size
limitations. For larger files or production workloads, we recommend using the
/v1/ingest/file
endpoint, which processes files asynchronously in the
background and can handle much larger documents.
Convert File
Convert a file to markdown format.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
- Name
Accept
*- Type
- string
- Description
application/json
Request Form Data
- Name
file
*- Type
- file
- Description
- File to convert
- Name
ocrConfig
- Type
- object (stringified)(optional)
- Description
- Configuration for OCR
- Name
strategy
*- Type
- enum<string>
- Description
- OCR strategy. Defaults to BASIC_PARSER.Available options:
BASIC_PARSER
,STANDARD_OCR
Request
curl -X POST https://api.sourcesync.ai/v1/convert/file \
-H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
-H "Accept: application/json" \
-F 'file=@"/Users/Downloads/sample.pdf"' \
-F 'ocrConfig="{\"strategy\": \"STANDARD_OCR\"}"'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
documents
*- Type
- array<object>
- Description
- Details of the converted documents
- Name
filename
*- Type
- string
- Description
- Name of the converted file
- Name
markdown
*- Type
- string
- Description
- Converted content in markdown format
Response
{
"success": true,
"message": "File converted successfully",
"data": {
"documents": [
{
"filename": "sample.pdf",
"markdown": "# Sample Document\n\nThis is the converted content of the document in markdown format.\n\n## Section 1\n\nContent of section 1.\n\n## Section 2\n\nContent of section 2."
}
]
}
}
Usage Guidelines
Synchronous Processing
The /v1/convert/file
endpoint processes files synchronously and returns results immediately, which makes it ideal for:
- Quick document previews
- Small files
- Testing and development environments
- User-facing applications where immediate results are needed
File Size Limitations
Due to its synchronous nature, this endpoint has the following limitations:
- Request timeout: 30 seconds
- For larger files, the request may time out before processing completes
For Larger Files
Recommended approach: Use the /v1/ingest/file endpoint which:
- Processes files asynchronously
- Can handle much larger documents
- Provides better scaling for production workloads
- Includes document tracking and status updates
OCR Configuration
The /v1/convert/file
endpoint supports Optical Character Recognition (OCR) for extracting text from images or documents that contain images.
- Name
OCR Strategy
- Description
Choose between basic parsing and advanced OCR capabilities: BASIC_PARSER (default method, faster but less accurate for image-heavy documents) or STANDARD_OCR (more accurate OCR for documents with images or scanned content).
Supported File Types
- Name
Document Files
- Description
- PDF, DOCX, DOC, TXT, RTF, ODT
- Name
Presentation Files
- Description
- PPTX, PPT, ODP
- Name
Spreadsheet Files
- Description
- XLSX, XLS, CSV, ODS
- Name
Image Files
- Description
PNG, JPG, JPEG, TIFF, GIF, BMP (OCR is applied automatically)
Error Codes
- Name
FILE_TOO_LARGE
- Description
The uploaded file exceeds the size limit
- Name
UNSUPPORTED_FILE_TYPE
- Description
The file format is not supported for conversion
- Name
CONVERT_FILE_FAILED
- Description
Internal error during file conversion process
- Name
INVALID_OCR_CONFIG
- Description
Invalid OCR configuration provided
- Name
REQUEST_TIMEOUT
- Description
The request timed out due to file size or processing complexity