Documents API Reference
Learn about the document management endpoints and how to work with your ingested content.
Documents are the core content units in SourceSync. Each document can have metadata that helps in organizing and filtering content. For detailed examples and best practices on pagination, see our Document Management Guide.
Fetch Documents
Fetch documents with optional filters and pagination.
Authorization
- Name
 Authorization*- Type
 - string
 - Description
 - Bearer token authentication. Include your API key as 
Bearer your_api_key 
- Name
 Accept*- Type
 - string
 - Description
 application/json
Request Body
- Name
 namespaceId*- Type
 - string
 - Description
 - Unique identifier of the namespace containing the documents
 
- Name
 filterConfig*- Type
 - object
 - Description
 - Configuration for filtering documents
 
- Name
 documentIds- Type
 - array<string>(optional)
 - Description
 - List of document IDs to filter
 
- Name
 documentExternalIds- Type
 - array<string>(optional)
 - Description
 - List of external document IDs to filter
 
- Name
 documentConnectionIds- Type
 - array<string>(optional)
 - Description
 - List of connection IDs to filter
 
- Name
 documentTypes- Type
 - array<enum<string>>(optional)
 - Description
 - List of document types to filterAvailable options:
TEXT,URL,SITEMAP,WEBSITE 
- Name
 documentIngestionSources- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion sources to filterAvailable options:
TEXT,LOCAL_FILE,URLS_LIST,SITEMAP,WEBSITE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 documentIngestionStatuses- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion statuses to filterAvailable options:
BACKLOG,QUEUED,QUEUED_FOR_RESYNC,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 metadata- Type
 - object(optional)
 - Description
 - Metadata filters to apply
 
- Name
 includeConfig- Type
 - object(optional)
 - Description
 - Include options
 
- Name
 documents- Type
 - boolean(optional)
 - Description
 - Option to include the documents or not in the response. Defaults to true
 
- Name
 stats- Type
 - boolean(optional)
 - Description
 - Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
 
- Name
 statsBySource- Type
 - boolean(optional)
 - Description
 - Option to include the stats by source or not in the response. Defaults to false
 
- Name
 statsByStatus- Type
 - boolean(optional)
 - Description
 - Option to include the stats by status or not in the response. Defaults to false
 
- Name
 statsByDocumentType- Type
 - boolean(optional)
 - Description
 - Option to include the stats by document type or not in the response. Defaults to false
 
- Name
 rawFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the raw file URL or not in the response. Defaults to false
 
- Name
 parsedTextFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the parsed text file URL or not in the response. Defaults to false
 
- Name
 pagination- Type
 - object(optional)
 - Description
 - Pagination options
 
- Name
 pageSize- Type
 - number(optional)
 - Description
 - Number of documents per page (1-100, default: 20)
 
- Name
 cursor- Type
 - string(optional)
 - Description
 - Opaque cursor for fetching the next page
 
Request
curl -X POST https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124"],
      "documentExternalIds": ["external_123"],
      "documentConnectionIds": ["conn_123"],
      "documentTypes": ["URL", "GOOGLE_DRIVE_DOCUMENT"],
      "documentIngestionSources": ["WEBSITE", "GOOGLE_DRIVE"],
      "documentIngestionStatuses": ["SUCCESS", "FAILED"],
      "metadata": {
        "category": "security",
        "status": "published"
      }
    },
    "includeConfig": {
      "documents": true,
      "statsBySource": true,
      "statsByStatus": true,
      "statsByDocumentType": true,
      "rawFileUrl": true,
      "parsedTextFileUrl": true
    },
    "pagination": {
      "pageSize": 10,
      "cursor": "eyJjcmVhdGVkQXQiOi..."
    }
  }'
Response Body
- Name
 success*- Type
 - boolean
 - Description
 - Indicates whether the request is successful or not. This is always true for success responses.
 
- Name
 message*- Type
 - string
 - Description
 - Human readable message mentioning the result of the request
 
- Name
 data*- Type
 - object
 - Description
 - Data returned from the API.
 
- Name
 itemsReturned*- Type
 - number
 - Description
 - Number of documents returned in current page
 
- Name
 hasNextPage*- Type
 - boolean
 - Description
 - Whether more documents are available
 
- Name
 nextCursor- Type
 - string(optional)
 - Description
 - Cursor for fetching next page, or undefined if no more pages
 
- Name
 statsBySource- Type
 - array<object>(optional)
 - Description
 - Stats of the documents by ingestion source. This will be present when includeConfig.statsBySource is set to true in the request
 
- Name
 source*- Type
 - enum<string>
 - Description
 - Ingestion source of the documentAvailable options:
TEXT,LOCAL_FILE,URLS_LIST,SITEMAP,WEBSITE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 totalCount*- Type
 - number
 - Description
 - Total number of documents ingested via this source
 
- Name
 statsByStatus- Type
 - array<object>(optional)
 - Description
 - Stats of the documents by status. This will be present when includeConfig.statsByStatus is set to true in the request
 
- Name
 status*- Type
 - enum<string>
 - Description
 - Status of the documentAvailable options:
BACKLOG,QUEUED,QUEUED_FOR_RESYNC,QUEUED_FOR_UPDATE,QUEUED_FOR_DELETION,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 totalCount*- Type
 - number
 - Description
 - Total number of documents with this status
 
- Name
 statsByDocumentType- Type
 - array<object>(optional)
 - Description
 - Stats of the documents by document type. This will be present when includeConfig.statsByDocumentType is set to true in the request
 
- Name
 documentType*- Type
 - enum<string>
 - Description
 - Document type of the documentAvailable options:
TEXT,FILE,URL,NOTION_DOCUMENT,GOOGLE_DRIVE_DOCUMENT,DROPBOX_DOCUMENT,ONEDRIVE_DOCUMENT,BOX_DOCUMENT,CONFLUENCE_DOCUMENT 
- Name
 totalCount*- Type
 - number
 - Description
 - Total number of documents ingested via this source
 
- Name
 documents*- Type
 - array<object>
 - Description
 - List of documents
 
- Name
 id*- Type
 - string
 - Description
 - Unique identifier of the document
 
- Name
 name- Type
 - string | null(optional)
 - Description
 - Name of the document
 
- Name
 externalId*- Type
 - string
 - Description
 - External identifier of the document
 
- Name
 documentType*- Type
 - enum<string>
 - Description
 - Type of the documentAvailable options:
TEXT,URL,FILE,NOTION_DOCUMENT,GOOGLE_DRIVE_DOCUMENT,DROPBOX_DOCUMENT,ONEDRIVE_DOCUMENT,BOX_DOCUMENT,CONFLUENCE_DOCUMENT 
- Name
 ingestionSource*- Type
 - enum<string>
 - Description
 - Source from where the document was ingestedAvailable options:
TEXT,URLS_LIST,SITEMAP,WEBSITE,LOCAL_FILE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 ingestionSource*- Type
 - enum<string>
 - Description
 - Source from where the document was ingestedAvailable options:
BACKLOG,QUEUED,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 ingestionError- Type
 - string | null(optional)
 - Description
 - Error message if the document ingestion failed
 
- Name
 ingestJob*- Type
 - object
 - Description
 - Details of the ingest job
 
- Name
 id*- Type
 - string
 - Description
 - ID of the ingest job
 
- Name
 ingestJobRun*- Type
 - object
 - Description
 - Details of the ingest job run
 
- Name
 id*- Type
 - string
 - Description
 - ID of the ingest job run
 
- Name
 connection- Type
 - object(optional)
 - Description
 - Details of the connection
 
- Name
 id*- Type
 - string
 - Description
 - ID of the connection from which the document was ingested
 
- Name
 documentProperties- Type
 - object(optional)
 - Description
 - Properties of the document
 
- Name
 mimeType- Type
 - string(optional)
 - Description
 - MIME type of the document
 
- Name
 fileSize- Type
 - number(optional)
 - Description
 - Size of the document file in bytes
 
- Name
 characterCount- Type
 - number(optional)
 - Description
 - Number of characters in the document
 
- Name
 tokenCount- Type
 - number(optional)
 - Description
 - Number of tokens in the document
 
- Name
 embeddingCount- Type
 - number(optional)
 - Description
 - Number of embeddings in the document
 
- Name
 ocrPagesCount- Type
 - number(optional)
 - Description
 - Number of pages processed by OCR in the document
 
- Name
 embeddingConfig- Type
 - object(optional)
 - Description
 - Configuration of the embedding model
 
- Name
 provider- Type
 - enum<string>(optional)
 - Description
 - Provider of the embedding model usedAvailable options:
OPENAI,COHERE,JINA 
- Name
 model- Type
 - enum<string>(optional)
 - Description
 - Embedding model used to create the embeddingsAvailable options:
text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002,embed-english-v3.0,embed-multilingual-v3.0,embed-english-light-v3.0,embed-multilingual-light-v3.0,embed-english-v2.0,embed-english-light-v2.0,embed-multilingual-v2.0,jina-embeddings-v3 
- Name
 dimensions- Type
 - number(optional)
 - Description
 - Dimensions of the embedding model used
 
- Name
 chunkSize- Type
 - number(optional)
 - Description
 - Number of tokens in each chunk
 
- Name
 chunkOverlap- Type
 - number(optional)
 - Description
 - Number of tokens to overlap between chunks
 
- Name
 providers- Type
 - object(optional)
 - Description
 - Providers used to ingest the document
 
- Name
 fileStorage- Type
 - enum<string>(optional)
 - Description
 - Type of the file storage usedAvailable options:
S3_COMPATIBLE 
- Name
 vectorStorage- Type
 - enum<string>(optional)
 - Description
 - Provider of the vector storage usedAvailable options:
PINECONE 
- Name
 embeddingModel- Type
 - enum<string>(optional)
 - Description
 - Provider of the embedding model usedAvailable options:
OPENAI,COHERE,JINA 
- Name
 webScraper- Type
 - enum<string>(optional)
 - Description
 - Provider of the web scraper used if the document is from a web sourceAvailable options:
FIRECRAWL,JINA,SCRAPINGBEE 
- Name
 metadata*- Type
 - object
 - Description
 - Metadata associated with the document
 
- Name
 namespace*- Type
 - object
 - Description
 - Details of the namespace containing the document
 
- Name
 identifier*- Type
 - string
 - Description
 - Unique identifier of the namespace
 
- Name
 organization*- Type
 - object
 - Description
 - Details of the organization
 
- Name
 id*- Type
 - string
 - Description
 - ID of the organization containing the document
 
- Name
 createdAt*- Type
 - object
 - Description
 - Timestamp when the document was created
 
- Name
 isoString*- Type
 - string
 - Description
 - ISO 8601 formatted timestamp
 
- Name
 updatedAt*- Type
 - object
 - Description
 - Timestamp when the document was last updated
 
- Name
 isoString*- Type
 - string
 - Description
 - ISO 8601 formatted timestamp
 
Response
{
  "success": true,
  "message": "Documents retrieved successfully",
  "data": {
    "itemsReturned": 10,
    "hasNextPage": true,
    "nextCursor": "eyJjcmVhdGVkQXQiOi...",
    "statsBySource": [
      {
        "source": "WEBSITE",
        "totalCount": 5
      },
      {
        "source": "LOCAL_FILE",
        "totalCount": 2
      },
      {
        "source": "GOOGLE_DRIVE",
        "totalCount": 3
      }
    ],
    "statsByStatus": [
      {
        "status": "BACKLOG",
        "totalCount": 1
      },
      {
        "status": "QUEUED",
        "totalCount": 1
      },
      {
        "status": "QUEUED_FOR_RESYNC",
        "totalCount": 1
      },
      {
        "status": "QUEUED_FOR_UPDATE",
        "totalCount": 1
      },
      {
        "status": "PROCESSING",
        "totalCount": 1
      },
      {
        "status": "SUCCESS",
        "totalCount": 5
      },
      {
        "status": "FAILED",
        "totalCount": 2
      },
      {
        "status": "CANCELLED",
        "totalCount": 1
      }
    ],
    "statsByDocumentType": [      
      {
        "documentType": "TEXT",
        "totalCount": 2
      },
      {
        "documentType": "URL",
        "totalCount": 5
      },
      {
        "documentType": "FILE",
        "totalCount": 3
      },
      {
        "documentType": "GOOGLE_DRIVE_DOCUMENT",
        "totalCount": 1
      },
    ],
    "documents": [
      {
        "id": "doc_123",
        "name": "https://example.com",
        "externalId": "external_123",
        "documentType": "URL",
        "ingestionSource": "WEBSITE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "text/html",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
          "ocrPagesCount": 0,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "category": "security",
          "status": "published"
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "rawFileUrl": "https://example.com/raw.html",
        "parsedTextFileUrl": "https://example.com/parsed.txt",
      },
      {
        "id": "doc_124",
        "name": "doc.pdf",
        "externalId": "external_124",
        "documentType": "GOOGLE_DRIVE_DOCUMENT",
        "ingestionSource": "GOOGLE_DRIVE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "application/pdf",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
          "ocrPagesCount": 1,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "category": "security",
          "status": "published"
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "rawFileUrl": "https://example.com/raw.pdf",
        "parsedTextFileUrl": "https://example.com/parsed.txt",
      },
    ]
  }
}
Update Documents
Update metadata of documents based on filters.
Authorization
- Name
 Authorization*- Type
 - string
 - Description
 - Bearer token authentication. Include your API key as 
Bearer your_api_key 
- Name
 Accept*- Type
 - string
 - Description
 application/json
Request Body
- Name
 namespaceId*- Type
 - string
 - Description
 - Unique identifier of the namespace containing the documents
 
- Name
 filterConfig*- Type
 - object
 - Description
 - Configuration for filtering documents
 
- Name
 documentIds- Type
 - array<string>(optional)
 - Description
 - List of document IDs to filter
 
- Name
 documentExternalIds- Type
 - array<string>(optional)
 - Description
 - List of external document IDs to filter
 
- Name
 documentConnectionIds- Type
 - array<string>(optional)
 - Description
 - List of connection IDs to filter
 
- Name
 documentTypes- Type
 - array<enum<string>>(optional)
 - Description
 - List of document types to filterAvailable options:
TEXT,URL,SITEMAP,WEBSITE 
- Name
 documentIngestionSources- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion sources to filterAvailable options:
TEXT,LOCAL_FILE,URLS_LIST,SITEMAP,WEBSITE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 documentIngestionStatuses- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion statuses to filterAvailable options:
BACKLOG,QUEUED,QUEUED_FOR_RESYNC,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 metadata- Type
 - object(optional)
 - Description
 - Metadata filters to apply
 
- Name
 includeConfig- Type
 - object(optional)
 - Description
 - Include options
 
- Name
 documents- Type
 - boolean(optional)
 - Description
 - Option to include the documents or not in the response. Defaults to true
 
- Name
 stats- Type
 - boolean(optional)
 - Description
 - Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
 
- Name
 statsBySource- Type
 - boolean(optional)
 - Description
 - Option to include the stats by source or not in the response. Defaults to false
 
- Name
 statsByStatus- Type
 - boolean(optional)
 - Description
 - Option to include the stats by status or not in the response. Defaults to false
 
- Name
 statsByDocumentType- Type
 - boolean(optional)
 - Description
 - Option to include the stats by document type or not in the response. Defaults to false
 
- Name
 rawFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the raw file URL or not in the response. Defaults to false
 
- Name
 parsedTextFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the parsed text file URL or not in the response. Defaults to false
 
- Name
 pagination- Type
 - object(optional)
 - Description
 - Pagination options
 
- Name
 pageSize- Type
 - number(optional)
 - Description
 - Number of documents per page (1-100, default: 20)
 
- Name
 cursor- Type
 - string(optional)
 - Description
 - Opaque cursor for fetching the next page
 
- Name
 data*- Type
 - object
 - Description
 - Data to update in the documents
 
- Name
 metadata- Type
 - object(optional)
 - Description
 - Metadata to update in the documents. This is a legacy field and will be deprecated. Use $metadata instead.
 
- Name
 $metadata- Type
 - object(optional)
 - Description
 - Advanced metadata to update in the documents
 
- Name
 $set- Type
 - object(optional)
 - Description
 - Set/replace the metadata to the given value. Applicable to both string and array values. The values will be set/replaced.
 
- Name
 $append- Type
 - object(optional)
 - Description
 - Append the metadata with the given value. Applicable only to array values. The values will be appended to the existing array.
 
- Name
 $remove- Type
 - object(optional)
 - Description
 - Remove the metadata with the given value. Applicable only to array values. The values will be removed from the existing array.
 
Request
curl -X PATCH https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["URL"],
      "documentIngestionSources": ["WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "category": "security"
      }
    },
    "pagination": {
      "pageSize": 10,
      "cursor": "eyJjcmVhdGVkQXQiOi..."
    },
    "data": {
      "metadata": {
        "status": "archived",
        "archivedAt": "2024-01-15T00:00:00Z"
      },
      "$metadata": {
        "$set": {
          "status": "archived",
          "category": ["security", "networking"]
        },
        "$append": {
          "complexity": ["advanced"]
        },
        "$remove": {
          "apiVersion": ["v0"]
        }
      }
    }
  }'
Response Body
- Name
 success*- Type
 - boolean
 - Description
 - Indicates whether the request is successful or not. This is always true for success responses.
 
- Name
 message*- Type
 - string
 - Description
 - Human readable message mentioning the result of the request
 
- Name
 data*- Type
 - object
 - Description
 - Data returned from the API.
 
- Name
 itemsUpdated*- Type
 - number
 - Description
 - Number of documents updated
 
- Name
 documents*- Type
 - array<object>
 - Description
 - List of documents
 
- Name
 id*- Type
 - string
 - Description
 - Unique identifier of the document
 
- Name
 name- Type
 - string | null(optional)
 - Description
 - Name of the document
 
- Name
 externalId*- Type
 - string
 - Description
 - External identifier of the document
 
- Name
 documentType*- Type
 - enum<string>
 - Description
 - Type of the documentAvailable options:
TEXT,URL,FILE,NOTION_DOCUMENT,GOOGLE_DRIVE_DOCUMENT,DROPBOX_DOCUMENT,ONEDRIVE_DOCUMENT,BOX_DOCUMENT,CONFLUENCE_DOCUMENT 
- Name
 ingestionSource*- Type
 - enum<string>
 - Description
 - Source from where the document was ingestedAvailable options:
TEXT,URLS_LIST,SITEMAP,WEBSITE,LOCAL_FILE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 ingestionSource*- Type
 - enum<string>
 - Description
 - Source from where the document was ingestedAvailable options:
BACKLOG,QUEUED,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 ingestionError- Type
 - string | null(optional)
 - Description
 - Error message if the document ingestion failed
 
- Name
 ingestJob*- Type
 - object
 - Description
 - Details of the ingest job
 
- Name
 id*- Type
 - string
 - Description
 - ID of the ingest job
 
- Name
 ingestJobRun*- Type
 - object
 - Description
 - Details of the ingest job run
 
- Name
 id*- Type
 - string
 - Description
 - ID of the ingest job run
 
- Name
 connection- Type
 - object(optional)
 - Description
 - Details of the connection
 
- Name
 id*- Type
 - string
 - Description
 - ID of the connection from which the document was ingested
 
- Name
 documentProperties- Type
 - object(optional)
 - Description
 - Properties of the document
 
- Name
 mimeType- Type
 - string(optional)
 - Description
 - MIME type of the document
 
- Name
 fileSize- Type
 - number(optional)
 - Description
 - Size of the document file in bytes
 
- Name
 characterCount- Type
 - number(optional)
 - Description
 - Number of characters in the document
 
- Name
 tokenCount- Type
 - number(optional)
 - Description
 - Number of tokens in the document
 
- Name
 embeddingCount- Type
 - number(optional)
 - Description
 - Number of embeddings in the document
 
- Name
 ocrPagesCount- Type
 - number(optional)
 - Description
 - Number of pages processed by OCR in the document
 
- Name
 embeddingConfig- Type
 - object(optional)
 - Description
 - Configuration of the embedding model
 
- Name
 provider- Type
 - enum<string>(optional)
 - Description
 - Provider of the embedding model usedAvailable options:
OPENAI,COHERE,JINA 
- Name
 model- Type
 - enum<string>(optional)
 - Description
 - Embedding model used to create the embeddingsAvailable options:
text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002,embed-english-v3.0,embed-multilingual-v3.0,embed-english-light-v3.0,embed-multilingual-light-v3.0,embed-english-v2.0,embed-english-light-v2.0,embed-multilingual-v2.0,jina-embeddings-v3 
- Name
 dimensions- Type
 - number(optional)
 - Description
 - Dimensions of the embedding model used
 
- Name
 chunkSize- Type
 - number(optional)
 - Description
 - Number of tokens in each chunk
 
- Name
 chunkOverlap- Type
 - number(optional)
 - Description
 - Number of tokens to overlap between chunks
 
- Name
 providers- Type
 - object(optional)
 - Description
 - Providers used to ingest the document
 
- Name
 fileStorage- Type
 - enum<string>(optional)
 - Description
 - Type of the file storage usedAvailable options:
S3_COMPATIBLE 
- Name
 vectorStorage- Type
 - enum<string>(optional)
 - Description
 - Provider of the vector storage usedAvailable options:
PINECONE 
- Name
 embeddingModel- Type
 - enum<string>(optional)
 - Description
 - Provider of the embedding model usedAvailable options:
OPENAI,COHERE,JINA 
- Name
 webScraper- Type
 - enum<string>(optional)
 - Description
 - Provider of the web scraper used if the document is from a web sourceAvailable options:
FIRECRAWL,JINA,SCRAPINGBEE 
- Name
 metadata*- Type
 - object
 - Description
 - Metadata associated with the document
 
- Name
 namespace*- Type
 - object
 - Description
 - Details of the namespace containing the document
 
- Name
 identifier*- Type
 - string
 - Description
 - Unique identifier of the namespace
 
- Name
 organization*- Type
 - object
 - Description
 - Details of the organization
 
- Name
 id*- Type
 - string
 - Description
 - ID of the organization containing the document
 
- Name
 createdAt*- Type
 - object
 - Description
 - Timestamp when the document was created
 
- Name
 isoString*- Type
 - string
 - Description
 - ISO 8601 formatted timestamp
 
- Name
 updatedAt*- Type
 - object
 - Description
 - Timestamp when the document was last updated
 
- Name
 isoString*- Type
 - string
 - Description
 - ISO 8601 formatted timestamp
 
Response
{
  "success": true,
  "message": "Documents updated successfully",
  "data": {
    "itemsUpdated": 10,    
    "documents": [
      {
        "id": "doc_123",
        "name": "https://example.com",
        "externalId": "external_123",
        "documentType": "URL",
        "ingestionSource": "WEBSITE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "text/html",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
          "ocrPagesCount": 0,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "status": "archived",
          "archivedAt": "2024-01-15T00:00:00Z",
          "category": ["security", "networking"],
          "complexity": ["advanced"],
          "apiVersion": ["v1"]
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        }
      }
    ]
  }
}
Delete Documents
Delete documents based on filters.
filterConfig: {}Authorization
- Name
 Authorization*- Type
 - string
 - Description
 - Bearer token authentication. Include your API key as 
Bearer your_api_key 
- Name
 Accept*- Type
 - string
 - Description
 application/json
Request Body
- Name
 namespaceId*- Type
 - string
 - Description
 - Unique identifier of the namespace containing the documents
 
- Name
 filterConfig*- Type
 - object
 - Description
 - Configuration for filtering documents
 
- Name
 documentIds- Type
 - array<string>(optional)
 - Description
 - List of document IDs to filter
 
- Name
 documentExternalIds- Type
 - array<string>(optional)
 - Description
 - List of external document IDs to filter
 
- Name
 documentConnectionIds- Type
 - array<string>(optional)
 - Description
 - List of connection IDs to filter
 
- Name
 documentTypes- Type
 - array<enum<string>>(optional)
 - Description
 - List of document types to filterAvailable options:
TEXT,URL,SITEMAP,WEBSITE 
- Name
 documentIngestionSources- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion sources to filterAvailable options:
TEXT,LOCAL_FILE,URLS_LIST,SITEMAP,WEBSITE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 documentIngestionStatuses- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion statuses to filterAvailable options:
BACKLOG,QUEUED,QUEUED_FOR_RESYNC,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 metadata- Type
 - object(optional)
 - Description
 - Metadata filters to apply
 
- Name
 includeConfig- Type
 - object(optional)
 - Description
 - Include options
 
- Name
 documents- Type
 - boolean(optional)
 - Description
 - Option to include the documents or not in the response. Defaults to true
 
- Name
 stats- Type
 - boolean(optional)
 - Description
 - Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
 
- Name
 statsBySource- Type
 - boolean(optional)
 - Description
 - Option to include the stats by source or not in the response. Defaults to false
 
- Name
 statsByStatus- Type
 - boolean(optional)
 - Description
 - Option to include the stats by status or not in the response. Defaults to false
 
- Name
 statsByDocumentType- Type
 - boolean(optional)
 - Description
 - Option to include the stats by document type or not in the response. Defaults to false
 
- Name
 rawFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the raw file URL or not in the response. Defaults to false
 
- Name
 parsedTextFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the parsed text file URL or not in the response. Defaults to false
 
- Name
 pagination- Type
 - object(optional)
 - Description
 - Pagination options
 
- Name
 pageSize- Type
 - number(optional)
 - Description
 - Number of documents per page (1-100, default: 20)
 
- Name
 cursor- Type
 - string(optional)
 - Description
 - Opaque cursor for fetching the next page
 
Request
curl -X DELETE https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentTypes": ["TEXT"],
      "documentIngestionSources": ["TEXT"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
      "pagination": {
        "pageSize": 10,
        "cursor": "eyJjcmVhdGVkQXQiOi..."
      }
    }
  }'
Response Body
- Name
 success*- Type
 - boolean
 - Description
 - Indicates whether the request is successful or not. This is always true for success responses.
 
- Name
 message*- Type
 - string
 - Description
 - Human readable message mentioning the result of the request
 
- Name
 data*- Type
 - object
 - Description
 - Data returned from the API.
 
- Name
 itemsDeleted*- Type
 - number
 - Description
 - Number of documents deleted
 
- Name
 documents*- Type
 - array<object>
 - Description
 - List of documents
 
- Name
 id*- Type
 - string
 - Description
 - Unique identifier of the document
 
- Name
 status*- Type
 - enum<string>
 - Description
 - Status of the documentAvailable options:
QUEUED_FOR_DELETION 
Response
{
  "success": true,
  "message": "Documents deleted successfully",
  "data": {
    "itemsDeleted": 2,    
    "documents": [
      {
        "id": "doc_123",
        "status": "QUEUED_FOR_DELETION"
      },
      {
        "id": "doc_124",
        "status": "QUEUED_FOR_DELETION"
      }
    ]
  }
}
Update Document Content
Update content of a document.
Authorization
- Name
 Authorization*- Type
 - string
 - Description
 - Bearer token authentication. Include your API key as 
Bearer your_api_key 
- Name
 Accept*- Type
 - string
 - Description
 application/json
Request Body
- Name
 namespaceId*- Type
 - string
 - Description
 - Unique identifier of the namespace containing the document
 
- Name
 content*- Type
 - string
 - Description
 - Content of the document to update
 
Request
curl -X PATCH https://api.sourcesync.ai/v1/documents/doc_123 \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "content": "This is the updated content of the document"
  }'
Response Body
- Name
 success*- Type
 - boolean
 - Description
 - Indicates whether the request is successful or not. This is always true for success responses.
 
- Name
 message*- Type
 - string
 - Description
 - Human readable message mentioning the result of the request
 
- Name
 data*- Type
 - object
 - Description
 - Data returned from the API.
 
- Name
 document*- Type
 - object
 - Description
 - Details of the document updated
 
- Name
 id*- Type
 - string
 - Description
 - Unique identifier of the document
 
- Name
 status*- Type
 - enum<string>
 - Description
 - Status of the documentAvailable options:
QUEUED_FOR_UPDATE 
Response
{
  "success": true,
  "message": "Added the document to the document content update queue successfully",
  "data": {   
    "document": {
      "id": "doc_123",
      "status": "QUEUED_FOR_UPDATE"
    }
  }
}
Resync Documents
TEXT and LOCAL_FILE documents and also the documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING.Authorization
- Name
 Authorization*- Type
 - string
 - Description
 - Bearer token authentication. Include your API key as 
Bearer your_api_key 
- Name
 Accept*- Type
 - string
 - Description
 application/json
Request Body
- Name
 namespaceId*- Type
 - string
 - Description
 - Unique identifier of the namespace containing the documents
 
- Name
 filterConfig*- Type
 - object
 - Description
 - Configuration for filtering documents
 
- Name
 documentIds- Type
 - array<string>(optional)
 - Description
 - List of document IDs to filter
 
- Name
 documentExternalIds- Type
 - array<string>(optional)
 - Description
 - List of external document IDs to filter
 
- Name
 documentConnectionIds- Type
 - array<string>(optional)
 - Description
 - List of connection IDs to filter
 
- Name
 documentTypes- Type
 - array<enum<string>>(optional)
 - Description
 - List of document types to filterAvailable options:
TEXT,URL,SITEMAP,WEBSITE 
- Name
 documentIngestionSources- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion sources to filterAvailable options:
TEXT,LOCAL_FILE,URLS_LIST,SITEMAP,WEBSITE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 documentIngestionStatuses- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion statuses to filterAvailable options:
BACKLOG,QUEUED,QUEUED_FOR_RESYNC,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 metadata- Type
 - object(optional)
 - Description
 - Metadata filters to apply
 
Request
curl -X POST https://api.sourcesync.ai/v1/documents/resync \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124", "doc_125", "doc_126", "doc_127"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["TEXT", "URL"],
      "documentIngestionSources": ["TEXT", "WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
    }
  }'
Response Body
- Name
 success*- Type
 - boolean
 - Description
 - Indicates whether the request is successful or not. This is always true for success responses.
 
- Name
 message*- Type
 - string
 - Description
 - Human readable message mentioning the result of the request
 
- Name
 data*- Type
 - object
 - Description
 - Data returned from the API.
 
- Name
 itemsQueued*- Type
 - number
 - Description
 - Number of documents queued for resync
 
- Name
 itemsSkipped*- Type
 - number
 - Description
 - Number of documents skipped for resync
 
- Name
 documents*- Type
 - array<object>
 - Description
 - List of documents
 
- Name
 id*- Type
 - string
 - Description
 - Unique identifier of the document
 
- Name
 status*- Type
 - enum<string>
 - Description
 - Status of the documentAvailable options:
QUEUED_FOR_RESYNC,NOT_ELIGIBLE_FOR_RESYNC 
- Name
 error- Type
 - string | null(optional)
 - Description
 - Error message if the document is not eligible for resync or resync failed
 
Response
{
  "success": true,
  "message": "Added the eligible documents to the resync queue successfully",
  "data": {
    "itemsQueued": 3,
    "itemsSkipped": 2,
    "documents": [
      {
        "id": "doc_123",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_124",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_125",
        "status": "NOT_ELIGIBLE_FOR_RESYNC",
        "error": "Documents with LOCAL_FILE and TEXT as documentType are not eligible for resync"
      },
      {
        "id": "doc_126",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_127",
        "status": "NOT_ELIGIBLE_FOR_RESYNC",
        "error": "Documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING are not eligible for resync"
      },
    ]
  }
}
Schedule Documents
TEXT and LOCAL_FILE documents.Authorization
- Name
 Authorization*- Type
 - string
 - Description
 - Bearer token authentication. Include your API key as 
Bearer your_api_key 
- Name
 Accept*- Type
 - string
 - Description
 application/json
Request Body
- Name
 namespaceId*- Type
 - string
 - Description
 - Unique identifier of the namespace containing the documents
 
- Name
 filterConfig*- Type
 - object
 - Description
 - Configuration for filtering documents
 
- Name
 documentIds- Type
 - array<string>(optional)
 - Description
 - List of document IDs to filter
 
- Name
 documentExternalIds- Type
 - array<string>(optional)
 - Description
 - List of external document IDs to filter
 
- Name
 documentConnectionIds- Type
 - array<string>(optional)
 - Description
 - List of connection IDs to filter
 
- Name
 documentTypes- Type
 - array<enum<string>>(optional)
 - Description
 - List of document types to filterAvailable options:
TEXT,URL,SITEMAP,WEBSITE 
- Name
 documentIngestionSources- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion sources to filterAvailable options:
TEXT,LOCAL_FILE,URLS_LIST,SITEMAP,WEBSITE,NOTION,GOOGLE_DRIVE,DROPBOX,ONEDRIVE,BOX,CONFLUENCE 
- Name
 documentIngestionStatuses- Type
 - array<enum<string>>(optional)
 - Description
 - List of ingestion statuses to filterAvailable options:
BACKLOG,QUEUED,QUEUED_FOR_RESYNC,PROCESSING,SUCCESS,FAILED,CANCELLED 
- Name
 metadata- Type
 - object(optional)
 - Description
 - Metadata filters to apply
 
- Name
 includeConfig- Type
 - object(optional)
 - Description
 - Include options
 
- Name
 documents- Type
 - boolean(optional)
 - Description
 - Option to include the documents or not in the response. Defaults to true
 
- Name
 stats- Type
 - boolean(optional)
 - Description
 - Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
 
- Name
 statsBySource- Type
 - boolean(optional)
 - Description
 - Option to include the stats by source or not in the response. Defaults to false
 
- Name
 statsByStatus- Type
 - boolean(optional)
 - Description
 - Option to include the stats by status or not in the response. Defaults to false
 
- Name
 statsByDocumentType- Type
 - boolean(optional)
 - Description
 - Option to include the stats by document type or not in the response. Defaults to false
 
- Name
 rawFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the raw file URL or not in the response. Defaults to false
 
- Name
 parsedTextFileUrl- Type
 - boolean(optional)
 - Description
 - Option to include the parsed text file URL or not in the response. Defaults to false
 
- Name
 pagination- Type
 - object(optional)
 - Description
 - Pagination options
 
- Name
 pageSize- Type
 - number(optional)
 - Description
 - Number of documents per page (1-100, default: 20)
 
- Name
 cursor- Type
 - string(optional)
 - Description
 - Opaque cursor for fetching the next page
 
- Name
 data*- Type
 - object
 - Description
 - Details of the schedule
 
- Name
 syncFrequency*- Type
 - enum<string>
 - Description
 - Sync frequency of the documents. Defaults to NEVER. If set to DAILY, WEEKLY, MONTHLY, the documents will be synced daily, weekly, monthly respectively.Available options:
DAILY,WEEKLY,MONTHLY,NEVER 
Request
curl -X POST https://api.sourcesync.ai/v1/documents/schedule \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["URL"],
      "documentIngestionSources": ["WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
    },
    "data": {
      "syncFrequency": "DAILY"
    }
  }'
Response Body
- Name
 success*- Type
 - boolean
 - Description
 - Indicates whether the request is successful or not. This is always true for success responses.
 
- Name
 message*- Type
 - string
 - Description
 - Human readable message mentioning the result of the request
 
- Name
 data*- Type
 - object
 - Description
 - Data returned from the API.
 
- Name
 itemsQueued*- Type
 - number
 - Description
 - Number of documents queued for resync
 
- Name
 itemsSkipped*- Type
 - number
 - Description
 - Number of documents skipped for resync
 
- Name
 documents*- Type
 - array<object>
 - Description
 - List of documents
 
- Name
 id*- Type
 - string
 - Description
 - Unique identifier of the document
 
- Name
 status*- Type
 - enum<string>
 - Description
 - Status of the documentAvailable options:
QUEUED_FOR_RESYNC,NOT_ELIGIBLE_FOR_RESYNC 
- Name
 error- Type
 - string | null(optional)
 - Description
 - Error message if the document is not eligible for resync or resync failed
 
Response
{
  "success": true,
  "message": "Added schedule to the eligible documents successfully",
  "data": {
    "itemsScheduled": 1,
    "itemsSkipped": 1,
    "documents": [
      {
        "id": "doc_123",
        "status": "SYNC_SCHEDULED",
        "syncFrequency": "DAILY",
        "error": null
      },
      {
        "id": "doc_124",
        "status": "NOT_ELIGIBLE_FOR_SYNC_SCHEDULE",
        "syncFrequency": null,
        "error": "Documents with ingestionSource as LOCAL_FILE and TEXT are not eligible for scheduling"
      },
    ]
  }
}
Error Codes
- Name
 NAMESPACE_NOT_FOUND- Description
 The specified namespace does not exist
- Name
 DOCUMENTS_NOT_FOUND- Description
 No documents match the filter criteria
- Name
 INVALID_FILTER_CONFIG- Description
 Invalid filter configuration provided
- Name
 UPDATE_DOCUMENTS_FAILED- Description
 Internal server error while updating documents
- Name
 DELETE_DOCUMENTS_FAILED- Description
 Internal server error while deleting documents
- Name
 UPDATE_DOCUMENT_CONTENT_FAILED- Description
 Internal server error while updating the document content
- Name
 RESYNC_DOCUMENTS_FAILED- Description
 Internal server error while resyncing documents
- Name
 SCHEDULE_DOCUMENTS_FAILED- Description
 Internal server error while scheduling documents
Filter Configuration
- Name
 documentIds- Description
 Filter by specific document IDs. If multiple documentIds are provided, it will be OR condition among these ids.
- Name
 documentExternalIds- Description
 Filter by external IDs. If multiple documentExternalIds are provided, it will be OR condition among these ids.
- Name
 documentConnectionIds- Description
 Filter by connection IDs. If multiple documentConnectionIds are provided, it will be OR condition among these ids.
- Name
 documentTypes- Description
 Filter by document types. If multiple documentTypes are provided, it will be OR condition among these types.
- Name
 documentIngestionSources- Description
 Filter by ingestion sources. If multiple documentIngestionSources are provided, it will be OR condition among these sources.
- Name
 documentIngestionStatuses- Description
 Filter by ingestion statuses. If multiple documentIngestionStatuses are provided, it will be OR condition among these statuses.
- Name
 metadata- Description
 Filter by metadata key-value pairs. If multiple metadata are provided, it will be AND condition among these key-value pairs.