Documents API Reference

Learn about the document management endpoints and how to work with your ingested content.

Documents are the core content units in SourceSync. Each document can have metadata that helps in organizing and filtering content. For detailed examples and best practices on pagination, see our Document Management Guide.

POST/v1/documents

Fetch Documents

Fetch documents with optional filters and pagination.

You can send an optional X-Tenant-ID header for multitenancy within a namespace (docs).

Authorization

Name
Authorization*
Type
string
Description
Bearer token authentication. Include your API key as Bearer your_api_key
Name
Accept*
Type
string
Description
application/json

Request Body

Name
namespaceId*
Type
string
Description
Unique identifier of the namespace containing the documents
Name
filterConfig*
Type
object
Description
Configuration for filtering documents
- Name
  documentIds
  Type
  array<string>(optional)
  Description
  List of document IDs to filter
- Name
  documentExternalIds
  Type
  array<string>(optional)
  Description
  List of external document IDs to filter
- Name
  documentConnectionIds
  Type
  array<string>(optional)
  Description
  List of connection IDs to filter
- Name
  documentTypes
  Type
  array<enum<string>>(optional)
  Description
  List of document types to filter
  Available options: TEXT, URL, SITEMAP, WEBSITE
- Name
  documentIngestionSources
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion sources to filter
  Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
- Name
  documentIngestionStatuses
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion statuses to filter
  Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
- Name
  metadata
  Type
  object(optional)
  Description
  Metadata filters to apply
Name
includeConfig
Type
object(optional)
Description
Include options
- Name
  documents
  Type
  boolean(optional)
  Description
  Option to include the documents or not in the response. Defaults to true
- Name
  stats
  Type
  boolean(optional)
  Description
  Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
  statsBySource
  Type
  boolean(optional)
  Description
  Option to include the stats by source or not in the response. Defaults to false
- Name
  statsByStatus
  Type
  boolean(optional)
  Description
  Option to include the stats by status or not in the response. Defaults to false
- Name
  statsByDocumentType
  Type
  boolean(optional)
  Description
  Option to include the stats by document type or not in the response. Defaults to false
- Name
  rawFileUrl
  Type
  boolean(optional)
  Description
  Option to include the raw file URL or not in the response. Defaults to false
- Name
  parsedTextFileUrl
  Type
  boolean(optional)
  Description
  Option to include the parsed text file URL or not in the response. Defaults to false
Name
pagination
Type
object(optional)
Description
Pagination options
- Name
  pageSize
  Type
  number(optional)
  Description
  Number of documents per page (1-100, default: 20)
- Name
  cursor
  Type
  string(optional)
  Description
  Opaque cursor for fetching the next page

Request

POST

/v1/documents

curl -X POST https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124"],
      "documentExternalIds": ["external_123"],
      "documentConnectionIds": ["conn_123"],
      "documentTypes": ["URL", "GOOGLE_DRIVE_DOCUMENT"],
      "documentIngestionSources": ["WEBSITE", "GOOGLE_DRIVE"],
      "documentIngestionStatuses": ["SUCCESS", "FAILED"],
      "metadata": {
        "category": "security",
        "status": "published"
      }
    },
    "includeConfig": {
      "documents": true,
      "statsBySource": true,
      "statsByStatus": true,
      "statsByDocumentType": true,
      "rawFileUrl": true,
      "parsedTextFileUrl": true
    },
    "pagination": {
      "pageSize": 10,
      "cursor": "eyJjcmVhdGVkQXQiOi..."
    }
  }'

Response Body

Name
success*
Type
boolean
Description
Indicates whether the request is successful or not. This is always true for success responses.
Name
message*
Type
string
Description
Human readable message mentioning the result of the request
Name
data*
Type
object
Description
Data returned from the API.
- Name
  itemsReturned*
  Type
  number
  Description
  Number of documents returned in current page
- Name
  hasNextPage*
  Type
  boolean
  Description
  Whether more documents are available
- Name
  nextCursor
  Type
  string(optional)
  Description
  Cursor for fetching next page, or undefined if no more pages
- Name
  statsBySource
  Type
  array<object>(optional)
  Description
  Stats of the documents by ingestion source. This will be present when includeConfig.statsBySource is set to true in the request
  Name
  source*
  Type
  enum<string>
  Description
  Ingestion source of the document
  Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
  Name
  totalCount*
  Type
  number
  Description
  Total number of documents ingested via this source
- Name
  statsByStatus
  Type
  array<object>(optional)
  Description
  Stats of the documents by status. This will be present when includeConfig.statsByStatus is set to true in the request
  Name
  status*
  Type
  enum<string>
  Description
  Status of the document
  Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, QUEUED_FOR_UPDATE, QUEUED_FOR_DELETION, PROCESSING, SUCCESS, FAILED, CANCELLED
  Name
  totalCount*
  Type
  number
  Description
  Total number of documents with this status
- Name
  statsByDocumentType
  Type
  array<object>(optional)
  Description
  Stats of the documents by document type. This will be present when includeConfig.statsByDocumentType is set to true in the request
  Name
  documentType*
  Type
  enum<string>
  Description
  Document type of the document
  Available options: TEXT, FILE, URL, NOTION_DOCUMENT, GOOGLE_DRIVE_DOCUMENT, DROPBOX_DOCUMENT, ONEDRIVE_DOCUMENT, BOX_DOCUMENT, CONFLUENCE_DOCUMENT
  Name
  totalCount*
  Type
  number
  Description
  Total number of documents ingested via this source
- Name
  documents*
  Type
  array<object>
  Description
  List of documents
  Name
  id*
  Type
  string
  Description
  Unique identifier of the document
  Name
  name
  Type
  string | null(optional)
  Description
  Name of the document
  Name
  externalId*
  Type
  string
  Description
  External identifier of the document
  Name
  documentType*
  Type
  enum<string>
  Description
  Type of the document
  Available options: TEXT, URL, FILE, NOTION_DOCUMENT, GOOGLE_DRIVE_DOCUMENT, DROPBOX_DOCUMENT, ONEDRIVE_DOCUMENT, BOX_DOCUMENT, CONFLUENCE_DOCUMENT
  Name
  ingestionSource*
  Type
  enum<string>
  Description
  Source from where the document was ingested
  Available options: TEXT, URLS_LIST, SITEMAP, WEBSITE, LOCAL_FILE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
  Name
  ingestionSource*
  Type
  enum<string>
  Description
  Source from where the document was ingested
  Available options: BACKLOG, QUEUED, PROCESSING, SUCCESS, FAILED, CANCELLED
  Name
  ingestionError
  Type
  string | null(optional)
  Description
  Error message if the document ingestion failed
  Name
  ingestJob*
  Type
  object
  Description
  Details of the ingest job
  Name
  id*
  Type
  string
  Description
  ID of the ingest job
  Name
  ingestJobRun*
  Type
  object
  Description
  Details of the ingest job run
  Name
  id*
  Type
  string
  Description
  ID of the ingest job run
  Name
  connection
  Type
  object(optional)
  Description
  Details of the connection
  Name
  id*
  Type
  string
  Description
  ID of the connection from which the document was ingested
  Name
  documentProperties
  Type
  object(optional)
  Description
  Properties of the document
  Name
  mimeType
  Type
  string(optional)
  Description
  MIME type of the document
  Name
  fileSize
  Type
  number(optional)
  Description
  Size of the document file in bytes
  Name
  characterCount
  Type
  number(optional)
  Description
  Number of characters in the document
  Name
  tokenCount
  Type
  number(optional)
  Description
  Number of tokens in the document
  Name
  embeddingCount
  Type
  number(optional)
  Description
  Number of embeddings in the document
  Name
  ocrPagesCount
  Type
  number(optional)
  Description
  Number of pages processed by OCR in the document
  Name
  embeddingConfig
  Type
  object(optional)
  Description
  Configuration of the embedding model
  Name
  provider
  Type
  enum<string>(optional)
  Description
  Provider of the embedding model used
  Available options: OPENAI, COHERE, JINA
  Name
  model
  Type
  enum<string>(optional)
  Description
  Embedding model used to create the embeddings
  Available options: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002, embed-english-v3.0, embed-multilingual-v3.0, embed-english-light-v3.0, embed-multilingual-light-v3.0, embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0, jina-embeddings-v3
  Name
  dimensions
  Type
  number(optional)
  Description
  Dimensions of the embedding model used
  Name
  chunkSize
  Type
  number(optional)
  Description
  Number of tokens in each chunk
  Name
  chunkOverlap
  Type
  number(optional)
  Description
  Number of tokens to overlap between chunks
  Name
  providers
  Type
  object(optional)
  Description
  Providers used to ingest the document
  Name
  fileStorage
  Type
  enum<string>(optional)
  Description
  Type of the file storage used
  Available options: S3_COMPATIBLE
  Name
  vectorStorage
  Type
  enum<string>(optional)
  Description
  Provider of the vector storage used
  Available options: PINECONE
  Name
  embeddingModel
  Type
  enum<string>(optional)
  Description
  Provider of the embedding model used
  Available options: OPENAI, COHERE, JINA
  Name
  webScraper
  Type
  enum<string>(optional)
  Description
  Provider of the web scraper used if the document is from a web source
  Available options: FIRECRAWL, JINA, SCRAPINGBEE
  Name
  metadata*
  Type
  object
  Description
  Metadata associated with the document
  Name
  namespace*
  Type
  object
  Description
  Details of the namespace containing the document
  Name
  identifier*
  Type
  string
  Description
  Unique identifier of the namespace
  Name
  organization*
  Type
  object
  Description
  Details of the organization
  Name
  id*
  Type
  string
  Description
  ID of the organization containing the document
  Name
  createdAt*
  Type
  object
  Description
  Timestamp when the document was created
  Name
  isoString*
  Type
  string
  Description
  ISO 8601 formatted timestamp
  Name
  updatedAt*
  Type
  object
  Description
  Timestamp when the document was last updated
  Name
  isoString*
  Type
  string
  Description
  ISO 8601 formatted timestamp

Response

POST

/v1/documents

{
  "success": true,
  "message": "Documents retrieved successfully",
  "data": {
    "itemsReturned": 10,
    "hasNextPage": true,
    "nextCursor": "eyJjcmVhdGVkQXQiOi...",
    "statsBySource": [
      {
        "source": "WEBSITE",
        "totalCount": 5
      },
      {
        "source": "LOCAL_FILE",
        "totalCount": 2
      },
      {
        "source": "GOOGLE_DRIVE",
        "totalCount": 3
      }
    ],
    "statsByStatus": [
      {
        "status": "BACKLOG",
        "totalCount": 1
      },
      {
        "status": "QUEUED",
        "totalCount": 1
      },
      {
        "status": "QUEUED_FOR_RESYNC",
        "totalCount": 1
      },
      {
        "status": "QUEUED_FOR_UPDATE",
        "totalCount": 1
      },
      {
        "status": "PROCESSING",
        "totalCount": 1
      },
      {
        "status": "SUCCESS",
        "totalCount": 5
      },
      {
        "status": "FAILED",
        "totalCount": 2
      },
      {
        "status": "CANCELLED",
        "totalCount": 1
      }
    ],
    "statsByDocumentType": [      
      {
        "documentType": "TEXT",
        "totalCount": 2
      },
      {
        "documentType": "URL",
        "totalCount": 5
      },
      {
        "documentType": "FILE",
        "totalCount": 3
      },
      {
        "documentType": "GOOGLE_DRIVE_DOCUMENT",
        "totalCount": 1
      },
    ],
    "documents": [
      {
        "id": "doc_123",
        "name": "https://example.com",
        "externalId": "external_123",
        "documentType": "URL",
        "ingestionSource": "WEBSITE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "text/html",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
          "ocrPagesCount": 0,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "category": "security",
          "status": "published"
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "rawFileUrl": "https://example.com/raw.html",
        "parsedTextFileUrl": "https://example.com/parsed.txt",
      },
      {
        "id": "doc_124",
        "name": "doc.pdf",
        "externalId": "external_124",
        "documentType": "GOOGLE_DRIVE_DOCUMENT",
        "ingestionSource": "GOOGLE_DRIVE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "application/pdf",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
          "ocrPagesCount": 1,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "category": "security",
          "status": "published"
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "rawFileUrl": "https://example.com/raw.pdf",
        "parsedTextFileUrl": "https://example.com/parsed.txt",
      },
    ]
  }
}

PATCH/v1/documents

Update Documents

Update metadata of documents based on filters.

You can send an optional X-Tenant-ID header for multitenancy within a namespace (docs).

Authorization

Name
Authorization*
Type
string
Description
Bearer token authentication. Include your API key as Bearer your_api_key
Name
Accept*
Type
string
Description
application/json

Request Body

Name
namespaceId*
Type
string
Description
Unique identifier of the namespace containing the documents
Name
filterConfig*
Type
object
Description
Configuration for filtering documents
- Name
  documentIds
  Type
  array<string>(optional)
  Description
  List of document IDs to filter
- Name
  documentExternalIds
  Type
  array<string>(optional)
  Description
  List of external document IDs to filter
- Name
  documentConnectionIds
  Type
  array<string>(optional)
  Description
  List of connection IDs to filter
- Name
  documentTypes
  Type
  array<enum<string>>(optional)
  Description
  List of document types to filter
  Available options: TEXT, URL, SITEMAP, WEBSITE
- Name
  documentIngestionSources
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion sources to filter
  Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
- Name
  documentIngestionStatuses
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion statuses to filter
  Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
- Name
  metadata
  Type
  object(optional)
  Description
  Metadata filters to apply
Name
includeConfig
Type
object(optional)
Description
Include options
- Name
  documents
  Type
  boolean(optional)
  Description
  Option to include the documents or not in the response. Defaults to true
- Name
  stats
  Type
  boolean(optional)
  Description
  Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
  statsBySource
  Type
  boolean(optional)
  Description
  Option to include the stats by source or not in the response. Defaults to false
- Name
  statsByStatus
  Type
  boolean(optional)
  Description
  Option to include the stats by status or not in the response. Defaults to false
- Name
  statsByDocumentType
  Type
  boolean(optional)
  Description
  Option to include the stats by document type or not in the response. Defaults to false
- Name
  rawFileUrl
  Type
  boolean(optional)
  Description
  Option to include the raw file URL or not in the response. Defaults to false
- Name
  parsedTextFileUrl
  Type
  boolean(optional)
  Description
  Option to include the parsed text file URL or not in the response. Defaults to false
Name
pagination
Type
object(optional)
Description
Pagination options
- Name
  pageSize
  Type
  number(optional)
  Description
  Number of documents per page (1-100, default: 20)
- Name
  cursor
  Type
  string(optional)
  Description
  Opaque cursor for fetching the next page
Name
data*
Type
object
Description
Data to update in the documents
- Name
  metadata
  Type
  object(optional)
  Description
  Metadata to update in the documents. This is a legacy field and will be deprecated. Use $metadata instead.
- Name
  $metadata
  Type
  object(optional)
  Description
  Advanced metadata to update in the documents
  Name
  $set
  Type
  object(optional)
  Description
  Set/replace the metadata to the given value. Applicable to both string and array values. The values will be set/replaced.
  Name
  $append
  Type
  object(optional)
  Description
  Append the metadata with the given value. Applicable only to array values. The values will be appended to the existing array.
  Name
  $remove
  Type
  object(optional)
  Description
  Remove the metadata with the given value. Applicable only to array values. The values will be removed from the existing array.

Request

PATCH

/v1/documents

curl -X PATCH https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["URL"],
      "documentIngestionSources": ["WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "category": "security"
      }
    },
    "pagination": {
      "pageSize": 10,
      "cursor": "eyJjcmVhdGVkQXQiOi..."
    },
    "data": {
      "metadata": {
        "status": "archived",
        "archivedAt": "2024-01-15T00:00:00Z"
      },
      "$metadata": {
        "$set": {
          "status": "archived",
          "category": ["security", "networking"]
        },
        "$append": {
          "complexity": ["advanced"]
        },
        "$remove": {
          "apiVersion": ["v0"]
        }
      }
    }
  }'

Response Body

Name
success*
Type
boolean
Description
Indicates whether the request is successful or not. This is always true for success responses.
Name
message*
Type
string
Description
Human readable message mentioning the result of the request
Name
data*
Type
object
Description
Data returned from the API.
- Name
  itemsUpdated*
  Type
  number
  Description
  Number of documents updated
- Name
  documents*
  Type
  array<object>
  Description
  List of documents
  Name
  id*
  Type
  string
  Description
  Unique identifier of the document
  Name
  name
  Type
  string | null(optional)
  Description
  Name of the document
  Name
  externalId*
  Type
  string
  Description
  External identifier of the document
  Name
  documentType*
  Type
  enum<string>
  Description
  Type of the document
  Available options: TEXT, URL, FILE, NOTION_DOCUMENT, GOOGLE_DRIVE_DOCUMENT, DROPBOX_DOCUMENT, ONEDRIVE_DOCUMENT, BOX_DOCUMENT, CONFLUENCE_DOCUMENT
  Name
  ingestionSource*
  Type
  enum<string>
  Description
  Source from where the document was ingested
  Available options: TEXT, URLS_LIST, SITEMAP, WEBSITE, LOCAL_FILE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
  Name
  ingestionSource*
  Type
  enum<string>
  Description
  Source from where the document was ingested
  Available options: BACKLOG, QUEUED, PROCESSING, SUCCESS, FAILED, CANCELLED
  Name
  ingestionError
  Type
  string | null(optional)
  Description
  Error message if the document ingestion failed
  Name
  ingestJob*
  Type
  object
  Description
  Details of the ingest job
  Name
  id*
  Type
  string
  Description
  ID of the ingest job
  Name
  ingestJobRun*
  Type
  object
  Description
  Details of the ingest job run
  Name
  id*
  Type
  string
  Description
  ID of the ingest job run
  Name
  connection
  Type
  object(optional)
  Description
  Details of the connection
  Name
  id*
  Type
  string
  Description
  ID of the connection from which the document was ingested
  Name
  documentProperties
  Type
  object(optional)
  Description
  Properties of the document
  Name
  mimeType
  Type
  string(optional)
  Description
  MIME type of the document
  Name
  fileSize
  Type
  number(optional)
  Description
  Size of the document file in bytes
  Name
  characterCount
  Type
  number(optional)
  Description
  Number of characters in the document
  Name
  tokenCount
  Type
  number(optional)
  Description
  Number of tokens in the document
  Name
  embeddingCount
  Type
  number(optional)
  Description
  Number of embeddings in the document
  Name
  ocrPagesCount
  Type
  number(optional)
  Description
  Number of pages processed by OCR in the document
  Name
  embeddingConfig
  Type
  object(optional)
  Description
  Configuration of the embedding model
  Name
  provider
  Type
  enum<string>(optional)
  Description
  Provider of the embedding model used
  Available options: OPENAI, COHERE, JINA
  Name
  model
  Type
  enum<string>(optional)
  Description
  Embedding model used to create the embeddings
  Available options: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002, embed-english-v3.0, embed-multilingual-v3.0, embed-english-light-v3.0, embed-multilingual-light-v3.0, embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0, jina-embeddings-v3
  Name
  dimensions
  Type
  number(optional)
  Description
  Dimensions of the embedding model used
  Name
  chunkSize
  Type
  number(optional)
  Description
  Number of tokens in each chunk
  Name
  chunkOverlap
  Type
  number(optional)
  Description
  Number of tokens to overlap between chunks
  Name
  providers
  Type
  object(optional)
  Description
  Providers used to ingest the document
  Name
  fileStorage
  Type
  enum<string>(optional)
  Description
  Type of the file storage used
  Available options: S3_COMPATIBLE
  Name
  vectorStorage
  Type
  enum<string>(optional)
  Description
  Provider of the vector storage used
  Available options: PINECONE
  Name
  embeddingModel
  Type
  enum<string>(optional)
  Description
  Provider of the embedding model used
  Available options: OPENAI, COHERE, JINA
  Name
  webScraper
  Type
  enum<string>(optional)
  Description
  Provider of the web scraper used if the document is from a web source
  Available options: FIRECRAWL, JINA, SCRAPINGBEE
  Name
  metadata*
  Type
  object
  Description
  Metadata associated with the document
  Name
  namespace*
  Type
  object
  Description
  Details of the namespace containing the document
  Name
  identifier*
  Type
  string
  Description
  Unique identifier of the namespace
  Name
  organization*
  Type
  object
  Description
  Details of the organization
  Name
  id*
  Type
  string
  Description
  ID of the organization containing the document
  Name
  createdAt*
  Type
  object
  Description
  Timestamp when the document was created
  Name
  isoString*
  Type
  string
  Description
  ISO 8601 formatted timestamp
  Name
  updatedAt*
  Type
  object
  Description
  Timestamp when the document was last updated
  Name
  isoString*
  Type
  string
  Description
  ISO 8601 formatted timestamp

Response

PATCH

/v1/documents

{
  "success": true,
  "message": "Documents updated successfully",
  "data": {
    "itemsUpdated": 10,    
    "documents": [
      {
        "id": "doc_123",
        "name": "https://example.com",
        "externalId": "external_123",
        "documentType": "URL",
        "ingestionSource": "WEBSITE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "text/html",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
          "ocrPagesCount": 0,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "status": "archived",
          "archivedAt": "2024-01-15T00:00:00Z",
          "category": ["security", "networking"],
          "complexity": ["advanced"],
          "apiVersion": ["v1"]
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        }
      }
    ]
  }
}

DELETE/v1/documents

Delete Documents

Delete documents based on filters.

You can send an optional X-Tenant-ID header for multitenancy within a namespace (docs).

If you want to delete all documents in the namespace, set the filterConfig as empty object.

filterConfig: {}

Authorization

Name
Authorization*
Type
string
Description
Bearer token authentication. Include your API key as Bearer your_api_key
Name
Accept*
Type
string
Description
application/json

Request Body

Name
namespaceId*
Type
string
Description
Unique identifier of the namespace containing the documents
Name
filterConfig*
Type
object
Description
Configuration for filtering documents
- Name
  documentIds
  Type
  array<string>(optional)
  Description
  List of document IDs to filter
- Name
  documentExternalIds
  Type
  array<string>(optional)
  Description
  List of external document IDs to filter
- Name
  documentConnectionIds
  Type
  array<string>(optional)
  Description
  List of connection IDs to filter
- Name
  documentTypes
  Type
  array<enum<string>>(optional)
  Description
  List of document types to filter
  Available options: TEXT, URL, SITEMAP, WEBSITE
- Name
  documentIngestionSources
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion sources to filter
  Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
- Name
  documentIngestionStatuses
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion statuses to filter
  Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
- Name
  metadata
  Type
  object(optional)
  Description
  Metadata filters to apply
Name
includeConfig
Type
object(optional)
Description
Include options
- Name
  documents
  Type
  boolean(optional)
  Description
  Option to include the documents or not in the response. Defaults to true
- Name
  stats
  Type
  boolean(optional)
  Description
  Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
  statsBySource
  Type
  boolean(optional)
  Description
  Option to include the stats by source or not in the response. Defaults to false
- Name
  statsByStatus
  Type
  boolean(optional)
  Description
  Option to include the stats by status or not in the response. Defaults to false
- Name
  statsByDocumentType
  Type
  boolean(optional)
  Description
  Option to include the stats by document type or not in the response. Defaults to false
- Name
  rawFileUrl
  Type
  boolean(optional)
  Description
  Option to include the raw file URL or not in the response. Defaults to false
- Name
  parsedTextFileUrl
  Type
  boolean(optional)
  Description
  Option to include the parsed text file URL or not in the response. Defaults to false
Name
pagination
Type
object(optional)
Description
Pagination options
- Name
  pageSize
  Type
  number(optional)
  Description
  Number of documents per page (1-100, default: 20)
- Name
  cursor
  Type
  string(optional)
  Description
  Opaque cursor for fetching the next page

Request

DELETE

/v1/documents

curl -X DELETE https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentTypes": ["TEXT"],
      "documentIngestionSources": ["TEXT"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
      "pagination": {
        "pageSize": 10,
        "cursor": "eyJjcmVhdGVkQXQiOi..."
      }
    }
  }'

Response Body

Name
success*
Type
boolean
Description
Indicates whether the request is successful or not. This is always true for success responses.
Name
message*
Type
string
Description
Human readable message mentioning the result of the request
Name
data*
Type
object
Description
Data returned from the API.
- Name
  itemsDeleted*
  Type
  number
  Description
  Number of documents deleted
- Name
  documents*
  Type
  array<object>
  Description
  List of documents
  Name
  id*
  Type
  string
  Description
  Unique identifier of the document
  Name
  status*
  Type
  enum<string>
  Description
  Status of the document
  Available options: QUEUED_FOR_DELETION

Response

DELETE

/v1/documents

{
  "success": true,
  "message": "Documents deleted successfully",
  "data": {
    "itemsDeleted": 2,    
    "documents": [
      {
        "id": "doc_123",
        "status": "QUEUED_FOR_DELETION"
      },
      {
        "id": "doc_124",
        "status": "QUEUED_FOR_DELETION"
      }
    ]
  }
}

PATCH/v1/documents/{documentId}

Update Document Content

Update content of a document.

You can send an optional X-Tenant-ID header for multitenancy within a namespace (docs).

Authorization

Name
Authorization*
Type
string
Description
Bearer token authentication. Include your API key as Bearer your_api_key
Name
Accept*
Type
string
Description
application/json

Request Body

Name
namespaceId*
Type
string
Description
Unique identifier of the namespace containing the document
Name
content*
Type
string
Description
Content of the document to update

Request

PATCH

/v1/documents/{documentId}

curl -X PATCH https://api.sourcesync.ai/v1/documents/doc_123 \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "content": "This is the updated content of the document"
  }'

Response Body

Name
success*
Type
boolean
Description
Indicates whether the request is successful or not. This is always true for success responses.
Name
message*
Type
string
Description
Human readable message mentioning the result of the request
Name
data*
Type
object
Description
Data returned from the API.
- Name
  document*
  Type
  object
  Description
  Details of the document updated
  Name
  id*
  Type
  string
  Description
  Unique identifier of the document
  Name
  status*
  Type
  enum<string>
  Description
  Status of the document
  Available options: QUEUED_FOR_UPDATE

Response

PATCH

/v1/documents/{documentId}

{
  "success": true,
  "message": "Added the document to the document content update queue successfully",
  "data": {   
    "document": {
      "id": "doc_123",
      "status": "QUEUED_FOR_UPDATE"
    }
  }
}

POST/v1/documents/resync

Resync Documents

Resync documents based on filters. You cannot resync TEXT and LOCAL_FILE documents and also the documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING.

You can send an optional X-Tenant-ID header for multitenancy within a namespace (docs).

Authorization

Name
Authorization*
Type
string
Description
Bearer token authentication. Include your API key as Bearer your_api_key
Name
Accept*
Type
string
Description
application/json

Request Body

Name
namespaceId*
Type
string
Description
Unique identifier of the namespace containing the documents
Name
filterConfig*
Type
object
Description
Configuration for filtering documents
- Name
  documentIds
  Type
  array<string>(optional)
  Description
  List of document IDs to filter
- Name
  documentExternalIds
  Type
  array<string>(optional)
  Description
  List of external document IDs to filter
- Name
  documentConnectionIds
  Type
  array<string>(optional)
  Description
  List of connection IDs to filter
- Name
  documentTypes
  Type
  array<enum<string>>(optional)
  Description
  List of document types to filter
  Available options: TEXT, URL, SITEMAP, WEBSITE
- Name
  documentIngestionSources
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion sources to filter
  Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
- Name
  documentIngestionStatuses
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion statuses to filter
  Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
- Name
  metadata
  Type
  object(optional)
  Description
  Metadata filters to apply

Request

POST

/v1/documents/resync

curl -X POST https://api.sourcesync.ai/v1/documents/resync \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124", "doc_125", "doc_126", "doc_127"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["TEXT", "URL"],
      "documentIngestionSources": ["TEXT", "WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
    }
  }'

Response Body

Name
success*
Type
boolean
Description
Indicates whether the request is successful or not. This is always true for success responses.
Name
message*
Type
string
Description
Human readable message mentioning the result of the request
Name
data*
Type
object
Description
Data returned from the API.
- Name
  itemsQueued*
  Type
  number
  Description
  Number of documents queued for resync
- Name
  itemsSkipped*
  Type
  number
  Description
  Number of documents skipped for resync
- Name
  documents*
  Type
  array<object>
  Description
  List of documents
  Name
  id*
  Type
  string
  Description
  Unique identifier of the document
  Name
  status*
  Type
  enum<string>
  Description
  Status of the document
  Available options: QUEUED_FOR_RESYNC, NOT_ELIGIBLE_FOR_RESYNC
  Name
  error
  Type
  string | null(optional)
  Description
  Error message if the document is not eligible for resync or resync failed

Response

POST

/v1/documents/resync

{
  "success": true,
  "message": "Added the eligible documents to the resync queue successfully",
  "data": {
    "itemsQueued": 3,
    "itemsSkipped": 2,
    "documents": [
      {
        "id": "doc_123",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_124",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_125",
        "status": "NOT_ELIGIBLE_FOR_RESYNC",
        "error": "Documents with LOCAL_FILE and TEXT as documentType are not eligible for resync"
      },
      {
        "id": "doc_126",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_127",
        "status": "NOT_ELIGIBLE_FOR_RESYNC",
        "error": "Documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING are not eligible for resync"
      },
    ]
  }
}

POST/v1/documents/schedule

Schedule Documents

Schedule sync of documents based on filters. You cannot schedule sync for TEXT and LOCAL_FILE documents.

You can send an optional X-Tenant-ID header for multitenancy within a namespace (docs).

Authorization

Name
Authorization*
Type
string
Description
Bearer token authentication. Include your API key as Bearer your_api_key
Name
Accept*
Type
string
Description
application/json

Request Body

Name
namespaceId*
Type
string
Description
Unique identifier of the namespace containing the documents
Name
filterConfig*
Type
object
Description
Configuration for filtering documents
- Name
  documentIds
  Type
  array<string>(optional)
  Description
  List of document IDs to filter
- Name
  documentExternalIds
  Type
  array<string>(optional)
  Description
  List of external document IDs to filter
- Name
  documentConnectionIds
  Type
  array<string>(optional)
  Description
  List of connection IDs to filter
- Name
  documentTypes
  Type
  array<enum<string>>(optional)
  Description
  List of document types to filter
  Available options: TEXT, URL, SITEMAP, WEBSITE
- Name
  documentIngestionSources
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion sources to filter
  Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX, CONFLUENCE
- Name
  documentIngestionStatuses
  Type
  array<enum<string>>(optional)
  Description
  List of ingestion statuses to filter
  Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
- Name
  metadata
  Type
  object(optional)
  Description
  Metadata filters to apply
Name
includeConfig
Type
object(optional)
Description
Include options
- Name
  documents
  Type
  boolean(optional)
  Description
  Option to include the documents or not in the response. Defaults to true
- Name
  stats
  Type
  boolean(optional)
  Description
  Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
  statsBySource
  Type
  boolean(optional)
  Description
  Option to include the stats by source or not in the response. Defaults to false
- Name
  statsByStatus
  Type
  boolean(optional)
  Description
  Option to include the stats by status or not in the response. Defaults to false
- Name
  statsByDocumentType
  Type
  boolean(optional)
  Description
  Option to include the stats by document type or not in the response. Defaults to false
- Name
  rawFileUrl
  Type
  boolean(optional)
  Description
  Option to include the raw file URL or not in the response. Defaults to false
- Name
  parsedTextFileUrl
  Type
  boolean(optional)
  Description
  Option to include the parsed text file URL or not in the response. Defaults to false
Name
pagination
Type
object(optional)
Description
Pagination options
- Name
  pageSize
  Type
  number(optional)
  Description
  Number of documents per page (1-100, default: 20)
- Name
  cursor
  Type
  string(optional)
  Description
  Opaque cursor for fetching the next page
Name
data*
Type
object
Description
Details of the schedule
- Name
  syncFrequency*
  Type
  enum<string>
  Description
  Sync frequency of the documents. Defaults to NEVER. If set to DAILY, WEEKLY, MONTHLY, the documents will be synced daily, weekly, monthly respectively.
  Available options: DAILY, WEEKLY, MONTHLY, NEVER

Request

POST

/v1/documents/schedule

curl -X POST https://api.sourcesync.ai/v1/documents/schedule \
  -H "Authorization: Bearer $SOURCE_SYNC_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["URL"],
      "documentIngestionSources": ["WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
    },
    "data": {
      "syncFrequency": "DAILY"
    }
  }'

Response Body

Name
success*
Type
boolean
Description
Indicates whether the request is successful or not. This is always true for success responses.
Name
message*
Type
string
Description
Human readable message mentioning the result of the request
Name
data*
Type
object
Description
Data returned from the API.
- Name
  itemsQueued*
  Type
  number
  Description
  Number of documents queued for resync
- Name
  itemsSkipped*
  Type
  number
  Description
  Number of documents skipped for resync
- Name
  documents*
  Type
  array<object>
  Description
  List of documents
  Name
  id*
  Type
  string
  Description
  Unique identifier of the document
  Name
  status*
  Type
  enum<string>
  Description
  Status of the document
  Available options: QUEUED_FOR_RESYNC, NOT_ELIGIBLE_FOR_RESYNC
  Name
  error
  Type
  string | null(optional)
  Description
  Error message if the document is not eligible for resync or resync failed

Response

POST

/v1/documents/resync

{
  "success": true,
  "message": "Added schedule to the eligible documents successfully",
  "data": {
    "itemsScheduled": 1,
    "itemsSkipped": 1,
    "documents": [
      {
        "id": "doc_123",
        "status": "SYNC_SCHEDULED",
        "syncFrequency": "DAILY",
        "error": null
      },
      {
        "id": "doc_124",
        "status": "NOT_ELIGIBLE_FOR_SYNC_SCHEDULE",
        "syncFrequency": null,
        "error": "Documents with ingestionSource as LOCAL_FILE and TEXT are not eligible for scheduling"
      },
    ]
  }
}

Error Codes

Name
NAMESPACE_NOT_FOUND
Description
The specified namespace does not exist
Name
DOCUMENTS_NOT_FOUND
Description
No documents match the filter criteria
Name
INVALID_FILTER_CONFIG
Description
Invalid filter configuration provided
Name
UPDATE_DOCUMENTS_FAILED
Description
Internal server error while updating documents
Name
DELETE_DOCUMENTS_FAILED
Description
Internal server error while deleting documents
Name
UPDATE_DOCUMENT_CONTENT_FAILED
Description
Internal server error while updating the document content
Name
RESYNC_DOCUMENTS_FAILED
Description
Internal server error while resyncing documents
Name
SCHEDULE_DOCUMENTS_FAILED
Description
Internal server error while scheduling documents

Filter Configuration

Name
documentIds
Description
Filter by specific document IDs. If multiple documentIds are provided, it will be OR condition among these ids.
Name
documentExternalIds
Description
Filter by external IDs. If multiple documentExternalIds are provided, it will be OR condition among these ids.
Name
documentConnectionIds
Description
Filter by connection IDs. If multiple documentConnectionIds are provided, it will be OR condition among these ids.
Name
documentTypes
Description
Filter by document types. If multiple documentTypes are provided, it will be OR condition among these types.
Name
documentIngestionSources
Description
Filter by ingestion sources. If multiple documentIngestionSources are provided, it will be OR condition among these sources.
Name
documentIngestionStatuses
Description
Filter by ingestion statuses. If multiple documentIngestionStatuses are provided, it will be OR condition among these statuses.
Name
metadata
Description
Filter by metadata key-value pairs. If multiple metadata are provided, it will be AND condition among these key-value pairs.