Documents API Reference

Learn about the document management endpoints and how to work with your ingested content.

POST/v1/documents

Fetch Documents

Fetch documents with optional filters and pagination.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key
  • Name
    Accept*
    Type
    string
    Description
    application/json

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    Unique identifier of the namespace containing the documents
  • Name
    filterConfig*
    Type
    object
    Description
    Configuration for filtering documents
    • Name
      documentIds
      Type
      array<string>(optional)
      Description
      List of document IDs to filter
    • Name
      documentExternalIds
      Type
      array<string>(optional)
      Description
      List of external document IDs to filter
    • Name
      documentConnectionIds
      Type
      array<string>(optional)
      Description
      List of connection IDs to filter
    • Name
      documentTypes
      Type
      array<enum<string>>(optional)
      Description
      List of document types to filter
      Available options: TEXT, URL, SITEMAP, WEBSITE
    • Name
      documentIngestionSources
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion sources to filter
      Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
    • Name
      documentIngestionStatuses
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion statuses to filter
      Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
    • Name
      metadata
      Type
      object(optional)
      Description
      Metadata filters to apply
  • Name
    includeConfig
    Type
    object(optional)
    Description
    Include options
    • Name
      documents
      Type
      boolean(optional)
      Description
      Option to include the documents or not in the response. Defaults to true
    • Name
      stats
      Type
      boolean(optional)
      Description
      Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
    • Name
      statsBySource
      Type
      boolean(optional)
      Description
      Option to include the stats by source or not in the response. Defaults to false
    • Name
      statsByStatus
      Type
      boolean(optional)
      Description
      Option to include the stats by status or not in the response. Defaults to false
    • Name
      rawFileUrl
      Type
      boolean(optional)
      Description
      Option to include the raw file URL or not in the response. Defaults to false
    • Name
      parsedTextFileUrl
      Type
      boolean(optional)
      Description
      Option to include the parsed text file URL or not in the response. Defaults to false
  • Name
    pagination
    Type
    object(optional)
    Description
    Pagination options
    • Name
      pageSize
      Type
      number(optional)
      Description
      Number of documents per page (1-100, default: 20)
    • Name
      cursor
      Type
      string(optional)
      Description
      Opaque cursor for fetching the next page

Request

POST
/v1/documents
curl -X POST https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_124"],
      "documentExternalIds": ["external_123"],
      "documentConnectionIds": ["conn_123"],
      "documentTypes": ["URL", "GOOGLE_DRIVE_DOCUMENT"],
      "documentIngestionSources": ["WEBSITE", "GOOGLE_DRIVE"],
      "documentIngestionStatuses": ["SUCCESS", "FAILED"],
      "metadata": {
        "category": "security",
        "status": "published"
      }
    },
    "includeConfig": {
      "documents": true,
      "statsBySource": true,
      "statsByStatus": true,
      "rawFileUrl": true,
      "parsedTextFileUrl": true
    },
    "pagination": {
      "pageSize": 10,
      "cursor": "eyJjcmVhdGVkQXQiOi..."
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      itemsReturned*
      Type
      number
      Description
      Number of documents returned in current page
    • Name
      hasNextPage*
      Type
      boolean
      Description
      Whether more documents are available
    • Name
      nextCursor
      Type
      string(optional)
      Description
      Cursor for fetching next page, or undefined if no more pages
    • Name
      stats*
      Type
      array<object>
      Description
      Stats of the documents
      • Name
        source*
        Type
        enum<string>
        Description
        Ingestion source of the document
        Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
      • Name
        totalCount*
        Type
        number
        Description
        Total number of documents ingested via this source
    • Name
      documents*
      Type
      array<object>
      Description
      List of documents
      • Name
        id*
        Type
        string
        Description
        Unique identifier of the document
      • Name
        name
        Type
        string | null(optional)
        Description
        Name of the document
      • Name
        externalId*
        Type
        string
        Description
        External identifier of the document
      • Name
        documentType*
        Type
        enum<string>
        Description
        Type of the document
        Available options: TEXT, URL, FILE, NOTION_DOCUMENT, GOOGLE_DRIVE_DOCUMENT, DROPBOX_DOCUMENT, ONEDRIVE_DOCUMENT, BOX_DOCUMENT
      • Name
        ingestionSource*
        Type
        enum<string>
        Description
        Source from where the document was ingested
        Available options: TEXT, URLS_LIST, SITEMAP, WEBSITE, LOCAL_FILE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
      • Name
        ingestionSource*
        Type
        enum<string>
        Description
        Source from where the document was ingested
        Available options: BACKLOG, QUEUED, PROCESSING, SUCCESS, FAILED, CANCELLED
      • Name
        ingestionError
        Type
        string | null(optional)
        Description
        Error message if the document ingestion failed
      • Name
        ingestJob*
        Type
        object
        Description
        Details of the ingest job
        • Name
          id*
          Type
          string
          Description
          ID of the ingest job
      • Name
        ingestJobRun*
        Type
        object
        Description
        Details of the ingest job run
        • Name
          id*
          Type
          string
          Description
          ID of the ingest job run
      • Name
        connection
        Type
        object(optional)
        Description
        Details of the connection
        • Name
          id*
          Type
          string
          Description
          ID of the connection from which the document was ingested
      • Name
        documentProperties
        Type
        object(optional)
        Description
        Properties of the document
        • Name
          mimeType
          Type
          string(optional)
          Description
          MIME type of the document
        • Name
          fileSize
          Type
          number(optional)
          Description
          Size of the document file in bytes
        • Name
          characterCount
          Type
          number(optional)
          Description
          Number of characters in the document
        • Name
          tokenCount
          Type
          number(optional)
          Description
          Number of tokens in the document
        • Name
          embeddingCount
          Type
          number(optional)
          Description
          Number of embeddings in the document
      • Name
        embeddingConfig
        Type
        object(optional)
        Description
        Configuration of the embedding model
        • Name
          provider
          Type
          enum<string>(optional)
          Description
          Provider of the embedding model used
          Available options: OPENAI, COHERE, JINA
        • Name
          model
          Type
          enum<string>(optional)
          Description
          Embedding model used to create the embeddings
          Available options: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002, embed-english-v3.0, embed-multilingual-v3.0, embed-english-light-v3.0, embed-multilingual-light-v3.0, embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0, jina-embeddings-v3
        • Name
          dimensions
          Type
          number(optional)
          Description
          Dimensions of the embedding model used
        • Name
          chunkSize
          Type
          number(optional)
          Description
          Number of tokens in each chunk
        • Name
          chunkOverlap
          Type
          number(optional)
          Description
          Number of tokens to overlap between chunks
      • Name
        providers
        Type
        object(optional)
        Description
        Providers used to ingest the document
        • Name
          fileStorage
          Type
          enum<string>(optional)
          Description
          Type of the file storage used
          Available options: S3_COMPATIBLE
        • Name
          vectorStorage
          Type
          enum<string>(optional)
          Description
          Provider of the vector storage used
          Available options: PINECONE
        • Name
          embeddingModel
          Type
          enum<string>(optional)
          Description
          Provider of the embedding model used
          Available options: OPENAI, COHERE, JINA
        • Name
          webScraper
          Type
          enum<string>(optional)
          Description
          Provider of the web scraper used if the document is from a web source
          Available options: FIRECRAWL, JINA, SCRAPINGBEE
      • Name
        metadata*
        Type
        object
        Description
        Metadata associated with the document
      • Name
        namespace*
        Type
        object
        Description
        Details of the namespace containing the document
        • Name
          identifier*
          Type
          string
          Description
          Unique identifier of the namespace
      • Name
        organization*
        Type
        object
        Description
        Details of the organization
        • Name
          id*
          Type
          string
          Description
          ID of the organization containing the document
      • Name
        createdAt*
        Type
        object
        Description
        Timestamp when the document was created
        • Name
          isoString*
          Type
          string
          Description
          ISO 8601 formatted timestamp
      • Name
        updatedAt*
        Type
        object
        Description
        Timestamp when the document was last updated
        • Name
          isoString*
          Type
          string
          Description
          ISO 8601 formatted timestamp

Response

POST
/v1/documents
{
  "success": true,
  "message": "Documents retrieved successfully",
  "data": {
    "itemsReturned": 10,
    "hasNextPage": true,
    "nextCursor": "eyJjcmVhdGVkQXQiOi...",
    "statsBySource": [
      {
        "source": "WEBSITE",
        "totalCount": 5
      },
      {
        "source": "LOCAL_FILE",
        "totalCount": 2
      },
      {
        "source": "GOOGLE_DRIVE",
        "totalCount": 3
      }
    ],
    "statsByStatus": [
      {
        "status": "QUEUED",
        "totalCount": 1
      },
      {
        "status": "SUCCESS",
        "totalCount": 5
      },
      {
        "status": "FAILED",
        "totalCount": 2
      }
    ],
    "documents": [
      {
        "id": "doc_123",
        "name": "https://example.com",
        "externalId": "external_123",
        "documentType": "URL",
        "ingestionSource": "WEBSITE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "text/html",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "category": "security",
          "status": "published"
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "rawFileUrl": "https://example.com/raw.html",
        "parsedTextFileUrl": "https://example.com/parsed.txt",
      },
      {
        "id": "doc_124",
        "name": "doc.pdf",
        "externalId": "external_124",
        "documentType": "GOOGLE_DRIVE_DOCUMENT",
        "ingestionSource": "GOOGLE_DRIVE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "application/pdf",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "category": "security",
          "status": "published"
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "rawFileUrl": "https://example.com/raw.pdf",
        "parsedTextFileUrl": "https://example.com/parsed.txt",
      },
    ]
  }
}

PATCH/v1/documents

Update Documents

Update metadata of documents based on filters.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key
  • Name
    Accept*
    Type
    string
    Description
    application/json

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    Unique identifier of the namespace containing the documents
  • Name
    filterConfig*
    Type
    object
    Description
    Configuration for filtering documents
    • Name
      documentIds
      Type
      array<string>(optional)
      Description
      List of document IDs to filter
    • Name
      documentExternalIds
      Type
      array<string>(optional)
      Description
      List of external document IDs to filter
    • Name
      documentConnectionIds
      Type
      array<string>(optional)
      Description
      List of connection IDs to filter
    • Name
      documentTypes
      Type
      array<enum<string>>(optional)
      Description
      List of document types to filter
      Available options: TEXT, URL, SITEMAP, WEBSITE
    • Name
      documentIngestionSources
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion sources to filter
      Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
    • Name
      documentIngestionStatuses
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion statuses to filter
      Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
    • Name
      metadata
      Type
      object(optional)
      Description
      Metadata filters to apply
  • Name
    includeConfig
    Type
    object(optional)
    Description
    Include options
    • Name
      documents
      Type
      boolean(optional)
      Description
      Option to include the documents or not in the response. Defaults to true
    • Name
      stats
      Type
      boolean(optional)
      Description
      Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
    • Name
      statsBySource
      Type
      boolean(optional)
      Description
      Option to include the stats by source or not in the response. Defaults to false
    • Name
      statsByStatus
      Type
      boolean(optional)
      Description
      Option to include the stats by status or not in the response. Defaults to false
    • Name
      rawFileUrl
      Type
      boolean(optional)
      Description
      Option to include the raw file URL or not in the response. Defaults to false
    • Name
      parsedTextFileUrl
      Type
      boolean(optional)
      Description
      Option to include the parsed text file URL or not in the response. Defaults to false
  • Name
    pagination
    Type
    object(optional)
    Description
    Pagination options
    • Name
      pageSize
      Type
      number(optional)
      Description
      Number of documents per page (1-100, default: 20)
    • Name
      cursor
      Type
      string(optional)
      Description
      Opaque cursor for fetching the next page
  • Name
    data*
    Type
    object
    Description
    Data to update in the documents
    • Name
      metadata
      Type
      object(optional)
      Description
      Metadata to update in the documents. This is a legacy field and will be deprecated. Use $metadata instead.
    • Name
      $metadata
      Type
      object(optional)
      Description
      Advanced metadata to update in the documents
      • Name
        $set
        Type
        object(optional)
        Description
        Set/replace the metadata to the given value. Applicable to both string and array values. The values will be set/replaced.
      • Name
        $append
        Type
        object(optional)
        Description
        Append the metadata with the given value. Applicable only to array values. The values will be appended to the existing array.
      • Name
        $remove
        Type
        object(optional)
        Description
        Remove the metadata with the given value. Applicable only to array values. The values will be removed from the existing array.

Request

PATCH
/v1/documents
curl -X PATCH https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentConnectionIds": [],
      "documentTypes": ["URL"],
      "documentIngestionSources": ["WEBSITE"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "category": "security"
      }
    },
    "pagination": {
      "pageSize": 10,
      "cursor": "eyJjcmVhdGVkQXQiOi..."
    },
    "data": {
      "metadata": {
        "status": "archived",
        "archivedAt": "2024-01-15T00:00:00Z"
      },
      "$metadata": {
        "$set": {
          "status": "archived",
          "category": ["security", "networking"]
        },
        "$append": {
          "complexity": ["advanced"]
        },
        "$remove": {
          "apiVersion": ["v0"]
        }
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      itemsUpdated*
      Type
      number
      Description
      Number of documents updated
    • Name
      documents*
      Type
      array<object>
      Description
      List of documents
      • Name
        id*
        Type
        string
        Description
        Unique identifier of the document
      • Name
        name
        Type
        string | null(optional)
        Description
        Name of the document
      • Name
        externalId*
        Type
        string
        Description
        External identifier of the document
      • Name
        documentType*
        Type
        enum<string>
        Description
        Type of the document
        Available options: TEXT, URL, FILE, NOTION_DOCUMENT, GOOGLE_DRIVE_DOCUMENT, DROPBOX_DOCUMENT, ONEDRIVE_DOCUMENT, BOX_DOCUMENT
      • Name
        ingestionSource*
        Type
        enum<string>
        Description
        Source from where the document was ingested
        Available options: TEXT, URLS_LIST, SITEMAP, WEBSITE, LOCAL_FILE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
      • Name
        ingestionSource*
        Type
        enum<string>
        Description
        Source from where the document was ingested
        Available options: BACKLOG, QUEUED, PROCESSING, SUCCESS, FAILED, CANCELLED
      • Name
        ingestionError
        Type
        string | null(optional)
        Description
        Error message if the document ingestion failed
      • Name
        ingestJob*
        Type
        object
        Description
        Details of the ingest job
        • Name
          id*
          Type
          string
          Description
          ID of the ingest job
      • Name
        ingestJobRun*
        Type
        object
        Description
        Details of the ingest job run
        • Name
          id*
          Type
          string
          Description
          ID of the ingest job run
      • Name
        connection
        Type
        object(optional)
        Description
        Details of the connection
        • Name
          id*
          Type
          string
          Description
          ID of the connection from which the document was ingested
      • Name
        documentProperties
        Type
        object(optional)
        Description
        Properties of the document
        • Name
          mimeType
          Type
          string(optional)
          Description
          MIME type of the document
        • Name
          fileSize
          Type
          number(optional)
          Description
          Size of the document file in bytes
        • Name
          characterCount
          Type
          number(optional)
          Description
          Number of characters in the document
        • Name
          tokenCount
          Type
          number(optional)
          Description
          Number of tokens in the document
        • Name
          embeddingCount
          Type
          number(optional)
          Description
          Number of embeddings in the document
      • Name
        embeddingConfig
        Type
        object(optional)
        Description
        Configuration of the embedding model
        • Name
          provider
          Type
          enum<string>(optional)
          Description
          Provider of the embedding model used
          Available options: OPENAI, COHERE, JINA
        • Name
          model
          Type
          enum<string>(optional)
          Description
          Embedding model used to create the embeddings
          Available options: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002, embed-english-v3.0, embed-multilingual-v3.0, embed-english-light-v3.0, embed-multilingual-light-v3.0, embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0, jina-embeddings-v3
        • Name
          dimensions
          Type
          number(optional)
          Description
          Dimensions of the embedding model used
        • Name
          chunkSize
          Type
          number(optional)
          Description
          Number of tokens in each chunk
        • Name
          chunkOverlap
          Type
          number(optional)
          Description
          Number of tokens to overlap between chunks
      • Name
        providers
        Type
        object(optional)
        Description
        Providers used to ingest the document
        • Name
          fileStorage
          Type
          enum<string>(optional)
          Description
          Type of the file storage used
          Available options: S3_COMPATIBLE
        • Name
          vectorStorage
          Type
          enum<string>(optional)
          Description
          Provider of the vector storage used
          Available options: PINECONE
        • Name
          embeddingModel
          Type
          enum<string>(optional)
          Description
          Provider of the embedding model used
          Available options: OPENAI, COHERE, JINA
        • Name
          webScraper
          Type
          enum<string>(optional)
          Description
          Provider of the web scraper used if the document is from a web source
          Available options: FIRECRAWL, JINA, SCRAPINGBEE
      • Name
        metadata*
        Type
        object
        Description
        Metadata associated with the document
      • Name
        namespace*
        Type
        object
        Description
        Details of the namespace containing the document
        • Name
          identifier*
          Type
          string
          Description
          Unique identifier of the namespace
      • Name
        organization*
        Type
        object
        Description
        Details of the organization
        • Name
          id*
          Type
          string
          Description
          ID of the organization containing the document
      • Name
        createdAt*
        Type
        object
        Description
        Timestamp when the document was created
        • Name
          isoString*
          Type
          string
          Description
          ISO 8601 formatted timestamp
      • Name
        updatedAt*
        Type
        object
        Description
        Timestamp when the document was last updated
        • Name
          isoString*
          Type
          string
          Description
          ISO 8601 formatted timestamp

Response

PATCH
/v1/documents
{
  "success": true,
  "message": "Documents updated successfully",
  "data": {
    "itemsUpdated": 10,    
    "documents": [
      {
        "id": "doc_123",
        "name": "https://example.com",
        "externalId": "external_123",
        "documentType": "URL",
        "ingestionSource": "WEBSITE",
        "ingestionStatus": "SUCCESS",
        "ingestionError": null,
        "ingestJob": {
          "id": "job_123",      
        },
        "ingestJobRun": {
          "id": "job_run_123",          
        },
        "connection": {
          "id": "conn_123",
        },
        "documentProperties": {
          "mimeType": "text/html",
          "fileSize": 1347,
          "characterCount": 1335,
          "tokenCount": 340,
          "embeddingCount": 1,
        },
        "embeddingConfig": {
          "provider": "OPENAI",
          "model": "text-embedding-3-small",
          "dimensions": 1536,
          "chunkSize": 1024,
          "chunkOverlap": 256,
        },
        "providers": {
          "fileStorage": "S3_COMPATIBLE",
          "vectorStorage": "PINECONE",
          "embeddingModel": "OPENAI",
          "webScraper": "FIRECRAWL",
        },
        "metadata": {
          "status": "archived",
          "archivedAt": "2024-01-15T00:00:00Z",
          "category": ["security", "networking"],
          "complexity": ["advanced"],
          "apiVersion": ["v1"]
        },
        "namespace": {
          "identifier": "ns_123"
        },
        "organization": {
          "id": "org_123"
        },
        "createdAt": {
          "isoString": "2024-01-01T00:00:00Z"
        },
        "updatedAt": {
          "isoString": "2024-01-01T00:00:00Z"
        }
      }
    ]
  }
}

DELETE/v1/documents

Delete Documents

Delete documents based on filters.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key
  • Name
    Accept*
    Type
    string
    Description
    application/json

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    Unique identifier of the namespace containing the documents
  • Name
    filterConfig*
    Type
    object
    Description
    Configuration for filtering documents
    • Name
      documentIds
      Type
      array<string>(optional)
      Description
      List of document IDs to filter
    • Name
      documentExternalIds
      Type
      array<string>(optional)
      Description
      List of external document IDs to filter
    • Name
      documentConnectionIds
      Type
      array<string>(optional)
      Description
      List of connection IDs to filter
    • Name
      documentTypes
      Type
      array<enum<string>>(optional)
      Description
      List of document types to filter
      Available options: TEXT, URL, SITEMAP, WEBSITE
    • Name
      documentIngestionSources
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion sources to filter
      Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
    • Name
      documentIngestionStatuses
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion statuses to filter
      Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
    • Name
      metadata
      Type
      object(optional)
      Description
      Metadata filters to apply
  • Name
    includeConfig
    Type
    object(optional)
    Description
    Include options
    • Name
      documents
      Type
      boolean(optional)
      Description
      Option to include the documents or not in the response. Defaults to true
    • Name
      stats
      Type
      boolean(optional)
      Description
      Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
    • Name
      statsBySource
      Type
      boolean(optional)
      Description
      Option to include the stats by source or not in the response. Defaults to false
    • Name
      statsByStatus
      Type
      boolean(optional)
      Description
      Option to include the stats by status or not in the response. Defaults to false
    • Name
      rawFileUrl
      Type
      boolean(optional)
      Description
      Option to include the raw file URL or not in the response. Defaults to false
    • Name
      parsedTextFileUrl
      Type
      boolean(optional)
      Description
      Option to include the parsed text file URL or not in the response. Defaults to false
  • Name
    pagination
    Type
    object(optional)
    Description
    Pagination options
    • Name
      pageSize
      Type
      number(optional)
      Description
      Number of documents per page (1-100, default: 20)
    • Name
      cursor
      Type
      string(optional)
      Description
      Opaque cursor for fetching the next page

Request

DELETE
/v1/documents
curl -X DELETE https://api.sourcesync.ai/v1/documents \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentTypes": ["TEXT"],
      "documentIngestionSources": ["TEXT"],
      "documentIngestionStatuses": ["SUCCESS"],
      "metadata": {
        "status": "archived"
      },
      "pagination": {
        "pageSize": 10,
        "cursor": "eyJjcmVhdGVkQXQiOi..."
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      itemsDeleted*
      Type
      number
      Description
      Number of documents deleted
    • Name
      documents*
      Type
      array<object>
      Description
      List of documents
      • Name
        id*
        Type
        string
        Description
        Unique identifier of the document

Response

DELETE
/v1/documents
{
  "success": true,
  "message": "Documents deleted successfully",
  "data": {
    "itemsDeleted": 10,    
    "documents": [
      {
        "id": "doc_123"
      }
    ]
  }
}

POST/v1/documents/resync

Resync Documents

Resync documents based on filters. You cannot resync TEXT and LOCAL_FILE documents and also the documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key
  • Name
    Accept*
    Type
    string
    Description
    application/json

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    Unique identifier of the namespace containing the documents
  • Name
    filterConfig*
    Type
    object
    Description
    Configuration for filtering documents
    • Name
      documentIds
      Type
      array<string>(optional)
      Description
      List of document IDs to filter
    • Name
      documentExternalIds
      Type
      array<string>(optional)
      Description
      List of external document IDs to filter
    • Name
      documentConnectionIds
      Type
      array<string>(optional)
      Description
      List of connection IDs to filter
    • Name
      documentTypes
      Type
      array<enum<string>>(optional)
      Description
      List of document types to filter
      Available options: TEXT, URL, SITEMAP, WEBSITE
    • Name
      documentIngestionSources
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion sources to filter
      Available options: TEXT, LOCAL_FILE, URLS_LIST, SITEMAP, WEBSITE, NOTION, GOOGLE_DRIVE, DROPBOX, ONEDRIVE, BOX
    • Name
      documentIngestionStatuses
      Type
      array<enum<string>>(optional)
      Description
      List of ingestion statuses to filter
      Available options: BACKLOG, QUEUED, QUEUED_FOR_RESYNC, PROCESSING, SUCCESS, FAILED, CANCELLED
    • Name
      metadata
      Type
      object(optional)
      Description
      Metadata filters to apply

Request

POST
/v1/documents/resync
curl -X POST https://api.sourcesync.ai/v1/documents/resync \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "filterConfig": {
      "documentIds": ["doc_123"],
      "documentExternalIds": [],
      "documentTypes": ["TEXT"],
      "metadata": {
        "status": "archived"
      },
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      itemsQueued*
      Type
      number
      Description
      Number of documents queued for resync
    • Name
      itemsSkipped*
      Type
      number
      Description
      Number of documents skipped for resync
    • Name
      documents*
      Type
      array<object>
      Description
      List of documents
      • Name
        id*
        Type
        string
        Description
        Unique identifier of the document
      • Name
        status*
        Type
        enum<string>
        Description
        Status of the document
        Available options: QUEUED_FOR_RESYNC, NOT_ELIGIBLE_FOR_RESYNC
      • Name
        error
        Type
        string | null(optional)
        Description
        Error message if the document is not eligible for resync or resync failed

Response

POST
/v1/documents/resync
{
  "success": true,
  "message": "Added the eligible documents to the resync queue successfully",
  "data": {
    "itemsQueued": 3,
    "itemsSkipped": 2,
    "documents": [
      {
        "id": "doc_123",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_124",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_125",
        "status": "NOT_ELIGIBLE_FOR_RESYNC",
        "error": "Documents with LOCAL_FILE and TEXT as documentType are not eligible for resync"
      },
      {
        "id": "doc_126",
        "status": "QUEUED_FOR_RESYNC",
        "error": null
      },
      {
        "id": "doc_127",
        "status": "NOT_ELIGIBLE_FOR_RESYNC",
        "error": "Documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING are not eligible for resync"
      },
    ]
  }
}

Error Codes

  • Name
    NAMESPACE_NOT_FOUND
    Description

    The specified namespace does not exist

  • Name
    DOCUMENTS_NOT_FOUND
    Description

    No documents match the filter criteria

  • Name
    INVALID_FILTER_CONFIG
    Description

    Invalid filter configuration provided

  • Name
    UPDATE_DOCUMENTS_FAILED
    Description

    Internal error while updating documents

  • Name
    DELETE_DOCUMENTS_FAILED
    Description

    Internal error while deleting documents

Filter Configuration

  • Name
    Document IDs
    Description

    Filter by specific document IDs using documentIds

  • Name
    External IDs
    Description

    Filter by external IDs using documentExternalIds

  • Name
    Document Types
    Description

    Filter by document types using documentTypes

  • Name
    Metadata
    Description

    Filter by metadata key-value pairs using metadata