Documents API Reference
Learn about the document management endpoints and how to work with your ingested content.
Documents are the core content units in SourceSync. Each document can have metadata that helps in organizing and filtering content. For detailed examples and best practices on pagination, see our Document Management Guide.
Fetch Documents
Fetch documents with optional filters and pagination.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
- Name
Accept
*- Type
- string
- Description
application/json
Request Body
- Name
namespaceId
*- Type
- string
- Description
- Unique identifier of the namespace containing the documents
- Name
filterConfig
*- Type
- object
- Description
- Configuration for filtering documents
- Name
documentIds
- Type
- array<string>(optional)
- Description
- List of document IDs to filter
- Name
documentExternalIds
- Type
- array<string>(optional)
- Description
- List of external document IDs to filter
- Name
documentConnectionIds
- Type
- array<string>(optional)
- Description
- List of connection IDs to filter
- Name
documentTypes
- Type
- array<enum<string>>(optional)
- Description
- List of document types to filterAvailable options:
TEXT
,URL
,SITEMAP
,WEBSITE
- Name
documentIngestionSources
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion sources to filterAvailable options:
TEXT
,LOCAL_FILE
,URLS_LIST
,SITEMAP
,WEBSITE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
documentIngestionStatuses
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion statuses to filterAvailable options:
BACKLOG
,QUEUED
,QUEUED_FOR_RESYNC
,PROCESSING
,SUCCESS
,FAILED
,CANCELLED
- Name
metadata
- Type
- object(optional)
- Description
- Metadata filters to apply
- Name
includeConfig
- Type
- object(optional)
- Description
- Include options
- Name
documents
- Type
- boolean(optional)
- Description
- Option to include the documents or not in the response. Defaults to true
- Name
stats
- Type
- boolean(optional)
- Description
- Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
statsBySource
- Type
- boolean(optional)
- Description
- Option to include the stats by source or not in the response. Defaults to false
- Name
statsByStatus
- Type
- boolean(optional)
- Description
- Option to include the stats by status or not in the response. Defaults to false
- Name
rawFileUrl
- Type
- boolean(optional)
- Description
- Option to include the raw file URL or not in the response. Defaults to false
- Name
parsedTextFileUrl
- Type
- boolean(optional)
- Description
- Option to include the parsed text file URL or not in the response. Defaults to false
- Name
pagination
- Type
- object(optional)
- Description
- Pagination options
- Name
pageSize
- Type
- number(optional)
- Description
- Number of documents per page (1-100, default: 20)
- Name
cursor
- Type
- string(optional)
- Description
- Opaque cursor for fetching the next page
Request
curl -X POST https://api.sourcesync.ai/v1/documents \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"filterConfig": {
"documentIds": ["doc_123", "doc_124"],
"documentExternalIds": ["external_123"],
"documentConnectionIds": ["conn_123"],
"documentTypes": ["URL", "GOOGLE_DRIVE_DOCUMENT"],
"documentIngestionSources": ["WEBSITE", "GOOGLE_DRIVE"],
"documentIngestionStatuses": ["SUCCESS", "FAILED"],
"metadata": {
"category": "security",
"status": "published"
}
},
"includeConfig": {
"documents": true,
"statsBySource": true,
"statsByStatus": true,
"rawFileUrl": true,
"parsedTextFileUrl": true
},
"pagination": {
"pageSize": 10,
"cursor": "eyJjcmVhdGVkQXQiOi..."
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
itemsReturned
*- Type
- number
- Description
- Number of documents returned in current page
- Name
hasNextPage
*- Type
- boolean
- Description
- Whether more documents are available
- Name
nextCursor
- Type
- string(optional)
- Description
- Cursor for fetching next page, or undefined if no more pages
- Name
stats
*- Type
- array<object>
- Description
- Stats of the documents
- Name
source
*- Type
- enum<string>
- Description
- Ingestion source of the documentAvailable options:
TEXT
,LOCAL_FILE
,URLS_LIST
,SITEMAP
,WEBSITE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
totalCount
*- Type
- number
- Description
- Total number of documents ingested via this source
- Name
documents
*- Type
- array<object>
- Description
- List of documents
- Name
id
*- Type
- string
- Description
- Unique identifier of the document
- Name
name
- Type
- string | null(optional)
- Description
- Name of the document
- Name
externalId
*- Type
- string
- Description
- External identifier of the document
- Name
documentType
*- Type
- enum<string>
- Description
- Type of the documentAvailable options:
TEXT
,URL
,FILE
,NOTION_DOCUMENT
,GOOGLE_DRIVE_DOCUMENT
,DROPBOX_DOCUMENT
,ONEDRIVE_DOCUMENT
,BOX_DOCUMENT
- Name
ingestionSource
*- Type
- enum<string>
- Description
- Source from where the document was ingestedAvailable options:
TEXT
,URLS_LIST
,SITEMAP
,WEBSITE
,LOCAL_FILE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
ingestionSource
*- Type
- enum<string>
- Description
- Source from where the document was ingestedAvailable options:
BACKLOG
,QUEUED
,PROCESSING
,SUCCESS
,FAILED
,CANCELLED
- Name
ingestionError
- Type
- string | null(optional)
- Description
- Error message if the document ingestion failed
- Name
ingestJob
*- Type
- object
- Description
- Details of the ingest job
- Name
id
*- Type
- string
- Description
- ID of the ingest job
- Name
ingestJobRun
*- Type
- object
- Description
- Details of the ingest job run
- Name
id
*- Type
- string
- Description
- ID of the ingest job run
- Name
connection
- Type
- object(optional)
- Description
- Details of the connection
- Name
id
*- Type
- string
- Description
- ID of the connection from which the document was ingested
- Name
documentProperties
- Type
- object(optional)
- Description
- Properties of the document
- Name
mimeType
- Type
- string(optional)
- Description
- MIME type of the document
- Name
fileSize
- Type
- number(optional)
- Description
- Size of the document file in bytes
- Name
characterCount
- Type
- number(optional)
- Description
- Number of characters in the document
- Name
tokenCount
- Type
- number(optional)
- Description
- Number of tokens in the document
- Name
embeddingCount
- Type
- number(optional)
- Description
- Number of embeddings in the document
- Name
embeddingConfig
- Type
- object(optional)
- Description
- Configuration of the embedding model
- Name
provider
- Type
- enum<string>(optional)
- Description
- Provider of the embedding model usedAvailable options:
OPENAI
,COHERE
,JINA
- Name
model
- Type
- enum<string>(optional)
- Description
- Embedding model used to create the embeddingsAvailable options:
text-embedding-3-small
,text-embedding-3-large
,text-embedding-ada-002
,embed-english-v3.0
,embed-multilingual-v3.0
,embed-english-light-v3.0
,embed-multilingual-light-v3.0
,embed-english-v2.0
,embed-english-light-v2.0
,embed-multilingual-v2.0
,jina-embeddings-v3
- Name
dimensions
- Type
- number(optional)
- Description
- Dimensions of the embedding model used
- Name
chunkSize
- Type
- number(optional)
- Description
- Number of tokens in each chunk
- Name
chunkOverlap
- Type
- number(optional)
- Description
- Number of tokens to overlap between chunks
- Name
providers
- Type
- object(optional)
- Description
- Providers used to ingest the document
- Name
fileStorage
- Type
- enum<string>(optional)
- Description
- Type of the file storage usedAvailable options:
S3_COMPATIBLE
- Name
vectorStorage
- Type
- enum<string>(optional)
- Description
- Provider of the vector storage usedAvailable options:
PINECONE
- Name
embeddingModel
- Type
- enum<string>(optional)
- Description
- Provider of the embedding model usedAvailable options:
OPENAI
,COHERE
,JINA
- Name
webScraper
- Type
- enum<string>(optional)
- Description
- Provider of the web scraper used if the document is from a web sourceAvailable options:
FIRECRAWL
,JINA
,SCRAPINGBEE
- Name
metadata
*- Type
- object
- Description
- Metadata associated with the document
- Name
namespace
*- Type
- object
- Description
- Details of the namespace containing the document
- Name
identifier
*- Type
- string
- Description
- Unique identifier of the namespace
- Name
organization
*- Type
- object
- Description
- Details of the organization
- Name
id
*- Type
- string
- Description
- ID of the organization containing the document
- Name
createdAt
*- Type
- object
- Description
- Timestamp when the document was created
- Name
isoString
*- Type
- string
- Description
- ISO 8601 formatted timestamp
- Name
updatedAt
*- Type
- object
- Description
- Timestamp when the document was last updated
- Name
isoString
*- Type
- string
- Description
- ISO 8601 formatted timestamp
Response
{
"success": true,
"message": "Documents retrieved successfully",
"data": {
"itemsReturned": 10,
"hasNextPage": true,
"nextCursor": "eyJjcmVhdGVkQXQiOi...",
"statsBySource": [
{
"source": "WEBSITE",
"totalCount": 5
},
{
"source": "LOCAL_FILE",
"totalCount": 2
},
{
"source": "GOOGLE_DRIVE",
"totalCount": 3
}
],
"statsByStatus": [
{
"status": "QUEUED",
"totalCount": 1
},
{
"status": "SUCCESS",
"totalCount": 5
},
{
"status": "FAILED",
"totalCount": 2
}
],
"documents": [
{
"id": "doc_123",
"name": "https://example.com",
"externalId": "external_123",
"documentType": "URL",
"ingestionSource": "WEBSITE",
"ingestionStatus": "SUCCESS",
"ingestionError": null,
"ingestJob": {
"id": "job_123",
},
"ingestJobRun": {
"id": "job_run_123",
},
"connection": {
"id": "conn_123",
},
"documentProperties": {
"mimeType": "text/html",
"fileSize": 1347,
"characterCount": 1335,
"tokenCount": 340,
"embeddingCount": 1,
},
"embeddingConfig": {
"provider": "OPENAI",
"model": "text-embedding-3-small",
"dimensions": 1536,
"chunkSize": 1024,
"chunkOverlap": 256,
},
"providers": {
"fileStorage": "S3_COMPATIBLE",
"vectorStorage": "PINECONE",
"embeddingModel": "OPENAI",
"webScraper": "FIRECRAWL",
},
"metadata": {
"category": "security",
"status": "published"
},
"namespace": {
"identifier": "ns_123"
},
"organization": {
"id": "org_123"
},
"createdAt": {
"isoString": "2024-01-01T00:00:00Z"
},
"updatedAt": {
"isoString": "2024-01-01T00:00:00Z"
},
"rawFileUrl": "https://example.com/raw.html",
"parsedTextFileUrl": "https://example.com/parsed.txt",
},
{
"id": "doc_124",
"name": "doc.pdf",
"externalId": "external_124",
"documentType": "GOOGLE_DRIVE_DOCUMENT",
"ingestionSource": "GOOGLE_DRIVE",
"ingestionStatus": "SUCCESS",
"ingestionError": null,
"ingestJob": {
"id": "job_123",
},
"ingestJobRun": {
"id": "job_run_123",
},
"connection": {
"id": "conn_123",
},
"documentProperties": {
"mimeType": "application/pdf",
"fileSize": 1347,
"characterCount": 1335,
"tokenCount": 340,
"embeddingCount": 1,
},
"embeddingConfig": {
"provider": "OPENAI",
"model": "text-embedding-3-small",
"dimensions": 1536,
"chunkSize": 1024,
"chunkOverlap": 256,
},
"providers": {
"fileStorage": "S3_COMPATIBLE",
"vectorStorage": "PINECONE",
"embeddingModel": "OPENAI",
"webScraper": "FIRECRAWL",
},
"metadata": {
"category": "security",
"status": "published"
},
"namespace": {
"identifier": "ns_123"
},
"organization": {
"id": "org_123"
},
"createdAt": {
"isoString": "2024-01-01T00:00:00Z"
},
"updatedAt": {
"isoString": "2024-01-01T00:00:00Z"
},
"rawFileUrl": "https://example.com/raw.pdf",
"parsedTextFileUrl": "https://example.com/parsed.txt",
},
]
}
}
Update Documents
Update metadata of documents based on filters.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
- Name
Accept
*- Type
- string
- Description
application/json
Request Body
- Name
namespaceId
*- Type
- string
- Description
- Unique identifier of the namespace containing the documents
- Name
filterConfig
*- Type
- object
- Description
- Configuration for filtering documents
- Name
documentIds
- Type
- array<string>(optional)
- Description
- List of document IDs to filter
- Name
documentExternalIds
- Type
- array<string>(optional)
- Description
- List of external document IDs to filter
- Name
documentConnectionIds
- Type
- array<string>(optional)
- Description
- List of connection IDs to filter
- Name
documentTypes
- Type
- array<enum<string>>(optional)
- Description
- List of document types to filterAvailable options:
TEXT
,URL
,SITEMAP
,WEBSITE
- Name
documentIngestionSources
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion sources to filterAvailable options:
TEXT
,LOCAL_FILE
,URLS_LIST
,SITEMAP
,WEBSITE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
documentIngestionStatuses
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion statuses to filterAvailable options:
BACKLOG
,QUEUED
,QUEUED_FOR_RESYNC
,PROCESSING
,SUCCESS
,FAILED
,CANCELLED
- Name
metadata
- Type
- object(optional)
- Description
- Metadata filters to apply
- Name
includeConfig
- Type
- object(optional)
- Description
- Include options
- Name
documents
- Type
- boolean(optional)
- Description
- Option to include the documents or not in the response. Defaults to true
- Name
stats
- Type
- boolean(optional)
- Description
- Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
statsBySource
- Type
- boolean(optional)
- Description
- Option to include the stats by source or not in the response. Defaults to false
- Name
statsByStatus
- Type
- boolean(optional)
- Description
- Option to include the stats by status or not in the response. Defaults to false
- Name
rawFileUrl
- Type
- boolean(optional)
- Description
- Option to include the raw file URL or not in the response. Defaults to false
- Name
parsedTextFileUrl
- Type
- boolean(optional)
- Description
- Option to include the parsed text file URL or not in the response. Defaults to false
- Name
pagination
- Type
- object(optional)
- Description
- Pagination options
- Name
pageSize
- Type
- number(optional)
- Description
- Number of documents per page (1-100, default: 20)
- Name
cursor
- Type
- string(optional)
- Description
- Opaque cursor for fetching the next page
- Name
data
*- Type
- object
- Description
- Data to update in the documents
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to update in the documents. This is a legacy field and will be deprecated. Use $metadata instead.
- Name
$metadata
- Type
- object(optional)
- Description
- Advanced metadata to update in the documents
- Name
$set
- Type
- object(optional)
- Description
- Set/replace the metadata to the given value. Applicable to both string and array values. The values will be set/replaced.
- Name
$append
- Type
- object(optional)
- Description
- Append the metadata with the given value. Applicable only to array values. The values will be appended to the existing array.
- Name
$remove
- Type
- object(optional)
- Description
- Remove the metadata with the given value. Applicable only to array values. The values will be removed from the existing array.
Request
curl -X PATCH https://api.sourcesync.ai/v1/documents \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"filterConfig": {
"documentIds": ["doc_123"],
"documentExternalIds": [],
"documentConnectionIds": [],
"documentTypes": ["URL"],
"documentIngestionSources": ["WEBSITE"],
"documentIngestionStatuses": ["SUCCESS"],
"metadata": {
"category": "security"
}
},
"pagination": {
"pageSize": 10,
"cursor": "eyJjcmVhdGVkQXQiOi..."
},
"data": {
"metadata": {
"status": "archived",
"archivedAt": "2024-01-15T00:00:00Z"
},
"$metadata": {
"$set": {
"status": "archived",
"category": ["security", "networking"]
},
"$append": {
"complexity": ["advanced"]
},
"$remove": {
"apiVersion": ["v0"]
}
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
itemsUpdated
*- Type
- number
- Description
- Number of documents updated
- Name
documents
*- Type
- array<object>
- Description
- List of documents
- Name
id
*- Type
- string
- Description
- Unique identifier of the document
- Name
name
- Type
- string | null(optional)
- Description
- Name of the document
- Name
externalId
*- Type
- string
- Description
- External identifier of the document
- Name
documentType
*- Type
- enum<string>
- Description
- Type of the documentAvailable options:
TEXT
,URL
,FILE
,NOTION_DOCUMENT
,GOOGLE_DRIVE_DOCUMENT
,DROPBOX_DOCUMENT
,ONEDRIVE_DOCUMENT
,BOX_DOCUMENT
- Name
ingestionSource
*- Type
- enum<string>
- Description
- Source from where the document was ingestedAvailable options:
TEXT
,URLS_LIST
,SITEMAP
,WEBSITE
,LOCAL_FILE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
ingestionSource
*- Type
- enum<string>
- Description
- Source from where the document was ingestedAvailable options:
BACKLOG
,QUEUED
,PROCESSING
,SUCCESS
,FAILED
,CANCELLED
- Name
ingestionError
- Type
- string | null(optional)
- Description
- Error message if the document ingestion failed
- Name
ingestJob
*- Type
- object
- Description
- Details of the ingest job
- Name
id
*- Type
- string
- Description
- ID of the ingest job
- Name
ingestJobRun
*- Type
- object
- Description
- Details of the ingest job run
- Name
id
*- Type
- string
- Description
- ID of the ingest job run
- Name
connection
- Type
- object(optional)
- Description
- Details of the connection
- Name
id
*- Type
- string
- Description
- ID of the connection from which the document was ingested
- Name
documentProperties
- Type
- object(optional)
- Description
- Properties of the document
- Name
mimeType
- Type
- string(optional)
- Description
- MIME type of the document
- Name
fileSize
- Type
- number(optional)
- Description
- Size of the document file in bytes
- Name
characterCount
- Type
- number(optional)
- Description
- Number of characters in the document
- Name
tokenCount
- Type
- number(optional)
- Description
- Number of tokens in the document
- Name
embeddingCount
- Type
- number(optional)
- Description
- Number of embeddings in the document
- Name
embeddingConfig
- Type
- object(optional)
- Description
- Configuration of the embedding model
- Name
provider
- Type
- enum<string>(optional)
- Description
- Provider of the embedding model usedAvailable options:
OPENAI
,COHERE
,JINA
- Name
model
- Type
- enum<string>(optional)
- Description
- Embedding model used to create the embeddingsAvailable options:
text-embedding-3-small
,text-embedding-3-large
,text-embedding-ada-002
,embed-english-v3.0
,embed-multilingual-v3.0
,embed-english-light-v3.0
,embed-multilingual-light-v3.0
,embed-english-v2.0
,embed-english-light-v2.0
,embed-multilingual-v2.0
,jina-embeddings-v3
- Name
dimensions
- Type
- number(optional)
- Description
- Dimensions of the embedding model used
- Name
chunkSize
- Type
- number(optional)
- Description
- Number of tokens in each chunk
- Name
chunkOverlap
- Type
- number(optional)
- Description
- Number of tokens to overlap between chunks
- Name
providers
- Type
- object(optional)
- Description
- Providers used to ingest the document
- Name
fileStorage
- Type
- enum<string>(optional)
- Description
- Type of the file storage usedAvailable options:
S3_COMPATIBLE
- Name
vectorStorage
- Type
- enum<string>(optional)
- Description
- Provider of the vector storage usedAvailable options:
PINECONE
- Name
embeddingModel
- Type
- enum<string>(optional)
- Description
- Provider of the embedding model usedAvailable options:
OPENAI
,COHERE
,JINA
- Name
webScraper
- Type
- enum<string>(optional)
- Description
- Provider of the web scraper used if the document is from a web sourceAvailable options:
FIRECRAWL
,JINA
,SCRAPINGBEE
- Name
metadata
*- Type
- object
- Description
- Metadata associated with the document
- Name
namespace
*- Type
- object
- Description
- Details of the namespace containing the document
- Name
identifier
*- Type
- string
- Description
- Unique identifier of the namespace
- Name
organization
*- Type
- object
- Description
- Details of the organization
- Name
id
*- Type
- string
- Description
- ID of the organization containing the document
- Name
createdAt
*- Type
- object
- Description
- Timestamp when the document was created
- Name
isoString
*- Type
- string
- Description
- ISO 8601 formatted timestamp
- Name
updatedAt
*- Type
- object
- Description
- Timestamp when the document was last updated
- Name
isoString
*- Type
- string
- Description
- ISO 8601 formatted timestamp
Response
{
"success": true,
"message": "Documents updated successfully",
"data": {
"itemsUpdated": 10,
"documents": [
{
"id": "doc_123",
"name": "https://example.com",
"externalId": "external_123",
"documentType": "URL",
"ingestionSource": "WEBSITE",
"ingestionStatus": "SUCCESS",
"ingestionError": null,
"ingestJob": {
"id": "job_123",
},
"ingestJobRun": {
"id": "job_run_123",
},
"connection": {
"id": "conn_123",
},
"documentProperties": {
"mimeType": "text/html",
"fileSize": 1347,
"characterCount": 1335,
"tokenCount": 340,
"embeddingCount": 1,
},
"embeddingConfig": {
"provider": "OPENAI",
"model": "text-embedding-3-small",
"dimensions": 1536,
"chunkSize": 1024,
"chunkOverlap": 256,
},
"providers": {
"fileStorage": "S3_COMPATIBLE",
"vectorStorage": "PINECONE",
"embeddingModel": "OPENAI",
"webScraper": "FIRECRAWL",
},
"metadata": {
"status": "archived",
"archivedAt": "2024-01-15T00:00:00Z",
"category": ["security", "networking"],
"complexity": ["advanced"],
"apiVersion": ["v1"]
},
"namespace": {
"identifier": "ns_123"
},
"organization": {
"id": "org_123"
},
"createdAt": {
"isoString": "2024-01-01T00:00:00Z"
},
"updatedAt": {
"isoString": "2024-01-01T00:00:00Z"
}
}
]
}
}
Delete Documents
Delete documents based on filters.
filterConfig: {}
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
- Name
Accept
*- Type
- string
- Description
application/json
Request Body
- Name
namespaceId
*- Type
- string
- Description
- Unique identifier of the namespace containing the documents
- Name
filterConfig
*- Type
- object
- Description
- Configuration for filtering documents
- Name
documentIds
- Type
- array<string>(optional)
- Description
- List of document IDs to filter
- Name
documentExternalIds
- Type
- array<string>(optional)
- Description
- List of external document IDs to filter
- Name
documentConnectionIds
- Type
- array<string>(optional)
- Description
- List of connection IDs to filter
- Name
documentTypes
- Type
- array<enum<string>>(optional)
- Description
- List of document types to filterAvailable options:
TEXT
,URL
,SITEMAP
,WEBSITE
- Name
documentIngestionSources
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion sources to filterAvailable options:
TEXT
,LOCAL_FILE
,URLS_LIST
,SITEMAP
,WEBSITE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
documentIngestionStatuses
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion statuses to filterAvailable options:
BACKLOG
,QUEUED
,QUEUED_FOR_RESYNC
,PROCESSING
,SUCCESS
,FAILED
,CANCELLED
- Name
metadata
- Type
- object(optional)
- Description
- Metadata filters to apply
- Name
includeConfig
- Type
- object(optional)
- Description
- Include options
- Name
documents
- Type
- boolean(optional)
- Description
- Option to include the documents or not in the response. Defaults to true
- Name
stats
- Type
- boolean(optional)
- Description
- Option to include the stats or not in the response. Defaults to false. This is a legacy field and will be deprecated. Use statsBySource instead.
- Name
statsBySource
- Type
- boolean(optional)
- Description
- Option to include the stats by source or not in the response. Defaults to false
- Name
statsByStatus
- Type
- boolean(optional)
- Description
- Option to include the stats by status or not in the response. Defaults to false
- Name
rawFileUrl
- Type
- boolean(optional)
- Description
- Option to include the raw file URL or not in the response. Defaults to false
- Name
parsedTextFileUrl
- Type
- boolean(optional)
- Description
- Option to include the parsed text file URL or not in the response. Defaults to false
- Name
pagination
- Type
- object(optional)
- Description
- Pagination options
- Name
pageSize
- Type
- number(optional)
- Description
- Number of documents per page (1-100, default: 20)
- Name
cursor
- Type
- string(optional)
- Description
- Opaque cursor for fetching the next page
Request
curl -X DELETE https://api.sourcesync.ai/v1/documents \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"filterConfig": {
"documentIds": ["doc_123"],
"documentExternalIds": [],
"documentTypes": ["TEXT"],
"documentIngestionSources": ["TEXT"],
"documentIngestionStatuses": ["SUCCESS"],
"metadata": {
"status": "archived"
},
"pagination": {
"pageSize": 10,
"cursor": "eyJjcmVhdGVkQXQiOi..."
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
itemsDeleted
*- Type
- number
- Description
- Number of documents deleted
- Name
documents
*- Type
- array<object>
- Description
- List of documents
- Name
id
*- Type
- string
- Description
- Unique identifier of the document
Response
{
"success": true,
"message": "Documents deleted successfully",
"data": {
"itemsDeleted": 10,
"documents": [
{
"id": "doc_123"
}
]
}
}
Resync Documents
TEXT
and LOCAL_FILE
documents and also the documents with status QUEUED
, QUEUED_FOR_RESYNC
and PROCESSING
.Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
- Name
Accept
*- Type
- string
- Description
application/json
Request Body
- Name
namespaceId
*- Type
- string
- Description
- Unique identifier of the namespace containing the documents
- Name
filterConfig
*- Type
- object
- Description
- Configuration for filtering documents
- Name
documentIds
- Type
- array<string>(optional)
- Description
- List of document IDs to filter
- Name
documentExternalIds
- Type
- array<string>(optional)
- Description
- List of external document IDs to filter
- Name
documentConnectionIds
- Type
- array<string>(optional)
- Description
- List of connection IDs to filter
- Name
documentTypes
- Type
- array<enum<string>>(optional)
- Description
- List of document types to filterAvailable options:
TEXT
,URL
,SITEMAP
,WEBSITE
- Name
documentIngestionSources
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion sources to filterAvailable options:
TEXT
,LOCAL_FILE
,URLS_LIST
,SITEMAP
,WEBSITE
,NOTION
,GOOGLE_DRIVE
,DROPBOX
,ONEDRIVE
,BOX
- Name
documentIngestionStatuses
- Type
- array<enum<string>>(optional)
- Description
- List of ingestion statuses to filterAvailable options:
BACKLOG
,QUEUED
,QUEUED_FOR_RESYNC
,PROCESSING
,SUCCESS
,FAILED
,CANCELLED
- Name
metadata
- Type
- object(optional)
- Description
- Metadata filters to apply
Request
curl -X POST https://api.sourcesync.ai/v1/documents/resync \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"filterConfig": {
"documentIds": ["doc_123"],
"documentExternalIds": [],
"documentTypes": ["TEXT"],
"metadata": {
"status": "archived"
},
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
itemsQueued
*- Type
- number
- Description
- Number of documents queued for resync
- Name
itemsSkipped
*- Type
- number
- Description
- Number of documents skipped for resync
- Name
documents
*- Type
- array<object>
- Description
- List of documents
- Name
id
*- Type
- string
- Description
- Unique identifier of the document
- Name
status
*- Type
- enum<string>
- Description
- Status of the documentAvailable options:
QUEUED_FOR_RESYNC
,NOT_ELIGIBLE_FOR_RESYNC
- Name
error
- Type
- string | null(optional)
- Description
- Error message if the document is not eligible for resync or resync failed
Response
{
"success": true,
"message": "Added the eligible documents to the resync queue successfully",
"data": {
"itemsQueued": 3,
"itemsSkipped": 2,
"documents": [
{
"id": "doc_123",
"status": "QUEUED_FOR_RESYNC",
"error": null
},
{
"id": "doc_124",
"status": "QUEUED_FOR_RESYNC",
"error": null
},
{
"id": "doc_125",
"status": "NOT_ELIGIBLE_FOR_RESYNC",
"error": "Documents with LOCAL_FILE and TEXT as documentType are not eligible for resync"
},
{
"id": "doc_126",
"status": "QUEUED_FOR_RESYNC",
"error": null
},
{
"id": "doc_127",
"status": "NOT_ELIGIBLE_FOR_RESYNC",
"error": "Documents with status QUEUED, QUEUED_FOR_RESYNC and PROCESSING are not eligible for resync"
},
]
}
}
Error Codes
- Name
NAMESPACE_NOT_FOUND
- Description
The specified namespace does not exist
- Name
DOCUMENTS_NOT_FOUND
- Description
No documents match the filter criteria
- Name
INVALID_FILTER_CONFIG
- Description
Invalid filter configuration provided
- Name
UPDATE_DOCUMENTS_FAILED
- Description
Internal error while updating documents
- Name
DELETE_DOCUMENTS_FAILED
- Description
Internal error while deleting documents
Filter Configuration
- Name
Document IDs
- Description
Filter by specific document IDs using
documentIds
- Name
External IDs
- Description
Filter by external IDs using
documentExternalIds
- Name
Document Types
- Description
Filter by document types using
documentTypes
- Name
Metadata
- Description
Filter by metadata key-value pairs using
metadata