Changelog
Stay up to date with all the latest changes and improvements to SourceSync.
January 23, 2025
Enhanced document management with improved metadata handling and resync capabilities.
🚀 Features
- Added array metadata support:
- Store arrays of values in document metadata
- Flexible array operations:
- Replace arrays using
$set
- Add values using
$append
- Remove values using
$remove
- Replace arrays using
- Smart array search:
- OR condition within array values
- AND condition between different metadata keys
- Automatic deduplication of values
- Enhanced document model:
- Added
name
field for better document identification - Added
mimeType
tracking in document properties - Added document source statistics in fetch response
- Added
- Added document resync support:
- New endpoint to trigger document resyncs
- Automatic reprocessing of existing documents
- Support for all document types except text and local files
🔄 API Changes
- Added
/v1/documents/resync
endpoint (docs) - Enhanced
/v1/documents
response with source statistics:{ "stats": [ { "source": "TEXT", "totalCount": 50 }, { "source": "LOCAL_FILE", "totalCount": 25 }, { "source": "URLS_LIST", "totalCount": 100 }, { "source": "SITEMAP", "totalCount": 75 } ] }
- Updated metadata operations in document updates:
{ "$metadata": { "$set": { "tags": ["important", "urgent"] }, "$append": { "categories": ["new-category"] }, "$remove": { "labels": ["outdated"] } } }
- Enhanced search endpoint to support array metadata filtering:
{ "query": "search query", "filter": { "metadata": { "tags": ["important", "urgent"], // OR condition: matches if document has either tag "categories": ["tech"] // AND condition: must match with above filter } } }
📝 Documentation
- Updated document management guide with resync examples (docs)
- Added array metadata handling guide (docs)
- Enhanced search documentation with array metadata examples (docs)
January 22, 2025
Added native multitenancy support for enhanced data isolation.
🚀 Features
- Added native multitenancy support:
- Virtual tenant separation within namespaces
- Automatic tenant-based data isolation
- Support for
X-Tenant-ID
header across all endpoints - Tenant-scoped document management and search
🔄 API Changes
- Added
X-Tenant-ID
header support for all namespace-related endpoints:- Document ingestion endpoints
- Search endpoints
- Document management endpoints
- Connection management endpoints
📝 Documentation
- Updated API reference to include tenant header usage (docs)
- Added multitenancy guide with best practices (docs)
January 21, 2025
Enhanced connection management and ingestion capabilities.
🚀 Features
- Added connection revocation support:
- New endpoint to revoke connections (
/connections/:connectionId/revoke
) - Preserves existing ingested documents
- Automatic token refresh during ingestion
- New endpoint to revoke connections (
- Improved ingestion response:
- Added document IDs in ingestion responses for immediate tracking
- Enhanced error handling and token refresh logic
🔄 API Changes
- Added
/v1/connections/:connectionId/revoke
endpoint (docs) - Enhanced ingestion response format:
- Added
documentIds
in response for text, file, and URL ingestion - Updated background job status tracking for sitemap and website ingestion
- Added
January 20, 2025
Major improvements to document management and ingestion capabilities.
🚀 Features
- Added cursor-based pagination for document retrieval:
- Configurable page size (default: 20, max: 100)
- Consistent ordering by creation date
- Efficient navigation with cursor support
- Enhanced document management:
- Automatic bulk operations for updates and deletes
- Synchronized storage and vector database operations
- Improved error handling for S3-compatible storage
- Enhanced sitemap ingestion with path filtering and limit the number of links to ingest
- Added connection tracking for documents
🔄 API Changes
- Updated
/v1/documents
endpoint:- Added
pagination
parameter withpageSize
andcursor
- Enhanced response with
returnedCount
,hasNextPage
, andnextCursor
- Added
- Added
maxLinks
,includePaths
, andexcludePaths
for sitemap ingestion - Added
connectionId
to document responses - Added
clientRedirectUrl
to connection endpoints
📝 Documentation
- Updated document management guide with pagination examples (docs)
- Enhanced API reference for document operations (docs)
January 19, 2025
Improved web content processing for better search results.
🚀 Features
- Enhanced HTML content processing:
- Automatic removal of non-text elements
- Cleaner markdown output for LLM consumption
- Improved content relevance for search
🔄 API Changes
- Enhanced web ingestion endpoints to remove:
- Script tags
- Style elements
- Head section
- Meta tags
- iframes
- Other non-content elements
January 18, 2025
API optimization and security improvements.
🚀 Features
- Enhanced GET request handling:
- Automatic ignoring of request body
- Improved security and performance
- Better adherence to HTTP standards
January 17, 2025
Performance improvements and simplified namespace handling.
🚀 Features
- Improved search performance:
- Faster
/search
endpoint response - Enhanced
/search/hybrid
endpoint speed
- Faster
- Simplified namespace management:
- Using user-provided namespace identifiers
- Eliminated need to store SourceSync-generated IDs
- Made ingestion parameters optional:
- Optional metadata
- Optional chunk configuration
- Optional OCR configuration with sensible defaults
🔄 API Changes
- Updated namespace handling in all endpoints
- Made
metadata
,chunkConfig
,ocrConfig
optional in/ingest/file
- Set
BASIC_PARSER
as default OCR strategy
January 16, 2025
Added OCR support for enhanced document processing.
🚀 Features
- Added OCR support for document processing:
- Support for scanned text documents
- Image text extraction
- Configurable OCR strategy
- Integration with existing document processing
🔄 API Changes
- Added
ocrConfig
parameter to file ingestion:- Optional
strategy
field - Support for
STANDARD_OCR
andBASIC_PARSER
- Default to
BASIC_PARSER
when not specified
- Optional
📝 Documentation
- Updated data ingestion guide with OCR configuration (docs)
- Added OCR processing examples and best practices (docs)
January 13, 2025
Bug fixes for text file processing and improvements to document management and statistics.
🐛 Bug Fixes
- Fixed plain text and markdown file processing:
- Correctly preserves MIME types during file uploads
- Fixed content type detection for
.txt
and.md
files - Improved error handling for text-based files
- Enhanced document metadata updates:
- Update endpoint now syncs metadata changes to vector storage
- Delete endpoint removes files from file storage and vectors from vector storage
- Ensures data consistency across all storage layers
🚀 Features
- Added document statistics tracking:
- File size tracking
- Character and token count metrics
- Embedding count tracking
- Enhanced document configuration storage
- Added API Logs Dashboard (Alpha):
- Real-time monitoring of API requests
- Filter by status, method, and endpoint
- Detailed request/response inspection
- Performance metrics and error tracking
- Coming soon to all customers
🔄 API Changes
- Enhanced document model with new properties:
- Added
documentProperties
for tracking statistics:fileSize
: Size of the original documentcharacterCount
: Total character counttokenCount
: Number of tokens processedembeddingCount
: Number of embeddings generated
- Added
embeddingConfig
for configuration tracking:provider
: Embedding provider used (OPENAI, COHERE, JINA)model
: Specific model useddimensions
: Embedding dimensionschunkSize
: Document chunk sizechunkOverlap
: Chunk overlap settings
- Added
providers
configuration tracking:fileStorage
: Storage provider (S3_COMPATIBLE)vectorStorage
: Vector store provider (PINECONE)embeddingModel
: Embedding model providerwebScraper
: Web scraping provider
- Added
📝 Documentation
- Updated file upload guide with improved MIME type handling (docs)
- Enhanced documents API reference to reflect metadata syncing (docs)
January 12, 2025
Added Box integration to connect and search your Box enterprise content.
🚀 Features
- Added Box connector to connect your Box with SourceSync (docs):
- New endpoint to add a new Box connection
- Secure OAuth2 flow for Box access and file selection
- Support for PDF, CSV, DOCX, TXT, MD, and PPTX files
- Integration with enterprise content management
- Updated namespace:
- Added Box configuration to namespace
🔄 API Changes
- Updated
/v1/connections
endpoint for creating and managing Box connections (docs) - Added
/v1/ingest/box
endpoint for ingesting selected Box files (docs) - Updated namespace configuration to support Box settings (docs)
📝 Documentation
- New guide on how to use the Box connector (docs)
- Updated the namespace reference to include Box configuration (docs)
- Added new pages to sitemap.xml for improved SEO
January 11, 2025
Added Jina Reader API and ScrapingBee as new web scraping providers for enhanced content ingestion capabilities.
🚀 Features
- Added Jina Reader API as a free web scraping provider (docs)
- Added ScrapingBee as a web scraping provider with JavaScript rendering support (docs)
🔄 API Changes
- Added Jina and ScrapingBee web scraper configurations in namespace settings (docs)
📝 Documentation
- Updated web scraping guide with Jina and ScrapingBee configurations (docs)
January 10, 2025
Enhanced pricing transparency and added API logging capabilities.
🚀 Features
- Added API request logging:
- Basic request metadata (timestamp, endpoint, method)
- Response status and timing
- Error tracking
- Preparing for upcoming API dashboard feature
🔄 API Changes
- Added request logging to all API endpoints:
- No changes to request/response format
- No performance impact
- Tiered log retention (7-90 days based on plan)
- No sensitive data or payload contents stored
📝 Documentation
- Enhanced pricing page clarity (docs):
- Added detailed processing limits with overage costs
- Clarified retrieval call limits and pricing
- Improved rate limit explanations
- Added log retention periods by plan
- Enhanced enterprise plan feature list
- Added security documentation (docs):
- Data privacy commitments
- Infrastructure details
- Compliance roadmap
January 9, 2025
Added OneDrive integration and Jina AI embedding models for enhanced content search capabilities.
🚀 Features
- Added OneDrive connector to connect your OneDrive with SourceSync (docs):
- New endpoint to add a new OneDrive connection
- Secure OAuth2 flow for OneDrive access and file selection
- Support for PDF, CSV, DOCX, TXT, MD, and PPTX files
- Integration with both personal and business accounts
- Required Microsoft Graph permissions: Files.Read.All, offline_access, openid, User.Read
- Updated namespace:
- Added OneDrive configuration to namespace
- Added Jina AI as a new embedding model provider (docs):
- High-performance
jina-embeddings-v3
model with 1024 dimensions
- High-performance
🔄 API Changes
- Updated
/v1/connections
endpoint for creating and managing OneDrive connections (docs) - Added
/v1/ingest/onedrive
endpoint for ingesting selected OneDrive files (docs) - Updated namespace configuration to support:
📝 Documentation
- New guide on how to use the OneDrive connector (docs)
- Added Jina embedding models to supported models list with configuration examples (docs)
- Updated the namespace reference to include new configurations (docs)
January 8, 2025
Added Dropbox integration to connect and search your Dropbox content.
🚀 Features
- Added Dropbox connector to connect your Dropbox with SourceSync (docs):
- New endpoint to add a new Dropbox connection
- Secure OAuth2 flow for Dropbox access and file selection
- Support for PDF, CSV, DOCX, TXT, MD, and PPTX files
- Updated namespace:
- Added Dropbox configuration to namespace
🔄 API Changes
- Updated
/v1/connections
endpoint for creating and managing Dropbox connections (docs) - Added
/v1/ingest/dropbox
endpoint for ingesting selected Dropbox files (docs) - Updated namespace configuration to support Dropbox settings (docs)
📝 Documentation
- New guide on how to use the Dropbox connector (docs)
- Updated the namespace reference to include Dropbox configuration (docs)
- Added new pages to sitemap.xml for improved SEO
January 7, 2025
Added Google Drive integration to connect and search your Drive content.
🚀 Features
- Added Google Drive connector to connect your Drive with SourceSync (docs):
- New endpoint to add a new Drive connection
- Secure OAuth2 flow for Drive access and file selection
- Support for Google Docs, Sheets, and native files
- Updated namespace:
- Added Google Drive configuration to namespace
🔄 API Changes
- Updated
/v1/connections
endpoint for creating and managing Drive connections (docs) - Added
/v1/ingest/google-drive
endpoint for ingesting selected Drive files (docs) - Updated namespace configuration to support Drive settings (docs)
📝 Documentation
- New guide on how to use the Google Drive connector (docs)
- Updated the namespace reference to include Drive configuration (docs)
- Added new pages to sitemap.xml for improved SEO
January 6, 2025
Major improvements to document connectivity with the addition of Notion connector.
🚀 Features
- Added Notion connector to connect your Notion workspace with SourceSync (docs):
- New endpoint to add a new Notion connection
- Secure OAuth2 flow for Notion workspace access and content ingestion
- Updated namespace:
- Added Notion configuration to namespace
🔄 API Changes
- Added
/v1/connections
endpoint for creating and fetching connections (docs) - Added
/v1/connections/:connectionId
endpoint for getting and managing a particular connection (docs) - Added
/v1/ingest/notion
endpoint for ingesting Notion content from all the pages you select during the OAuth flow (docs) - Updated namespace configuration to support Notion settings (docs)
📝 Documentation
- New guide on how to use the Notion connector (docs)
- New endpoint reference for connectors (docs)
- Updated the namespace reference to include Notion configuration (docs)
- Added new pages to sitemap.xml for improved SEO and discoverability
January 5, 2025
Major improvements to document ingestion capabilities, introducing direct file uploads and API standardization.
🚀 Features
- Added direct file upload support with multiple formats (docs):
- Documents:
.pdf
,.docx
,.pptx
,.xlsx
- Text-based formats:
.csv
,.json
,.xml
,.html
- Upload files directly through the API without needing public URLs
- Documents:
- Enhanced document model with ingestion tracking (docs):
- Added
ingestJob
field - Added
ingestJobRun
field
- Added
🔄 API Changes
- Added new
/ingest/file
endpoint for direct file uploads (docs) - Added web scraper configuration support in namespaces
- Standardized document filter parameters to
camelCase
(docs):document_ids
→documentIds
document_external_ids
→documentExternalIds
document_types
→documentTypes
📝 Documentation
- Added
sitemap.xml
for improved SEO and discoverability - Updated RAG flow diagrams in "What is RAG?" guide:
- Added light mode versions
- Improved contrast and readability
- Enhanced visual consistency across themes
January 4, 2025 🚀
We're excited to announce the official launch of SourceSync - a privacy-focused, self-serve platform for building AI-powered search and Q&A applications!
🚀 Core Features
-
Privacy-First Architecture:
-
Document Processing:
-
Search & Retrieval:
-
Developer Experience:
📈 Launch Plans
-
Pilot ($99/month):
- 5,000 monthly ingestion pages
- 25,000 monthly retrieval calls
- ~50 requests/minute
- Email support
-
Pro ($299/month):
- 25,000 monthly ingestion pages
- 100,000 monthly retrieval calls
- ~200 requests/minute
- Standard support
-
Team ($999/month):
- 100,000 monthly ingestion pages
- 500,000 monthly retrieval calls
- ~500 requests/minute
- Priority support
-
Enterprise (Custom pricing):
- Custom usage limits
- Custom rate limits
- Dedicated SLA
- White-glove support
🗺️ Roadmap
-
Enhanced Processing:
- Direct file uploads (coming tomorrow!)
- OCR for tables & images in PDFs
- Advanced metadata filtering
- Additional embedding models (Voyage, Mistral)
- Additional vector databases (Qdrant, Weaviate)
-
Advanced Features:
- Advanced RAG pipeline with reranking
- Multi-step retrieval
- Real-time content updates
- Custom embeddings & fine-tuning
-
Integrations:
- Google Drive
- Notion
- SharePoint
- Dropbox
- And more connectors
-
Developer Tools:
- Webhooks for real-time notifications
- Advanced analytics
- Usage pattern monitoring
- Search quality metrics