All the features you need
A complete platform to manage your company data lifecycle
Ingestion
Intelligent Ingestion
Import documents from any source. We handle extraction, cleaning, and preparation.
-
Multi-format support: PDF, DOCX, TXT, MD, HTML
-
Native connectors: Notion, Google Drive, SharePoint
-
Built-in OCR for scanned documents
-
Automatic language detection
-
Bulk import with queuing
Connected
Connected
Connected
Connected
Raw Document
contract_2024.pdf
Chunk 1
~500 tokens
Chunk 2
~500 tokens
Chunk 3
~500 tokens
Vectorized
1536 dimensions • Ready for search
Processing
Advanced Processing
Intelligent chunking, cleaning, and metadata extraction for optimal search.
-
Smart chunking with context preservation
-
Automatic cleaning and normalization
-
Metadata extraction (author, date, tags)
-
Content deduplication
-
Queue with automatic retry
Search API
Powerful Search
Semantic search with filters, reranking, and complete RESTful API.
-
Vector similarity with pgvector
-
Metadata filters (date, author, source)
-
Hybrid search (vector + full-text)
-
Result reranking
-
RESTful API with key authentication
API Request
POST /api/v1/search
// Request
{
"query": "What is our refund policy?",
"limit": 5,
"threshold": 0.7
}
// Response
{
"results": [
{
"content": "Our refund policy allows...",
"score": 0.92,
"document": "policies.pdf"
}
]
}