Ingest PDFs, DOCX, CSVs, and S3 buckets. Define map() and reduce() to distribute work across AI agents in parallel, then query everything through an aggregator that embeds results for fast, accurate answers.
We let you process and reason over massive unstructured datasets using LLMs, automatically distributing tasks across agents that map, summarize, and aggregate results.
A world where questions over large data are answered optimally with LLMs, and big data is processed semantically and efficiently—so anyone can analyze it with a single prompt.
PDF, DOCX, CSV, URLs, and S3. Use automatic chunking or your own map().
Batch your data and assign processing logic to specialized agents to map, summarize, and transform content. Bring your own functions or use built-ins, and attach custom tools via MCP.
We schedule and scale agent work across batches. Get email or ping notifications when long-running jobs complete.
Agent outputs are embedded with metadata. The aggregator retrieves top‑K by cosine similarity and composes precise answers with source grounding. Attach MCP tools for reranking, retrieval, or policy filters; scale with multiple aggregators for complex tasks.
Extend both reduce and aggregate phases by plugging in custom tools over the Model Context Protocol. Compose domain parsers, vector stores, rerankers, or policy filters into your pipeline.
OCR, table extraction, domain‑specific parsers, SQL/NoSQL connectors, and enrichment services for structured outputs.
Vector DBs, rerankers, knowledge routers, and policy filters to tune retrieval quality and governance.
Per‑tool permissions, auditable calls, and sandboxing. Tailor access and observability to your data.
import clonejutsu as cj
pipeline = cj.Pipeline(
inputs=[
cj.Input.from_s3("s3://my-bucket/reports/"),
cj.Input.from_files(["contracts/*.pdf", "ledgers/*.csv"]),
],
map=cj.map.chunk_by_pages(pages=5), # or bring your own function
reduce=cj.reduce.agents(
role="Analyst",
instructions="Extract KPIs and anomalies with citations",
tools=[cj.mcp.tool("ocr"), cj.mcp.tool("pdf_tables")]
),
aggregators=[cj.Aggregator(top_k=8, tools=[cj.mcp.tool("reranker"), cj.mcp.tool("vector_db")])]
)
run = pipeline.run_async()
run.on_complete(lambda r: cj.notify.email("ops@company.com"))
answer = pipeline.ask("Which vendors exceeded budget in Q2?")
print(answer.text)
Process at scale, then analyze everything with a single prompt. Outputs are embedded with metadata so the aggregator retrieves the most relevant context and composes accurate answers with citations.
Aggregate product reviews across marketplaces, detect themes, sentiment shifts, and nuanced intent with citations.
Group noisy logs semantically, summarize incidents, and surface probable root causes and remediation steps.
Ask one question across thousands of PDFs, CSVs, and pages—get precise answers grounded in sources.
Extract entities and link relationships across documents to build knowledge graphs and summaries.
Detect sensitive data, policies, and obligations; provide explainable results and document‑level citations.
Analyze mixed‑language corpora and retrieve the best answers regardless of the source language.
Roll up quarterly reports, memos, and notes into concise briefings tailored to your role.
Score vendors and flag anomalies across invoices, contracts, and support tickets with reasoned justifications.
Start free with 1,000 agent-credits. Scale as your missions grow.
Connect data, define map() and reduce(), and start asking questions with an aggregator—powered by a single prompt.