CloneJutsu.dev — Process Massive Unstructured Data with AI Agents

Mission & Vision

Mission

We let you process and reason over massive unstructured datasets using LLMs, automatically distributing tasks across agents that map, summarize, and aggregate results.

Vision

A world where questions over large data are answered optimally with LLMs, and big data is processed semantically and efficiently—so anyone can analyze it with a single prompt.

Analyze Data with Parallel AI Agents.

Step 1: Connect Data

PDF, DOCX, CSV, URLs, and S3. Use automatic chunking or your own map().

Step 2: Define map() & reduce()

Batch your data and assign processing logic to specialized agents to map, summarize, and transform content. Bring your own functions or use built-ins, and attach custom tools via MCP.

Step 3: Run Agents in Parallel

We schedule and scale agent work across batches. Get email or ping notifications when long-running jobs complete.

Step 4: Ask with an Aggregator

Agent outputs are embedded with metadata. The aggregator retrieves top‑K by cosine similarity and composes precise answers with source grounding. Attach MCP tools for reranking, retrieval, or policy filters; scale with multiple aggregators for complex tasks.

Bring your own tools with MCP.

Extend both reduce and aggregate phases by plugging in custom tools over the Model Context Protocol. Compose domain parsers, vector stores, rerankers, or policy filters into your pipeline.

Reduce‑phase tools

OCR, table extraction, domain‑specific parsers, SQL/NoSQL connectors, and enrichment services for structured outputs.

Aggregator‑phase tools

Vector DBs, rerankers, knowledge routers, and policy filters to tune retrieval quality and governance.

Governance & safety

Per‑tool permissions, auditable calls, and sandboxing. Tailor access and observability to your data.

From data to answers in a few lines.

python • pipeline.py

import clonejutsu as cj

pipeline = cj.Pipeline(
  inputs=[
    cj.Input.from_s3("s3://my-bucket/reports/"),
    cj.Input.from_files(["contracts/*.pdf", "ledgers/*.csv"]),
  ],
  map=cj.map.chunk_by_pages(pages=5),  # or bring your own function
  reduce=cj.reduce.agents(
    role="Analyst",
    instructions="Extract KPIs and anomalies with citations",
    tools=[cj.mcp.tool("ocr"), cj.mcp.tool("pdf_tables")]
  ),
  aggregators=[cj.Aggregator(top_k=8, tools=[cj.mcp.tool("reranker"), cj.mcp.tool("vector_db")])]
)

run = pipeline.run_async()
run.on_complete(lambda r: cj.notify.email("ops@company.com"))

answer = pipeline.ask("Which vendors exceeded budget in Q2?")
print(answer.text)

Process at scale, then analyze everything with a single prompt. Outputs are embedded with metadata so the aggregator retrieves the most relevant context and composes accurate answers with citations.

What aggregators + subagents unlock (beyond traditional frameworks).

Semantic & Sentiment Analysis at Scale

Aggregate product reviews across marketplaces, detect themes, sentiment shifts, and nuanced intent with citations.

Log Categorization & Incident Triage

Group noisy logs semantically, summarize incidents, and surface probable root causes and remediation steps.

Multi‑Document Q&A with Citations

Ask one question across thousands of PDFs, CSVs, and pages—get precise answers grounded in sources.

Entity Extraction & Relationship Mapping

Extract entities and link relationships across documents to build knowledge graphs and summaries.

Compliance & PII Detection with Traceability

Detect sensitive data, policies, and obligations; provide explainable results and document‑level citations.

Multilingual Normalization & Cross‑Lingual Retrieval

Analyze mixed‑language corpora and retrieve the best answers regardless of the source language.

Narrative Summaries Across Reports

Roll up quarterly reports, memos, and notes into concise briefings tailored to your role.

Vendor Risk & Anomaly Detection

Score vendors and flag anomalies across invoices, contracts, and support tickets with reasoned justifications.

Simple, usage-based pricing

Start free with 1,000 agent-credits. Scale as your missions grow.

Get Started for Free

Ship your first analysis today.

Connect data, define map() and reduce(), and start asking questions with an aggregator—powered by a single prompt.