OnboardAI — RAG-Powered Document Intelligence

Use Cases 01

OnboardAI was designed around two specific, high-impact scenarios that almost every mid-size company faces. Both share the same underlying problem: people need answers from documents they don't know how to navigate.

🚀

Employee Onboarding

New hires upload (or are given access to) the employee handbook, SOPs, and process docs. Instead of reading 200 pages or pinging their manager for every question, they ask the chat interface directly.

Reduces ramp-up time, eliminates repetitive questions to managers, and ensures new employees get accurate, policy-compliant answers rather than hallway hearsay.

"What visas and work permits does the company not sponsor?"

"How can I access the shared drives and recorded videos?"

"What's the PTO policy for my first year?"

🔍

Cross-Department Doc Search

Engineering needs to understand a limitation in the billing system. Marketing wants to know what the API supports before writing copy. Instead of Slack-pinging another team and waiting hours, they query the docs directly.

Breaks down information silos without creating meeting overhead. Teams interact with each other's documentation on their own schedule.

"What rate limits does the payments API have?"

"What's the SLA for the data pipeline?"

"How does the returns process work for international orders?"

Why these two? Onboarding is where companies feel the pain most acutely — every new hire is a repeated cost. Cross-department search is where the compounding value lives — it scales with team size and document volume. Together they cover the "new employee" and "existing employee" halves of the knowledge-access problem.

Tech Stack 02

Every layer was chosen to balance capability against cost and complexity. The goal: a system that runs in production for under a dollar a month while being genuinely useful, not a demo.

Frontend

React + Vite + Tailwind

Component-based UI with fast HMR. Chat interface, upload panel, document sidebar, and collapsible source citations.

Backend

Python / FastAPI

Best AI/ML ecosystem, async support, auto-generated API docs. Three core routes: /ingest, /query, /documents.

Embeddings

OpenAI text-embedding-3-small

$0.02 per 1M tokens. 1536-dimension vectors. High quality at the lowest cost tier in the OpenAI lineup.

LLM

OpenAI gpt-4o-mini

$0.15/$0.60 per 1M input/output tokens. Cheapest capable model — strong enough for grounded Q&A with retrieved context.

Vector Store

ChromaDB (local)

Zero infrastructure, free, persists to disk. Good enough for demo and small-to-mid scale. Swap to Pinecone or Weaviate when needed.

Hosting

Vercel + Render

React on Vercel (free), FastAPI on Render (free tier). Total cost: ~$0.50/month for OpenAI API usage at moderate volume.

How It Works 03

Document Ingestion

When a user uploads a document (PDF, DOCX, MD, or TXT), it goes through a four-step pipeline before it's queryable.

Extract — The document processor pulls raw text based on file type. PDFs get page-by-page extraction, Word docs get paragraph parsing, markdown and text pass through directly.

Chunk — Text is split into overlapping 500-character chunks with 50-character overlap. Each chunk carries metadata: filename, file type, section header (if detected), and character positions. The overlap ensures no context is lost at boundaries.

Embed — Each chunk is sent to OpenAI's embedding API, returning a 1536-dimension vector that captures semantic meaning. "What's the PTO policy?" and "How many vacation days do I get?" produce similar vectors even though they share no keywords.

Store — Vectors and their associated text/metadata are written to ChromaDB. The document is now searchable. A GitLab Employee Handbook produces ~3,069 chunks; a smaller policy doc might be 25.

Query Flow

When a user asks a question, retrieval and generation happen in a single fast pass.

Embed the question — The user's query is converted to a vector using the same embedding model.

Retrieve — ChromaDB performs a cosine similarity search, returning the top 5 most relevant chunks. Results below a 0.3 relevance threshold are filtered out — if nothing is relevant, the system says so rather than guessing.

Generate — The retrieved chunks, conversation history (last 5 turns), and system instructions are assembled into a prompt. GPT-4o-mini generates the answer, grounded in the retrieved context.

Cite — The response includes clickable source references pointing back to the original document chunks, so users can verify every claim.

Architecture 04

Three-layer architecture with clean separation between the client, API, and processing/storage concerns.

System Architecture Overview

┌─────────────────────────────────────────────────────────┐
│  CLIENT LAYER                                            │
│                                                          │
│  React Frontend (Vite + Tailwind)                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐           │
│  │ Upload   │  │ Chat     │  │ Source       │           │
│  │ Panel    │  │ Interface│  │ Viewer       │           │
│  └──────────┘  └──────────┘  └──────────────┘           │
└──────────────────────┬──────────────────────────────────┘
                       │ HTTP/REST (JSON)
┌──────────────────────▼──────────────────────────────────┐
│  API LAYER  —  FastAPI Server                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐           │
│  │ /ingest  │  │ /query   │  │ /documents   │           │
│  └──────────┘  └──────────┘  └──────────────┘           │
└──────┬──────────────┬──────────────┬────────────────────┘
       │              │              │
┌──────▼──────┐ ┌────▼─────┐ ┌────▼─────────────────────┐
│ PROCESSING  │ │ RETRIEVAL│ │ EXTERNAL APIs             │
│             │ │          │ │                           │
│ Doc Process │ │ ChromaDB │ │ OpenAI Embeddings API     │
│ Text Chunker│ │ (local)  │ │ OpenAI Chat API           │
└─────────────┘ └──────────┘ └───────────────────────────┘

Services Layer: The backend is organized into discrete services — document_processor, chunker, embeddings, vector_store, retriever, and generator. Each handles one concern. This means swapping ChromaDB for Pinecone, or GPT-4o-mini for Claude, is a single-file change rather than a rewrite.

Key Design Decisions 05

Why 500-character chunks with 50-char overlap? Too small and you lose context — a sentence about PTO policy gets separated from the details. Too large and retrieval gets noisy — irrelevant content rides along with the good match. 500 characters (~80–100 words) is the sweet spot for Q&A retrieval. The overlap ensures sentences that span chunk boundaries aren't lost.

Why a relevance threshold of 0.3? Without a floor, ChromaDB always returns its top-k results — even if nothing is relevant. A cosine similarity score of 0.3 is generous enough to catch fuzzy matches but strong enough to prevent the LLM from fabricating answers from tangentially related content. When nothing clears the threshold, the system tells the user it can't find the answer rather than guessing.

Why conversation history (last 5 turns)? Follow-up questions like "What's the vesting schedule?" only make sense if the system remembers you were just asking about 401k matching. Sending the last 5 turns gives enough context for multi-step exploration without bloating the prompt or running up token costs.

Why RAG over fine-tuning? Documents change. People get promoted, policies update, new SOPs get written. RAG lets you add or remove documents instantly — no retraining, no waiting, no versioning headaches. The knowledge base is always current because it's always reading from the source docs.

Deployment & Cost 06

Designed to run on free tiers with minimal infrastructure. The frontend deploys to Vercel, the backend to Render, and ChromaDB persists on Render's filesystem.

Monthly Cost Breakdown

Component	Service	Monthly Cost
Frontend (React)	Vercel	Free
Backend (FastAPI)	Render	Free
Vector Store	ChromaDB on Render	Free
OpenAI API (~500 queries/mo)	OpenAI	~$0.50
Persistent disk (optional)	Render	$7.00
Total		~$0.50 – $7.50

For businesses evaluating this: The architecture is deliberately simple and cheap to start, but every layer has a clear upgrade path. ChromaDB → Pinecone for managed vector search. Render → AWS/GCP for autoscaling. GPT-4o-mini → GPT-4o or Claude for higher-quality answers. You start at $0.50/month and scale individual components as usage demands — no big-bang migration required.

OnboardAI
Document Intelligence
for Teams

Knowledge is trapped in documents

Ask your docs directly

RAG, not fine-tuning

Production-ready at ~$0.50/month

Use Cases 01

Employee Onboarding

Cross-Department Doc Search

Tech Stack 02

React + Vite + Tailwind

Python / FastAPI

OpenAI text-embedding-3-small

OpenAI gpt-4o-mini

ChromaDB (local)

Vercel + Render

How It Works 03

Document Ingestion

Query Flow

Architecture 04

Key Design Decisions 05

Deployment & Cost 06

OnboardAIDocument Intelligencefor Teams

Knowledge is trapped in documents

Ask your docs directly

RAG, not fine-tuning

Production-ready at ~$0.50/month

Use Cases 01

Employee Onboarding

Cross-Department Doc Search

Tech Stack 02

React + Vite + Tailwind

Python / FastAPI

OpenAI text-embedding-3-small

OpenAI gpt-4o-mini

ChromaDB (local)

Vercel + Render

How It Works 03

Document Ingestion

Query Flow

Architecture 04

Key Design Decisions 05

Deployment & Cost 06

OnboardAI
Document Intelligence
for Teams