Traditional RAG systems rely on vectorization: documents are split into chunks, turned into embeddings, and stored in vector databases. That implies:
- Chunking destroys context, tables, and cross-references.
- Vectorization means your text is sent to an embedding API and stored (often in plaintext or as vectors) in third-party infrastructure. You lose control and privacy.
- Similarity search is not semantic truth - retrieved chunks can mislead the model.
PrivateRAG avoids vectors entirely. We use a hierarchical table of contents (PageIndex-style) and keep encryption in your hands.
Aligned with the in-app docs /docs:
-
Client-side PDF processing
Your PDF never leaves the device. Text is extracted in the browser with Pyodide (Python in WebAssembly) and pypdf. Only extracted text is used for the next step. -
NEAR AI Trusted Execution Environment (TEE)
To build a rich PageIndex (hierarchical TOC), only the extracted text is sent to NEAR AI (cloud-api.near.ai). Processing runs inside a Trusted Execution Environment - confidential computing so the operator cannot see your data. You use your own NEAR AI API key (e.g. stored in the browser); the backend does not proxy your PDF or key. -
Vectorless RAG
We follow a vectorless approach inspired by PageIndex (vectorless RAG cookbook): no embeddings, no vector DB. A TypeScript implementation of the PageIndex logic runs in the frontend and talks to NEAR AI’s TEE so structure extraction stays client-side. -
Encryption and storage
The resulting TOC is encrypted in the browser with AES-256-GCM using a key derived from your wallet. Only the encrypted blob is sent to the server and stored in the vaults table. The server cannot decrypt it. -
Decryption
The client fetches the vault byowner_walletanddoc_hash, re-derives the decryption key from your key-derivation signature, and decryptsencrypted_tocwith AES-256-GCM (IV and auth tag are in the blob). What the hash and signature do:doc_hashidentifies which document the vault belongs to.toc_signatureis the wallet signature of that hash; the client verifies (ECDSA recovery) that the signer equalsowner_walletbefore decrypting, so you know the vault was created by that wallet for that document and the blob was not swapped. -
Nova Integration (Encrypted IPFS) For document files, we use Nova (Encrypted IPFS). The PDF is encrypted locally with a unique key, and the encrypted blob is stored on IPFS. The hash of the file is anchored on the defillama.testnet contract (using the
record_transactionmethod) to prove existence and ownership on-chain. This ensures that your documents are stored in a decentralized, verifiable, and encrypted manner, accessible only by you.
The server stores only what is needed to persist and list your encrypted TOC in a table whose schema is as follows:
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Auto-increment primary key |
| owner_wallet | VARCHAR(255) | Wallet address that owns this vault (lowercase) |
| doc_hash | VARCHAR(64) | SHA-256 of the original PDF (unique per document) |
| title | VARCHAR(255) | Document title (e.g. filename) |
| num_pages | INTEGER | Page count (optional) |
| encrypted_toc | TEXT | AES-256-GCM encrypted TOC blob. Server cannot decrypt. |
| toc_signature | VARCHAR(200) | Wallet signature of doc_hash (ownership proof) |
| created_at | TIMESTAMP | Creation time |
| updated_at | TIMESTAMP | Last update time |
Unique constraint: (owner_wallet, doc_hash) - one vault per document per wallet.
The server never sees the raw PDF, the decrypted TOC, or your encryption key.
We use two wallet signatures with different roles.
- What: You sign a fixed message (e.g. a deterministic string) with your wallet.
- Used for: Deriving the encryption key (e.g. SHA-256 of the signature or of a key-derivation payload). Same wallet + same message ⇒ same key every time.
- Stored: No. The key is derived on demand in the browser and never sent or stored. It is used only to encrypt before upload and decrypt after fetch.
So: this signature is the secret material that gives you the only key that can decrypt your vault. Without it, the server cannot decrypt encrypted_toc.
- What: You sign the document hash (
doc_hash) - e.g."PrivateRAG-TOC-Ownership:{doc_hash}"- with your wallet. - Used for: Proof of ownership. Anyone can verify that the signer of this message is the wallet that claims to own the vault.
- Stored: Yes. Stored as
toc_signaturenext to the vault. It does not reveal the TOC contents; it only attests “this wallet created this vault for this doc_hash.”
So: the first signature is for confidentiality (key derivation); the second is for attestation (ownership and integrity of the binding to the document).
-
Encryption (client):
Key = f(wallet key-derivation signature). Encrypt TOC with AES-256-GCM; store IV and auth tag with the ciphertext in theencrypted_tocpayload. Send encrypted blob + metadata (includingtoc_signature) to the server. -
Decryption (client):
Fetch vault; verify ownership (recover signer fromtoc_signatureand check it equalsowner_wallet). Re-derive the key from the key-derivation signature; decryptencrypted_tocwith AES-256-GCM (IV and tag in blob). GCM tag verifies integrity. The hash and signature ensure you are decrypting the right vault and that it was created by the claimed wallet.
The server only persists and returns opaque blobs and metadata; it never has the key or the plaintext TOC.
Make sure PostgreSQL is running and your .env is configured, as specified in the backend's .env.example.
cd backend
pip install -r requirements.txtActivate the backend environment and run:
alembic upgrade headuvicorn main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
npm run dev- Frontend: https://private-rag.vercel.app/app
- Backend: https://privaterag.onrender.com/
MIT. See LICENSE.
Credits to Vectify AI for PageIndex.

