Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 194 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,194 @@ npm run build
npx playwright install chromium
```

## Quick Start
## Quick Start: Choose Your Abstraction Level

Sentience SDK offers **4 levels of abstraction** - choose based on your needs:

### 💬 Level 4: Conversational Agent (Highest Abstraction) - **NEW in v0.3.0**

Complete automation with natural conversation. Just describe what you want, and the agent plans and executes everything:

```typescript
import { SentienceBrowser, ConversationalAgent, OpenAIProvider } from 'sentience-ts';

const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o');
const agent = new ConversationalAgent({ llmProvider: llm, browser });

// Navigate to starting page
await browser.getPage().goto('https://amazon.com');

// ONE command does it all - automatic planning and execution!
const response = await agent.execute(
"Search for 'wireless mouse' and tell me the price of the top result"
);
console.log(response); // "I found the top result for wireless mouse on Amazon. It's priced at $24.99..."

// Follow-up questions maintain context
const followUp = await agent.chat("Add it to cart");
console.log(followUp);

await browser.close();
```

**When to use:** Complex multi-step tasks, conversational interfaces, maximum convenience
**Code reduction:** 99% less code - describe goals in natural language
**Requirements:** OpenAI or Anthropic API key

### 🤖 Level 3: Agent (Natural Language Commands) - **Recommended for Most Users**

Zero coding knowledge needed. Just write what you want in plain English:

```typescript
import { SentienceBrowser, SentienceAgent, OpenAIProvider } from 'sentience-ts';

const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o-mini');
const agent = new SentienceAgent(browser, llm);

await browser.getPage().goto('https://www.amazon.com');

// Just natural language commands - agent handles everything!
await agent.act('Click the search box');
await agent.act("Type 'wireless mouse' into the search field");
await agent.act('Press Enter key');
await agent.act('Click the first product result');

// Automatic token tracking
console.log(`Tokens used: ${agent.getTokenStats().totalTokens}`);
await browser.close();
```

**When to use:** Quick automation, non-technical users, rapid prototyping
**Code reduction:** 95-98% less code vs manual approach
**Requirements:** OpenAI API key (or Anthropic for Claude)

### 🔧 Level 2: Direct SDK (Technical Control)

Full control with semantic selectors. For technical users who want precision:

```typescript
import { SentienceBrowser, snapshot, find, click, typeText, press } from 'sentience-ts';

const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
await browser.getPage().goto('https://www.amazon.com');

// Get semantic snapshot
const snap = await snapshot(browser);

// Find elements using query DSL
const searchBox = find(snap, 'role=textbox text~"search"');
await click(browser, searchBox!.id);

// Type and submit
await typeText(browser, searchBox!.id, 'wireless mouse');
await press(browser, 'Enter');

await browser.close();
```

**When to use:** Need precise control, debugging, custom workflows
**Code reduction:** Still 80% less code vs raw Playwright
**Requirements:** Only Sentience API key

### ⚙️ Level 1: Raw Playwright (Maximum Control)

For when you need complete low-level control (rare):

```typescript
import { chromium } from 'playwright';

const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://www.amazon.com');
await page.fill('#twotabsearchtextbox', 'wireless mouse');
await page.press('#twotabsearchtextbox', 'Enter');
await browser.close();
```

**When to use:** Very specific edge cases, custom browser configs
**Tradeoffs:** No semantic intelligence, brittle selectors, more code

---

## Agent Layer Examples

### Google Search (6 lines of code)

```typescript
import { SentienceBrowser, SentienceAgent, OpenAIProvider } from 'sentience-ts';

const browser = await SentienceBrowser.create({ apiKey: apiKey });
const llm = new OpenAIProvider(openaiKey, 'gpt-4o-mini');
const agent = new SentienceAgent(browser, llm);

await browser.getPage().goto('https://www.google.com');
await agent.act('Click the search box');
await agent.act("Type 'mechanical keyboards' into the search field");
await agent.act('Press Enter key');
await agent.act('Click the first non-ad search result');

await browser.close();
```

**See full example:** [examples/agent-google-search.ts](examples/agent-google-search.ts)

### Using Anthropic Claude Instead of GPT

```typescript
import { SentienceAgent, AnthropicProvider } from 'sentience-ts';

// Swap OpenAI for Anthropic - same API!
const llm = new AnthropicProvider(
process.env.ANTHROPIC_API_KEY!,
'claude-3-5-sonnet-20241022'
);

const agent = new SentienceAgent(browser, llm);
await agent.act('Click the search button'); // Works exactly the same
```

**BYOB (Bring Your Own Brain):** OpenAI, Anthropic, or implement `LLMProvider` for any model.

**See full example:** [examples/agent-with-anthropic.ts](examples/agent-with-anthropic.ts)

### Amazon Shopping (98% code reduction)

**Before (manual approach):** 350 lines
**After (agent layer):** 6 lines

```typescript
await agent.act('Click the search box');
await agent.act("Type 'wireless mouse' into the search field");
await agent.act('Press Enter key');
await agent.act('Click the first visible product in the search results');
await agent.act("Click the 'Add to Cart' button");
```

**See full example:** [examples/agent-amazon-shopping.ts](examples/agent-amazon-shopping.ts)

---

## Installation for Agent Layer

```bash
# Install core SDK
npm install sentience-ts

# Install LLM provider (choose one or both)
npm install openai # For GPT-4, GPT-4o, GPT-4o-mini
npm install @anthropic-ai/sdk # For Claude 3.5 Sonnet

# Set API keys
export SENTIENCE_API_KEY="your-sentience-key"
export OPENAI_API_KEY="your-openai-key" # OR
export ANTHROPIC_API_KEY="your-anthropic-key"
```

---

## Direct SDK Quick Start

```typescript
import { SentienceBrowser, snapshot, find, click } from './src';
Expand Down Expand Up @@ -349,6 +536,12 @@ element.z_index // CSS stacking order

See the `examples/` directory for complete working examples:

### Agent Layer (Level 3 - Natural Language)
- **`agent-google-search.ts`** - Google search automation with natural language commands
- **`agent-amazon-shopping.ts`** - Amazon shopping bot (6 lines vs 350 lines manual code)
- **`agent-with-anthropic.ts`** - Using Anthropic Claude instead of OpenAI GPT

### Direct SDK (Level 2 - Technical Control)
- **`hello.ts`** - Extension bridge verification
- **`basic-agent.ts`** - Basic snapshot and element inspection
- **`query-demo.ts`** - Query engine demonstrations
Expand Down
96 changes: 96 additions & 0 deletions examples/agent-amazon-shopping.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
/**
* Example: Amazon Shopping using SentienceAgent
*
* Demonstrates complex multi-step automation with the agent layer.
* Reduces 300+ lines of manual code to ~20 lines of natural language commands.
*
* Run with:
* npx ts-node examples/agent-amazon-shopping.ts
*/

import { SentienceBrowser, SentienceAgent, OpenAIProvider } from '../src';

async function main() {
// Set up environment
const sentienceKey = process.env.SENTIENCE_API_KEY;
const openaiKey = process.env.OPENAI_API_KEY;

if (!openaiKey) {
console.error('❌ Error: OPENAI_API_KEY environment variable not set');
console.log('Set it with: export OPENAI_API_KEY="your-key-here"');
process.exit(1);
}

// Initialize browser and agent
const browser = await SentienceBrowser.create({
apiKey: sentienceKey,
headless: false
});

const llm = new OpenAIProvider(openaiKey, 'gpt-4o-mini');
const agent = new SentienceAgent(browser, llm, 50, true);

try {
console.log('🛒 Amazon Shopping Demo with SentienceAgent\n');

// Navigate to Amazon
await browser.getPage().goto('https://www.amazon.com');
await browser.getPage().waitForLoadState('networkidle');
await new Promise(resolve => setTimeout(resolve, 2000));

// Search for product
console.log('Step 1: Searching for wireless mouse...\n');
await agent.act('Click the search box');
await agent.act("Type 'wireless mouse' into the search field");
await agent.act('Press Enter key');

// Wait for search results
await new Promise(resolve => setTimeout(resolve, 4000));

// Select a product
console.log('Step 2: Selecting a product...\n');
await agent.act('Click the first visible product in the search results');

// Wait for product page to load
await new Promise(resolve => setTimeout(resolve, 5000));

// Add to cart
console.log('Step 3: Adding to cart...\n');
await agent.act("Click the 'Add to Cart' button");

// Wait for cart confirmation
await new Promise(resolve => setTimeout(resolve, 3000));

console.log('\n✅ Shopping automation completed!\n');

// Print execution summary
const stats = agent.getTokenStats();
const history = agent.getHistory();

console.log('📊 Execution Summary:');
console.log(` Actions executed: ${history.length}`);
console.log(` Total tokens: ${stats.totalTokens}`);
console.log(` Avg tokens per action: ${Math.round(stats.totalTokens / history.length)}`);

console.log('\n📜 Action History:');
history.forEach((entry, i) => {
const status = entry.success ? '✅' : '❌';
console.log(` ${i + 1}. ${status} ${entry.goal} (${entry.durationMs}ms)`);
});

console.log('\n💡 Code Comparison:');
console.log(' Old approach: ~350 lines (manual snapshots, prompts, filtering)');
console.log(' Agent approach: ~6 lines (natural language commands)');
console.log(' Reduction: 98%');

} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await browser.close();
}
}

// Run if executed directly
if (require.main === module) {
main().catch(console.error);
}
71 changes: 71 additions & 0 deletions examples/agent-google-search.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
/**
* Example: Google Search using SentienceAgent
*
* Demonstrates high-level agent abstraction with natural language commands.
* No manual snapshot filtering or prompt engineering required.
*
* Run with:
* npx ts-node examples/agent-google-search.ts
*/

import { SentienceBrowser, SentienceAgent, OpenAIProvider } from '../src';

async function main() {
// Initialize browser
const browser = await SentienceBrowser.create({
apiKey: process.env.SENTIENCE_API_KEY,
headless: false
});

// Initialize LLM provider (OpenAI GPT-4o-mini for cost efficiency)
const llm = new OpenAIProvider(
process.env.OPENAI_API_KEY!,
'gpt-4o-mini'
);

// Create agent
const agent = new SentienceAgent(browser, llm, 50, true);

try {
console.log('🔍 Google Search Demo with SentienceAgent\n');

// Navigate to Google
await browser.getPage().goto('https://www.google.com');
await browser.getPage().waitForLoadState('networkidle');

// Use agent to perform search - just natural language commands!
await agent.act('Click the search box');
await agent.act("Type 'best mechanical keyboards 2024' into the search field");
await agent.act('Press Enter key');

// Wait for results
await new Promise(resolve => setTimeout(resolve, 3000));

// Click first result
await agent.act('Click the first non-ad search result');

// Wait for page load
await new Promise(resolve => setTimeout(resolve, 2000));

console.log('\n✅ Search completed successfully!\n');

// Print token usage stats
const stats = agent.getTokenStats();
console.log('📊 Token Usage:');
console.log(` Total tokens: ${stats.totalTokens}`);
console.log(` Prompt tokens: ${stats.totalPromptTokens}`);
console.log(` Completion tokens: ${stats.totalCompletionTokens}`);
console.log('\n📜 Action Breakdown:');
stats.byAction.forEach((action, i) => {
console.log(` ${i + 1}. ${action.goal}: ${action.totalTokens} tokens`);
});

} finally {
await browser.close();
}
}

// Run if executed directly
if (require.main === module) {
main().catch(console.error);
}
Loading