This document describes all the stateful business tools available in the benchmark.
The tools maintain a shared state that includes:
- Accounts: checking, savings, business_credit (with balances)
- Transactions: All income, expense, and transfer records
- Invoices: Client invoices with status tracking
Initial state:
- Checking account: $10,000
- Savings account: $50,000
- Business credit: $0
Create a new financial transaction (income, expense, or transfer).
Parameters:
transaction_type(str): "income", "expense", or "transfer"category(str): Category like "salary", "rent", "utilities", "consulting", etc.amount(float): Transaction amount (positive number)description(str): Description of the transactionaccount(str, optional): "checking", "savings", or "business_credit" (default: "checking")date(str, optional): ISO format date (default: today)tags(list, optional): List of tags
Returns: Transaction details and new account balance
Example:
result = tools.create_transaction(
transaction_type="income",
category="consulting",
amount=5000.0,
description="Website development project"
)Create a new invoice for a client.
Parameters:
client_name(str): Name of the clientitems(list): List of dicts with 'description', 'quantity', and 'price'due_days(int, optional): Days until payment due (default: 30)issue_date(str, optional): ISO format date (default: today)
Returns: Invoice details including calculated total
Example:
result = tools.create_invoice(
client_name="Acme Corp",
items=[
{"description": "Consulting", "quantity": 50, "price": 200},
{"description": "Design", "quantity": 20, "price": 150}
]
)Update the status of an invoice.
Parameters:
invoice_id(str): Invoice ID (e.g., "INV00001")new_status(str): "draft", "sent", "paid", "overdue", or "cancelled"
Returns: Update result
Note: Marking as "paid" automatically creates an income transaction.
Record a partial payment for an invoice.
Parameters:
invoice_id(str): Invoice IDamount(float): Payment amount
Returns: Payment result and remaining balance
Note: Automatically creates an income transaction and updates status to "paid" if fully paid.
Query transactions with optional filters.
Parameters:
account(str, optional): Filter by accounttransaction_type(str, optional): Filter by typecategory(str, optional): Filter by categorystart_date(str, optional): ISO format start dateend_date(str, optional): ISO format end datetags(list, optional): Filter by tags
Returns: List of matching transactions
Query invoices with optional filters.
Parameters:
status(str, optional): Filter by statusclient_name(str, optional): Filter by client name (partial match)
Returns: List of matching invoices
Get a financial summary with income/expense breakdown.
Parameters:
start_date(str, optional): ISO format start dateend_date(str, optional): ISO format end date
Returns: Complete financial summary including:
- Total income and breakdown by category
- Total expenses and breakdown by category
- Net income
- Account balances
Transfer money between accounts.
Parameters:
from_account(str): Source accountto_account(str): Destination accountamount(float): Transfer amountdescription(str, optional): Transfer description
Returns: Transfer result and new balances
Note: Creates two transactions (one debit, one credit).
Get the current balance of an account.
Parameters:
account(str): Account name
Returns: Account balance and type
Get a complete summary of the current accounting state.
Parameters: None
Returns: Complete state including:
- All account balances
- Total transaction count
- Total income and expenses
- Net income
- Invoice counts by status
- Outstanding receivables
Reset all accounting state to initial values (for testing).
Parameters: None
Returns: Confirmation message
import json
# Record rent
rent = tools.create_transaction(
transaction_type="expense",
category="rent",
amount=2500,
description="Office rent - January"
)
# Record utilities
utilities = tools.create_transaction(
transaction_type="expense",
category="utilities",
amount=200,
description="Electric and water"
)
# Get summary
summary_json = tools.get_financial_summary()
summary = json.loads(summary_json)
result = {
"total_expenses": summary["summary"]["total_expenses"],
"checking_balance": summary["summary"]["accounts"]["checking"]
}import json
# Create invoice
invoice_json = tools.create_invoice(
client_name="TechStart Inc",
items=[
{"description": "Development", "quantity": 80, "price": 150}
]
)
invoice = json.loads(invoice_json)
invoice_id = invoice["invoice"]["id"]
# Send invoice
tools.update_invoice_status(invoice_id, "sent")
# Record payment
payment = tools.record_partial_payment(invoice_id, 12000)
# Check state
state = tools.get_state_summary()
result = json.loads(state)import json
# Transfer to savings
transfer = tools.transfer_between_accounts(
from_account="checking",
to_account="savings",
amount=5000,
description="Monthly savings"
)
# Check balances
checking = tools.get_account_balance("checking")
savings = tools.get_account_balance("savings")
result = {
"checking": json.loads(checking)["balance"],
"savings": json.loads(savings)["balance"]
}Each test scenario validates the final state by checking:
- Transaction counts and types
- Invoice counts and statuses
- Account balances
- Total income/expenses
- Outstanding receivables
This ensures both agents not only produce reasonable-looking output but actually perform the correct operations.