Skip to content

ofbread/Income-Tax-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Income Tax Act RAG System

A Retrieval Augmented Generation (RAG) system for querying the Canadian Income Tax Act using natural language. This system enables users to ask questions about tax law and receive accurate, cited answers directly from the legislation.

Overview

This RAG system provides an interface to the Canadian Income Tax Act, allowing users to:

  • Ask natural language questions about tax law
  • Receive accurate answers with source citations
  • Navigate complex legal documents through semantic search
  • Access information from Subdivisions A & B of the Income Tax Act

Features

  • Intelligent Document Processing: Structure-aware chunking that preserves legal document hierarchy
  • Deployment Options:
    • Local: Powered by Ollama for complete privacy and offline operation
    • AWS Cloud: Serverless deployment with AWS Bedrock for scalable production use
  • Semantic Search: Vector embeddings using Ollama (local) or AWS Bedrock Titan (cloud)
  • LLM Integration: Local Ollama models or AWS Bedrock Claude for answer generation
  • Source Attribution: Every answer includes citations to specific sections
  • Section-Based Chunking: Preserves complete legal sections for context
  • Metadata: Tracks PART, DIVISION, SUBDIVISION, and marginal notes

Architecture

The system consists of three main pipelines:

1. Data Ingestion Pipeline

  • Web Scraping: Downloads HTML from Justice Laws website
  • Text Extraction: Cleans HTML and extracts structured markdown
  • Intelligent Chunking: Section-based chunking preserving legal structure

2. Vector Store Pipeline

  • Embedding Generation: Creates embeddings using Ollama
  • ChromaDB Storage: Persistent vector database for fast retrieval
  • Metadata Indexing: Enables filtering by section, part, division, etc.

3. Query Pipeline

  • Semantic Retrieval: Finds relevant sections using vector similarity
  • Context Assembly: Builds prompts with retrieved context
  • LLM Generation: Generates answers using Ollama LLM (local) or AWS Bedrock Claude (cloud)

AWS Deployment

AWS Architecture

  • Serverless Infrastructure: AWS Lambda functions for API and RAG processing
  • Vector Database: ChromaDB embedded in Lambda container with Bedrock Titan embeddings
  • LLM: AWS Bedrock Claude 3 Haiku for answer generation
  • Storage: DynamoDB for query tracking and results
  • Async Processing: Separate API and worker Lambda functions for scalable query processing
  • Authentication: IAM-based API security with signed requests

Demo Results

Here are example queries and responses from the system:

Example 1: Business Expense Deductions

Question: What types of expenses are generally deductible when earning business income?

Answer:

Based on the provided context from the Income Tax Act, the types of expenses that are generally deductible when earning business income include:

1. Expenses related to a work space that is either the individual's principal place of business (Section 18(12)(a)(i)) or used exclusively for the purpose of earning income from business and used on a regular and continuous basis for meeting clients, customers, or patients in respect of the business (Section 18(12)(a)(ii)). However, the deductible amount cannot exceed the individual's income from the business.

2. For personal services businesses, there is an additional tax added to the corporation's taxable income for each taxation year (Section 123.5).

3. In computing a taxpayer's income for a taxation year from an office or employment, certain deductions are permitted, but not all expenses related to a work space within a self-contained domestic establishment where the individual resides can be deducted unless the work space is either the place where the individual principally performs the duties of the office or employment (Section 8(13)(a)(i)) or used exclusively during the period in respect of which the amount relates for the purpose of earning income from the office or employment and used on a regular and continuous basis for meeting customers or other persons in the ordinary course of performing the duties of the office or employment (Section 8(13)(a)(ii)). Again, the deductible amount cannot exceed the individual's income from the office or employment.

Sources:

  • Section 18(12) (page-11.md)
  • Section 123.5 (page-100.md)
  • Section 8(2) (page-3.md)
  • Section 5 (page-1.md)
  • Section 8(13) (page-3.md)

Example 2: Rental Income Classification

Question: Are rental payments from tenants considered business income or passive income?

Answer:

Based on the provided context from the Income Tax Act, rental payments from tenants are considered income from a property as per Section 9(1) of the Act. However, it's important to note that this income does not include any capital gain from the disposition of the property, as mentioned in Section 9(3). For more specific information regarding deductions or other aspects related to rental income, additional sections and context may be needed.

Sources:

  • Section 6(7) (page-1.md)
  • Section 18(12) (page-11.md)
  • Section 6(19) (page-1.md)
  • Section 1 (page-93.md)
  • Section 9 (page-4.md)

Example 3: Employment Benefits

Question: What qualifies as "employment benefits" that must be included in income?

Answer:

Employment benefits that must be included in income, as per the provided context from the Income Tax Act, include:

1. Amounts deemed to be a benefit received in a taxation year by an individual due to a loan or debt, as stated in Section 6(9).
2. Portions of distributions made in a year that are described in certain subparagraphs or specified by the Minister as part of an agreement to waive or cancel a liability for tax under this Part, as mentioned in Section 207.061.
3. Amounts paid at any time in respect of a housing loss (other than an eligible housing loss) to or on behalf of a taxpayer or a person who does not deal at arm's length with the taxpayer in respect of, in the course of, or because of, an office or employment, as stated in Section 6(19).
4. Amounts paid or the value of assistance provided by any person in respect of, in the course of, or because of, an individual's office or employment in respect of the cost of, the financing of, the use of, or the right to use, a residence, as stated in Section 6(23).

Sources:

  • Section 6(9) (page-1.md)
  • Section 5 (page-1.md)
  • Section 207.061 (page-187.md)
  • Section 6(19) (page-1.md)
  • Section 6(23) (page-1.md)

Example 4: Scholarships and Research Grants

Question: Are scholarships or research grants always taxable?

Answer:

Based on the provided context from the Income Tax Act, scholarships or research grants are not explicitly mentioned as always being taxable. However, if a scholarship or research grant is received by a trust and it carries on one or more businesses or holds one or more properties that are not qualified investments for the trust, then tax may be payable under Part I of the Income Tax Act (Section 146.4(5)(b)).

It's also worth noting that registered charities and certain other organizations (like registered Canadian amateur athletic associations, registered journalism organizations) are required to file information returns with the Minister of National Revenue (Sections 149.1(14), 149.1(14.1)). However, this does not necessarily mean that scholarships or research grants received by these organizations are taxable.

In summary, while the provided context does not definitively answer whether scholarships or research grants are always taxable, it suggests that they may be taxable under certain circumstances if a trust holds them and carries on businesses or properties that are not qualified investments for the trust.

Sources:

  • Section 146.4(5) (page-144.md)
  • Section 212(9) (page-194.md)
  • Section 149.1(14) (page-152.md)
  • Section 146(10) (page-139.md)
  • Section 149.1(20) (page-152.md)

Prerequisites

For Local Development

  • Python 3.8+
  • Ollama installed and running
  • Required Ollama models:
    • nomic-embed-text (for embeddings)
    • mistral:latest or compatible LLM (for answer generation)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd ITA-rag
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Install and start Ollama

    # Install Ollama from https://ollama.ai/
    # Then pull required models:
    ollama pull nomic-embed-text
    ollama pull mistral:latest
    
    # Start Ollama server (if not already running)
    ollama serve
  5. Set up configuration Create and edit config/config.py if you need to customize any settings.

Usage

Quick Start

  1. Download the Income Tax Act pages

    python main.py --download
  2. Extract and clean text

    python main.py --extract
  3. Build the vector store

    python main.py --ingest
  4. Query the system

    python main.py --query "What is taxable income?"

Interactive Mode

Start an interactive session:

python main.py --interactive

Advanced Usage

Preview Chunking Strategy

Preview how documents are chunked:

python preview_chunker.py --file data_clean/page-1.md

Chunking Strategy

The system uses section-based chunking that:

  • Preserves complete legal sections (one chunk per section)
  • Maintains hierarchical context (PART, DIVISION, SUBDIVISION)
  • Includes marginal notes as metadata
  • Splits very large sections (>35,000 chars) at subsection boundaries
  • Preserves formulas, definitions, and nested lists

This approach ensures semantic completeness and maintains legal accuracy.

⚠️ Disclaimer

This system is provided for informational and educational purposes only. It should not be considered as legal or tax advice. Always consult with a qualified tax professional for specific tax situations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages