Skip to content

feat(data-ingestor): Implement historical data ingestion module#4

Merged
impatient0 merged 12 commits intodevelopfrom
feat/data-ingestor
Oct 27, 2025
Merged

feat(data-ingestor): Implement historical data ingestion module#4
impatient0 merged 12 commits intodevelopfrom
feat/data-ingestor

Conversation

@impatient0
Copy link
Owner

Summary

This pull request introduces the a0-data-ingestor, the first core functional module for the A-Zero trading system. This is a robust, standalone command-line application designed to fetch historical K-line (candle) data from the public Binance API and save it to a standardized CSV format. The data produced by this tool is the essential prerequisite for all future backtesting and strategy analysis.

Key Features

  • Command-Line Interface: A standalone Java application built with picocli that accepts --symbol, --timeframe, and --start-date arguments for precise data fetching.
  • Binance API Integration: Uses the official binance-connector-java library to fetch K-line data. Implements full pagination to retrieve all data since the start date and includes a delay between requests to respect API rate limits.
  • Atomic & Standardized Output: Writes data to a temporary file (.tmp) and only renames it to the final <symbol>-<timeframe>.csv upon successful completion. This atomic write pattern prevents corrupt or incomplete files in case of an error.
  • Comprehensive Unit Testing: A full suite of JUnit 5 and Mockito tests has been developed to verify the module's logic without relying on live API calls. Test cases cover successful pagination, API failures, and file system error handling and cleanup.
  • Clean Build & Documentation: Includes a refined Maven Shade Plugin configuration for a clean uber-jar build and a complete README.md file to guide future developers.

This module establishes the foundational data layer for the entire system. By providing a reliable tool for acquiring historical data, this PR directly enables the development of the next core component, the backtester, and moves the project closer to its goal of systematic, data-driven strategy evaluation.

- Set up basic root POM
- Configure .gitignore
- Create a0-data-ingestor module
- Set up dependencies & build
- Implement the DataIngestor class
- Add Lombok dependency for more concise logging
- Create a constructor that is accessible from the outside for mock injection
- Implement a basic unit test verifying a success scenario
- Define CSV header through CSVFormat instead of the deprecated .withHeader() method
- Implement "write and rename" pattern to avoid race condition with file creation and API call
- Implement tests covering unsuccessful scenarios: API errors, file write errors, empty API response
- Create test verifying that temporary file is cleaned up on API failure
- Add readable names to tests via `@DisplayName`
…sources

- Exclude redundant resources via filters
- Process specialized resources like manifests and licenses with dedicated transformers
- Now ignoring dependency-reduced-pom.xml generated by Shade plugin
- Add a README providing detailed module overview and usage instructions
- Reflect the data-ingestor module status change in ARCHITECTURE.md
- Update project roadmap and "Getting Started" in README.md
@impatient0 impatient0 merged commit 13db046 into develop Oct 27, 2025
1 check passed
impatient0 added a commit that referenced this pull request Oct 27, 2025
* build: bootstrap the project with initial config

- Set up basic root POM
- Configure .gitignore

* feat(data-ingestor): set up data ingestor module

- Create a0-data-ingestor module
- Set up dependencies & build

* feat(data-ingestor): create initial CLI tool implementation

- Implement the DataIngestor class
- Add Lombok dependency for more concise logging

* test(data-ingestor): add basic unit tests for data ingestor

- Create a constructor that is accessible from the outside for mock injection
- Implement a basic unit test verifying a success scenario

* fix(data-ingestor): replace deprecated methods

- Define CSV header through CSVFormat instead of the deprecated .withHeader() method

* fix(data-ingestor): fix file cleanup on API failures

- Implement "write and rename" pattern to avoid race condition with file creation and API call

* test(data-ingestor): add tests for failure scenarios

- Implement tests covering unsuccessful scenarios: API errors, file write errors, empty API response

* test(data-ingestor): add test verifying temp file cleanup

- Create test verifying that temporary file is cleaned up on API failure
- Add readable names to tests via `@DisplayName`

* build(data-ingestor): configure transformers to handle overlapping resources

- Exclude redundant resources via filters
- Process specialized resources like manifests and licenses with dedicated transformers

* chore: update .gitignore

- Now ignoring dependency-reduced-pom.xml generated by Shade plugin

* docs(data-ingestor): create README.md

- Add a README providing detailed module overview and usage instructions

* docs: update README.md and ARCHITECTURE.md

- Reflect the data-ingestor module status change in ARCHITECTURE.md
- Update project roadmap and "Getting Started" in README.md

---------

Co-authored-by: Pepe Ronin <ivanpetrovskiy98@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants