Releases: Extralit/extralit
v0.6.1: Integrated PDF processing workflow with extralit-hf-space and incremental Dataset building with imports
This release delivers major upgrades to document processing, import workflows, and exposed additional dataset-building functionalities in the UI. Highlights include OCRmyPDF-powered PDF processing via Redis jobs, a workspace selector at breadcrumb, and incremental import with dataset mapping.
What's Changed
- [FEAT] integrate OCRmyPDF and on document upload in Redis Queue jobs by @priyankeshh and @JonnyTran in #115
- [FIX] Import Files Flow by @JonnyTran in #120
- [FEAT] Workspace Pinia Store and Dataset Breadcrumb Selector in AppHeader @JonnyTran in #121
- [FIX] Import File Parsing and Matching Flow and Refactoring by @JonnyTran in #122
- [FIX] DocumentAPI to query by params and return multiple documents & fix PDF file fetching by @JonnyTran in #123
- [FEAT] minio presigned url for pdf by @JonnyTran in #124
- [FIX] Import Analysis and Batch Refactoring, File Matching algorithm, Document Panel by @JonnyTran in #130
- [FIX] Consolidating linting configuration by @JonnyTran in #133
- [FEAT] Document workflows with rq jobs by @JonnyTran in #136
- [FEAT] Import dataset mapping by @JonnyTran in #140
Contributors
Many thanks @priyankeshh for work on the https://github.com/Extralit/extralit-hf-space repo for PyMuPDF integration.
Welcome @Mr-Youssef-Sherif!
Full Changelog: v0.6.0...v0.6.1
v0.6.0: PDF Importer feature with BibTeX support and namespace refactoring
What's Changed
- [FEATURE] Papers Library BibTeX Importer by @JonnyTran in #107
- Update Frontend Dependencies: Migrate deprecated Babel plugins and refresh Vue 2 tooling by @Copilot in #113
- feat(import-history): sidebar integration by @JonnyTran in #116
- [FEAT] Rename question in Dataset Configuration by @JonnyTran in #117
- Complete Python namespace refactor: argilla → extralit with directory restructure by @Copilot in #118
- [RELEASE] v0.6.0 by @JonnyTran in #119
Full Changelog: v0.5.0...v0.6.0
v0.5.0: Latest Argilla Upgrade and Repo Restructuring
This release focuses on synchronizing with the latest changes from the upstream Argilla project, improved CI pipelines, restructuring from Argilla to Extralit, and introduces support for legacy migrations.
What's New
-
Upstream Argilla Sync (v2.6.0-v2.8.0):
We've merged the latest changes from Argilla, bringing in new features and bug fixes. Key highlights include:- Similarity Search with Scores: The API now returns similarity scores when performing similarity searches, providing more context for your results.
- Predefined IDs for Users & Workspaces: You can now create users and workspaces with predefined IDs, simplifying integration and migration workflows.
-
Legacy Migration Support:
- To support users migrating from older versions, the
extralit_v1package has been added underargilla-v1/src/extralit_v1.
- To support users migrating from older versions, the
-
Project Refinements:
- We have completed the project-wide refactoring from
argillatoextralit, ensuring consistency across module paths and configurations like the cache directory (~/.extralit/). - The
upload_filefunction and document listing process have been streamlined for a better user experience.
- We have completed the project-wide refactoring from
Upgrade Notes
- To upgrade to the latest version, run:
pip install --upgrade extralit
Contributors
A big thank you to our community for the continuous support, contributions, and feedback that made this release possible!
Full Changelog
v0.4.0: Argilla v2 API and CLI Rebuild
New Features
- Major CLI Overhaul PR #57:
TheextralitCLI has been rebuilt and now includes comprehensive commands for workspace, file, document, and schema management. This makes it easier than ever to interact with Extralit from the command line, automate workflows, and integrate with other tools. - Workspace API Improvements:
The Workspace API now supports more robust operations, improved error handling, and better logging for easier debugging and development. - Enhanced CLI error messages and user feedback.
- Improved file and schema management commands.
- Refactored codebase for better maintainability and developer experience.
- Updated developer documentation and issue templates.
Upgrade Notes
- Python 3.9+ required.
- Upgrade with:
pip install --upgrade extralit
- To get started with the CLI:
extralit --help
Contributors
Special thanks to the contributors of PR #57 - you've made a major milestone:
And thanks to everyone else (@ArthrowAbstract, @SanjayUG, @Nakshatra05) who contributed to this release through code, reviews, and feedback!
Full Changelog
v0.3.0: TableField and TableQuestion types
This release focuses on introducing table support for fields and questions in feedback datasets, along with infrastructure improvements.
New Features
- Schema & Fields:
- Added support for
TableFieldandes_field_for_record_fieldfor table fields - Added
TableQuestionandTableQuestionSettingto support table questions
- Added support for
Infrastructure Improvements
- DevOps:
- Added redis service to the Tilt k8s deployment for argilla-server
- Improved argilla-server and extralit-server dockerfile multi-stage build
- Changed envvars in Tilt k8s deployment at
argilla-server-deployment.yaml
Bug Fixes
- Fixed elasticsearch reindexing errors with dynamic schema
- Fixed certain extralit-specific changes when loading Dataset
Full Changelog
v0.2.1: devcontainers and unit test for files and documents
This release focuses on enhancing the continuous integration, testing, and DevOps setup, ensuring a more robust and efficient development workflow.
New Features
-
Development Environment:
- Added singleton schema support in SchemaStructure.
- Added docs site for the Extralit project at
argilla/docs/. - Added pytest-xdist for parallel testing.
- Pytest and Python environment setup in the "PostgreSQL & Elasticsearch for Docker-Compose" GitHub Codespaces devcontainer.
- Added .devcontainer for "Docker, Tilt, and K8s" local development on GitHub Codespaces.
-
Testing:
- Added tests for:
- Response: update duration.
- Files: get, put, list, delete.
- Models: get, post, put, delete.
- Records: include response_suggestions.
- Added tests for:
Changes
-
Dependencies:
- Updated Elasticsearch to 8.15.0.
-
Database:
- Reverted Suggestion table's unique constraint to only "record_id" and "question_id", fixing the test suites.
-
API:
- Disabled adding
LIST_DATASET_RECORDS_DEFAULT_SORT_BYwhen there's no sort-by on GET records. - Changed the
/api/v1/documentsPOST endpoint to useUploadFile.
- Disabled adding
-
DevOps:
- Changed K8s Elasticsearch deployment from Helm to
docker.elastic.co/elasticsearch/elasticsearchto fix PVC restarting issues. - Refactored Extralit Dockerfile and Docker Hub images to
extralit/argilla-serverandextralit/argilla-quickstart. - Changed
developbranch changes in argilla/docs tohttps:/docs.extralit.ai/latestinstead ofdev. - Changed
examples/deployments/k8s/extralit-configs.yamlfor configuring the Extralit service and secrets in a K8s cluster.
- Changed K8s Elasticsearch deployment from Helm to
Bug Fixes
- Fixed Tiltfile and k8s manifests for mono-repo setup.
- Fixed creating a new Weaviate collection with Weaviate client v4.
- Fixed an error with checking Weaviate collection existence when one doesn't exist.
- Fixed an issue with reindexing Elasticsearch by handling exceptions on failed datasets.
- Added Workspace relationship Document to enable cascade delete.
Security
- Allow admin role for workspace creation.
Full Changelog: v0.2.0...v0.2.1
v0.2.0: Extralit CLI workspace management and Github Actions CI workflows
This release following Argilla v1.29.1 brings significant improvements to the Extralit CLI, workspace management, and various bug fixes and enhancements to ensure a smoother user experience.
New Features
-
Workspace Management:
- Added workspace schema and file management to the Extralit CLI.
- Refined workspace schema and file management in the Extralit CLI.
- Updated
rg.Workspacewithupdate_schemasandget_schemasmethods. - Enabled
_IDreference IDs in schemas. - Added
inserted_atandupdated_atfields toSuggestion.
-
CLI Enhancements:
- Introduced the Extralit CLI for improved command-line interactions.
-
User Interface:
- Updated status filter options in
StatusFilter.vueandRecordRepository.ts. - Added tooltip in
LabelSelection.
- Updated status filter options in
-
Translation and Localization:
- Updated translation for "Use Table" option.
- Added use_table option to
QuestionSetting.
Bug Fixes
- Fixed import statements in
SchemaStructureandWorkspace. - Ensured
.mjsfiles are properly transpiled withbabel-loader. - Fixed validation errors in
FeedbackRecordsuggestions to server payload. - Fixed
RecordRepository.tsto remove fetching "All data".
Continuous Integration and Deployment
- Updated GitHub Actions and updated Docker Hub image name deployments.
- Added GitHub Codespaces in
.devcontainer. - Updated package names and build configurations for Extralit.
- Set up mono repo to merge
extralit-server.
Documentation
- Updated README.md with new information.
Miscellaneous
- Updated pip dependencies for Python tests.
- Updated community links.
Full Changelog: v1.27.0a...v0.2.0