Bulk importer by egabancho · Pull Request #94 · inveniosoftware/rfcs

egabancho · 2025-02-25T09:45:01Z

Currently, InvenioRDM's only affordance for the bulk creation, import, and/or editing of records and files requires direct CLI-command-driven engagement with records and files APIs.

GUI-based bulk importing and editing of records and files is a widely desired, highly useful feature, which will help to make the platform appealing to a much broader base of institutional users.

The proposed feature is a beta version of a bulk importer for metadata (in CSV format) and associated files.

egabancho · 2025-02-25T09:57:50Z

rfcs/rdm-0080-bulk-importer.md

+- ...
+To enable easily search and files integration we believe these two objects should extend from records like many other modules do (collections, requests, etc). 
+
+The other two objects involved will be the *Resources* and the *Serializers*.


We might want to change this name to avoid confusion with resources and services. Any suggestions? ☺️

what do you think about record type instead of resource? I think we used this name elsewhere too

kpsherva · 2025-02-28T15:26:14Z

rfcs/rdm-0080-bulk-importer.md

+- Determine if a group of records get new DOIs (minted during the process)
+- Update many records at once.
+- Delete many records at once.
+- See the status of past and current uploads.


how long would the retention period be to keep the previous imports?

We are still discussing that internally. Our initial approach would be to keep the import tasks indefinitely and "just" delete the attached files of successfully created/updated records after 3 months (probably configurable).

We also considered an "archive" option to help clean up the interface.

kpsherva · 2025-02-28T15:26:49Z

rfcs/rdm-0080-bulk-importer.md

+
+GUI-based bulk importing and editing of records and files is a widely desired, highly useful feature, which will help to make the platform appealing to a much broader base of institutional users.
+
+The proposed feature is a beta version of a bulk importer for metadata (in CSV format) and associated files.


are any other formats also planned?

Initially, we are working only with CSV, but we are designing the tool so anyone can add their preferred format.

kpsherva · 2025-02-28T15:37:20Z

rfcs/rdm-0080-bulk-importer.md

+## Unresolved questions
+
+- Metadata file re-upload to correct errors
+- How do we set a file for preview?


is there any limit on the import size or maximum number of files? It might be quite troublesome to handle big imports

Since I wrote this, we have shown it to potential users, and this has come up on several occasions.
I think we will have the same "limitations" as the current deposit form. To mitigate this, we are pondering allowing users to enter known URIs into the files column. Imagine a shared location the service would have access to, like a bucket on AWS/GCP or directly a URL that can be fetched somehow.

the problem might be both on the CSV upload and associated file upload. The CSV /marcxml itself will be problematic if it contains a lot of records, for 2 reasons: size of the csv itself, and the time the underlying task will take to process it - depending on how much memory celery worker has available

You're definitely right. We did consider the problem with the celery workers' memory. Unfortunately, I don't have a solution other than making the process as memory-efficient as possible.
For example, we plan to start a task for each record (row) inside the input file and let that task do the transformation and validation so we will process one record at a time.
I think adding an "artificial" limit to the number of records at this point might not make a lot of sense. Once we have the process in place, and knowing where it can "break", we can load test it and set an informative limit.

kpsherva · 2025-03-05T08:35:23Z

rfcs/rdm-0080-bulk-importer.md

+As an administrator user I want to ...
+- Upload many records (with their files) at once into my instance.
+- Select in which communities the records are publishes.
+- Determine if a group of records get new DOIs (minted during the process)


is this use case already reflected in the mockup?

Yes, on the third screen, there is a checkbox section. For now, it has two options: mint DOIs and publish.

bulk importer

710e3dc

egabancho commented Feb 25, 2025

View reviewed changes

kpsherva reviewed Feb 28, 2025

View reviewed changes

kpsherva reviewed Mar 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk importer#94

Bulk importer#94
egabancho wants to merge 1 commit intoinveniosoftware:masterfrom
egabancho:bulk-importer

egabancho commented Feb 25, 2025

Uh oh!

egabancho Feb 25, 2025

Uh oh!

kpsherva Feb 28, 2025

Uh oh!

kpsherva Feb 28, 2025 •

edited

Loading

Uh oh!

egabancho Mar 3, 2025

Uh oh!

kpsherva Feb 28, 2025

Uh oh!

egabancho Mar 3, 2025

Uh oh!

kpsherva Feb 28, 2025

Uh oh!

egabancho Mar 3, 2025

Uh oh!

kpsherva Mar 5, 2025

Uh oh!

egabancho Mar 5, 2025

Uh oh!

kpsherva Mar 5, 2025

Uh oh!

egabancho Mar 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

egabancho commented Feb 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpsherva Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kpsherva Feb 28, 2025 •

edited

Loading