Conversation
| - ... | ||
| To enable easily search and files integration we believe these two objects should extend from records like many other modules do (collections, requests, etc). | ||
|
|
||
| The other two objects involved will be the *Resources* and the *Serializers*. |
There was a problem hiding this comment.
We might want to change this name to avoid confusion with resources and services. Any suggestions?
There was a problem hiding this comment.
what do you think about record type instead of resource? I think we used this name elsewhere too
| - Determine if a group of records get new DOIs (minted during the process) | ||
| - Update many records at once. | ||
| - Delete many records at once. | ||
| - See the status of past and current uploads. |
There was a problem hiding this comment.
how long would the retention period be to keep the previous imports?
There was a problem hiding this comment.
We are still discussing that internally. Our initial approach would be to keep the import tasks indefinitely and "just" delete the attached files of successfully created/updated records after 3 months (probably configurable).
We also considered an "archive" option to help clean up the interface.
|
|
||
| GUI-based bulk importing and editing of records and files is a widely desired, highly useful feature, which will help to make the platform appealing to a much broader base of institutional users. | ||
|
|
||
| The proposed feature is a beta version of a bulk importer for metadata (in CSV format) and associated files. |
There was a problem hiding this comment.
are any other formats also planned?
There was a problem hiding this comment.
Initially, we are working only with CSV, but we are designing the tool so anyone can add their preferred format.
| ## Unresolved questions | ||
|
|
||
| - Metadata file re-upload to correct errors | ||
| - How do we set a file for preview? |
There was a problem hiding this comment.
is there any limit on the import size or maximum number of files? It might be quite troublesome to handle big imports
There was a problem hiding this comment.
Since I wrote this, we have shown it to potential users, and this has come up on several occasions.
I think we will have the same "limitations" as the current deposit form. To mitigate this, we are pondering allowing users to enter known URIs into the files column. Imagine a shared location the service would have access to, like a bucket on AWS/GCP or directly a URL that can be fetched somehow.
There was a problem hiding this comment.
the problem might be both on the CSV upload and associated file upload. The CSV /marcxml itself will be problematic if it contains a lot of records, for 2 reasons: size of the csv itself, and the time the underlying task will take to process it - depending on how much memory celery worker has available
There was a problem hiding this comment.
You're definitely right. We did consider the problem with the celery workers' memory. Unfortunately, I don't have a solution other than making the process as memory-efficient as possible.
For example, we plan to start a task for each record (row) inside the input file and let that task do the transformation and validation so we will process one record at a time.
I think adding an "artificial" limit to the number of records at this point might not make a lot of sense. Once we have the process in place, and knowing where it can "break", we can load test it and set an informative limit.
| As an administrator user I want to ... | ||
| - Upload many records (with their files) at once into my instance. | ||
| - Select in which communities the records are publishes. | ||
| - Determine if a group of records get new DOIs (minted during the process) |
There was a problem hiding this comment.
is this use case already reflected in the mockup?
There was a problem hiding this comment.
Yes, on the third screen, there is a checkbox section. For now, it has two options: mint DOIs and publish.
Currently, InvenioRDM's only affordance for the bulk creation, import, and/or editing of records and files requires direct CLI-command-driven engagement with records and files APIs.
GUI-based bulk importing and editing of records and files is a widely desired, highly useful feature, which will help to make the platform appealing to a much broader base of institutional users.
The proposed feature is a beta version of a bulk importer for metadata (in CSV format) and associated files.