Skip to content

Conversation

@arrowd
Copy link
Contributor

@arrowd arrowd commented Jul 29, 2025

As it turned out in #111 running appstream-generator on an already built repository is too time consuming - it requires unpacking each package to scan its contents, which might be very slow for some packages.

This change allows for a different workflow. During the package building, the appstream-generator process-file is called for each package, right before the archive creating step. This command actually gets feed not a file, but a directory that represent the to-be-created package contents. This allows asgen to process the data blazing fast.

At the end of the package building, I run appstream-generator publish to actually create the AppStream metadata files.

@ximion
Copy link
Owner

ximion commented Jul 29, 2025

This will cause issues, as asgen will not be able to clean up data properly - it will likely just delete absolutely everything that was handled by process-file, as that is more a debug tool for quick testing rather than something to be used in production. Using it for all packages will likely also break icon search.

Why is processing the entire archive slow? The contents of packages have to be scanned regardless, and once the archive has been scanned initially, subsequent scans will only happen for new packages, as both the contents of existing packages as well as their metadata are cached.

@arrowd
Copy link
Contributor Author

arrowd commented Jul 29, 2025

This will cause issues, as asgen will not be able to clean up data properly - it will likely just delete absolutely everything that was handled by process-file

Yes, I haven't looked into cleaning yet. I'll study the code that decides what should be cleaned and what should stay.

Using it for all packages will likely also break icon search.

Can you please elaborate on that?

Why is processing the entire archive slow?

It is talked about in #111

The contents of packages have to be scanned regardless, and once the archive has been scanned initially, subsequent scans will only happen for new packages, as both the contents of existing packages as well as their metadata are cached.

This is still unsatisfactory on the large scale. FreeBSD Ports collection contains almost 35.000 ports ATM, which roughly maps to 35k packages. The initial scan is taking very long time for me and subsequent runs would be totally unpredictable. A little version bump of some dependency might result in rebuilding some heavy-weight package (like Stellarium from the referenced issue). Finally, it just feels a waste to operate on archives while we have possibility to operate on the plain filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants