This repository was archived by the owner on Jan 27, 2019. It is now read-only.
POC: running python functions asynchronously#191
Open
Villemoes wants to merge 6 commits intooe-lite:masterfrom
Open
POC: running python functions asynchronously#191Villemoes wants to merge 6 commits intooe-lite:masterfrom
Villemoes wants to merge 6 commits intooe-lite:masterfrom
Conversation
b8268bc to
02ae854
Compare
1bed7a4 to
d7993dc
Compare
For now this is just a copy of the generic implementations in OEliteFunction. When we implement async python functions in terms of fork(), we could just do the umask and chdir in the child, but since we only do some python functions asynchronously, we'd have to duplicate the try..finally stuff in the synchronous case, and that's not worth it for saving four system calls in the parent.
Well, sort of. For now we just implement retrieving
e.g. do_fetch[__async], but it has to be False.
I do get_flag with expand=CLEAN_EXPANSION to allow me to do something like
do_fetch[__async] = "${__ASYNC_FETCH}"
with the latter set in local.conf, or not set at all. The bool(int(
... or 0)) dance is so that all of unset, "", "0", "1", False, True
etc. do the expected thing. The double underscores are to ensure that
these variables and flags to not affect metadata hashes.
The list of PythonFunction tasks includes at least: do_fetch do_unpack do_stage do_split do_package These are all rather I/O bound, and do_fetch in particular is prone to stall the entire build if it is trying to fetch from an unresponsive or just slow server - worst case, with our default timeout settings, we can end up waiting 10 minutes, during which we may completely fail to start other tasks. This implements support for making a particular python task function run asynchronously; simply add e.g. do_fetch[__async] = True above the do_fetch definition in fetch.oeclass. In order for a task function implemented as a PythonFunction to safely run asynchronously, it must not rely on mutating state in the OE-lite process. Checking that is a rather tedious and error-prone job, so this is mostly an experimental feature for now.
For now, this allows controlling/experimenting with which of the tasks fetch, unpack, stage, split and package that are run asynchronously by setting variables __ASYNC_FOO in local.conf.
d7993dc to
5eb766b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is mostly a proof-of-concept people can play with. The reason I'm not proposing this for immediate inclusion is that I'm not entirely convinced that we don't have some python function relying on being able to modify the task's metadata, with those changes then being visible to some postfunc or some entirely different task. So I wrote this in a way that doing each of the five tasks asynchronously is entirely opt-in and controlled by setting e.g. __ASYNC_SPLIT = True in local.conf.
I've tried an "oe bake world -t fetch" with empty ingredients. Without this, it takes 65 min, while with __ASYNC_FETCH = True it finished in 15 min (it takes that long partly because some ingredients currently fail to fetch, but the wget PR should hopefully fix that). But of course usually one has most ingredients, and even if not, the fetch time will be partly hidden behind the tasks that already run asynchronously.