-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Lineapy is really impressive which covers the flow end2end. I notice there is to_pipeline feature which is very convenient to productionize the DF.
Ray is a distributed computation framework that can be used as a backend of python to do the scale-up (https://docs.ray.io/en/master/)
Recently, Ray is introducing Ray DAG, which is a ray style to define the computation graph: the data communicated between functions are the object refs instead of the actual python data. With this, we can either execute it in normal ray mode or in a workflow mode. The former one is just executing the pipeline in a distributed way and the latter one will do checkpointing for each step which offers durability and fault tolerance.
I'm thinking about the possibility to add it as a backend of lineapy. One benefit of this integration is that we'll be able to scale up the workloads easily by utilizing Ray's functionality.
I haven't dug deep into the implementation, but it seems to_pipeline is implemented as a plugin and the actual work is to convert a Graph defined internally to the code of the target backend. I feel the implementation should be straightforward.