Skip to content

[integration] Support ray DAG as a backend for lineapy pipeline #665

@fishbone

Description

@fishbone

Lineapy is really impressive which covers the flow end2end. I notice there is to_pipeline feature which is very convenient to productionize the DF.

Ray is a distributed computation framework that can be used as a backend of python to do the scale-up (https://docs.ray.io/en/master/)

Recently, Ray is introducing Ray DAG, which is a ray style to define the computation graph: the data communicated between functions are the object refs instead of the actual python data. With this, we can either execute it in normal ray mode or in a workflow mode. The former one is just executing the pipeline in a distributed way and the latter one will do checkpointing for each step which offers durability and fault tolerance.

I'm thinking about the possibility to add it as a backend of lineapy. One benefit of this integration is that we'll be able to scale up the workloads easily by utilizing Ray's functionality.

I haven't dug deep into the implementation, but it seems to_pipeline is implemented as a plugin and the actual work is to convert a Graph defined internally to the code of the target backend. I feel the implementation should be straightforward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions