-
Notifications
You must be signed in to change notification settings - Fork 274
Open
Labels
Description
Self Checks
To make sure we get to you in time, please check the following :)
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- "Please do not modify this template :) and fill in all the required fields."
Versions
- dify-plugin-daemon 0.5.2
- dify-api 1.11.2
Describe the bug
Environment:
- Deployment: Kubernetes (K8s) cluster
- Network: Plugin service port forwarded via SLB (Server Load Balancer)
- Replicas: 1 pod (works) / multiple pods (issue occurs)
- Dify-plugin-daemon Version: Version after v0.5.2 (problematic); v0.4.2 (works fine)
- Code Change: No code modifications made between versions
Description
I have identified a potential bug related to plugin debugging in a Kubernetes cluster deployment with SLB port forwarding.
The issue manifests as follows:
- Single-pod scenario: The plugin service runs with only one pod, and there is no "plugin not found" error; all requests work as expected.
- Cluster scenario (multiple pods): When the plugin service is scaled to multiple pods in K8s (with SLB forwarding the plugin installation port), most requests return a "plugin not found" error. However, a small number of requests will occasionally succeed.
Suspected Root Cause
- My hypothesis is that during the plugin registration process, the plugin establishes a TCP connection with a specific pod in the K8s cluster. The connection information (or related registration metadata) is not synchronized across other pods in the cluster. Since SLB distributes requests across all pods, only the pod that holds the TCP connection can handle the plugin request successfully, while other pods fail to recognize the plugin and throw errors.
Key Observation
- This issue does NOT exist in version v0.4.2. When I roll back the plugin service to v0.4.2 (with exactly the same K8s/SLB configuration and no code changes), the "plugin not found" error disappears completely in both single-pod and multi-pod cluster modes.
- Expected Behavior
- Plugin registration and connection information should be synchronized across all pods in the K8s cluster, ensuring consistent plugin availability for all requests regardless of which pod the SLB routes the traffic to.
Steps to Reproduce
- Deploy dify-plugin-daemon (version > v0.4.2) to a K8s cluster with multiple replicas (≥2 pods).
- Configure SLB to forward the plugin installation port to the K8s service of the plugin daemon.
- Send consecutive requests to invoke/install the plugin via the SLB endpoint.
- Observe that most requests fail with "plugin not found", while a few requests succeed randomly (likely routed to the pod with the established TCP connection).
- Roll back the deployment to v0.4.2 (same K8s/SLB configuration, no code changes).
- Verify that all requests succeed consistently in both single-pod and multi-pod modes.
Additional Context
- The plugin service is exposed via a K8s Service (ClusterIP/NodePort) and fronted by SLB for external access.
- No network policies are blocking pod-to-pod communication within the cluster.
- SLB uses standard round-robin load balancing (no sticky session/configured affinity).