Skip to content

Feature/462 computer use browser use rpa nodes workflow recording generated flow#519

Open
felix-schultz wants to merge 5 commits intodevfrom
feature/462-computer-use-browser-use-rpa-nodes-workflow-recording-generated-flow
Open

Feature/462 computer use browser use rpa nodes workflow recording generated flow#519
felix-schultz wants to merge 5 commits intodevfrom
feature/462-computer-use-browser-use-rpa-nodes-workflow-recording-generated-flow

Conversation

@felix-schultz
Copy link
Member

This pull request introduces foundational support for RPA (Robotic Process Automation) and computer automation workflows in the desktop app. It adds new UI components for requesting and checking system permissions (Accessibility and Screen Recording), ensures these permissions are checked before running relevant workflows, and updates project dependencies and platform configuration to support automation features across macOS, Windows, and Linux.

Key changes include:

RPA/Automation Permission Handling:

  • Added RpaPermissionDialog and useRpaPermissions React components for checking and requesting Accessibility and Screen Recording permissions from users, and integrated these into the workflow UI (apps/desktop/components/rpa/rpa-permission-dialog.tsx, apps/desktop/components/rpa/index.ts, apps/desktop/app/flow/page.tsx). [1] [2] [3] [4]
  • Updated workflow execution logic to check for required permissions before running any board that needs local execution (i.e., computer automation), throwing a descriptive error if permissions are missing (apps/desktop/components/tauri-provider/board-state.ts). [1] [2]

Platform and Dependency Updates for Automation:

  • Added new dependencies for automation (e.g., rdev, enigo, arboard, xcap, core-graphics, windows, atspi, zbus) to the Tauri backend, scoped by platform as needed (apps/desktop/src-tauri/Cargo.toml). [1] [2]
  • Declared new macOS permissions in Info.plist for Accessibility and Screen Recording, with clear user-facing descriptions (apps/desktop/src-tauri/Info.plist).

Project Structure and Capability Updates:

  • Added the new catalog/automation package to the workspace and dependencies (Cargo.toml). [1] [2]
  • Updated desktop app capabilities to allow additional window management actions (minimize, unminimize, set focus) and regenerated the corresponding schema (apps/desktop/src-tauri/capabilities/desktop.json, apps/desktop/src-tauri/gen/schemas/capabilities.json). [1] [2]
  • Added new Rust modules for permissions and recording functionality in the Tauri backend (apps/desktop/src-tauri/src/functions.rs).

These changes lay the groundwork for robust, cross-platform automation within the app, with secure and user-transparent permission handling.

- Re-export automation modules in the catalog package for easier access.
- Update CatalogPackage enum to include Automation.
- Add support for extra dock items and overlay rendering in FlowBoard and FlowWrapper components.
- Implement context menu support for dock items.
- Improve node handling in parseBoard function to prevent duplicate entries.
- Introduce Gource visualization scripts for development insights, including fetching GitHub avatars and generating videos in multiple resolutions.
- Add captions for release events in Gource visualizations.
…lidating conditionals and enhancing default implementations
- Updated screenshot storage path to use the new structure under `apps/{app_id}/upload/rpa/{board_id}/screenshots/{artifact_id}.png`.
- Adjusted coordinate scaling for HiDPI displays to ensure accurate cropping.
- Replaced `println!` with `tracing::debug!` for better logging consistency.

Enhance recording state logging

- Changed logging of keystroke buffer flushing from `println!` to `tracing::debug!`.

Improve tray icon handling during recording

- Added a boolean field to `TrayRuntimeState` to track recording state.
- Implemented functions to set and restore the tray icon based on recording status.
- Introduced a new stop icon for the tray when recording is active.

Fix timeout handling in automation nodes

- Ensured that timeout values are non-negative by using `max(0)` in various automation nodes to prevent potential issues with negative durations.

Refactor mouse movement logic

- Simplified mouse movement code by removing unnecessary complexity and ensuring natural movement is handled correctly.
- Added template matching capabilities for mouse click nodes to improve accuracy.

Update error handling in RPA nodes

- Enhanced documentation for `TryCatchNode` and `RetryLoopNode` to clarify limitations regarding error handling.

General code cleanup and improvements

- Various minor refactors and improvements across multiple files for better readability and maintainability.
@gemini-code-assist
Copy link

Summary of Changes

Hello @felix-schultz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the application's capabilities by integrating robust Robotic Process Automation (RPA) features. It establishes the foundational framework for computer and browser automation, ensuring secure operation through explicit permission handling and providing a user-friendly interface for recording and generating automation workflows. The introduction of a dedicated automation catalog with diverse nodes empowers users to create sophisticated, cross-platform automation solutions.

Highlights

  • RPA/Automation Permission Handling: Introduced UI components (RpaPermissionDialog, useRpaPermissions) for checking and requesting Accessibility and Screen Recording permissions, integrated into the workflow UI. Workflow execution logic was updated to verify these permissions before running computer automation boards, providing descriptive errors if permissions are missing.
  • Platform and Dependency Updates for Automation: Added numerous new Rust dependencies (rdev, enigo, arboard, xcap, core-graphics, windows, atspi, zbus) to the Tauri backend, configured for cross-platform support (macOS, Windows, Linux). macOS Info.plist was updated with necessary Accessibility and Screen Recording usage descriptions.
  • RPA Workflow Recording: Implemented a new RecordingDock UI component that allows users to record desktop actions (clicks, key presses, scrolls, window focus changes, copy/paste) and generate a corresponding workflow. This includes options for capturing screenshots, element fingerprinting, and natural mouse movements.
  • New Automation Catalog Package: A new flow-like-catalog-automation package was added to the workspace, containing a comprehensive set of nodes for browser automation (e.g., navigation, input, extraction, storage, file handling), computer automation (e.g., mouse, keyboard, window management, display info, clipboard, screenshots), element fingerprinting, and LLM-assisted healing and planning.
  • Dynamic Tray Icon for Recording: The application's system tray icon now dynamically changes to indicate when an RPA recording is active, providing a visual cue and a quick way to stop recording.
Changelog
  • Cargo.lock
    • Added new Rust packages: annotate-snippets, bindgen, cexpr, clang-sys, const_format, const_format_proc_macros, convert_case, cookie-factory, core-graphics, drm, drm-ffi, drm-fourcc, drm-sys, enigo, gbm, gbm-sys, gl, gl_generator, khronos-egl, khronos_api, lazycell, libspa, libspa-sys, libwayshot-xcap, linux-raw-sys, nix (0.27.1), objc2-av-foundation, objc2-avf-audio, objc2-core-audio, objc2-core-audio-types, objc2-core-media, objc2-image-io, objc2-media-toolbox, objc2-metal (0.3.2), pipewire, pipewire-sys, quick-xml (0.30.0), rdev, rustautogui, stringmatch, thirtyfour, thirtyfour-macros, unicode-width (0.1.14), xcap, xcb, xkbcommon, xml-rs, yansi-term.
    • Updated unicode-width dependency version to 0.2.2 in several packages.
    • Updated nix dependency version to 0.30.1 for rdev.
  • Cargo.toml
    • Added packages/catalog/automation to the workspace members.
    • Added flow-like-catalog-automation as a workspace dependency.
    • Added rdev as a patch dependency from a git repository.
  • apps/desktop/app/flow/page.tsx
    • Imported Video icon, useState, useAuth, and RecordingDock.
    • Implemented state management for showRecording.
    • Integrated RecordingDock as an overlay, conditionally rendered based on showRecording state.
    • Added a 'Record Actions' button to the FlowWrapper's extra dock items, which toggles the recording dock visibility.
  • apps/desktop/components/rpa/index.ts
    • Added a new file to export RpaPermissionDialog, useRpaPermissions, and RecordingDock components.
  • apps/desktop/components/rpa/recording-dock.tsx
    • Added a new component RecordingDock for RPA workflow recording functionality.
    • Implemented UI for recording status (Idle, Recording, Paused), elapsed time, and recorded actions list.
    • Included settings for screenshot capture, fingerprinting, keystroke aggregation, and pattern matching.
    • Added functionality to start, pause, resume, and stop recording, including window minimization during recording.
    • Integrated keyboard shortcuts for stopping recording.
    • Provided an 'Insert Actions to Board' button to generate workflow nodes from recorded actions.
  • apps/desktop/components/rpa/rpa-permission-dialog.tsx
    • Added a new component RpaPermissionDialog to display and manage Accessibility and Screen Recording permissions.
    • Implemented functions to check and request these permissions using Tauri commands.
    • Provided a useRpaPermissions hook for checking permission status.
  • apps/desktop/components/tauri-provider/board-state.ts
    • Added a pre-execution check for RPA permissions (accessibility and screen_recording) if a board requires local execution.
    • Introduced an isRpaPermissionError flag and permissions object to the error thrown if permissions are missing.
  • apps/desktop/public/flow/icons/bot-fix.svg
    • Added a new SVG icon for bot-fix.
  • apps/desktop/public/flow/icons/bot-plan.svg
    • Added a new SVG icon for bot-plan.
  • apps/desktop/public/flow/icons/browser.svg
    • Added a new SVG icon for browser.
  • apps/desktop/public/flow/icons/computer.svg
    • Added a new SVG icon for computer.
  • apps/desktop/public/flow/icons/fingerprint.svg
    • Added a new SVG icon for fingerprint.
  • apps/desktop/public/flow/icons/recording.svg
    • Added a new SVG icon for recording.
  • apps/desktop/public/flow/icons/rpa.svg
    • Added a new SVG icon for rpa.
  • apps/desktop/public/flow/icons/selector.svg
    • Added a new SVG icon for selector.
  • apps/desktop/public/flow/icons/vision.svg
    • Added a new SVG icon for vision.
  • apps/desktop/src-tauri/Cargo.toml
    • Added arboard dependency for clipboard access.
    • Added rdev and enigo for input simulation.
    • Added xcap for screenshot capabilities.
    • Added platform-specific dependencies: core-graphics for macOS, windows with specific features for Windows, and atspi, zbus for Linux.
  • apps/desktop/src-tauri/Info.plist
    • Added NSAccessibilityUsageDescription and NSScreenCaptureUsageDescription keys with user-facing descriptions for macOS permissions.
  • apps/desktop/src-tauri/capabilities/desktop.json
    • Added core:window:allow-minimize, core:window:allow-unminimize, and core:window:allow-set-focus capabilities to allow window management actions.
  • apps/desktop/src-tauri/gen/schemas/capabilities.json
    • Regenerated the capabilities schema to include new window management permissions.
  • apps/desktop/src-tauri/src/functions.rs
    • Added new modules permissions and recording to organize RPA-related functions.
  • apps/desktop/src-tauri/src/functions/permissions.rs
    • Added new Rust module permissions with functions to check and request macOS Accessibility and Screen Recording permissions via FFI.
  • apps/desktop/src-tauri/src/functions/recording/capture.rs
    • Added new Rust module capture implementing event capture logic for mouse, keyboard, and window focus changes using rdev and xcap.
    • Included logic for double-click detection, drag detection, and keystroke aggregation.
    • Integrated clipboard text capture for copy/paste actions.
  • apps/desktop/src-tauri/src/functions/recording/fingerprint.rs
    • Added new Rust module fingerprint with platform-specific functions to extract UI element fingerprints (role, name, text, bounding box) at given coordinates for macOS (AXUIElement), Windows (UI Automation), and Linux (AT-SPI2).
  • apps/desktop/src-tauri/src/functions/recording/generator.rs
    • Added new Rust module generator responsible for converting recorded actions into Flow-Like workflow nodes.
    • Implemented logic to generate events_simple and automation_start_session nodes, and then sequence action-specific nodes (clicks, key presses, scrolls, etc.).
    • Included support for inserting delay nodes based on recorded pauses and handling screenshot references for template matching.
  • apps/desktop/src-tauri/src/functions/recording/mod.rs
    • Added new Rust module recording to manage the overall RPA recording process.
    • Provided Tauri commands for start_recording, pause_recording, resume_recording, stop_recording, get_recording_status, get_recorded_actions, and insert_recording_to_board.
    • Implemented logic to determine the appropriate storage backend for screenshots (online or local).
  • apps/desktop/src-tauri/src/functions/recording/screenshot.rs
    • Added new Rust module screenshot with functions to capture screen regions using xcap and store them in the configured FlowLikeStore.
  • apps/desktop/src-tauri/src/functions/recording/state.rs
    • Added new Rust module state defining the data structures for RPA recording, including RecordingStatus, ActionType, RecordedAction, RecordingSettings, and RecordingSession.
    • Implemented state management for recording sessions, including buffering and flushing keystrokes.
  • apps/desktop/src-tauri/src/lib.rs
    • Registered new Tauri commands for RPA permissions (check_rpa_permissions, request_rpa_permission) and recording (start_recording, pause_recording, resume_recording, stop_recording, get_recording_status, get_recorded_actions, insert_recording_to_board).
    • Managed the TauriRecordingState in the Tauri application builder.
  • apps/desktop/src-tauri/src/state.rs
    • Exported TauriRecordingState for use within the Tauri application.
  • apps/desktop/src-tauri/src/tray.rs
    • Added recording field to TrayRuntimeState to track recording status.
    • Modified tray icon click handler to stop recording and restore the main window if recording is active.
    • Implemented generate_stop_icon, set_recording_tray_icon, and restore_tray_icon functions to dynamically change the tray icon based on recording status.
  • apps/embedded/public/flow/icons/bot-fix.svg
    • Added a new SVG icon for bot-fix.
  • apps/embedded/public/flow/icons/bot-plan.svg
    • Added a new SVG icon for bot-plan.
  • apps/embedded/public/flow/icons/browser.svg
    • Added a new SVG icon for browser.
  • apps/embedded/public/flow/icons/computer.svg
    • Added a new SVG icon for computer.
  • apps/embedded/public/flow/icons/fingerprint.svg
    • Added a new SVG icon for fingerprint.
  • apps/embedded/public/flow/icons/recording.svg
    • Added a new SVG icon for recording.
  • apps/embedded/public/flow/icons/rpa.svg
    • Added a new SVG icon for rpa.
  • apps/embedded/public/flow/icons/selector.svg
    • Added a new SVG icon for selector.
  • apps/embedded/public/flow/icons/vision.svg
    • Added a new SVG icon for vision.
  • packages/catalog/Cargo.toml
    • Added flow-like-catalog-automation to the members list in the workspace Cargo.toml.
    • Added flow-like-catalog-automation to the execute feature list.
  • packages/catalog/automation/Cargo.toml
    • Added a new Cargo.toml file defining the flow-like-catalog-automation package.
    • Configured execute feature with dependencies for browser automation (thirtyfour), desktop input (enigo, rdev), screenshot (xcap), RPA (rustautogui), clipboard (arboard), and LLM integration (rig-core, jsonschema).
    • Declared core dependencies like flow-like, flow-like-types, serde, tokio, uuid, async-trait, chrono, image.
  • packages/catalog/automation/src/browser/auth.rs
    • Added new nodes for browser authentication: BrowserSetBasicAuthNode, BrowserSaveCookiesNode, BrowserLoadCookiesNode, BrowserClearCookiesNode.
  • packages/catalog/automation/src/browser/capture.rs
    • Added new nodes for browser capture: BrowserScreenshotNode (full page or viewport) and BrowserScreenshotElementNode.
  • packages/catalog/automation/src/browser/context.rs
    • Added new nodes for browser context management: BrowserOpenNode (connects to WebDriver, configures browser type, headless mode, viewport, user agent, timeouts) and BrowserCloseNode.
  • packages/catalog/automation/src/browser/extract.rs
    • Added new nodes for browser data extraction: BrowserGetTextNode, BrowserGetAttributeNode, BrowserGetHtmlNode, BrowserExecuteJsNode.
  • packages/catalog/automation/src/browser/files.rs
    • Added new nodes for browser file operations: BrowserUploadFileNode, BrowserUploadMultipleFilesNode, BrowserSetDownloadDirNode, BrowserWaitForDownloadNode, BrowserTriggerDownloadNode.
  • packages/catalog/automation/src/browser/input.rs
    • Added new nodes for browser input: BrowserTypeTextNode, BrowserPressKeyNode, BrowserSelectOptionNode.
  • packages/catalog/automation/src/browser/interact.rs
    • Added new nodes for browser interaction: BrowserClickNode, BrowserDoubleClickNode, BrowserHoverNode, BrowserScrollIntoViewNode.
  • packages/catalog/automation/src/browser/mod.rs
    • Added a new module browser to organize browser automation nodes.
  • packages/catalog/automation/src/browser/navigation.rs
    • Added new nodes for browser navigation: BrowserGotoNode, BrowserBackNode, BrowserForwardNode, BrowserReloadNode.
  • packages/catalog/automation/src/browser/observe.rs
    • Added new nodes for browser observation: BrowserGetConsoleLogsNode, BrowserClearConsoleLogsNode, BrowserStartNetworkObserverNode, BrowserGetNetworkRequestsNode, BrowserWaitForNetworkIdleNode.
  • packages/catalog/automation/src/browser/page.rs
    • Added new nodes for browser page management: BrowserNewPageNode, BrowserClosePageNode.
  • packages/catalog/automation/src/browser/snapshot.rs
    • Added new nodes for browser state snapshotting: BrowserGetDomSnapshotNode, BrowserGetAccessibilitySnapshotNode, BrowserGetElementSnapshotNode.
  • packages/catalog/automation/src/browser/storage.rs
    • Added new nodes for browser storage management: BrowserGetLocalStorageNode, BrowserSetLocalStorageNode, BrowserGetSessionStorageNode, BrowserSetSessionStorageNode, BrowserClearStorageNode, BrowserGetAllStorageNode.
  • packages/catalog/automation/src/browser/wait.rs
    • Added new nodes for browser waiting: BrowserWaitForNode (for selectors), BrowserWaitForDelayNode.
  • packages/catalog/automation/src/computer/accessibility.rs
    • Added new nodes for computer accessibility: ComputerGetAccessibilityTreeNode, ComputerFindAccessibilityElementNode.
  • packages/catalog/automation/src/computer/capture.rs
    • Added a new node ComputerScreenshotNode for capturing full screen, display, or region screenshots.
  • packages/catalog/automation/src/computer/clipboard.rs
    • Added new nodes for computer clipboard operations: ClipboardGetTextNode, ClipboardSetTextNode, ClipboardGetImageNode, ClipboardSetImageNode.
  • packages/catalog/automation/src/computer/display.rs
    • Added new nodes for computer display information: ComputerListDisplaysNode, ComputerGetDisplayNode, ComputerGetPrimaryDisplayNode.
  • packages/catalog/automation/src/computer/keyboard.rs
    • Added new nodes for computer keyboard input: ComputerKeyPressNode, ComputerKeyTypeNode.
  • packages/catalog/automation/src/computer/mod.rs
    • Added a new module computer to organize computer automation nodes.
  • packages/catalog/automation/src/computer/mouse.rs
    • Added new nodes for computer mouse control: ComputerMouseMoveNode, ComputerNaturalMouseMoveNode, ComputerMouseClickNode, ComputerMouseDoubleClickNode, ComputerMouseDragNode, ComputerScrollNode.
  • packages/catalog/automation/src/computer/session.rs
    • Added a new module session for computer session management (deprecated, moved to root session.rs).
  • packages/catalog/automation/src/computer/wait.rs
    • Added a new node ComputerWaitNode for pausing execution.
  • packages/catalog/automation/src/computer/window.rs
    • Added new nodes for computer window management: ListWindowsNode, GetActiveWindowNode, FindWindowByTitleNode, LaunchAppNode, CaptureWindowNode, FocusWindowNode.
  • packages/catalog/automation/src/fingerprint/compute.rs
    • Added new nodes for fingerprint computation: ComputeFingerprintHashNode, ExtractFingerprintDataNode.
  • packages/catalog/automation/src/fingerprint/create.rs
    • Added new nodes for fingerprint creation: CreateFingerprintNode (from attributes), CreateFingerprintFromJsonNode.
  • packages/catalog/automation/src/fingerprint/match_.rs
    • Added new nodes for fingerprint matching: CreateMatchOptionsNode, CompareFingerprintsNode.
  • packages/catalog/automation/src/fingerprint/match_node.rs
    • Added a new node MatchFingerprintNode to find elements based on fingerprints using various strategies (DOM, Accessibility, Hybrid).
  • packages/catalog/automation/src/fingerprint/mod.rs
    • Added a new module fingerprint to organize element fingerprinting nodes.
  • packages/catalog/automation/src/fingerprint/update.rs
    • Added new nodes for fingerprint updates: UpdateFingerprintNode, RecordFingerprintMatchNode, FingerprintToJsonNode.
  • packages/catalog/automation/src/lib.rs
    • Added the main library file for flow-like-catalog-automation, defining its modules (browser, computer, fingerprint, llm, rpa, selector, session, vision) and exporting get_catalog.
  • packages/catalog/automation/src/llm/classify_screen.rs
    • Added a new node LLMClassifyScreenNode that uses a vision LLM to classify screen types and states from screenshots.
  • packages/catalog/automation/src/llm/extract_structured.rs
    • Added a new node LLMExtractFromScreenNode that uses a vision LLM to extract structured data from screenshots based on a provided JSON schema.
  • packages/catalog/automation/src/llm/find_element.rs
    • Added a new node LLMFindElementNode that uses a vision LLM to locate UI elements based on natural language descriptions from screenshots.
  • packages/catalog/automation/src/llm/heal.rs
    • Added a new node LLMDiagnoseAndHealNode that uses an LLM to diagnose automation failures and suggest/apply healing actions based on error messages and screenshots.
  • packages/catalog/automation/src/llm/heal_selector.rs
    • Added a new node LLMHealSelectorNode that uses an LLM to fix broken CSS/XPath selectors based on page HTML and element descriptions.
  • packages/catalog/automation/src/llm/heal_template.rs
    • Added a new node LLMHealTemplateNode that uses a vision LLM to find visually similar elements when template matching fails, based on current screenshots and failed templates.
  • packages/catalog/automation/src/llm/mod.rs
    • Added a new module llm to organize LLM-assisted automation nodes.
  • packages/catalog/automation/src/llm/observe.rs
    • Added new nodes for LLM-assisted observation: LLMObserveScreenNode (comprehensive screen description) and LLMDescribeElementNode (specific element description).
  • packages/catalog/automation/src/llm/plan.rs
    • Added a new node LLMSuggestNextStepNode that uses an LLM to suggest the next best action given a screen, goal, and past actions.
  • packages/catalog/automation/src/llm/plan_actions.rs
    • Added a new node LLMPlanActionsNode that uses an LLM to generate a sequence of actions to achieve a goal based on screen observations.
  • packages/catalog/automation/src/llm/rank_candidates.rs
    • Added a new node LLMRankCandidatesNode that uses an LLM to rank potential UI element candidates based on a description and their attributes.
  • packages/catalog/automation/src/llm/resolve_element.rs
    • Added a new node LLMResolveElementNode that uses an LLM to resolve a natural language element description into a concrete selector or coordinates.
  • packages/catalog/automation/src/rpa/mod.rs
    • Added a new module rpa to organize RPA utility nodes.
  • packages/catalog/automation/src/rpa/reliability.rs
    • Added new nodes for RPA reliability: RpaRetryNode, RpaFallbackNode, RpaErrorHandlingNode.
  • packages/catalog/automation/src/selector/mod.rs
    • Added a new module selector to organize selector-related nodes.
  • packages/catalog/automation/src/selector/selector_set.rs
    • Added new nodes for selector set management: CreateSelectorNode, AddSelectorToSetNode, RemoveSelectorFromSetNode, FilterSelectorSetNode, SelectorSetToJsonNode, SelectorSetFromJsonNode.
  • packages/catalog/automation/src/session/mod.rs
    • Added a new module session to manage automation sessions.
  • packages/catalog/automation/src/session/session_management.rs
    • Added new nodes for automation session management: AutomationStartSessionNode, AutomationEndSessionNode.
  • packages/catalog/automation/src/types/artifacts.rs
    • Added a new module artifacts defining ArtifactRef and ArtifactType for referencing captured data.
  • packages/catalog/automation/src/types/fingerprints.rs
    • Added a new module fingerprints defining ElementFingerprint, FingerprintMatchOptions, and MatchStrategy for UI element identification.
  • packages/catalog/automation/src/types/handles.rs
    • Added a new module handles defining AutomationSession, BrowserContextOptions, BrowserType, and ComputerContext for managing automation contexts.
  • packages/catalog/automation/src/types/mod.rs
    • Added a new module types to organize common data structures for automation.
  • packages/catalog/automation/src/types/selectors.rs
    • Added a new module selectors defining Selector, SelectorKind, and SelectorSet for describing UI element locators.
  • packages/catalog/automation/src/vision/mod.rs
    • Added a new module vision to organize vision-based automation nodes.
  • packages/catalog/automation/src/vision/template_matching.rs
    • Added new nodes for vision-based template matching: VisionFindTemplateNode, VisionClickTemplateNode, VisionWaitForTemplateNode.
Activity
  • The pull request was created by felix-schultz.
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Comment on lines +66 to +76
K_CF_STRING_ENCODING_UTF8,
);

if key.is_null() {
return AXIsProcessTrustedWithOptions(ptr::null());
}

let keys = [key];
let values = [kCFBooleanTrue];

let options = CFDictionaryCreate(

This comment was marked as outdated.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces foundational RPA/computer-automation support across the desktop app and catalog, including permission prompting/checking, workflow recording controls, and a new automation node catalog used for browser + computer + LLM-assisted automation.

Changes:

  • Added a new flow-like-catalog-automation crate with browser/computer/fingerprint/LLM automation nodes (feature-gated execution deps).
  • Added Tauri commands, tray behavior, and storage helpers for workflow recording + RPA permission checks (macOS focused).
  • Added desktop UI components and pre-run permission checks to guide users through required system permissions.

Reviewed changes

Copilot reviewed 63 out of 144 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/catalog/automation/src/llm/plan.rs Adds an LLM “next step suggestion” planning node.
packages/catalog/automation/src/llm/mod.rs Exposes new LLM node modules.
packages/catalog/automation/src/llm/heal_template.rs Adds an LLM node to visually heal template matches.
packages/catalog/automation/src/llm/heal_selector.rs Adds an LLM node to heal broken selectors.
packages/catalog/automation/src/llm/heal.rs Adds an LLM node for diagnosis + healing suggestions.
packages/catalog/automation/src/llm/find_element.rs Adds an LLM vision node to locate an element by description.
packages/catalog/automation/src/llm/extract_structured.rs Adds an LLM vision node to extract structured data with schema support.
packages/catalog/automation/src/llm/classify_screen.rs Adds an LLM vision node to classify screen state.
packages/catalog/automation/src/lib.rs Introduces the automation catalog crate entrypoint + module layout.
packages/catalog/automation/src/fingerprint/update.rs Adds nodes for fingerprint update, match recording, JSON serialization.
packages/catalog/automation/src/fingerprint/mod.rs Exposes fingerprint submodules.
packages/catalog/automation/src/fingerprint/match_node.rs Adds WebDriver-based fingerprint matching node.
packages/catalog/automation/src/fingerprint/match_.rs Adds nodes for match options + fingerprint similarity.
packages/catalog/automation/src/fingerprint/create.rs Adds nodes to create fingerprints (incl. from JSON).
packages/catalog/automation/src/fingerprint/compute.rs Adds nodes to compute fingerprint hash + extract fields.
packages/catalog/automation/src/computer/wait.rs Adds a computer “wait” node (feature-gated).
packages/catalog/automation/src/computer/session.rs Adds a deprecated stub module note for session management.
packages/catalog/automation/src/computer/mod.rs Exposes computer automation submodules.
packages/catalog/automation/src/computer/keyboard.rs Adds computer keyboard nodes (press/type) via Enigo.
packages/catalog/automation/src/computer/display.rs Adds nodes to enumerate/get displays via xcap.
packages/catalog/automation/src/computer/capture.rs Adds computer screenshot capture node and artifact storage.
packages/catalog/automation/src/computer/accessibility.rs Adds placeholder accessibility-tree nodes for future platform bindings.
packages/catalog/automation/src/browser/wait.rs Adds browser wait nodes via Thirtyfour.
packages/catalog/automation/src/browser/page.rs Adds browser page open/close nodes.
packages/catalog/automation/src/browser/navigation.rs Adds browser navigation nodes.
packages/catalog/automation/src/browser/mod.rs Exposes browser automation submodules.
packages/catalog/automation/src/browser/interact.rs Adds browser interaction nodes (click/hover/etc.).
packages/catalog/automation/src/browser/input.rs Adds browser input nodes (type/press/select).
packages/catalog/automation/src/browser/context.rs Adds browser context open/close node (WebDriver connect + caps).
packages/catalog/automation/src/browser/capture.rs Adds browser screenshot nodes producing base64 + NodeImage.
packages/catalog/automation/Cargo.toml Defines automation crate deps and execute feature gating.
packages/catalog/Cargo.toml Wires automation catalog into the umbrella catalog + execute feature.
Cargo.toml Adds automation crate to workspace + patches rdev source.
apps/desktop/src-tauri/src/tray.rs Adds recording state + tray-click stop behavior + dynamic tray icon.
apps/desktop/src-tauri/src/state.rs Re-exports recording state type for app management.
apps/desktop/src-tauri/src/lib.rs Registers recording state and new Tauri commands (permissions/recording).
apps/desktop/src-tauri/src/functions/recording/screenshot.rs Adds screenshot capture helpers for recording (region + full screen).
apps/desktop/src-tauri/src/functions/recording/mod.rs Adds recording lifecycle commands + store selection logic.
apps/desktop/src-tauri/src/functions/permissions.rs Adds macOS permission checks/requests for Accessibility + Screen Recording.
apps/desktop/src-tauri/src/functions.rs Registers new functions modules.
apps/desktop/src-tauri/gen/schemas/capabilities.json Regenerates capabilities schema (window focus/minimize permissions).
apps/desktop/src-tauri/capabilities/desktop.json Adds window permissions needed for tray-driven focus/restore.
apps/desktop/src-tauri/Info.plist Adds macOS usage strings for Accessibility + Screen Recording.
apps/desktop/src-tauri/Cargo.toml Adds automation-related deps (arboard/enigo/rdev/xcap/etc.).
apps/desktop/components/tauri-provider/board-state.ts Adds pre-run RPA permission check for local-execution workflows.
apps/desktop/components/rpa/rpa-permission-dialog.tsx Adds permission dialog + hook to request/check RPA permissions.
apps/desktop/components/rpa/index.ts Re-exports RPA UI components/hooks.
apps/desktop/app/flow/page.tsx Adds a “Record Actions” dock item + overlay mount for recording UI.

Comment on lines 175 to 181
let cropped = image::imageops::crop_imm(
&full_image,
region_x as u32,
region_y as u32,
region_width as u32,
region_height as u32,
);
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region_x/region_y/region_width/region_height are i64 and are cast directly to u32 without clamping/bounds checks. Negative values will wrap to huge u32 values, and out-of-bounds rectangles can cause image::imageops::crop_imm to panic. Clamp x/y to >= 0, clamp width/height to >= 1, and clamp the crop rectangle to the captured image bounds (or return a descriptive error when the requested region is invalid).

Suggested change
let cropped = image::imageops::crop_imm(
&full_image,
region_x as u32,
region_y as u32,
region_width as u32,
region_height as u32,
);
let img_width = full_image.width();
let img_height = full_image.height();
let x = if region_x < 0 { 0 } else { region_x as u32 };
let y = if region_y < 0 { 0 } else { region_y as u32 };
if x >= img_width {
return Err(flow_like_types::anyhow!(
"Requested region_x {} is outside the captured image width {}",
region_x,
img_width
));
}
if y >= img_height {
return Err(flow_like_types::anyhow!(
"Requested region_y {} is outside the captured image height {}",
region_y,
img_height
));
}
let requested_width = if region_width < 1 { 1 } else { region_width as u32 };
let requested_height = if region_height < 1 { 1 } else { region_height as u32 };
let max_width = img_width.saturating_sub(x);
let max_height = img_height.saturating_sub(y);
if max_width == 0 || max_height == 0 {
return Err(flow_like_types::anyhow!(
"Requested region is outside the captured image bounds"
));
}
let crop_width = requested_width.min(max_width);
let crop_height = requested_height.min(max_height);
if crop_width == 0 || crop_height == 0 {
return Err(flow_like_types::anyhow!(
"Requested region produces an empty crop area"
));
}
let cropped =
image::imageops::crop_imm(&full_image, x, y, crop_width, crop_height);

Copilot uses AI. Check for mistakes.
}
} catch (e) {
if ((e as any).isRpaPermissionError) throw e;
console.warn("Failed to check RPA permissions:", e);
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For requires_local_execution workflows, failures in check_rpa_permissions are currently swallowed (only logged), allowing execution to proceed even though the PR description says permissions are “checked before running.” Consider treating any non-RPA-permission error from check_rpa_permissions as a hard failure (or convert it into an isRpaPermissionError) so local automation workflows don’t run without a definitive permission verdict.

Suggested change
console.warn("Failed to check RPA permissions:", e);
console.warn("Failed to check RPA permissions:", e);
const error = new Error(
"Failed to verify RPA permissions. This workflow cannot run without a successful permission check.",
);
(error as any).isRpaPermissionError = true;
(error as any).cause = e;
throw error;

Copilot uses AI. Check for mistakes.
Comment on lines 156 to 157
let _ = permission_type;
Ok(true)
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On non-macOS platforms, request_rpa_permission returns Ok(true) for any permission_type value (including typos/unknown values), while macOS correctly errors on unknown strings. For consistent API behavior (and better frontend debugging), validate permission_type and return an error for unknown values on non-macOS as well.

Suggested change
let _ = permission_type;
Ok(true)
match permission_type.as_str() {
"accessibility" | "screen_recording" => Ok(true),
_ => Err(TauriFunctionError::new(&format!(
"Unknown permission type: {}",
permission_type
))),
}

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +175
<button
type="button"
onClick={onRequest}
disabled={checking}
className="rounded-md bg-primary px-3 py-1.5 text-xs font-medium text-primary-foreground hover:bg-primary/90 disabled:opacity-50"
>
{checking ? "Checking..." : "Grant"}
</button>
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a custom-styled <button> instead of reusing the existing shadcn/button component from @tm9657/flow-like-ui, which can lead to inconsistent focus states, accessibility, and theming across the app. Prefer using the shared Button component (and its variants/sizes) here so styling and keyboard/focus behavior remain consistent.

Copilot uses AI. Check for mistakes.
Comment on lines 60 to 87
pub fn request_accessibility() -> bool {
unsafe {
let key_str = b"AXTrustedCheckOptionPrompt\0";
let key = CFStringCreateWithCString(
ptr::null(),
key_str.as_ptr() as *const i8,
K_CF_STRING_ENCODING_UTF8,
);

if key.is_null() {
return AXIsProcessTrustedWithOptions(ptr::null());
}

let keys = [key];
let values = [kCFBooleanTrue];

let options = CFDictionaryCreate(
ptr::null(),
keys.as_ptr(),
values.as_ptr(),
1,
&kCFTypeDictionaryKeyCallBacks as *const _ as *const c_void,
&kCFTypeDictionaryValueCallBacks as *const _ as *const c_void,
);

AXIsProcessTrustedWithOptions(options)
}
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFStringCreateWithCString and CFDictionaryCreate return retained CoreFoundation objects, but they are never released. If request_accessibility() is called repeatedly (e.g., polling from UI), this will leak CF objects. Consider adding CFRelease calls for key and options (or using a safe CoreFoundation wrapper crate) to ensure objects are released.

Copilot uses AI. Check for mistakes.
Comment on lines +215 to +243
let tool_params = json::json!({
"type": "object",
"properties": {
"goal_reached": { "type": "boolean", "description": "Whether the goal appears to be reached" },
"action_type": { "type": "string", "description": "Type of action (click, type, scroll, wait, verify)" },
"target_description": { "type": "string", "description": "What to interact with" },
"target_coordinates": {
"type": "array",
"items": { "type": "integer" },
"description": "Approximate [x, y] coordinates if applicable"
},
"parameters": { "type": "object", "description": "Action-specific parameters" },
"reasoning": { "type": "string", "description": "Why this action is suggested" },
"confidence": { "type": "number", "description": "Confidence in this suggestion 0-1" },
"alternatives": {
"type": "array",
"items": {
"type": "object",
"properties": {
"action_type": { "type": "string" },
"description": { "type": "string" },
"confidence": { "type": "number" }
}
},
"description": "Alternative actions to consider"
}
},
"required": ["goal_reached", "action_type", "target_description", "reasoning", "confidence"]
});
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several newly added LLM nodes embed large inline JSON schemas/tools that are structurally very similar. This will be hard to edit safely over time (schema drift, inconsistent descriptions/required lists, etc.). Consider extracting shared schema-building helpers (e.g., a small module that returns Value schemas for common fields like confidence, reasoning, coordinates, etc.) to reduce duplication and keep tooling consistent across nodes.

Copilot uses AI. Check for mistakes.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and impressive pull request that introduces a comprehensive RPA and computer automation framework. The changes are well-structured, with new capabilities added to the Tauri backend, a dedicated automation catalog for nodes, and new UI components for recording and permissions. The use of platform-specific dependencies and FFI for macOS permissions shows great attention to detail. The recording logic, including complex event processing for double-clicks, drags, and copy-paste detection, is robust.

My review focuses on a few key areas to further improve the code:

  • Maintainability: Some functions in both the Rust backend and React frontend have become quite large and could benefit from refactoring to improve readability.
  • Dependencies: The use of a forked dependency for rdev introduces a potential maintenance risk.
  • Resource Management: I've identified a memory leak in the macOS permission handling code.

Overall, this is a fantastic addition that lays a strong foundation for powerful automation features.

&kCFTypeDictionaryValueCallBacks as *const _ as *const c_void,
);

AXIsProcessTrustedWithOptions(options)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This unsafe block has a memory leak. The CFStringCreateWithCString and CFDictionaryCreate functions follow Core Foundation's "Create Rule", meaning the caller owns the returned objects and is responsible for releasing them. The key and options variables are not being released, which will leak memory on each call to this function. You should call CFRelease on these objects before the function returns to prevent this.

            let trusted = AXIsProcessTrustedWithOptions(options);

            if !options.is_null() {
                CFRelease(options);
            }
            // `key` is guaranteed not to be null here due to the check on line 69
            CFRelease(key);

            trusted

if ("Scroll" in type) return `Scroll ${type.Scroll.direction} (${type.Scroll.amount})`;
if ("KeyType" in type) {
const text = type.KeyType.text;
return text.length > 15 ? `"${text.slice(0, 15)}..."` : `"${text}"`;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 15 is a magic number used for truncating text. This makes the code harder to read and maintain. It's also used on line 346. Consider defining a constant at the top of the component, like const MAX_LABEL_LENGTH = 15;, and using it in both places.

>
<ScrollArea className="max-h-36 overflow-y-auto">
<div className="px-4 pb-3 space-y-1.5">
{actions.slice(-8).map((action, index) => (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 8 is a magic number used to determine how many recent actions to display. This should be extracted into a named constant, such as MAX_VISIBLE_ACTIONS, to improve readability and make it easier to adjust in the future.

Comment on lines +730 to +1287
async fn process_events(
mut rx: mpsc::Receiver<CapturedEvent>,
state: Arc<RwLock<RecordingStateInner>>,
active: Arc<std::sync::atomic::AtomicBool>,
app_handle: tauri::AppHandle,
store: Option<Arc<FlowLikeStore>>,
) {
let mut last_mouse_down: Option<(i32, i32, MouseButton, Vec<KeyModifier>, std::time::Instant)> = None;
let mut drag_start: Option<(i32, i32)> = None;
let mut last_focused_window: Option<FocusedWindow> = None;

// Double-click detection - track completed clicks (not mouse downs)
let mut last_completed_click: Option<(i32, i32, MouseButton, std::time::Instant)> = None;
const DOUBLE_CLICK_THRESHOLD_MS: u128 = 400; // Standard OS double-click threshold

// Pending copy detection - copy clipboard content on KeyUp after delay
let mut pending_copy_key: Option<String> = None;
const DOUBLE_CLICK_DISTANCE: i32 = 10; // Pixels

tracing::debug!(
" process_events: store available: {}",
store.is_some()
);
tracing::debug!(" process_events: waiting for events...");

// Check session info
{
let state_guard = state.read().await;
if let Some(session) = &state_guard.session {
tracing::debug!(" Session ID: {}", session.id);
tracing::debug!(
" Target board ID: {:?}",
session.target_board_id
);
} else {
tracing::warn!(" No session in state!");
}
}

let mut processed_count = 0u32;
let mut action_count = 0u32;
let mut last_event_time = std::time::Instant::now();
// Reduce dedup interval - only skip very rapid duplicate non-click events
let min_event_interval = std::time::Duration::from_millis(5);

tracing::debug!(" About to enter event loop...");
while let Some(event) = rx.recv().await {
processed_count += 1;

// Deduplicate rapid events EXCEPT mouse clicks and key events (to preserve timing)
let now = std::time::Instant::now();
let is_important_event = matches!(
event,
CapturedEvent::MouseDown { .. }
| CapturedEvent::MouseUp { .. }
| CapturedEvent::KeyDown { .. }
| CapturedEvent::Character { .. }
);
if !is_important_event && now.duration_since(last_event_time) < min_event_interval {
continue;
}
last_event_time = now;

if processed_count % 10 == 1 {
tracing::debug!(
" Received event #{}: {:?}",
processed_count, event
);
}

if !active.load(std::sync::atomic::Ordering::SeqCst) {
tracing::debug!(
" Skipping event #{} - not active",
processed_count
);
continue;
}

{
let state_guard = state.read().await;
if state_guard.status != RecordingStatus::Recording {
tracing::debug!(
" Skipping event #{} - status is {:?}",
processed_count, state_guard.status
);
continue;
}
}

// Check for window focus changes on any mouse event (user is interacting with something)
if matches!(
event,
CapturedEvent::MouseDown { .. } | CapturedEvent::MouseUp { .. }
) && let Some(current_window) = Self::get_focused_window()
{
let focus_changed = match &last_focused_window {
Some(last) => {
last.title != current_window.title || last.process != current_window.process
}
None => true, // First focus detection
};

if focus_changed
&& (!current_window.title.is_empty() || !current_window.process.is_empty())
{
tracing::debug!(
" Window focus changed to: {} ({})",
current_window.title, current_window.process
);

// Flush any pending keystrokes before focus change
{
let mut state_guard = state.write().await;
if let Some(typed_action) = state_guard.flush_keystroke_buffer() {
let _ = app_handle.emit("recording:action", &typed_action);
}
}

// Create and emit WindowFocus action
let action = RecordedAction::new(
flow_like_types::create_id(),
ActionType::WindowFocus {
window_title: current_window.title.clone(),
process: current_window.process.clone(),
},
);

{
let mut state_guard = state.write().await;
state_guard.add_action(action.clone());
}
let _ = app_handle.emit("recording:action", &action);

last_focused_window = Some(current_window);
}
}

match &event {
CapturedEvent::MouseDown { x, y, button, modifiers } => {
last_mouse_down = Some((*x, *y, button.clone(), modifiers.clone(), std::time::Instant::now()));
drag_start = Some((*x, *y));
}
CapturedEvent::MouseUp { x, y, button } => {
// Get fresh coordinates from enigo for accuracy (aligns with screenshot capture)
let (fresh_x, fresh_y) = Self::get_mouse_location().unwrap_or((*x, *y));
tracing::debug!(
" MouseUp: rdev coords=({}, {}), fresh coords=({}, {})",
x, y, fresh_x, fresh_y
);
let (x, y) = (fresh_x, fresh_y);
let button = button.clone();

{
let mut state_guard = state.write().await;
if let Some(typed_action) = state_guard.flush_keystroke_buffer() {
let _ = app_handle.emit("recording:action", &typed_action);
}
}

// Get drag start position, or use current position if MouseDown was missed
let (start_x, start_y) = drag_start.take().unwrap_or((x, y));
let dx = (x - start_x).abs();
let dy = (y - start_y).abs();

// Only record as drag if significant movement, otherwise it's a click
if dx > 10 || dy > 10 {
let action = RecordedAction::new(
flow_like_types::create_id(),
ActionType::Drag {
start: (start_x, start_y),
end: (x, y),
},
)
.with_coordinates(start_x, start_y);

let mut state_guard = state.write().await;
state_guard.add_action(action.clone());
action_count += 1;
tracing::debug!(
" Drag action #{} added from ({}, {}) to ({}, {})",
action_count, start_x, start_y, x, y
);
let _ = app_handle.emit("recording:action", &action);
} else {
// This is a click (not a drag)
let click_time = std::time::Instant::now();

// Check for double-click against the last completed click
let is_double_click = if let Some((lx, ly, lb, lt)) = &last_completed_click
{
let distance = (x - lx).abs().max((y - ly).abs());
let time_diff = click_time.duration_since(*lt).as_millis();
// Double-click: same button, close position, within time threshold
*lb == button
&& distance <= DOUBLE_CLICK_DISTANCE
&& time_diff <= DOUBLE_CLICK_THRESHOLD_MS
} else {
false
};

let (capture_screenshots, region_size, app_id, board_id) = {
let state_guard = state.read().await;
state_guard
.session
.as_ref()
.map(|s| {
(
s.settings.capture_screenshots,
s.settings.capture_region_size,
s.app_id.clone(),
s.target_board_id.clone(),
)
})
.unwrap_or((false, 150, None, None))
};

let screenshot_ref = if capture_screenshots {
if let Some(ref store) = store {
capture_region(
x,
y,
region_size,
store,
app_id.as_deref(),
board_id.as_deref(),
)
.await
.ok()
} else {
None
}
} else {
None
};

// Extract UI element fingerprint at click location
let fingerprint = extract_fingerprint_at(x, y);

if is_double_click {
// Remove the previous single click and replace with double-click
{
let mut state_guard = state.write().await;
if let Some(session) = &mut state_guard.session
&& let Some(last_action) = session.actions.last()
&& matches!(last_action.action_type, ActionType::Click { .. })
{
session.actions.pop();
}
}

let mut action = RecordedAction::new(
flow_like_types::create_id(),
ActionType::DoubleClick {
button: button.clone(),
},
)
.with_coordinates(x, y);

if let Some(ref screenshot_id) = screenshot_ref {
action = action.with_screenshot_ref(screenshot_id);
}

if let Some(fp) = fingerprint {
action = action.with_fingerprint(fp);
}

let mut state_guard = state.write().await;
state_guard.add_action(action.clone());
action_count += 1;
let _ = app_handle.emit("recording:action", &action);

// Clear to prevent triple-click
last_completed_click = None;
} else {
let click_modifiers = last_mouse_down
.as_ref()
.map(|(_, _, _, mods, _)| mods.clone())
.unwrap_or_default();
let mut action = RecordedAction::new(
flow_like_types::create_id(),
ActionType::Click {
button: button.clone(),
modifiers: click_modifiers,
},
)
.with_coordinates(x, y);

if let Some(ref screenshot_id) = screenshot_ref {
action = action.with_screenshot_ref(screenshot_id);
}

if let Some(fp) = fingerprint {
action = action.with_fingerprint(fp);
}

let mut state_guard = state.write().await;
state_guard.add_action(action.clone());
action_count += 1;
let _ = app_handle.emit("recording:action", &action);

// Record for double-click detection
last_completed_click = Some((x, y, button.clone(), click_time));
}
}

last_mouse_down = None;
}
CapturedEvent::Scroll { x, y, dx, dy } => {
// Skip scroll events with no actual movement
if *dx == 0 && *dy == 0 {
continue;
}

// Get fresh coordinates for scroll position
let (x, y) = Self::get_mouse_location().unwrap_or((*x, *y));

let mut state_guard = state.write().await;
state_guard.flush_keystroke_buffer();

// Determine scroll direction and amount.
// rdev convention: positive dy = scroll down, negative dy = scroll up
// (matches macOS "natural" scrolling inverted at driver level).
// Positive dx = scroll right, negative dx = scroll left.
let (direction, amount) = if dy.abs() >= dx.abs() && *dy != 0 {
if *dy > 0 {
(ScrollDirection::Down, *dy)
} else {
(ScrollDirection::Up, -dy)
}
} else if *dx != 0 {
if *dx > 0 {
(ScrollDirection::Right, *dx)
} else {
(ScrollDirection::Left, -dx)
}
} else {
continue; // Both are 0, skip
};

let action = RecordedAction::new(
flow_like_types::create_id(),
ActionType::Scroll { direction, amount },
)
.with_coordinates(x, y);

state_guard.add_action(action.clone());
let _ = app_handle.emit("recording:action", &action);
}
CapturedEvent::KeyDown { key, modifiers } => {
tracing::debug!(
" KeyDown: key='{}', modifiers={:?}",
key, modifiers
);

let is_modifier = matches!(
key.as_str(),
"Shift"
| "Ctrl"
| "Alt"
| "Meta"
| "ShiftLeft"
| "ShiftRight"
| "ControlLeft"
| "ControlRight"
| "AltLeft"
| "AltRight"
| "MetaLeft"
| "MetaRight"
);

let is_special = matches!(
key.as_str(),
"Return"
| "Enter"
| "Tab"
| "Escape"
| "Backspace"
| "Delete"
| "Up"
| "Down"
| "Left"
| "Right"
| "Home"
| "End"
| "PageUp"
| "PageDown"
| "F1"
| "F2"
| "F3"
| "F4"
| "F5"
| "F6"
| "F7"
| "F8"
| "F9"
| "F10"
| "F11"
| "F12"
);

// Check for Copy (Ctrl+C / Cmd+C) or Paste (Ctrl+V / Cmd+V)
let has_cmd_or_ctrl = modifiers.contains(&KeyModifier::Control)
|| modifiers.contains(&KeyModifier::Meta);
let is_copy = has_cmd_or_ctrl && key.to_lowercase() == "c";
let is_paste = has_cmd_or_ctrl && key.to_lowercase() == "v";

tracing::debug!(
" KeyDown analysis: has_cmd_or_ctrl={}, is_copy={}, is_paste={}",
has_cmd_or_ctrl, is_copy, is_paste
);

// For Copy, defer clipboard reading until KeyUp (system processes copy after KeyDown)
if is_copy {
tracing::debug!(" Setting pending_copy_key to '{}'", key);
pending_copy_key = Some(key.clone());
continue;
}

// Record special keys (Enter, Tab, etc.) OR any key with modifiers (Ctrl+C, etc.)
// Skip pure modifier keys
if !is_modifier && (is_special || !modifiers.is_empty()) {
let mut state_guard = state.write().await;
// Flush any buffered keystrokes before adding the special key
if let Some(typed_action) = state_guard.flush_keystroke_buffer() {
let _ = app_handle.emit("recording:action", &typed_action);
}

let action = if is_paste {
// For paste, clipboard already has content - read immediately
let clipboard_content = Self::get_clipboard_text();
tracing::debug!(
" Paste detected, clipboard: {:?}",
clipboard_content.as_ref().map(|s| if s.len() > 50 {
format!("{}...", &s[..50])
} else {
s.clone()
})
);
RecordedAction::new(
flow_like_types::create_id(),
ActionType::Paste { clipboard_content },
)
} else {
// Normalize key name for the workflow
let normalized_key = match key.as_str() {
"Return" => "Enter".to_string(),
other => other.to_string(),
};

RecordedAction::new(
flow_like_types::create_id(),
ActionType::KeyPress {
key: normalized_key.clone(),
modifiers: modifiers.clone(),
},
)
};

state_guard.add_action(action.clone());
action_count += 1;
tracing::debug!(
" KeyPress action #{} added: {:?}",
action_count, action.action_type
);
let _ = app_handle.emit("recording:action", &action);
}
}
CapturedEvent::KeyUp { key } => {
tracing::debug!(
" KeyUp: key='{}', pending_copy_key={:?}",
key, pending_copy_key
);

// Handle deferred Copy detection - clipboard is now populated
let pending_matches = pending_copy_key.as_ref().map(|k| k.to_lowercase())
== Some(key.to_lowercase());
tracing::debug!(" KeyUp: pending_matches={}", pending_matches);

if pending_matches {
pending_copy_key = None;

// Retry clipboard read with increasing delay to handle OS clipboard latency
let mut clipboard_content = None;
for delay in [50, 100, 200] {
flow_like_types::tokio::time::sleep(
std::time::Duration::from_millis(delay),
)
.await;
clipboard_content = Self::get_clipboard_text();
if clipboard_content.is_some() {
break;
}
}
tracing::debug!(
" Copy detected (on KeyUp), clipboard: {:?}",
clipboard_content.as_ref().map(|s| if s.len() > 50 {
format!("{}...", &s[..50])
} else {
s.clone()
})
);

let mut state_guard = state.write().await;
if let Some(typed_action) = state_guard.flush_keystroke_buffer() {
let _ = app_handle.emit("recording:action", &typed_action);
}

let action = RecordedAction::new(
flow_like_types::create_id(),
ActionType::Copy { clipboard_content },
);

state_guard.add_action(action.clone());
action_count += 1;
tracing::debug!(" Copy action #{} added", action_count);
let _ = app_handle.emit("recording:action", &action);
}
}
CapturedEvent::Character { ch } => {
if ch.is_control() {
continue;
}
let mut state_guard = state.write().await;
state_guard.buffer_keystroke(*ch);
// Log every 10th character for debugging without spam
if state_guard.keystroke_buffer_len() % 10 == 1 {
tracing::debug!(
" Buffered char '{}', buffer len: {}",
ch,
state_guard.keystroke_buffer_len()
);
}
}
_ => {}
}

{
let mut state_guard = state.write().await;
if state_guard.should_flush_keystrokes()
&& let Some(typed_action) = state_guard.flush_keystroke_buffer()
{
let _ = app_handle.emit("recording:action", &typed_action);
}
}
}
tracing::debug!(" ========== PROCESSOR LOOP EXITED ==========");
tracing::debug!(" Total events processed: {}", processed_count);
tracing::debug!(" Total actions created: {}", action_count);

let state_guard = state.read().await;
if let Some(session) = &state_guard.session {
tracing::debug!(
" Session has {} actions at processor exit",
session.actions.len()
);
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The process_events function is very long and handles a lot of complex logic, making it difficult to read and maintain. To improve readability, consider refactoring the large match &event block. Each event type could be handled by its own helper function, for example:

async fn handle_mouse_up(state: &Arc<RwLock<RecordingStateInner>>, x: i32, y: i32, button: MouseButton) { /* ... */ }
async fn handle_key_down(state: &Arc<RwLock<RecordingStateInner>>, key: String, modifiers: Vec<KeyModifier>) { /* ... */ }

This would make the main event loop much cleaner and easier to follow.

Comment on lines +63 to +906
pub async fn generate_add_node_commands(
actions: &[RecordedAction],
start_position: (f64, f64),
state: &FlowLikeState,
options: Option<GeneratorOptions>,
) -> Result<Vec<GenericCommand>, TauriFunctionError> {
let opts = options.unwrap_or_default();
let registry = state.node_registry.read().await;
let mut commands = Vec::new();
let mut x_offset = start_position.0 as f32;
let mut y_offset = start_position.1 as f32;
let node_spacing = 300.0_f32;
let row_spacing = 400.0_f32;
let max_nodes_per_row: usize = 8;
let mut nodes_in_row: usize = 0;
let mut direction: f32 = 1.0; // 1.0 = right, -1.0 = left

let mut prev_exec_pin: Option<(String, String)> = None;
let mut session_node_id: Option<String> = None;
let mut session_out_pin_id: Option<String> = None;

// First, add a simple_event node as the trigger
let mut event_node = registry
.get_node("events_simple")
.map_err(|e| TauriFunctionError::new(&format!("events_simple node not found: {}", e)))?;
event_node.coordinates = Some((x_offset, y_offset, 0.0));

let add_event_cmd = AddNodeCommand::new(event_node);
let event_node_id = add_event_cmd.node.id.clone();
let event_exec_out = add_event_cmd
.node
.pins
.iter()
.find(|(_, p)| p.name == "exec_out" && p.pin_type == flow_like::flow::pin::PinType::Output)
.map(|(id, _)| id.clone());

commands.push(GenericCommand::AddNode(add_event_cmd));
prev_exec_pin = event_exec_out.map(|pin| (event_node_id.clone(), pin));

advance_layout(&mut x_offset, &mut y_offset, &mut nodes_in_row, &mut direction, node_spacing, row_spacing, max_nodes_per_row);

// Use the unified automation session that supports browser, desktop, and RPA
let mut session = registry.get_node("automation_start_session").map_err(|e| {
TauriFunctionError::new(&format!("automation_start_session node not found: {}", e))
})?;
session.coordinates = Some((x_offset, y_offset, 0.0));

// Create the AddNodeCommand which will generate new IDs for the node and pins
let add_session_cmd = AddNodeCommand::new(session.clone());

// Use the ACTUAL pin IDs from the created command, not the template
let actual_session_id = add_session_cmd.node.id.clone();
let actual_session_exec_in = add_session_cmd
.node
.pins
.iter()
.find(|(_, p)| p.name == "exec_in" && p.pin_type == flow_like::flow::pin::PinType::Input)
.map(|(id, _)| id.clone());
let actual_session_exec_out = add_session_cmd
.node
.pins
.iter()
.find(|(_, p)| p.name == "exec_out" && p.pin_type == flow_like::flow::pin::PinType::Output)
.map(|(id, _)| id.clone());
let actual_session_handle_out = add_session_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.friendly_name == "Session" && p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

commands.push(GenericCommand::AddNode(add_session_cmd));

// Connect event to session
if let (Some((prev_node, prev_pin)), Some(session_exec_in)) =
(&prev_exec_pin, &actual_session_exec_in)
{
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
prev_node.clone(),
actual_session_id.clone(),
prev_pin.clone(),
session_exec_in.clone(),
)));
}

session_node_id = Some(actual_session_id.clone());
session_out_pin_id = actual_session_handle_out.clone();
prev_exec_pin = actual_session_exec_out.map(|pin| (actual_session_id.clone(), pin));

advance_layout(&mut x_offset, &mut y_offset, &mut nodes_in_row, &mut direction, node_spacing, row_spacing, max_nodes_per_row);

// Minimum delay threshold to insert a delay node (milliseconds)
const MIN_DELAY_THRESHOLD_MS: i64 = 500;
// Minimum delay to insert after Enter key (for page navigation)
const MIN_DELAY_AFTER_ENTER_MS: i64 = 300;
let mut last_timestamp: Option<chrono::DateTime<chrono::Utc>> = None;

// Track last Copy node's text output for connecting to subsequent Paste nodes
let mut last_copy_text_output: Option<(String, String)> = None; // (node_id, pin_id)

// Track if last action was an Enter key press (for adding delay before clicks)
let mut last_was_enter = false;

for action in actions {
// Calculate delay from previous action
let delay_ms = if let Some(prev_ts) = last_timestamp {
let diff = action.timestamp.signed_duration_since(prev_ts);
diff.num_milliseconds()
} else {
0
};
last_timestamp = Some(action.timestamp);

// Insert delay node if there was a significant pause
if delay_ms > MIN_DELAY_THRESHOLD_MS {
tracing::debug!(" Adding delay node: {}ms", delay_ms);

if let Ok(mut delay_node) = registry.get_node("delay") {
delay_node.coordinates = Some((x_offset, y_offset, 0.0));

// Set the delay duration (Float type, in milliseconds)
if let Some((_, pin)) = delay_node.pins.iter_mut().find(|(_, p)| p.name == "time")
&& let Ok(bytes) = to_vec(&json!(delay_ms as f64))
{
pin.default_value = Some(bytes);
}

let add_delay_cmd = AddNodeCommand::new(delay_node);
let delay_node_id = add_delay_cmd.node.id.clone();

let delay_exec_in = add_delay_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "exec_in" && p.pin_type == flow_like::flow::pin::PinType::Input
})
.map(|(id, _)| id.clone());

let delay_exec_out = add_delay_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "exec_out" && p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

commands.push(GenericCommand::AddNode(add_delay_cmd));

// Connect previous node to delay
if let (Some((prev_node, prev_pin)), Some(delay_in)) =
(&prev_exec_pin, &delay_exec_in)
{
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
prev_node.clone(),
delay_node_id.clone(),
prev_pin.clone(),
delay_in.clone(),
)));
}

// Update prev_exec_pin to delay's output
if let Some(delay_out) = delay_exec_out {
prev_exec_pin = Some((delay_node_id, delay_out));
}

advance_layout(&mut x_offset, &mut y_offset, &mut nodes_in_row, &mut direction, node_spacing, row_spacing, max_nodes_per_row);
}
}
// If last action was Enter and this is a click, insert a minimum delay for page navigation
else if last_was_enter
&& matches!(
action.action_type,
ActionType::Click { .. } | ActionType::DoubleClick { .. }
)
{
tracing::debug!(
"Adding delay after Enter before click: {}ms",
MIN_DELAY_AFTER_ENTER_MS
);

if let Ok(mut delay_node) = registry.get_node("delay") {
delay_node.coordinates = Some((x_offset, y_offset, 0.0));

if let Some((_, pin)) = delay_node.pins.iter_mut().find(|(_, p)| p.name == "time")
&& let Ok(bytes) = to_vec(&json!(MIN_DELAY_AFTER_ENTER_MS as f64))
{
pin.default_value = Some(bytes);
}

let add_delay_cmd = AddNodeCommand::new(delay_node);
let delay_node_id = add_delay_cmd.node.id.clone();

let delay_exec_in = add_delay_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "exec_in" && p.pin_type == flow_like::flow::pin::PinType::Input
})
.map(|(id, _)| id.clone());

let delay_exec_out = add_delay_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "exec_out" && p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

commands.push(GenericCommand::AddNode(add_delay_cmd));

if let (Some((prev_node, prev_pin)), Some(delay_in)) =
(&prev_exec_pin, &delay_exec_in)
{
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
prev_node.clone(),
delay_node_id.clone(),
prev_pin.clone(),
delay_in.clone(),
)));
}

if let Some(delay_out) = delay_exec_out {
prev_exec_pin = Some((delay_node_id, delay_out));
}

advance_layout(&mut x_offset, &mut y_offset, &mut nodes_in_row, &mut direction, node_spacing, row_spacing, max_nodes_per_row);
}
}

tracing::debug!(" Processing action: {:?}", action.action_type);

// Track helper nodes needed for pattern matching (path_from_storage_dir, child)
let mut helper_commands: Vec<GenericCommand> = Vec::new();
let mut template_path_node_id: Option<String> = None;
let mut template_path_out_pin_id: Option<String> = None;
// Track fingerprint node for connecting to click nodes
let mut fingerprint_node_id: Option<String> = None;
let mut fingerprint_out_pin_id: Option<String> = None;
let mut fingerprint_exec_in_pin_id: Option<String> = None;
let mut fingerprint_exec_out_pin_id: Option<String> = None;

// Generate fingerprint_create node before clicks if fingerprint data is available
let is_click = matches!(
&action.action_type,
ActionType::Click { .. } | ActionType::DoubleClick { .. }
);
if opts.use_fingerprints && is_click {
if let Some(fp) = &action.fingerprint {
if let Some(fp_cmds) = generate_fingerprint_node(
fp,
&registry,
x_offset,
y_offset - 180.0,
) {
fingerprint_node_id = Some(fp_cmds.node_id.clone());
fingerprint_out_pin_id = Some(fp_cmds.fingerprint_out_pin_id.clone());
fingerprint_exec_in_pin_id = fp_cmds.exec_in_pin_id;
fingerprint_exec_out_pin_id = fp_cmds.exec_out_pin_id;
for cmd in fp_cmds.commands {
helper_commands.push(cmd);
}
}
}
}

let (node_name, extra_pins, _uses_rpa_session) = match &action.action_type {
ActionType::Click {
button,
modifiers: _,
} => {
let (x, y) = action.coordinates.unwrap_or((0, 0));
let button_str = match button {
MouseButton::Left => "Left",
MouseButton::Right => "Right",
MouseButton::Middle => "Middle",
};

// Use pattern matching if enabled and a screenshot is available
if opts.use_pattern_matching && action.screenshot_ref.is_some() {
let screenshot_id = action.screenshot_ref.as_ref().unwrap();

// Path is relative to upload_dir (from_upload_dir returns board_dir/upload)
let screenshot_path = match &opts.board_id {
Some(bid) => format!("rpa/{}/screenshots/{}.png", bid, screenshot_id),
None => format!("rpa/screenshots/{}.png", screenshot_id),
};

// Create path_from_upload_dir node (pure data node, no execution pins)
if let Ok(mut upload_dir_node) = registry.get_node("path_from_upload_dir") {
upload_dir_node.coordinates = Some((x_offset, y_offset - 150.0, 0.0));
let upload_dir_cmd = AddNodeCommand::new(upload_dir_node);
let upload_dir_id = upload_dir_cmd.node.id.clone();

let upload_path_out = upload_dir_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "path"
&& p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

helper_commands.push(GenericCommand::AddNode(upload_dir_cmd));

// Create child node to append the screenshot path
if let (Ok(mut child_node), Some(upload_out)) =
(registry.get_node("child"), upload_path_out)
{
child_node.coordinates =
Some((x_offset + 180.0, y_offset - 150.0, 0.0));

// Set the child_name (screenshot relative path)
if let Some((_, pin)) = child_node
.pins
.iter_mut()
.find(|(_, p)| p.name == "child_name")
&& let Ok(bytes) = to_vec(&json!(screenshot_path))
{
pin.default_value = Some(bytes);
}

let child_cmd = AddNodeCommand::new(child_node);
let child_node_id = child_cmd.node.id.clone();

let child_path_in = child_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "parent_path"
&& p.pin_type == flow_like::flow::pin::PinType::Input
})
.map(|(id, _)| id.clone());

let child_path_out = child_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "path"
&& p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

helper_commands.push(GenericCommand::AddNode(child_cmd));

// Connect upload_dir output to child input
if let Some(child_in) = child_path_in {
helper_commands.push(GenericCommand::ConnectPin(
ConnectPinsCommand::new(
upload_dir_id,
child_node_id.clone(),
upload_out,
child_in,
),
));
}

// Store child node output for connecting to vision_click_template
template_path_node_id = Some(child_node_id);
template_path_out_pin_id = child_path_out;
}
}

(
"vision_click_template",
vec![
("confidence", json!(opts.template_confidence)),
("click_type", json!(button_str)),
("fallback_x", json!(x)),
("fallback_y", json!(y)),
],
false,
)
} else {
// Even without full pattern matching mode, pass the screenshot ref
// so users can enable template matching later if needed
let mut pins = vec![
("x", json!(x)),
("y", json!(y)),
("button", json!(button_str)),
];

if let Some(ref screenshot_id) = action.screenshot_ref {
let screenshot_path = match &opts.board_id {
Some(bid) => format!("rpa/{}/screenshots/{}.png", bid, screenshot_id),
None => format!("rpa/screenshots/{}.png", screenshot_id),
};
pins.push(("screenshot_ref", json!(screenshot_path)));
}

// Add natural movement for bot detection evasion
if opts.bot_detection_evasion {
let mut rng = flow_like_types::rand::rng();
pins.push(("natural_move", json!(true)));
pins.push(("move_duration_ms", json!(rng.random_range(150..350))));
}

("computer_mouse_click", pins, false)
}
}
ActionType::DoubleClick { button: _ } => {
let (x, y) = action.coordinates.unwrap_or((0, 0));
let mut pins = vec![("x", json!(x)), ("y", json!(y))];

if let Some(ref screenshot_id) = action.screenshot_ref {
let screenshot_path = match &opts.board_id {
Some(bid) => format!("rpa/{}/screenshots/{}.png", bid, screenshot_id),
None => format!("rpa/screenshots/{}.png", screenshot_id),
};
pins.push(("screenshot_ref", json!(screenshot_path)));
}

if opts.bot_detection_evasion {
let mut rng = flow_like_types::rand::rng();
pins.push(("natural_move", json!(true)));
pins.push(("move_duration_ms", json!(rng.random_range(150..350))));
}

("computer_mouse_double_click", pins, false)
}
ActionType::Drag { start, end } => (
"computer_mouse_drag",
vec![
("start_x", json!(start.0)),
("start_y", json!(start.1)),
("end_x", json!(end.0)),
("end_y", json!(end.1)),
],
false,
),
ActionType::Scroll { direction, amount } => {
// Skip scroll events with 0 amount
if *amount == 0 {
continue;
}

// Convert pixel delta to scroll lines (rdev gives pixels, enigo expects lines)
// Typical scroll line = ~40 pixels, but cap at reasonable values
let lines = ((*amount as f32) / 40.0).round().max(1.0).min(20.0) as i32;
let (dx, dy) = match direction {
ScrollDirection::Down => (0, -lines),
ScrollDirection::Up => (0, lines),
ScrollDirection::Left => (-lines, 0),
ScrollDirection::Right => (lines, 0),
};
(
"computer_scroll",
vec![("dx", json!(dx)), ("dy", json!(dy))],
false,
)
}
ActionType::KeyType { text } => {
("computer_key_type", vec![("text", json!(text))], false)
}
ActionType::KeyPress { key, modifiers } => {
let modifier_str = modifiers
.iter()
.map(|m| match m {
KeyModifier::Shift => "shift",
KeyModifier::Control => "ctrl",
KeyModifier::Alt => "alt",
KeyModifier::Meta => "meta",
})
.collect::<Vec<_>>()
.join(",");
(
"computer_key_press",
vec![("key", json!(key)), ("modifiers", json!(modifier_str))],
false,
)
}
ActionType::AppLaunch {
app_name: _,
app_path,
} => (
"computer_launch_app",
vec![("path", json!(app_path))],
false,
),
ActionType::WindowFocus {
window_title: _,
process,
} => (
"computer_focus_window",
// Use process name (app name) for more reliable matching
// Window titles change with tab/page, but app names stay stable
vec![("window_title", json!(process))],
false,
),
ActionType::Copy {
clipboard_content: _,
} => {
// Copy reads from clipboard - we'll track its output to connect to Paste
("computer_clipboard_get_text", vec![], false)
}
ActionType::Paste { clipboard_content } => {
// For Paste, we write to clipboard
// If we have a previous Copy, we'll connect them; otherwise use captured content
let text = clipboard_content.clone().unwrap_or_default();
(
"computer_clipboard_set_text",
vec![("text", json!(text))],
false,
)
}
};

tracing::debug!(" Mapped to node: {}", node_name);
let mut node = match registry.get_node(node_name) {
Ok(n) => n,
Err(_) => {
tracing::warn!("Node {} not found, skipping action", node_name);
continue;
}
};
node.coordinates = Some((x_offset, y_offset, 0.0));

// Annotate click nodes with fingerprint context for debugging
if is_click {
if let Some(fp) = &action.fingerprint {
let parts: Vec<String> = [
fp.role.as_ref().map(|r| format!("Role: {}", r)),
fp.name.as_ref().map(|n| format!("Name: {}", n)),
fp.text.as_ref().map(|t| format!("Text: {}", t)),
]
.into_iter()
.flatten()
.collect();
if !parts.is_empty() {
node.description = format!(
"{} | Target: [{}]",
node.description,
parts.join(", ")
);
}
}
}

for (pin_name, value) in &extra_pins {
if let Some((_, pin)) = node.pins.iter_mut().find(|(_, p)| p.name == *pin_name)
&& let Ok(bytes) = to_vec(value)
{
pin.default_value = Some(bytes);
}
}

// Create the AddNodeCommand which generates new IDs
let add_cmd = AddNodeCommand::new(node);
let new_node_id = add_cmd.node.id.clone();

// Extract pin IDs from the CREATED node with new IDs, not the template
let exec_in_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "exec_in" && p.pin_type == flow_like::flow::pin::PinType::Input
})
.map(|(id, _)| id.clone());

let exec_out_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "exec_out" && p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

let session_in_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| {
(p.friendly_name == "Session" || p.friendly_name == "RPA Session")
&& p.pin_type == flow_like::flow::pin::PinType::Input
})
.map(|(id, _)| id.clone());

let new_session_out_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| {
(p.friendly_name == "Session" || p.friendly_name == "RPA Session")
&& p.pin_type == flow_like::flow::pin::PinType::Output
})
.map(|(id, _)| id.clone());

// For vision_click_template, find the template pin to connect FlowPath
let template_in_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| {
p.name == "template" && p.pin_type == flow_like::flow::pin::PinType::Input
})
.map(|(id, _)| id.clone());

// Extract text pins for Copy/Paste connection before add_cmd is moved
let text_output_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| p.name == "text" && p.pin_type == flow_like::flow::pin::PinType::Output)
.map(|(id, _)| id.clone());
let text_input_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| p.name == "text" && p.pin_type == flow_like::flow::pin::PinType::Input)
.map(|(id, _)| id.clone());

let fingerprint_in_pin = add_cmd
.node
.pins
.iter()
.find(|(_, p)| p.name == "fingerprint" && p.pin_type == flow_like::flow::pin::PinType::Input)
.map(|(id, _)| id.clone());

// Add helper nodes first (path_from_storage_dir, child) for pattern matching
for cmd in helper_commands {
commands.push(cmd);
}

// Add the node command BEFORE trying to connect its pins
commands.push(GenericCommand::AddNode(add_cmd));

// Connect template path to vision_click_template if pattern matching
if let (Some(path_node), Some(path_out), Some(template_in)) = (
&template_path_node_id,
&template_path_out_pin_id,
&template_in_pin,
) {
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
path_node.clone(),
new_node_id.clone(),
path_out.clone(),
template_in.clone(),
)));
}

// Wire fingerprint node into execution chain: prev → fingerprint → action node
if let (Some(fp_id), Some(fp_exec_in), Some(fp_exec_out)) =
(&fingerprint_node_id, &fingerprint_exec_in_pin_id, &fingerprint_exec_out_pin_id)
{
// Connect prev_exec → fingerprint.exec_in
if let Some((prev_node, prev_pin)) = &prev_exec_pin {
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
prev_node.clone(),
fp_id.clone(),
prev_pin.clone(),
fp_exec_in.clone(),
)));
}
// Connect fingerprint.exec_out → action_node.exec_in
if let Some(curr_pin) = &exec_in_pin {
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
fp_id.clone(),
new_node_id.clone(),
fp_exec_out.clone(),
curr_pin.clone(),
)));
}
// Connect fingerprint.fingerprint_out → action_node.fingerprint_in
if let (Some(fp_out), Some(fp_in)) = (&fingerprint_out_pin_id, &fingerprint_in_pin) {
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
fp_id.clone(),
new_node_id.clone(),
fp_out.clone(),
fp_in.clone(),
)));
}
} else if let (Some((prev_node, prev_pin)), Some(curr_pin)) = (&prev_exec_pin, &exec_in_pin) {
// No fingerprint node — connect directly as before
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
prev_node.clone(),
new_node_id.clone(),
prev_pin.clone(),
curr_pin.clone(),
)));
}

if let (Some(session_node), Some(session_pin), Some(curr_session_pin)) =
(&session_node_id, &session_out_pin_id, &session_in_pin)
{
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
session_node.clone(),
new_node_id.clone(),
session_pin.clone(),
curr_session_pin.clone(),
)));
}

// Handle Copy/Paste node connections
if matches!(&action.action_type, ActionType::Copy { .. }) {
// Track Copy node's text output for later Paste connection
if let Some(pin_id) = text_output_pin {
last_copy_text_output = Some((new_node_id.clone(), pin_id));
}
}

if matches!(&action.action_type, ActionType::Paste { .. }) {
// Connect previous Copy's text output to this Paste's text input
if let Some((copy_node_id, copy_text_pin)) = &last_copy_text_output
&& let Some(paste_text_pin) = text_input_pin
{
commands.push(GenericCommand::ConnectPin(ConnectPinsCommand::new(
copy_node_id.clone(),
new_node_id.clone(),
copy_text_pin.clone(),
paste_text_pin,
)));
}
}

if let Some(exec_out) = exec_out_pin {
prev_exec_pin = Some((new_node_id.clone(), exec_out));
}

if let Some(new_session_out) = new_session_out_pin {
session_node_id = Some(new_node_id.clone());
session_out_pin_id = Some(new_session_out);
}

// Track if this was an Enter key press for next iteration
last_was_enter = matches!(
&action.action_type,
ActionType::KeyPress { key, .. } if key == "Enter" || key == "Return"
);

advance_layout(&mut x_offset, &mut y_offset, &mut nodes_in_row, &mut direction, node_spacing, row_spacing, max_nodes_per_row);
}

Ok(commands)
}

pub fn action_to_description(action: &RecordedAction) -> String {
match &action.action_type {
ActionType::Click { button, modifiers } => {
let coords = action
.coordinates
.map(|(x, y)| format!(" at ({}, {})", x, y))
.unwrap_or_default();
let mods = if modifiers.is_empty() {
String::new()
} else {
format!(" with {:?}", modifiers)
};
format!("{:?} click{}{}", button, coords, mods)
}
ActionType::DoubleClick { button } => {
let coords = action
.coordinates
.map(|(x, y)| format!(" at ({}, {})", x, y))
.unwrap_or_default();
format!("{:?} double-click{}", button, coords)
}
ActionType::Drag { start, end } => {
format!(
"Drag from ({}, {}) to ({}, {})",
start.0, start.1, end.0, end.1
)
}
ActionType::Scroll { direction, amount } => {
format!("Scroll {:?} by {}", direction, amount)
}
ActionType::KeyType { text } => {
let preview = if text.len() > 20 {
format!("{}...", &text[..20])
} else {
text.clone()
};
format!("Type \"{}\"", preview)
}
ActionType::KeyPress { key, modifiers } => {
if modifiers.is_empty() {
format!("Press {}", key)
} else {
format!("Press {:?}+{}", modifiers, key)
}
}
ActionType::AppLaunch { app_name, .. } => {
format!("Launch {}", app_name)
}
ActionType::WindowFocus { window_title, .. } => {
format!("Focus window \"{}\"", window_title)
}
ActionType::Copy { clipboard_content } => {
let preview = clipboard_content
.as_ref()
.map(|s| {
if s.len() > 20 {
format!("\"{}...\"", &s[..20])
} else {
format!("\"{}\"", s)
}
})
.unwrap_or_else(|| "(empty)".to_string());
format!("Copy {}", preview)
}
ActionType::Paste { clipboard_content } => {
let preview = clipboard_content
.as_ref()
.map(|s| {
if s.len() > 20 {
format!("\"{}...\"", &s[..20])
} else {
format!("\"{}\"", s)
}
})
.unwrap_or_else(|| "(empty)".to_string());
format!("Paste {}", preview)
}
}
}

struct FingerprintNodeResult {
node_id: String,
fingerprint_out_pin_id: String,
exec_in_pin_id: Option<String>,
exec_out_pin_id: Option<String>,
commands: Vec<GenericCommand>,
}

fn generate_fingerprint_node(
fp: &RecordedFingerprint,
registry: &flow_like::state::FlowNodeRegistry,
x: f32,
y: f32,
) -> Option<FingerprintNodeResult> {
let mut node = registry.get_node("fingerprint_create").ok()?;
node.coordinates = Some((x, y, 0.0));

// Set the fingerprint ID
if let Some((_, pin)) = node.pins.iter_mut().find(|(_, p)| p.name == "id")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The generate_add_node_commands function is very large and complex, making it difficult to read and maintain. Consider refactoring the main for action in actions loop. You could create helper functions for each action type, like generate_nodes_for_click_action, which would encapsulate the logic for creating the main node and any helper nodes (e.g., for template paths or fingerprints). The main function would then iterate through the actions and delegate to these helpers, simplifying the overall structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Computer Use + Browser Use + RPA Nodes (Workflow Recording → Generated Flow)

1 participant