Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ The architecture establishes clear layers with controlled dependencies:
│ CLI Commands │
│ (validate, run, etc.) │
├─────────────────────────────────────────────────────────────┤
│ Application Facade
(QTypeFacade + Services) │
Application
(Services)
├─────────────────────────────────────────────────────────────┤
│ Interpreter │
│ (execution engine) │
Expand Down
4 changes: 0 additions & 4 deletions common/aws.bedrock.models.qtype.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,6 @@
provider: aws-bedrock
- id: amazon.titan-embed-g1-text-02
provider: aws-bedrock
- id: amazon.titan-text-express-v1:0:8k
provider: aws-bedrock
- id: amazon.titan-text-express-v1
provider: aws-bedrock
- id: amazon.titan-embed-text-v1:2:8k
provider: aws-bedrock
- id: amazon.titan-embed-text-v1
Expand Down
112 changes: 45 additions & 67 deletions common/tools.qtype.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,124 +11,102 @@ tools:
function_name: base64_decode
id: qtype.application.commons.tools.base64_decode
inputs:
data:
optional: false
type: text
- id: data
type: text
module_path: qtype.application.commons.tools
name: base64_decode
outputs:
result:
optional: false
type: file
- id: base64_decode_result
type: file
type: PythonFunctionTool
- description: Encode bytes to a Base64 string.
function_name: base64_encode
id: qtype.application.commons.tools.base64_encode
inputs:
data:
optional: false
type: file
- id: data
type: file
module_path: qtype.application.commons.tools
name: base64_encode
outputs:
result:
optional: false
type: text
- id: base64_encode_result
type: text
type: PythonFunctionTool
- description: Calculate the difference between two timestamps.
function_name: calculate_time_difference
id: qtype.application.commons.tools.calculate_time_difference
inputs:
end_time:
optional: false
type: datetime
start_time:
optional: false
type: datetime
- id: start_time
type: datetime
- id: end_time
type: datetime
module_path: qtype.application.commons.tools
name: calculate_time_difference
outputs:
result:
optional: false
type: TimeDifferenceResultType
- id: calculate_time_difference_result
type: TimeDifferenceResultType
type: PythonFunctionTool
- description: Format a timestamp using a custom format string that can be
passed to strftime.
function_name: format_datetime
id: qtype.application.commons.tools.format_datetime
inputs:
format_string:
optional: false
type: text
timestamp:
optional: false
type: datetime
- id: timestamp
type: datetime
- id: format_string
type: text
module_path: qtype.application.commons.tools
name: format_datetime
outputs:
result:
optional: false
type: text
- id: format_datetime_result
type: text
type: PythonFunctionTool
- description: Get the current UTC timestamp.
function_name: get_current_timestamp
id: qtype.application.commons.tools.get_current_timestamp
inputs: {}
inputs: []
module_path: qtype.application.commons.tools
name: get_current_timestamp
outputs:
result:
optional: false
type: datetime
- id: get_current_timestamp_result
type: datetime
type: PythonFunctionTool
- description: Parse a human-readable duration string into seconds.
function_name: parse_duration_string
id: qtype.application.commons.tools.parse_duration_string
inputs:
duration:
optional: false
type: text
- id: duration
type: text
module_path: qtype.application.commons.tools
name: parse_duration_string
outputs:
result:
optional: false
type: int
- id: parse_duration_string_result
type: int
type: PythonFunctionTool
- description: Add a specified amount of time from a given timestamp.
function_name: timedelta
id: qtype.application.commons.tools.timedelta
inputs:
days:
optional: true
type: int
hours:
optional: true
type: int
microseconds:
optional: true
type: int
milliseconds:
optional: true
type: int
minutes:
optional: true
type: int
seconds:
optional: true
type: int
timestamp:
optional: false
type: datetime
weeks:
optional: true
type: int
- id: timestamp
type: datetime
- id: days
type: int?
- id: seconds
type: int?
- id: microseconds
type: int?
- id: milliseconds
type: int?
- id: minutes
type: int?
- id: hours
type: int?
- id: weeks
type: int?
module_path: qtype.application.commons.tools
name: timedelta
outputs:
result:
optional: false
type: datetime
- id: timedelta_result
type: datetime
type: PythonFunctionTool
types:
- description: Custom type for TimeDifferenceResultType
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,34 @@ qtype run app.qtype.yaml --input-file inputs.csv

### Supported File Formats

- **CSV**: Columns map to input variable names
- **JSON**: Array of objects or records format
- **CSV**: Columns map to input variable names (best for primitive types)
- **JSON**: Array of objects or records format (best for nested/complex types)
- **Parquet**: Efficient columnar format for large datasets
- **Excel**: `.xlsx` or `.xls` files

### How It Works

When you provide `--input-file`, QType:
1. Reads the file into a pandas DataFrame
2. Each row becomes one execution of the flow
3. Column names must match flow input variable IDs
4. Processes rows with configured concurrency
5. Returns results as a DataFrame (can be saved with `--output`)
2. Automatically converts data to match input variable types
3. Each row becomes one execution of the flow
4. Column names must match flow input variable IDs
5. Processes rows with configured concurrency
6. Returns results as a DataFrame (can be saved with `--output`)

### Type Conversion

QType automatically converts file data to match your flow's input types:

- **Primitive types** (`int`, `float`, `bool`, `text`): Converted from file values
- **Custom types**: Validated and instantiated from dict/object columns (use JSON format)
- **Domain types**: Built-in types like `ChatMessage` or `SearchResult` (use JSON format)

**Format Selection Guide:**

- Use **CSV** for simple data with primitive types (strings, numbers, booleans)
- Use **JSON** for complex data with custom types, nested objects, or domain types
- Use **Parquet** for large datasets with mixed types and efficient storage

## Complete Example

Expand Down
28 changes: 27 additions & 1 deletion docs/How To/Data Processing/read_data_from_files.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Read Data from Files

Load structured data from files using FileSource, which supports CSV, JSON, JSONL, and Parquet formats with automatic format detection based on file extension.
Load structured data from files using FileSource, which supports CSV, JSON, JSONL, and Parquet formats with automatic format detection and type conversion.

### QType YAML

Expand All @@ -20,8 +20,34 @@ steps:
- **path**: File path (relative to YAML file or absolute), supports local files and cloud storage (s3://, gs://, etc.)
- **outputs**: Column names from the file to extract as variables (must match actual column names)
- **Format detection**: Automatically determined by file extension (.csv, .json, .jsonl, .parquet)
- **Type conversion**: Automatically converts data to match variable types (primitives, domain types, custom types)
- **Streaming**: Emits one FlowMessage per row, enabling downstream steps to process data in parallel

### Automatic Type Conversion

FileSource automatically converts data from files to match your variable types:

- **Primitive types** (`int`, `float`, `bool`, `text`): Direct conversion from file data
- **Domain types** (`ChatMessage`, `SearchResult`, etc.): Validated from dict/object columns
- **Custom types**: Your defined types are validated and instantiated from dict/object columns

**Format Recommendations:**

- **CSV**: Best for simple primitive types (strings, numbers, booleans)
- **JSON/JSONL**: Recommended for nested objects, custom types, and domain types
- **Parquet**: Best for large datasets with mixed types and efficient storage

**Example with Custom Types (JSON format):**

```json
[
{"person": {"name": "Alice", "age": 30}, "score": 95},
{"person": {"name": "Bob", "age": 25}, "score": 87}
]
```

JSON preserves nested objects, making it ideal for complex types. CSV stores everything as strings, requiring nested objects to be serialized as JSON strings within the CSV.

## Complete Example

```yaml
Expand Down
41 changes: 41 additions & 0 deletions docs/How To/Language Features/use_optional_variables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Use Optional Variables

Mark variables as optional to handle cases where data may be missing or unset, allowing your flow to continue gracefully instead of failing.

### QType YAML

```yaml
variables:
- id: email
type: text? # Optional text variable
```

### Explanation

- **`?` suffix**: Shorthand syntax to mark a variable as optional
- **Optional variables**: Can be `None` or set to a value
- **FieldExtractor**: Returns `None` for optional output variables when JSONPath finds no matches, instead of raising an error. If you make the variable non-optional, it will raise an error.

## Complete Example

```yaml
--8<-- "../examples/language_features/optional_variables.qtype.yaml"
```

**Run it:**
```bash
# When email field exists
qtype run examples/language_features/optional_variables.qtype.yaml -i '{"user_profile": {"email":"hello@domain.com"}}'
# Results:
# email: hello@domain.com

# When email field is missing
qtype run examples/language_features/optional_variables.qtype.yaml -i '{"user_profile": "just text"}'
# Results:
# email: None
```

## See Also

- [Variable Reference](../../components/Variable.md)
- [FieldExtractor Reference](../../components/FieldExtractor.md)
Original file line number Diff line number Diff line change
Expand Up @@ -31,20 +31,17 @@ tools:
method: GET
endpoint: /api/v3/pet/{petId}
auth: swagger-petstore---openapi-30_api_key_api_key
parameters:
petId:
inputs:
- id: petId
type: int
optional: false
outputs:
id:
type: int
optional: true
name:
type: text
optional: false
status:
- id: id
type: int?
- id: name
type: text
optional: true
- id: status
type: text?
parameters: []
```

### Explanation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,13 @@ tools:
module_path: myapp.utils
name: calculate_age
inputs:
birth_date:
- id: birth_date
type: datetime
optional: false
reference_date:
- id: reference_date
type: datetime
optional: false
outputs:
result:
- id: calculate_age_result
type: int
optional: false
```

### Explanation
Expand Down
2 changes: 1 addition & 1 deletion docs/Tutorials/03-structured-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -478,4 +478,4 @@ A: Not directly. Decoder maps JSON fields to individual outputs. If you need the
A: Use **Decoder** when you have a JSON/XML string to parse. Use **FieldExtractor** when you already have structured data and need to extract specific fields using JSONPath (covered in advanced tutorials).

**Q: Can I make properties optional?**
A: Currently all properties are required. For optional fields, you can define them in your flow logic but not include them in the Construct step.
A: Yes! Mark variables as optional using the `?` suffix (e.g., `type: text?`). Optional variables can be unset, `None`, or have a value. This is useful when extracting fields that may not always be present. See [Use Optional Variables](../How%20To/Language%20Features/use_optional_variables.md).
Loading