Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/how/kafka-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,45 @@ How Metadata Events relate to these topics is discussed at more length in [Metad

We've included environment variables to customize the name each of these topics, for cases where an organization has naming rules for your topics.

### Handling Large Kafka Messages

When ingesting large metadata records, such as those from Snowflake, you may encounter `org.apache.kafka.common.errors.RecordTooLargeException`. This error occurs when the message size exceeds the configured limits in Kafka.

#### Explanation of the Error

The `RecordTooLargeException` indicates that the message being sent to Kafka is larger than the configured maximum message size. This is common when dealing with large metadata records.

#### Step-by-Step Resolution Guide

1. **Increase Kafka Configuration Limits**:
- Update the `max.request.size` for the Kafka producer by setting the `SPRING_KAFKA_PRODUCER_PROPERTIES_MAX_REQUEST_SIZE` environment variable.
- Update the `max.partition.fetch.bytes` for the Kafka consumer by setting the `SPRING_KAFKA_CONSUMER_PROPERTIES_MAX_PARTITION_FETCH_BYTES` environment variable.

2. **Update Kafka Topic Configuration**:
- Set the `max.message.bytes` configuration for Kafka topics to allow larger messages.

3. **Helm Chart Configuration**:
- For Helm deployments, set these configurations in the `values.yaml` file:
```yaml
kafka:
maxMessageBytes: "10485760" # 10MB
producer:
maxRequestSize: "10485760" # 10MB
consumer:
maxPartitionFetchBytes: "10485760" # 10MB
```

4. **Compression**:
- Enable compression for Kafka messages by setting the `KAFKA_PRODUCER_COMPRESSION_TYPE` to `snappy` or another supported type.

5. **Check for Updates**:
- Ensure you are using the latest version of DataHub for potential improvements or fixes.

#### Sources and Further Reading

- [Slack discussion on Kafka configuration](https://datahubspace.slack.com/archives/C029A3M079U/p1701877527.416859)
- [DataHub environment variables documentation](https://github.com/datahub-project/datahub/blob/master/docs/deploy/environment-vars.md)

### Metadata Service (datahub-gms)

The following are environment variables you can use to configure topic names used in the Metadata Service container:
Expand Down