Skip to content

Pulsar Functions Leak Database Credentials to Logs #9462

@alexanderursu99

Description

@alexanderursu99

Is your enhancement request related to a problem? Please describe.
When I look at logs of function workers, I can see where database passwords are revealed in plain text.

21:51:55.889 [main] INFO org.apache.pulsar.functions.worker.FunctionAssignmentTailer - Received assignment update: instance {
functionMetaData {
functionDetails {
tenant: "market_data"
namespace: "deribit"
name: "skew_iv_v2"
className: "org.apache.pulsar.functions.api.utils.IdentityFunction"
processingGuarantees: EFFECTIVELY_ONCE
autoAck: true
parallelism: 1
source {
subscriptionType: FAILOVER
typeClassName: "org.apache.pulsar.client.api.schema.GenericRecord"
subscriptionName: "clickhouse-sink"
inputSpecs {
key: "market_data/deribit/skew_iv_v2"
value {
}
}
cleanupSubscription: true
}
sink {
className: "org.apache.pulsar.io.jdbc.ClickHouseJdbcAutoSchemaSink"
configs: "{\"userName\":\"USERNAME\",\"password\":\"PASSWORD\",\"jdbcUrl\":\"jdbc:clickhouse://ADDRESS:PORT/DATABASE\",\"tableName\":\"TABLE_NAME\",\"timeoutMs\":60000.0,\"batchSize\":100000.0}"
typeClassName: "org.apache.pulsar.client.api.schema.GenericRecord"
builtin: "jdbc-clickhouse"
}
resources {
cpu: 1.0
ram: 1073741824
disk: 10737418240
}
componentType: SINK
}
packageLocation {
packagePath: "market_data/deribit/skew_iv_v2/08b8f87d-6db5-4907-a85a-4f1fac9c5d5d-pulsar-io-jdbc-clickhouse-2.6.1.nar"
originalFileName: "pulsar-io-jdbc-clickhouse-2.6.1.nar"
}
version: 7
createTime: 1612193701790
instanceStates {
key: 0
value: RUNNING
}
functionAuthSpec {
data: "viv6c"
}
}
instanceId: -1
}
workerId: "pulsar-function-0"

This is an example snippet of logs I see. I've replaced some information with placeholders in all-caps, like USERNAME and PASSWORD. Most sensitive information is seen under configs.

Note that this is also the ClickHouse JDBC sink.

Describe the solution you'd like
For database credentials (at least) to not be exposed in the logs, or some option to disable it. I can already restrict access to who can administrate the cluster, but I cannot as easily restrict what logs someone is able to see in our log monitoring solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedtype/enhancementThe enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions