From 733975bd6d4551e1f5a2c41fce18fe02d85ef53f Mon Sep 17 00:00:00 2001 From: Oleksandr Tkachenko Date: Fri, 5 Sep 2025 17:35:34 +0200 Subject: [PATCH 1/9] ... --- examples/encrypted_chat/SPEC.md | 914 ++++++++++++++++++++++++++++++++ 1 file changed, 914 insertions(+) create mode 100644 examples/encrypted_chat/SPEC.md diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md new file mode 100644 index 00000000..50723928 --- /dev/null +++ b/examples/encrypted_chat/SPEC.md @@ -0,0 +1,914 @@ +# vetKey Encrypted Chat + +vetKey Encrypted Chat has two main components: the canister backend and user frontend. + +## Features + +* Messaging protected with end-to-end encryption. + +* High security through symmetric ratchet and key rotation. + +* Disappearing messages guaranteed by the canister (ICP smart contract) logic. + +## Design + +TODO: add information about the symmetric ratchet/vetKey rotation. + +### State Recovery + +TODO: explain what this it about and that it works via uploading encrypted user cache and disallowing to obtain the vetKey that was used to initialize the state. + +## Components + +vetKey Encrypted Chat consists of a backend canister on the ICP and a user frontend. + +The backend's responsibilities are: + +* Providing APIs for chat interactions and key retrieval from vetKeys. + +* Storing chat metadata and users' encrypted messages. + +* Ordering of incoming messages. + +* Validation of encryption key metadata correctness for incoming messages. + +* Access control for user requests to both encryption keys and chat data. + +* Cleanup of expired messages if message expiration is turned on in a chat. + +* Storing user's encrypted key cache that allows the user to restore a former symmetric ratchet state in case of a state loss, e.g., upon browser change. + +The frontend's responsibilities are: + +* Providing a chat UI similar to Signal, WhatsApp, etc. + +* Synchronizing metadata for accessible chats. + +* Obtaining keys required for message encryption/decryption. + +* Encrypting and sending outgoing messages. + +* Fetching and decrypting incoming messages. + +## Backend Canister Component + +### Backend State + +* Chat data + + * Chat IDs - each chat has a chat ID + + * vetKey epochs - each chat has one or more vetKey epochs + + * vetKey epoch ID for each vetKey epoch in the chat + + * Participants who have access to the chat at the vetKey epoch + + * Creation time of the vetKey epoch + + * Symmetric key ratchet rotation duration at the vetKey epoch + + * Message ID that the vetKey epoch starts with in the chat + + * Messages + + * Chat Message ID that is assigned by the canister + + * Nonce use for message encryption that is assigned by the user + + * Consensus time at message receival + + * vetKey epoch ID when the message was received + + * Encrypted bytes of the message content. + + * Message expiry + + * Number of expired messages in the chat + + * Message expiry setting - how long does it take for a message to expire + +* User data per chat and vetKey epoch + + * [User-uploaded optional encrypted symmetric ratchet state cache](#state-cache) + + * Optional optimization: [IBE-encrypted vetKey reshared by another user](#ibe-encrypted-vetkey-resharing) + +### Chat Creation + +Upon receiving a call from the frontend to create a chat via one of the following APIs + +``` +type OtherParticipant = principal; +type TimeNanos = nat64; +type SymmetricKeyRotationMins = nat64; +type GroupChatId = nat64; +type GroupChatMetadata = record { creation_timestamp : TimeNanos; chat_id : GroupChatId }; + +create_direct_chat : (OtherParticipant, SymmetricKeyRotationMins) -> variant { Ok : TimeNanos; Err : text }; +create_group_chat : (vec OtherParticipant, SymmetricKeyRotationMins) -> (variant { Ok : GroupChatMetadata; Err : text }); +``` + +the backend does the following: + +* Checks that a direct chat does not exist yet if `create_direct_chat` was called and returns an error if the check fails. + +* Checks that `SymmetricKeyRotationMins` do not cause overflows in `nat64` types if converted to nanoseconds and returns an error if the check fails. + +* Deduplicates group chat participants if `create_group_chat` was called. + +* If all checks pass, adds the chat ID and users who have access to it to the state. The return value of `create_direct_chat` is the current consensus time indicating the chat creation time (which is required to correctly compute the symmetric key epoch that the frontend needs to encrypt messages with). The return value of `create_group_chat` is `GroupChatMetadata`, which contains the chat creation time as well as the group chat ID. The group chat ID does not depend on the caller's inputs (in contrast to direct chat IDs), and thus must be returned explicitly. + +### Group Changes + +Group changes in a group chat such as addition or removal of users can be triggered using the following backend canister API: +``` +type GroupChatId = nat64; +type VetKeyEpochId = nat64; +type KeyRotationResult = variant { Ok : VetKeyEpochId; Err : text }; +type GroupModification = record { + remove_participants : vec principal; + add_participants : vec principal; +}; + +modify_group_chat_participants : (GroupChatId, GroupModification) -> (KeyRotationResult); +``` + +The API takes in a group chat ID and a set of group changes. +When this API is triggered, the canister checks that: + +* The group chat exists. + +* The user has access to the group chat at the latest vetKey epoch. + +* The user is authorized to make group changes. This is an implementation detail and is out of scope of this document. Authorizing users to make group changes can be performed via a separate API and can be implemented with different rules, e.g., admins can make changes, or more fine-grained access can be implemented such as admins, moderators, etc., or even every user can perform group changes. + +* The passed `GroupModification` is valid: + + * `remove_participants` or `add_participants` is non-empty. + + * Every `principal` in `remove_participants` has access to the chat. + + * No `principal` in `add_participants` has access to the chat. + +Note that the latter two points guarantee that there is no intersection between `remove_participants` and `add_participants`. + +A group change triggers a vetKey epoch rotation that updates the set of group participants according to the passed `GroupModification` and stores it in the next vetKey epoch for the chat. The effects of vetKey epoch rotation are further discussed in a [separate section](#vetkey-epoch-rotation). + +> [!NOTE] +> One call to `modify_group_chat_participants` triggers one vetKey epoch rotation even if multiple `principals` are added or removed. Further potential optimizations for reducing the number of vetKey epoch rotations or the number of vetKey retrievals are discussed in [Optimizations](#optimizations). + +### Incoming Message Validation + +Upon receival of a user message via one of the following APIs + +``` +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type VetKeyEpochId = nat64; +type EncryptedBytes = blob; +type SymmetricKeyEpochId = nat64; +type Nonce = blob; +type UserMessage = record { + vetkey_epoch_id : VetKeyEpochId; + content : EncryptedBytes; + symmetric_key_epoch_id : SymmetricKeyEpochId; + nonce : Nonce; +}; +type MessagingError = variant { WrongVetKeyEpoch; WrongSymmetricKeyEpoch; Custom: text }; + +send_message : (ChatId, UserMessage) -> (variant { Ok; Err : MessageSendingError }); +``` + +the canister validates the message metadata and ensures that the caller has access. + +More specifically, the canister checks that: +* The caller has access to the chat at the passed `vetkey_epoch_id` or returns a `Custom` variant of `MessageSendingError` if the check fails. +* `vetkey_epoch_id` attached to the message is the latest for the chat ID or returns the `WrongVetKeyEpoch` variant of `MessageSendingError` if the check fails. +* `symmetric_key_epoch_id` attached to the message is equal to the [current symmetric key epoch ID](#calculating-current-symmetric-ratchet-epoch-id) corresponding the current consensus time. To check that, the canister calculates the current symmetric ratchet epoch ID for the chat and `vetkey_epoch_id`. If the check fails, the canister returns the `WrongSymmetricKeyEpoch` variant of `MessageSendingError` if the check fails. + +TODO: can we use timestamps instead of `symmetric_key_epoch_id`s? Would that improve anything? + +> [!NOTE] +> This API assumes that the frontend's clock is reasonably synchronized with the ICP to encrypt the messages with the key from the right symmetric ratchet state. This does not pose a significant limitation, since 1) it must already be the case for facilitating reliable communication with the ICP in general and 2) re-encrypting and re-sending in case of failures can be done automatically by the frontend. + +If the checks pass, the canister accepts the message, assigns to it: + +* The current consensus time as its timestamp, which is needed for computing the message expiry but also to display the message arrival time in the chat UI. + +* A chat message ID, which is unique and assigned from an incrementing counter starting from zero. Note that the current number of messages in the chat is different than the value of the counter if some messages have expired. + +Finally, the canister adds the message to the state and returns an `Ok`. + +### Exposing Metadata about Chats and New Messages + +The backend canister exposees the following APIs for fetching metadata: + +``` +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type NumberOfMessages = nat64; +type ChatMetadata = record { + chat_id : ChatId; + number_of_messages : NumberOfMessages; + vetkey_epoch_id : VetKeyEpochId; +}; + +type SymmetricKeyRotationMins = nat64; +type ChatMessageId = nat64; +type TimeNanos = nat64; +type VetKeyEpochId = nat64; +type VetKeyEpochMetadata = record { + symmetric_key_rotation_duration : SymmetricKeyRotationMins; + participants : vec principal; + messages_start_with_id : ChatMessageId; + creation_timestamp : TimeNanos; + epoch_id : VetKeyEpochId; +}; + +get_my_chats_and_time : () -> (record { chats : vec ChatMetadata; consensus_time_now : TimeNanos }) query; +get_vetkey_epoch_metadata : (ChatId, VetKeyEpochId) -> (variant { Ok : VetKeyEpochMetadata; Err : text }) query; +``` + +* The `get_my_chats_and_time` API returns a vector of all chat IDs accessible to the user as well as their their current total number of messages and vetKey epoch ID. The frontend can detect new chats and new messages in existing chats by periodically querying `get_my_chats_and_time`. Also, this API returns the current consensus time, which is e.g. useful to compute the message expiry and to determine if symmetric ratchet states need to be evolved. The current total number of messages (`NumberOfMessages`) includes the messages in the accessible chat ID that are not accessible to the user. This can happen if some messages have expired or in group chats, where the user joined at a later point. If a user is [removed from a chat](#group-changes), the result of `get_my_chats_and_time` called by the user will not include that chat anymore. Note that the latter only leaks to the user how many messages were in the chat before the user joined. + +* The `get_vetkey_epoch_metadata` API checks that the user has access to `ChatId` at `VetKeyEpochId` and if the test passes, the API returns the corresponding `VetKeyEpochMetadata` or an error otherwise. + +> [!NOTE] +> This API is exposed as `query` and, therefore, requires handling of cases where a replica would return incorrect data. This is further discussed [Ensuring Correctness of Query Calls](#ensuring-correctness-of-query-calls). + +### Encrypted Message Retrieval + +To allow the frontend to retrieve encrypted messages, the backend canister exposes the following backend canister API: + +``` +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type ChatMessageId = nat64; +type Limit = nat32; +type EncryptedBytes = blob; +type EncryptedMessage = record { + content : EncryptedBytes; + metadata : EncryptedMessageMetadata; +}; +type VetKeyEpochId = nat64; +type SymmetricKeyEpochId = nat64; +type TimeNanos = nat64; +type Nonce = blob; +type EncryptedMessageMetadata = record { + vetkey_epoch : VetKeyEpochId; + sender : principal; + symmetric_key_epoch_id : SymmetricKeyEpochId; + chat_message_id : ChatMessageId; + timestamp : TimeNanos; + nonce : Nonce; +}; + +get_messages : (ChatId, ChatMessageId, opt Limit) -> ( + vec EncryptedMessage, + ) query; +``` + +The `get_messages` API takes in a chat ID, the first message ID to retrieve, and an optional limit value for the maximum number of messages to retrieve in this call. +The API returns a vector of `EncryptedMessage`s. +If the user does not have access to the chat or the chat does not exist, an empty vector is returned. +If the user does not have access to particular messages, e.g., if the user was [added to a group chat](#group-changes) after some activity, or if some of the messages [expired](#disappearing-messages), then those messages are skipped. + +If a user is removed from a chat and afterwards the user is added to the chat again (with or without some messages being added in-between), the user will not have access to the messages that were visible before the user was removed from the chat. +This applies to any number of repetitions of this process. +That is, only the last range of messages without gaps is accessible to the user. +This also applies to other backend canister APIs that require the user to have access to a particular vetKey epoch such as vetKey epoch and encrypted user cache retrieval, and vetKey derivation. + +> [!NOTE] +> This API is exposed as `query` and, therefore, requires handling of cases where a replica would return incorrect data. This is further discussed [Ensuring Correctness of Query Calls](#ensuring-correctness-of-query-calls). + +### Providing vetKeys for Symmetric Ratchet Initialization + +Symmetric ratchet state is initialized from a vetKey that is the same for all chat participants. +To fetch a vetKey, the user calls the following backend canister API: +``` +type PublicTransportKey = blob; +type GroupChatId = nat64; +type VetKeyEpochId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type EncryptedVetKey = blob; + +derive_chat_vetkey : (ChatId, VetKeyEpochId, PublicTransportKey) -> (variant { Ok : EncryptedVetKey; Err : text }); +``` + +Then, the canister checks that: + +* The chat corresponding to the passed `ChatId` exists. + +* The user has access to the chat at the passed `VetKeyEpochId`. + +* The user did not upload an encrypted cache for his symmetric ratchet state for the vetKey epoch in question (see [State Recovery](#state-recovery)). + +If the checks pass, the canister calls the [`vetkd_derive_key`](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api) API of the management canister with: + +* `context` being computed by invoking the `ratchet_context` function defined below. + +* `input` being the big-endian encoding of `VetKeyEpochId`. + +* `transport_public_key` being the `PublicTransportKey` input argument. + +* `key_id` being an implementation detail. + +```rust +pub fn ratchet_context(chat_id_bytes: &[u8]) -> Vec { + pub static DOMAIN_SEPARATOR_VETKEY_ROTATION: &str = "vetkeys-example-encrypted-chat-rotation"; + + let mut context = vec![]; + context.extend_from_slice(&[DOMAIN_SEPARATOR_VETKEY_ROTATION.len() as u8]); + context.extend_from_slice(DOMAIN_SEPARATOR_VETKEY_ROTATION.as_bytes()); + context.extend_from_slice(chat_id_bytes); + context +} +``` + +> [!NOTE] +> `chat_id_bytes` is a serialization of the chat ID and is an implementation detail. + +The actual initialization of the symmetric ratchet state is performed in the frontend and is, therefore, specified in [Ratchet Initialization](#ratchet-initialization) in the frontend. + +### User's Encrypted Symmetric Ratchet State Cache + +The canister backend provides the following APIs for storing users' encrypted symmetric ratchet states in the canister: +``` +type PublicTransportKey = blob; +type EncryptedVetKey = blob; +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type VetKeyEpochId = nat64; +type EncryptedSymmetricRatchetCache = blob; +type DerivedVetKeyPublicKey = blob; + +get_vetkey_for_my_cache_encryption : (PublicTransportKey) -> (EncryptedVetKey); +get_my_symmetric_key_cache : (ChatId, VetKeyEpochId) -> (variant { Ok : opt EncryptedSymmetricRatchetCache; Err : text }); +update_my_symmetric_key_cache : (ChatId, VetKeyEpochId, EncryptedSymmetricRatchetCache) -> (variant { Ok; Err : text }); +``` + +To facilitate those APIs, we make use of [Encrypted Maps](https://docs.rs/ic-vetkeys/latest/ic_vetkeys/encrypted_maps/struct.EncryptedMaps.html), which, as the name says, allow users to upload encrypted data into a map structure inside the canister. +The advantage in using Encrypted Maps is that 1) we do not need to design and implement such a scheme ourselves, and 2) Encrypted Maps allow to encrypt data efficiently in terms of both the number of fetched vetKeys and the efficiency of the used cryptography. + +On a high level, the canister creates exactly one encrypted map for the user that stores _all_ their key caches. +The cache is stored in the map as a `(key, value)`, where `key` is a serialization of tuple `(ChatId, VetKeyEpochId)` and `value` is `EncryptedSymmetricRatchetCache`. + +The user calls `get_vetkey_for_my_cache_encryption` to obtain the vetKey used for data encryption for their storage, which is called once upon initialization of the state in the frontend. +It can be called multiple times in general if the client loses their local state in the browser, e.g., when the client wants to use it on another device or in a different browser. + +The user calls `update_my_symmetric_key_cache` and `get_my_symmetric_key_cache` to update or fetch their cache. +In `update_my_symmetric_key_cache`, the canister checks that: + +* The user has access to the passed `ChatId` and `VetKeyEpochId`. + +* The passed `EncryptedSymmetricRatchetCache` has a reasonable size. The purpose of this check is to prevent misuse e.g. for cycles draining attacks where an attacker would store huge amounts of data as `EncryptedSymmetricRatchetCache`. What a reasonable size is depends on how the symmetric ratchet state is serialized in the frontend, which is an implementation detail, but generally the limit can be quite generous, e.g., 100 bytes. In most cases the size will be fixed though, since `EncryptedSymmetricRatchetCache` contains an encryption of 1) a symmetric epoch key and 2) a `VetKeyEpochId`, which both have a fixed size. For example, if using AES-GCM with a 16-byte authentication tag and a 12-byte nonce, then `EncryptedSymmetricRatchetCache` will have the ciphertext overhead of 16 + 12 = 28 bytes in terms of size and the total ciphertext size will be 28 + 32 (symmetric epoch key) + 8 (`VetKeyEpochId`) = 68 bytes. + +If the checks pass, the canister accepts the call and stores the cache, or overwrites if it exists, in the state. + +The `get_my_symmetric_key_cache` call retrieves the response of Encrypted Maps for getting their cache corresponding to the input arguments, which is the encrypted bytes if the entry exists or `null` if it does not. + +The expired caches are removed transparently to the user by the canister, i.e., the removal does not require explicit calls for doing that. +Expired cache is defined as a cache that neither has any messages associated with it in the canister nor does it correspond to the latest vetKey epoch for the chat ID. +See also [Disappearing Messages](#disappearing-messages). + +TODO: give more details about the Encrypted Maps APIs and how they are called. + +TODO: describe that as a potential optimization we could fetch all available cached ratchet states to reduce the number of calls. + +TODO: how frequently? + +### vetKey Epoch Rotation + +The vetKey epoch rotation can happen because of two different reasons: + +* Group change: this prevents a newly added user to be able to decrypt old messages or a deleted user to be able to decrypt new encrypted messages in the chat. + +* User request: this prevents an attacker that obtained a symmetric ratchet state to be able do decrypt messages that will be encrypted after the vetKey epoch rotation takes place. + +To facilitate this functionality, the canister provides two APIs: + +``` +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type VetKeyEpochId = nat64; +type GroupModification = record { + remove_participants : vec principal; + add_participants : vec principal; +}; +type KeyRotationResult = variant { Ok : VetKeyEpochId; Err : text }; + +// Group change in a group chat such as user addition (without access to the past chat history) or user removal. +// Takes in a batch of such changes to potentially reduce the number of required rotations. +modify_group_chat_participants : (GroupChatId, GroupModification) -> (KeyRotationResult); + +// User-initiated key rotation, e.g., periodic key rotation. +rotate_chat_vetkey : (ChatId) -> (KeyRotationResult); +``` + +Both have the same effect of rotation the vetKey epoch, but they follow different input validation rules. The rules for `modify_group_chat_participants` are discussed in [Group Changes](#group-changes). + +To validate the inputs in a `rotate_chat_vetkey` call, the canister checks that the user has access to the passed chat ID and eventually if the user is authorized to perform a key rotation (see [Group Changes](#group-changes)). + +Further validation rules can be added here and are an implementation detail. For example, the canister can rate-limit calls to `rotate_chat_vetkey` or make such calls dependent on further conditions such as user's subscription type. + +### Disappearing Messages + +Disappearing messages for a chat are defined by a non-negative integer identifying how many messages are expired, i.e., `e` expired messages mean that any message ID in the chat smaller than `e` has expired. +Expired messages cannot be [retrieved](#encrypted-message-retrieval) from the canister backend anymore and the canister backend will delete them eventually. + +The deletion algorithm is an implementation detail of the canister backend, but in general we see two options: + +* Delete expired messages in a timer job. This allows to delete messages periodically but running a timer job too often may be too expensive, so messages will be deleted with a delay. + +* Delete expired messages while sending new messages, i.e., whenever the API for sending messages is invoked, it internally calls the message deletion routine. This works well if there is a lot of activity in the chat but will leave messages undeleted or deleted after a long delay if there is no or very little activity. + +The chat allows to assign or update a disappearing messages duration for messages in a chat via the following backend canister API: + +``` +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type MessageExpiryMins = nat64; + +set_message_expiry : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); +``` + +Upon receival of a `set_message_expiry` call, the backend canister checks that the user has access to `ChatId` and eventually that the user is authorized to call `set_message_expiry` for `ChatId` and sets `MessageExpiryMins` in the state for `ChatId`. + +The semantics of `MessageExpiryMins` is as follows: + +* If `MessageExpiryMins` is equal to zero, then this means no expiry is set and all messages are always returned. + +* Otherwise the value of `MessageExpiryMins` is used as the message expiry. + +Every chat is [created](#chat-creation) with the expiry of 0, which holds a special meaning that messages do not expire. +Once the state is updated, it affects the behavior of [the APIs returning the metadata about message IDs](#exposing-metadata-about-chats-and-new-messages) and [the APIs returning the actual messages](#encrypted-message-retrieval). + +* `get_messages : (ChatId, ChatMessageId, opt Limit) -> (vec EncryptedMessage) query;` does not add expired messages to the output. + +## User Frontend Component + +### Frontend State + +* Chat metadata for each chat. + + * Chat ID. + + * Number of received and decrypted messages so far. + + * Latest vetKey epoch ID. + + * VetKey epoch metadata for all vetKey epochs that were required for encryption or decryption. Note that the metadata of the vetKey epochs that are not the last vetKey epoch and whose messages have expired are removed from the state. + + * Message expiry data. + +* Symmetric ratchet states for all vetKey epochs that were required for encryption or decryption. Similarly to vetKey epoch metadata, expired symmetric ratchet states are deleted from the state. + +* Decrypted non-expired messages for each chat. + +* Optional optimization: All messages and chats stored in browser storage. + +### Chat UI + +The chat UI is a UI that displays chats and their decrypted messages, and allows the user to send their own messages and change settings such as group membership and message expiration via UI elements. +It may also display the metadata about the encrypted chat such as the current epoch information if that fits the application. + +The chat UI calls a few canister backend APIs directly, normally via dedicated UI buttons: [chat creation](#chat-creation), [vetKey rotation](#vetkey-epoch-rotation), [group changes](#group-changes), [updating the message expiry](#disappearing-messages). + +Since the chat UI is mostly an implementation detail, only its interaction with [Encrypted Messaging Service (EMS)](#encrypted-messaging-service) is described here, whose purpose is it to take care of encryption and decryption of messages. + +The chat UI uses the EMS in a black-box way to: + +* Dispatch user messages to be encrypted and sent via the `enqueueSendMessage` API of the EMS. + +* Fetch received and decrypted messages via periodically quering the `takeReceivedMessages` API of the EMS. + +* Find out which chats are accessible at the moment via the `getCurrentChatIds` API of the EMS. New chats need to be added to the UI and chats that the user has lost access to need to be removed from the UI. + +### Encrypted Messaging Service + +Encrypted Messaging Service (EMS) is a component that gives the developer a transparent way to interact with the encrypted chat by reading from a stream of received and decrypted messages and putting user messages to be encrypted and sent into a stream that the EMS will take care of encrypting and sending. + +The EMS exposes the following APIs: + +TODO: add types `ChatId`, `ChatIdAsString`, `Message` + +* `enqueueSendMessage(chatId: ChatId, content: Uint8Array)`: adds the message `content` to be encrypted for and sent for adding to the chat with ID `chatId`. This API does not give any guarantees that the message will actually be added to the chat but it makes attempts to recover from recoverable errors (see [Encrypting and Sending Messages in the EMS](#encrypting-and-sending-messages)). + +* `takeReceivedMessages(): Map`: returns latest chat messages that were received and decrypted by the EMS and were not yet taken by the user from the EMS (see [Fetching and Decrypting Messages in the EMS](#fetching-and-decrypting-messages)). + +* `start()`: starts the EMS service. + +* `skipMessagesAvailableLocally(chatId: ChatId, lastKnownChatMessageId: bigint)`: tells the EMS what chat message ID should be the first one to fetch the messages. This is relevant if some of the messages are available from another source such as [browser storage](#local-cache-in-indexeddb). + +* `getCurrentChatIds(): ChatId[]`: returns the chat IDs that are currently accessible to the user. This particular API is mostly an efficiency optimization, since the message retrieval in the EMS anyways requires fetching the information about the currently accessible chats. + +#### Encrypting and Sending Messages + +For message [encryption](#ratchet-message-encryption) and [sending](#incoming-message-validation), the EMS makes use of the following backend canister APIs: [`get_my_chats_and_time` and `get_vetkey_epoch_metadata`](#exposing-metadata-about-chats-and-new-messages). + +The EMS periodically takes a message from the sending stream that was added via the `enqueueSendMessage` API. If the stream is empty, the EMS retries with a timeout. If the stream is non-empty, the EMS takes it from the stream and performs the following steps: + +1. If there is no symmetric ratchet state for the chat, the EMS [initializes](#ratchet-initialization) the symmetric ratchet for the latest vetKey epoch id in the output of [`get_my_chats_and_time`](#exposing-metadata-about-chats-and-new-messages) and calls `get_vetkey_epoch_metadata` to obtain its metadata. + +2. The EMS encrypts the message using the symmetric ratchet state that corresponds to the latest known vetKey epoch ID (see [Symmetric Ratchet](#symmetric-ratchet)) at the current time. It may happen that the symmetric ratchet epoch of the symmetric ratchet state is smaller than needed for the encryption at the current time. In that case, the state is copied to a temporary state and the temporary state is evolved to encrypt the message. After encryption, the temporary state may be deleted. Note that the encryption does not evolve the symmetric ratchet state because the canister ultimately decides when a symmetric epoch ends. The indication that a symmetric ratchet epoch has ended and the previous epoch is not needed anymore is that the frontend fetches a message that was encrypted with the next symmetric ratchet epoch. The evolution of the state then happens in the process of [decryption](#fetching-and-decrypting-messages) of that message. + +3. The EMS [sends](#incoming-message-validation) the message. + +4. If the canister returns the `WrongSymmetricKeyEpoch` variant of `MessageSendingError`, then the EMS was unlucky and the sent message arrived at the next symmetric key epoch. In this case the EMS goes to step 2. + +5. If the canister returns the `WrongVetKeyEpoch` variant of `MessageSendingError`, then either the [manual vetKey epoch rotation](#vetkey-epoch-rotation) or a [group change](#group-changes) took place. In this case, the EMS + +To avoid infinite loops in case of too strict parameters, bad network connectivity, etc., the maximum number of retries should be capped. + +TODO: maybe we should expose `getLatestVetKeyEpochMetadata()` in the EMS to have a central place where this API is queried because in the chat UI we use it to find out the current chat participants. Alternatively, just expose a function that returns the participants. + +#### Fetching and Decrypting Messages + +For message [retrieval](#encrypted-message-retrieval) and [decryption](#ratchet-message-decryption), the EMS makes use of the following backend canister APIs: [`get_my_chats_and_time` and `get_vetkey_epoch_metadata`](#exposing-metadata-about-chats-and-new-messages). + +Also, the `get_messages` backend canister API is used to retrieve the encrypted messages for the chat. Its behavior is closer described in [Encrypted Message Retrieval](#encrypted-message-retrieval). + +The frontend stores the following in its state: the chat IDs, the first accessible message ID for the user, the last fetched message, and the total number of messages in the chat. +Let us call this information frontend chat metadata. + +Periodically, the EMS queries the `get_my_chats_and_time` backend canister API. +Its result is compared to the frontend chat metadata in the state. +If there is a new chat in the result that is not yet in the state, the EMS adds it to the state along with the information that no messages were obtained for this chat yet. +If one of the chat in the state does not appear in the result of `get_my_chats_and_time` anymore, than this chat is deleted from the state. + +Also periodically, two separate routines run. + +1. Check if there are new messages to be fetched from the canister: if the largest received message ID for the chat plus one is smaller than the total number of messages in the chat. If it is, then `get_messages` is invoked with the first message ID to be fetched that is equal to the largest received message ID for the chat plus one. If an error occurs due to too large messages that don't fit into the response, the query to `get_messages` is retried with a limit of one. A successful result is stored in the received messages queue. + +2. Try to take a message from the received messages queue and decrypt it. + + a. The EMS checks if it already has the symmetric ratchet state in its state that is required to decrypt the message, whose metadata specifies the required vetKey epoch and symmetric ratchet epoch. If the symmetric ratchet state is not yet initialized, the EMS [initializes](#ratchet-initialization) it. An error to do so is unrecoverable. + + b. The EMS [decrypts](#ratchet-message-decryption) the message using the symmetric ratchet state and the vetKey epoch ID stored in the message metadata. A successfully decrypted message is put into the decrypted message queue that is exposed to the chat UI component via the [`takeReceivedMessages` API](#encrypted-messaging-service) of the EMS. If the decryption returns an error, such an error is unrecoverable and instead of a decrypted message, a message of special form is put into the decrypted message queue that indicated that this message could not be decrypted. Note that user-side errors cannot be avoided, since the canister cannot check if the encryption is valid. + +TODO: in the above point it may happen that the calculation is incorrect because if some messages expire, the first accessible message ID moves. We should probably just use the last received message ID instead. + +TODO: change this to not evolve on decryption. We want to evolve this whenever messages disappear. The ratchet specifies how granular we can make messages disappear and provides forward security for symmetric epochs we already "forgot". + +#### vetKey Epoch Rotation + +#### Symmetric Ratchet + +A symmetric ratchet state consists of a symmetric ratchet epoch key and a symmetric ratchet epoch id that it corresponds to. + +##### Ratchet Initialization + +The ratchet state is initialized from a vetKey as follows: + +1. Fetch, decrypt, and verify the vetKey + + a. [Generate](https://dfinity.github.io/vetkeys/classes/_dfinity_vetkeys.TransportSecretKey.html#random) a transport key pair. + + b. [Fetch](#providing-vetkeys-for-symmetric-ratchet-initialization) the encrypted vetKey for the chat and vetKey epoch from the backend canister. + + c. Compute the verification public key either [locally](https://dfinity.github.io/vetkeys/classes/_dfinity_vetkeys.MasterPublicKey.html) or via querying the `chat_public_key` backend canister API. + + d. [Decrypt and verify](https://dfinity.github.io/vetkeys/classes/_dfinity_vetkeys.EncryptedVetKey.html#decryptandverify) the vetKey. + +2. Compute and save the ratchet state + + a. Compute [`let rootKey = deriveSymmetricKey(vetKeyBytes, DOMAIN_RATCHET_INIT, 32)`](https://github.com/dfinity/vetkeys/blob/83b887f220a2c1c40713a3512ce5a9994d5ec4c6/frontend/ic_vetkeys/src/utils/utils.ts#L352), where `DOMAIN_RATCHET_INIT` is a unique domain separator for ratchet initialization (TODO: this function is currently internal - we should make it public). + + b. Initialize the symmetric ratchet state as `rootKey` and symmetric ratchet epoch that is equal to zero. + +More details about the retrieval and decryption of vetKeys can be found in the [developer docs](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api) of the ICP. + +TODO: add initialization from cache + +##### Ratchet Evolution + +```ts +import { deriveSymmetricKey } from '@dfinity/vetkeys'; + +type RawSymmetricRatchetState = { epochKey: Uint8Array, epochId: bigint }; + +function evolve(symmetricRatchetState: RawSymmetricRatchetState) : RawSymmetricRatchetState { + const domainSeparator = new Uint8Array([ + ...DOMAIN_RATCHET_STEP, + ...uBigIntTo8ByteUint8ArrayBigEndian(symmetricRatchetState.epochId) + ]); + const newEpochkey = deriveSymmetricKey(symmetricRatchetState.epochKey, domainSeparator, 32); + + return { epochKey: newEpochkey, epochId: symmetricRatchetState.epochId + 1n} +} +``` +where `DOMAIN_RATCHET_STEP` is a unique domain separator. + +Alternatively, this can be implemented using Web Crypto API to make the current key non-extractable. Note though that Web Crypto API's `deriveKey` cannot derive an HKDF key and, therefore, to derive the next epoch key, it first needs to be derive via `deriveBits`, which returns the next epoch key in form of a byte vector, which needs to be imported as `CryptoKey`. + +```ts +type SymmetricRatchetState = { epochKey: CryptoKey, epochId: bigint }; + +async function deriveNextSymmetricRatchetEpochCryptoKey(symmetricRatchetState: RawSymmetricRatchetState) : Promise { + const exportable = false; + const domainSeparator = ratchetStepDomainSeparator() + const algorithm = { + name: 'HKDF', + hash: 'SHA-256', + length: 32 * 8, + info: domainSeparator, + salt: new Uint8Array() + }; + + const rawKey = await globalThis.crypto.subtle.deriveBits(algorithm, epochKey, 8 * 32); + + return await globalThis.crypto.subtle.importKey('raw', rawKey, algorithm, exportable, [ + 'deriveKey', + 'deriveBits' + ]); +} +``` + +It would be quite natural and similar to [Signal's symmetric ratchet](https://signal.org/docs/specifications/doubleratchet/) if the decryption would trigger the ratchet evolution. However, that would force the frontned to decrypt all messages belonging to one vetKey epoch in cases where a chat has many messages and we only want to display the latest ones. This incurs a big and unnecessary overhead in terms of both communication and computation. + +Therefore, neither encryption nor decryption directly evolve `SymmetricRatchetState` but instead, the state is evolved whenever the current consensus time obtained via `get_my_chats_and_time` minus the message expiry is larger than the timestamp of the current symmetric key epoch id + 1, i.e., whenever there can be no non-expired message that we would need the current symmetric ratchet epoch to decrypt. The state evolution can be triggered by a background job that periodically checks if state evolution should be performed. + +##### Ratchet Message Encryption +```ts +import { DerivedKeyMaterial } from '@dfinity/vetkeys'; + +async encrypt( + epochKey: CryptoKey, + sender: Principal, + nonce: Uint8Array, + message: Uint8Array +): Promise { + const domainSeparator = messageEncryptionDomainSeparator(sender, nonce); + const derivedKeyMaterial = DerivedKeyMaterial.fromCryptoKey(epochKey); + return await derivedKeyMaterial.encryptMessage(message, domainSeparator); +} +``` +where [`messageEncryptionDomainSeparator`](#typescript-domain-separators) is a unique [size-prefixed](#typescript-size-prefix) domain separator. + +##### Ratchet Message Decryption + +```ts + +import { DerivedKeyMaterial } from '@dfinity/vetkeys'; + +async decrypt( + epochKey: CryptoKey, + sender: Principal, + nonce: Uint8Array, + message: Uint8Array +): Promise { + const domainSeparator = messageEncryptionDomainSeparator(sender, nonce); + const derivedKeyMaterial = DerivedKeyMaterial.fromCryptoKey(epochKey); + return await derivedKeyMaterial.encryptMessage(message, domainSeparator); +} +``` +where [`messageEncryptionDomainSeparator`](#typescript-domain-separators) is a unique [size-prefixed](#typescript-size-prefix) domain separator. + +#### State Cache + +##### Encrypted Maps + +## Ensuring Correctness of Query Calls + +TODO + +Idea 1: use query calls and start work, while in the background an update call is invoked to compare the result. + +Idea 2: use certified variables. + +Comment by Andrea regarding Idea 2: +In a multi-user canister scenario then it may be difficult to add certificates for this endpoint. However, for individual user/chat canisters, then it should be easy to provide certificates, e.g. of chat ID, time and latest message ID. + +Is something like mixed hash tree possible in a single canister scenario to reduce hashing times if hashing a huge state? Probably not extremely important, but may be useful to discuss. + +## Optimizations + +### Local Cache in indexedDB + +### IBE-Encrypted vetKey Resharing + +### Group Changes Allowing to See Messsage History + +## Appendix + +### Constructing `ChatId` + +``` +type GroupChatId = nat64; + +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +``` + +**Direct**: The `ChatId` type is constructed from a pair of _sorted_ principals. It is valid if the principals are equal and corresponds to a user's private chat (similar to e.g. Signal's "Note to Self" chat). + +**Group**: The `ChatId` type is constructed from a _unique_ group chat ID, which is defined by an unsigned 64-bit number. The backend is responsible of issuing those and ensuring their uniqueness. + +### Calculating Current Symmetric Ratchet Epoch ID + +```rust +fn current_symmetric_ratchet_epoch( + vetkey_epoch_creation_time_nanos: u64, + symmetric_ratchet_rotation_duration_nanos: u64 +) { + let now = ic_cdk::api::time(); + let elapsed = vetkey_epoch_creation_time_nanos - now; + return elapsed / symmetric_ratchet_rotation_duration_nanos; +} +``` + +### TypeScript Conversion of Unsigned BigInt to Uint8Array + +```ts +function uBigIntTo8ByteUint8ArrayBigEndian(value: bigint): Uint8Array { + if (value < 0n) throw new RangeError('Accepts only bigint n >= 0'); + if ((value >> 128) > 0n) throw new RangeError('Accepts only bigint fitting into an 8-byte array'); + + const bytes = new Uint8Array(8); + for (let i = 0; i < 8; i++) { + bytes[i] = Number((value >> BigInt(i * 8)) & 0xffn); + } + return bytes; +} +``` + +### TypeScript CryptoKey Import +```ts +let keyBytes = /* */; +let exportable = false; +await globalThis.crypto.subtle.importKey( + 'raw', + keyBytes, + 'HKDF', + exportable, + ['deriveKey', 'deriveBits'] + ); +``` + +### TypeScript Size Prefix + +```ts +export function sizePrefixedBytesFromString(text: string): Uint8Array { + const bytes = new TextEncoder().encode(text); + if (bytes.length > 255) { + throw new Error('Text is too long'); + } + const size = new Uint8Array(1); + size[0] = bytes.length & 0xff; + return new Uint8Array([...size, ...bytes]); +} +``` + +### TypeScript Domain Separators + +```ts + +// Example definition of the domain separators +const DOMAIN_RATCHET_INIT = sizePrefixedBytesFromString('ic-vetkeys-chat-ratchet-init'); +const DOMAIN_RATCHET_STEP = sizePrefixedBytesFromString('ic-vetkeys-chat-ratchet-step'); +const DOMAIN_MESSAGE_ENCRYPTION = sizePrefixedBytesFromString( + 'ic-vetkeys-chat-message-encryption' +); + +export function messageEncryptionDomainSeparator( + sender: Principal, + nonce: Uint8Array +): Uint8Array { + if (nonce.length !== 16) { throw RangeError("Expected nonce of size 16 but got " + nonce.length); } + return new Uint8Array([ + ...DOMAIN_MESSAGE_ENCRYPTION, + ...sender.toUint8Array(), + ...uBigIntTo8ByteUint8ArrayBigEndian(nonce) + ]); +} + +export function ratchetStepDomainSeparator(currentSymmetricKeyEpoch: bigint){ + new Uint8Array([ + ...DOMAIN_RATCHET_STEP, + ...uBigIntTo8ByteUint8ArrayBigEndian(currentSymmetricKeyEpoch) + ]); +} +``` + +### Candid Interface of the Backend + +```candid +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type VetKeyEpochId = nat64; +type SymmetricKeyEpochId = nat64; +type ChatMessageId = nat64; +type Nonce = blob; +type Limit = nat32; +type TimeNanos = nat64; +type NumberOfMessages = nat64; +type SymmetricKeyRotationMins = nat64; +type MessageExpiryMins = nat64; + +type KeyRotationResult = variant { Ok : VetKeyEpochId; Err : text }; +type EncryptedMessage = record { + content : EncryptedBytes; + metadata : EncryptedMessageMetadata; +}; + +// vetKeys +type PublicTransportKey = blob; +type DerivedVetKeyPublicKey = blob; +type EncryptedVetKey = blob; +type IbeEncryptedVetKey = blob; + +type EncryptedMessageMetadata = record { + vetkey_epoch : VetKeyEpochId; + sender : principal; + symmetric_key_epoch_id : SymmetricKeyEpochId; + chat_message_id : ChatMessageId; + timestamp : TimeNanos; + nonce : Nonce; +}; +type GroupChatMetadata = record { creation_timestamp : TimeNanos; chat_id : GroupChatId }; +type GroupModification = record { + remove_participants : vec principal; + add_participants : vec principal; +}; +type EncryptedBytes = blob; +type UserMessage = record { + vetkey_epoch_id : VetKeyEpochId; + content : EncryptedBytes; + symmetric_key_epoch_id : SymmetricKeyEpochId; + nonce : Nonce; +}; +type NumberOfMessages = nat64; +type Receiver = principal; +type OtherParticipant = principal; +type VetKeyEpochMetadata = record { + symmetric_key_rotation_duration : SymmetricKeyRotationMins; + participants : vec principal; + messages_start_with_id : ChatMessageId; + creation_timestamp : TimeNanos; + epoch_id : VetKeyEpochId; +}; +type KeyRotationResult = variant { Ok : VetKeyEpochId; Err : text }; +type MessageSendingError = variant { WrongVetKeyEpoch; WrongSymmetricKeyEpoch; Custom: text }; +type ChatMetadata = record { + chat_id : ChatId; + number_of_messages : NumberOfMessages; + vetkey_epoch_id : VetKeyEpochId; + symmetric_epoch_id : SymmetricEpochId; +}; + +service : (text) -> { + chat_public_key : (ChatId, VetKeyEpochId) -> (DerivedVetKeyPublicKey); + create_direct_chat : (OtherParticipant, SymmetricKeyRotationMins) -> variant { Ok : TimeNanos; Err : text }; + create_group_chat : (vec OtherParticipant, SymmetricKeyRotationMins) -> (variant { Ok : GroupChatMetadata; Err : text }); + derive_chat_vetkey : (ChatId, VetKeyEpochId, PublicTransportKey) -> (variant { Ok : EncryptedVetKey; Err : text }); + get_vetkey_for_my_cache_encryption : (PublicTransportKey) -> (EncryptedVetKey); + get_latest_chat_vetkey_epoch_metadata : (ChatId) -> (variant { Ok : VetKeyEpochMetadata; Err : text }) query; + get_my_chats_and_time : () -> (vec ChatMetadata) query; + get_my_reshared_ibe_encrypted_vetkey : (ChatId, VetKeyEpochId) -> (variant { Ok : opt IbeEncryptedVetKey; Err : text }); + get_my_symmetric_key_cache : (ChatId, VetKeyEpochId) -> (variant { Ok : opt EncryptedSymmetricRatchetCache; Err : text }); + // Returns messages for a chat starting from a given message id. + get_messages : (ChatId, ChatMessageId, opt Limit) -> ( + vec EncryptedMessage, + ) query; + get_vetkey_epoch_metadata : (ChatId, VetKeyEpochId) -> (variant { Ok : VetKeyEpochMetadata; Err : text }) query; + get_vetkey_resharing_ibe_decryption_key : (PublicTransportKey) -> (EncryptedVetKey); + get_vetkey_resharing_ibe_encryption_key : (Receiver) -> (DerivedVetKeyPublicKey); + modify_group_chat_participants : (ChatGroupId, GroupModification) -> (KeyRotationResult); + reshare_ibe_encrypted_vetkeys : ( + ChatId, + VetKeyEpochId, + vec record { Receiver; IbeEncryptedVetKey }, + ) -> (variant { Ok; Err : text }); + rotate_chat_vetkey : (ChatId) -> (KeyRotationResult); + send_message : (ChatId, UserMessage) -> (variant { Ok; Err : MessageSendingError }); + set_message_expiry : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); + update_my_symmetric_key_cache : (ChatId, VetKeyEpochId, EncryptedSymmetricRatchetCache) -> (variant { Ok; Err : text }); +} +``` \ No newline at end of file From a8f2540ec737d51a29208dfff556ea2538f321f0 Mon Sep 17 00:00:00 2001 From: Oleksandr Tkachenko Date: Fri, 26 Sep 2025 09:27:05 +0200 Subject: [PATCH 2/9] latest changes --- examples/encrypted_chat/SPEC.md | 125 ++++++++++++++++++++++++-------- 1 file changed, 95 insertions(+), 30 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 50723928..55feb55e 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -1,8 +1,6 @@ # vetKey Encrypted Chat -vetKey Encrypted Chat has two main components: the canister backend and user frontend. - -## Features +vetKey Encrypted Chat has two main components: the canister backend and user frontend. It provides the following features: * Messaging protected with end-to-end encryption. @@ -10,13 +8,50 @@ vetKey Encrypted Chat has two main components: the canister backend and user fro * Disappearing messages guaranteed by the canister (ICP smart contract) logic. -## Design +## Encryption Keys and State Recovery + +TODO: explain what keys we have. + +vetKey Rotation: + +* Independent key material - provides both forward and backward security, i.e., an adversary obtaining the key material for a vetKey epoch gains no knowledge about the key material in other vetKey epochs. + +* More costly than symmetric key ratchet because requires interaction with the backend canister and a [vetKey derivation](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api). + +For each vetKey epoch, we instantiate a Symmetric Key Ratchet State that can be evolved. + +Symmetric Key Ratchet: + +* Creates a chain of symmetric keys known by the chat participants from an initial key material derived from a vetKey. + +* Very efficient local ratchet evolution based on consensus time frames. + +* Provides forward securiry but not backward security, i.e., an adversary obtaining the key material for the symmetric ratchet ID `i` can derive any ratchet state for an epoch ID that is larger than `i`. -TODO: add information about the symmetric ratchet/vetKey rotation. +While vetKey rotation can be performed at arbitry time points, the symmetric ratchet is performed at time frame boundaries. +The duration of the time frame defines the granularity of the time frame that the user must keep the key material around for in order to be able to decrypt messages. +This differs from the Signal protocol, where we discard the old key as soon as we evolved it in most cases. +Keeping some keys around is important if we want to decrypt messages in arbitrary order. +Consider for example a scenario where we only want to decrypt the newest messages that we want to display to the user and not the whole history, which might be much longer and contain media material that requires a lot of potentially unnecessary communication. +This requires us to keep around the ratchet state, with which we are able to decrypt the oldest non-expired message. -### State Recovery +The duration of the symmetric key ratchet time frame defines the duration of the time frame that we would be able to decrypt messages for that goes beyond what's necessary. +Consider e.g. a message expiry of duration `t` ns and the symmetric ratchet duration of `t` ns as well. +In that case, if no vetKey rotations happen in between, the frontend would need to keep around states `i` and `i + 1` at all times, meaning that in the worst case two time windows (`2t` ns) worth of messages would be decryptable instead of one. -TODO: explain what this it about and that it works via uploading encrypted user cache and disallowing to obtain the vetKey that was used to initialize the state. +If we set the message expiry to `t` ns and symmetric ratchet duration to `r * t`, then we can reduce the amount of time we unnecessarily keep the key material for by a factor of `r`. +For example, if we set the expiry to one month and the symmetric ratchet duration to one hour, then we would need to keep the key material (and the ability to decrypt all message encrypted with it) for almost one additional hour. + +TODO: what if vetKey rotation happens in between? + +The downside of this approach is that we need to keep symmetric ratchet states for more ratchet epochs around or compute them on the fly s.t. the messages can be decrypted in any order. +However, with realistic time frames, keeping around hunderds of keys or just a key that we can generate the other keys from is not a showstopper. + +Realistic time frames for symmetric ratchet evolution: between a minute and a few hours/days. Being on a lower side (minutes) makes less sense, since updating the state's keys involves updating the encrypted cache, where every update costs cycles, which would make mass adoption of the chat costly. + +TODO: updates to the encrypted cache. + +TODO: explain what this is about and that it works via uploading encrypted user cache and disallowing to obtain the vetKey that was used to initialize the state. ## Components @@ -481,12 +516,14 @@ Once the state is updated, it affects the behavior of [the APIs returning the me * VetKey epoch metadata for all vetKey epochs that were required for encryption or decryption. Note that the metadata of the vetKey epochs that are not the last vetKey epoch and whose messages have expired are removed from the state. - * Message expiry data. + * Message expiry for each decrypted message. * Symmetric ratchet states for all vetKey epochs that were required for encryption or decryption. Similarly to vetKey epoch metadata, expired symmetric ratchet states are deleted from the state. * Decrypted non-expired messages for each chat. +* Latest consensus time. + * Optional optimization: All messages and chats stored in browser storage. ### Chat UI @@ -510,10 +547,10 @@ The chat UI uses the EMS in a black-box way to: Encrypted Messaging Service (EMS) is a component that gives the developer a transparent way to interact with the encrypted chat by reading from a stream of received and decrypted messages and putting user messages to be encrypted and sent into a stream that the EMS will take care of encrypting and sending. -The EMS exposes the following APIs: - TODO: add types `ChatId`, `ChatIdAsString`, `Message` +The EMS exposes the following APIs: + * `enqueueSendMessage(chatId: ChatId, content: Uint8Array)`: adds the message `content` to be encrypted for and sent for adding to the chat with ID `chatId`. This API does not give any guarantees that the message will actually be added to the chat but it makes attempts to recover from recoverable errors (see [Encrypting and Sending Messages in the EMS](#encrypting-and-sending-messages)). * `takeReceivedMessages(): Map`: returns latest chat messages that were received and decrypted by the EMS and were not yet taken by the user from the EMS (see [Fetching and Decrypting Messages in the EMS](#fetching-and-decrypting-messages)). @@ -524,43 +561,59 @@ TODO: add types `ChatId`, `ChatIdAsString`, `Message` * `getCurrentChatIds(): ChatId[]`: returns the chat IDs that are currently accessible to the user. This particular API is mostly an efficiency optimization, since the message retrieval in the EMS anyways requires fetching the information about the currently accessible chats. +* `getCurrentUsersInChat(chatId: ChatId): Principal[]`: takes in a chat ID and returns the current users in the chat. If the EMS does not have data about the chat with `chatId` because either the user does not have access to the chat, or if `chatId` does not exist, or the frontend did not yet manage to synchronize with the backend, this function throws an error. + #### Encrypting and Sending Messages -For message [encryption](#ratchet-message-encryption) and [sending](#incoming-message-validation), the EMS makes use of the following backend canister APIs: [`get_my_chats_and_time` and `get_vetkey_epoch_metadata`](#exposing-metadata-about-chats-and-new-messages). +For message [encryption](#ratchet-message-encryption) and [sending](#incoming-message-validation), the EMS makes use of the [`send_message`](#incoming-message-validation) backend canister API. -The EMS periodically takes a message from the sending stream that was added via the `enqueueSendMessage` API. If the stream is empty, the EMS retries with a timeout. If the stream is non-empty, the EMS takes it from the stream and performs the following steps: +The EMS periodically takes a message from the sending stream that was added via the `enqueueSendMessage` API. If the stream is empty, the EMS retries with a timeout. If the stream is non-empty, the EMS takes the oldest message from the stream and performs the following steps: -1. If there is no symmetric ratchet state for the chat, the EMS [initializes](#ratchet-initialization) the symmetric ratchet for the latest vetKey epoch id in the output of [`get_my_chats_and_time`](#exposing-metadata-about-chats-and-new-messages) and calls `get_vetkey_epoch_metadata` to obtain its metadata. +1. If there is no symmetric ratchet state for the latest vetKey epoch of the chat, the EMS [initializes](#ratchet-initialization) it and calls `get_vetkey_epoch_metadata` to obtain its metadata, which the frontend stores in its state. -2. The EMS encrypts the message using the symmetric ratchet state that corresponds to the latest known vetKey epoch ID (see [Symmetric Ratchet](#symmetric-ratchet)) at the current time. It may happen that the symmetric ratchet epoch of the symmetric ratchet state is smaller than needed for the encryption at the current time. In that case, the state is copied to a temporary state and the temporary state is evolved to encrypt the message. After encryption, the temporary state may be deleted. Note that the encryption does not evolve the symmetric ratchet state because the canister ultimately decides when a symmetric epoch ends. The indication that a symmetric ratchet epoch has ended and the previous epoch is not needed anymore is that the frontend fetches a message that was encrypted with the next symmetric ratchet epoch. The evolution of the state then happens in the process of [decryption](#fetching-and-decrypting-messages) of that message. +2. The EMS encrypts the message using the symmetric ratchet state that corresponds to the latest known vetKey epoch ID (see [Symmetric Ratchet](#symmetric-ratchet)) for the chat at the current time. It may happen that the symmetric ratchet epoch of the symmetric ratchet state is smaller than needed for the encryption at the current time. In that case, the state is copied to a temporary state and the temporary state is evolved to encrypt the message. After encryption, the temporary state may be deleted. Note that neither encryption nor decryption evolves the symmetric ratchet state and instead the ratchet evolution is performed in a background task (see [Ratchet Evolution](#ratchet-evolution)). -3. The EMS [sends](#incoming-message-validation) the message. +3. The EMS [sends](#incoming-message-validation) the encrypted message via the `send_message` canister backend API (see [Incoming Message Validation](#incoming-message-validation)). -4. If the canister returns the `WrongSymmetricKeyEpoch` variant of `MessageSendingError`, then the EMS was unlucky and the sent message arrived at the next symmetric key epoch. In this case the EMS goes to step 2. +4. If the canister returns the `WrongSymmetricKeyEpoch` variant of `MessageSendingError`, then the EMS was unlucky and the sent message arrived at a wrong (normally the next) symmetric key epoch. In this case the EMS goes to step 2. -5. If the canister returns the `WrongVetKeyEpoch` variant of `MessageSendingError`, then either the [manual vetKey epoch rotation](#vetkey-epoch-rotation) or a [group change](#group-changes) took place. In this case, the EMS +5. If the canister returns the `WrongVetKeyEpoch` variant of `MessageSendingError`, then either the [manual vetKey epoch rotation](#vetkey-epoch-rotation) or a [group change](#group-changes) took place. In this case, the EMS makes a query call to `get_latest_vetkey_epoch` and updates the latest vetKey epoch ID of the chat in the state, and then goes to step 1. To avoid infinite loops in case of too strict parameters, bad network connectivity, etc., the maximum number of retries should be capped. -TODO: maybe we should expose `getLatestVetKeyEpochMetadata()` in the EMS to have a central place where this API is queried because in the chat UI we use it to find out the current chat participants. Alternatively, just expose a function that returns the participants. +> [!NOTE] +> A useful feature of chat applications is displaying when a user joined or left the chat directly in the chat history. The current spec only makes use of such information in the vetKey epoch metadata, which returns the full list of participants for each vetKey epoch, which is a bit redundant for the purpose. More succinct data can be exposed by proving an additional backend API that returns a vector of `GroupModification`s (see [Group Changes](#group-changes)). #### Fetching and Decrypting Messages -For message [retrieval](#encrypted-message-retrieval) and [decryption](#ratchet-message-decryption), the EMS makes use of the following backend canister APIs: [`get_my_chats_and_time` and `get_vetkey_epoch_metadata`](#exposing-metadata-about-chats-and-new-messages). +For message [retrieval](#encrypted-message-retrieval) and [decryption](#ratchet-message-decryption), the EMS makes use of the following backend canister APIs: + +* [`get_my_chats_and_time`](#exposing-metadata-about-chats-and-new-messages) to retrieve the chat IDs and the number of messages to retrieve. + +* [`get_messages`](#encrypted-message-retrieval) to retrieve the encrypted messages for the chats accessible to the user. + +* Further canister APIs required for [Ratchet Initialization](#ratchet-initialization). -Also, the `get_messages` backend canister API is used to retrieve the encrypted messages for the chat. Its behavior is closer described in [Encrypted Message Retrieval](#encrypted-message-retrieval). +The frontend stores the following data related to this section in its state: -The frontend stores the following in its state: the chat IDs, the first accessible message ID for the user, the last fetched message, and the total number of messages in the chat. -Let us call this information frontend chat metadata. +* The chat IDs accessible to the user. + +* The first accessible message ID for the user for each chat + +* The last fetched message for the chat. + +* The total number of messages in the chat. + +Let's call this information frontend chat metadata. Periodically, the EMS queries the `get_my_chats_and_time` backend canister API. Its result is compared to the frontend chat metadata in the state. If there is a new chat in the result that is not yet in the state, the EMS adds it to the state along with the information that no messages were obtained for this chat yet. -If one of the chat in the state does not appear in the result of `get_my_chats_and_time` anymore, than this chat is deleted from the state. +If one of the chats in the state does not appear in the result of `get_my_chats_and_time` anymore, then this chat is deleted from the state. Also periodically, two separate routines run. -1. Check if there are new messages to be fetched from the canister: if the largest received message ID for the chat plus one is smaller than the total number of messages in the chat. If it is, then `get_messages` is invoked with the first message ID to be fetched that is equal to the largest received message ID for the chat plus one. If an error occurs due to too large messages that don't fit into the response, the query to `get_messages` is retried with a limit of one. A successful result is stored in the received messages queue. +1. Check if there are new messages to be fetched from the canister: if the largest received message ID for the chat plus one is smaller than the total number of messages in the chat. If it is, then `get_messages` is invoked with the first message ID to be fetched that is equal to the largest received message ID for the chat plus one, or if no messages were received so far for a chat, message ID 0 is used. If an error occurs due to too large messages that don't fit into the canister's response due to the query response limits on the ICP, the query to `get_messages` is retried with a limit of e.g. one. A successful result is stored in the received messages queue. 2. Try to take a message from the received messages queue and decrypt it. @@ -568,21 +621,33 @@ Also periodically, two separate routines run. b. The EMS [decrypts](#ratchet-message-decryption) the message using the symmetric ratchet state and the vetKey epoch ID stored in the message metadata. A successfully decrypted message is put into the decrypted message queue that is exposed to the chat UI component via the [`takeReceivedMessages` API](#encrypted-messaging-service) of the EMS. If the decryption returns an error, such an error is unrecoverable and instead of a decrypted message, a message of special form is put into the decrypted message queue that indicated that this message could not be decrypted. Note that user-side errors cannot be avoided, since the canister cannot check if the encryption is valid. -TODO: in the above point it may happen that the calculation is incorrect because if some messages expire, the first accessible message ID moves. We should probably just use the last received message ID instead. - -TODO: change this to not evolve on decryption. We want to evolve this whenever messages disappear. The ratchet specifies how granular we can make messages disappear and provides forward security for symmetric epochs we already "forgot". - #### vetKey Epoch Rotation +TODO: how often + #### Symmetric Ratchet -A symmetric ratchet state consists of a symmetric ratchet epoch key and a symmetric ratchet epoch id that it corresponds to. +A symmetric ratchet state consists of a symmetric ratchet epoch key and a symmetric ratchet epoch ID that it corresponds to. + +The backend canister APIs required for ratchet initialization are described in a [separate section](#providing-vetkeys-for-symmetric-ratchet-initialization). + +A symmetric ratchet state consists of: + +* Key material that is used to: + + 1. Derive the next ratchet state. + + 2. Derive chat participants' message encryption keys. + +* Symmetric ratchet epoch ID, which is a non-negative number. ##### Ratchet Initialization +TODO: describe the args for fetching + The ratchet state is initialized from a vetKey as follows: -1. Fetch, decrypt, and verify the vetKey +1. Obtain the vetKey for the vetKey epoch of the chat a. [Generate](https://dfinity.github.io/vetkeys/classes/_dfinity_vetkeys.TransportSecretKey.html#random) a transport key pair. From 9da87d3960224a8171c4e46cb3a87a6b14b03bc5 Mon Sep 17 00:00:00 2001 From: Oleksandr Tkachenko Date: Mon, 29 Sep 2025 10:54:20 +0200 Subject: [PATCH 3/9] minor improvements --- examples/encrypted_chat/SPEC.md | 84 +++++++++++++++++++++++++-------- 1 file changed, 64 insertions(+), 20 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 55feb55e..4f1d9389 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -4,7 +4,7 @@ vetKey Encrypted Chat has two main components: the canister backend and user fro * Messaging protected with end-to-end encryption. -* High security through symmetric ratchet and key rotation. +* High security through symmetric ratchet and key rotation via [vetKeys](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/introduction). * Disappearing messages guaranteed by the canister (ICP smart contract) logic. @@ -18,6 +18,8 @@ vetKey Rotation: * More costly than symmetric key ratchet because requires interaction with the backend canister and a [vetKey derivation](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api). +* Can be performed on group changes in a chat or periodically. + For each vetKey epoch, we instantiate a Symmetric Key Ratchet State that can be evolved. Symmetric Key Ratchet: @@ -30,32 +32,47 @@ Symmetric Key Ratchet: While vetKey rotation can be performed at arbitry time points, the symmetric ratchet is performed at time frame boundaries. The duration of the time frame defines the granularity of the time frame that the user must keep the key material around for in order to be able to decrypt messages. -This differs from the Signal protocol, where we discard the old key as soon as we evolved it in most cases. +This differs from e.g. the Signal protocol, where we discard the old key as soon as we evolved it in most cases. Keeping some keys around is important if we want to decrypt messages in arbitrary order. Consider for example a scenario where we only want to decrypt the newest messages that we want to display to the user and not the whole history, which might be much longer and contain media material that requires a lot of potentially unnecessary communication. This requires us to keep around the ratchet state, with which we are able to decrypt the oldest non-expired message. The duration of the symmetric key ratchet time frame defines the duration of the time frame that we would be able to decrypt messages for that goes beyond what's necessary. -Consider e.g. a message expiry of duration `t` ns and the symmetric ratchet duration of `t` ns as well. -In that case, if no vetKey rotations happen in between, the frontend would need to keep around states `i` and `i + 1` at all times, meaning that in the worst case two time windows (`2t` ns) worth of messages would be decryptable instead of one. - -If we set the message expiry to `t` ns and symmetric ratchet duration to `r * t`, then we can reduce the amount of time we unnecessarily keep the key material for by a factor of `r`. -For example, if we set the expiry to one month and the symmetric ratchet duration to one hour, then we would need to keep the key material (and the ability to decrypt all message encrypted with it) for almost one additional hour. +Consider e.g. a state recovery limit `l` ns and the symmetric ratchet duration of `r` ns as well. +In that case, if no vetKey rotations happen in between, at a time point `p` that is `l < p < 2r` the frontend would need to keep around both symmetric ratchet states and the ability to decrypt messages received by the canister from `0` ns and `p` ns instead of from `p - l` to `p`. +If a vetKey rotation happens in-between, this redundant encryption time frame can only become shorter. -TODO: what if vetKey rotation happens in between? +If we set the state recovery limit to `l` ns and symmetric ratchet duration to `r * l`, then we can reduce the amount of time we unnecessarily keep the key material for by a factor of `r`. +For example, if we set the state recovery limit to one month and the symmetric ratchet duration to one hour, then we would need to keep the key material (and the ability to decrypt all message encrypted with it) for almost one additional hour in the worst case. The downside of this approach is that we need to keep symmetric ratchet states for more ratchet epochs around or compute them on the fly s.t. the messages can be decrypted in any order. However, with realistic time frames, keeping around hunderds of keys or just a key that we can generate the other keys from is not a showstopper. Realistic time frames for symmetric ratchet evolution: between a minute and a few hours/days. Being on a lower side (minutes) makes less sense, since updating the state's keys involves updating the encrypted cache, where every update costs cycles, which would make mass adoption of the chat costly. -TODO: updates to the encrypted cache. +Since the symmetric ratchet is instantiated from a vetKey, the ability to obtain the vetKey allows to decrypt all messages encrypted using any derived ratchet states. +Therefore, to allow for a limited state recorve it is important to disallow the frontend to obtain the initial vetKey once a encrypted cache for that vetKey epoch was uploaded by the user. +Instead, the frontend encrypts its symmetric ratchet states that are required for state recovery and stores them in the canister. +Then, once a state recovery is required, the frontend fetches the required encrypted ratchet states and decrypts them. + +Note that the end of life of a symmetric ratchet state triggers ratchet evolution, which evolves the state and forgets about the previous one, meaning that also the frontend's encrypted cache must be updated. +This poses another tradeoff in terms of the symmetric ratchet duration, i.e., the shorter the time frame is, the more often the cached state must be updated. +Therefore, symmetric ratchet state evolution duration in the order of minutes might be too short to be practical in chats with many users, and duration in the order of hours or days could be better in that case. + +Note also that only active users can update their encrypted cache in the canister. +Therefore, forward security can only be guaranteed to active users or users who disable the state recovery. + +## Notation -TODO: explain what this is about and that it works via uploading encrypted user cache and disallowing to obtain the vetKey that was used to initialize the state. +TODO + +* Disappearing messages + +* State recovery limit ## Components -vetKey Encrypted Chat consists of a backend canister on the ICP and a user frontend. +vetKey Encrypted Chat consists of a backend canister on the ICP and a user frontend. The backend's responsibilities are: @@ -225,8 +242,6 @@ More specifically, the canister checks that: * `vetkey_epoch_id` attached to the message is the latest for the chat ID or returns the `WrongVetKeyEpoch` variant of `MessageSendingError` if the check fails. * `symmetric_key_epoch_id` attached to the message is equal to the [current symmetric key epoch ID](#calculating-current-symmetric-ratchet-epoch-id) corresponding the current consensus time. To check that, the canister calculates the current symmetric ratchet epoch ID for the chat and `vetkey_epoch_id`. If the check fails, the canister returns the `WrongSymmetricKeyEpoch` variant of `MessageSendingError` if the check fails. -TODO: can we use timestamps instead of `symmetric_key_epoch_id`s? Would that improve anything? - > [!NOTE] > This API assumes that the frontend's clock is reasonably synchronized with the ICP to encrypt the messages with the key from the right symmetric ratchet state. This does not pose a significant limitation, since 1) it must already be the case for facilitating reliable communication with the ICP in general and 2) re-encrypting and re-sending in case of failures can be done automatically by the frontend. @@ -424,9 +439,25 @@ See also [Disappearing Messages](#disappearing-messages). TODO: give more details about the Encrypted Maps APIs and how they are called. -TODO: describe that as a potential optimization we could fetch all available cached ratchet states to reduce the number of calls. +As a potential further optimization, it is possible to fetch the available cached ratchet states for all chats at once to reduce the number of calls. +However, this is mostly helpful for the relatively rare cases of complete state recovery but not in the other cases such as if part of the state is already available in the local state (i.e., opening the chat in the browser after some time when the chat was closed). + +In the best case, user's ratchet state cache should be synchronized with the frontend's state, i.e., whenever a ratchet state is evolved in the frontend, it should also be updated in the backend. +However, this only works if the user is online and active. +If the user is offline, the canister will delete the ratchet states that don't have any messages in canister storage that can be decrypted using those states, except for the very last state in order to be able to evolve it to a later when the user is online. -TODO: how frequently? +If keeping an older ratchet state is not desirable by some users, this can be mitigated by setting up a periodic key rotation, where a frontend will not encrypt new messages with a ratchet state from an old vetKey epoch and will instead request a vetKey rotation before the next encryption. +In that case, the users' encrypted caches associated with the older vetKey epoch will be garbage-collected automatically by the canister once they are not required for state recovery anymore. +Note though that this happens by a timer job, so despite that the canister APIs will returns correct results for the state recovery, the actual cache deletion from a canister might happen at a later point, e.g., during an hour or a day, depending on the configuration. + +Also, if the symmetric ratchet evolution duration is too short or the number of users in a chat is very high, updating the cache may be costly because that would involve many canister calls (one per user per chat). +To allow for mass adoption of a chat app, one could think of the following strategies to reduce costs: + +* Allow only larger time frames for the symmetric ratchet evolution, e.g., in the order of hours or days. + +* Instead of updating the caches separately, introduce a batch API for batch updates, which is perfored only once per time frame for all chats. + +* Let the privacy-focused users pay for the cost of the more frequent updates in the chats where they want to have the maximally fined-grained privacy gurantees. ### vetKey Epoch Rotation @@ -547,7 +578,22 @@ The chat UI uses the EMS in a black-box way to: Encrypted Messaging Service (EMS) is a component that gives the developer a transparent way to interact with the encrypted chat by reading from a stream of received and decrypted messages and putting user messages to be encrypted and sent into a stream that the EMS will take care of encrypting and sending. -TODO: add types `ChatId`, `ChatIdAsString`, `Message` +Types: + +* `type ChatId = variant { Group : nat64; Direct : record { principal; principal }; }` + +* `type ChatIdAsString = string` + +* `interface Message { + nonce: bigint; + chatId: ChatId; + senderId: Principal; + content: string; + timestamp: Date; + vetkeyEpoch: bigint; + symmetricRatchetEpoch: bigint; + } + ` The EMS exposes the following APIs: @@ -581,7 +627,7 @@ The EMS periodically takes a message from the sending stream that was added via To avoid infinite loops in case of too strict parameters, bad network connectivity, etc., the maximum number of retries should be capped. -> [!NOTE] +> [!TIP] > A useful feature of chat applications is displaying when a user joined or left the chat directly in the chat history. The current spec only makes use of such information in the vetKey epoch metadata, which returns the full list of participants for each vetKey epoch, which is a bit redundant for the purpose. More succinct data can be exposed by proving an additional backend API that returns a vector of `GroupModification`s (see [Group Changes](#group-changes)). #### Fetching and Decrypting Messages @@ -643,8 +689,6 @@ A symmetric ratchet state consists of: ##### Ratchet Initialization -TODO: describe the args for fetching - The ratchet state is initialized from a vetKey as follows: 1. Obtain the vetKey for the vetKey epoch of the chat @@ -774,7 +818,7 @@ Is something like mixed hash tree possible in a single canister scenario to redu ### IBE-Encrypted vetKey Resharing -### Group Changes Allowing to See Messsage History +### Allowing New Users to See Old Messsage History ## Appendix From 15cebbe604854fd16cf840b179c5f3b66bdcd71a Mon Sep 17 00:00:00 2001 From: Oleksandr Tkachenko Date: Mon, 13 Oct 2025 09:09:54 +0200 Subject: [PATCH 4/9] wip --- examples/encrypted_chat/SPEC.md | 199 ++++++++++++++++++++++++-------- 1 file changed, 154 insertions(+), 45 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 4f1d9389..5d3029fa 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -14,7 +14,7 @@ TODO: explain what keys we have. vetKey Rotation: -* Independent key material - provides both forward and backward security, i.e., an adversary obtaining the key material for a vetKey epoch gains no knowledge about the key material in other vetKey epochs. +* Independent key material - provides both secrecy and post-compromise security, i.e., an adversary obtaining the key material for a vetKey epoch gains no knowledge about the key material in other vetKey epochs. * More costly than symmetric key ratchet because requires interaction with the backend canister and a [vetKey derivation](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api). @@ -28,11 +28,11 @@ Symmetric Key Ratchet: * Very efficient local ratchet evolution based on consensus time frames. -* Provides forward securiry but not backward security, i.e., an adversary obtaining the key material for the symmetric ratchet ID `i` can derive any ratchet state for an epoch ID that is larger than `i`. +* Provides forward secrecy but not post-compromise security, i.e., an adversary obtaining the key material for the symmetric ratchet ID `i` can derive any ratchet state for an epoch ID that is larger than `i`. -While vetKey rotation can be performed at arbitry time points, the symmetric ratchet is performed at time frame boundaries. +While vetKey rotation can be performed at arbitrary time points, the symmetric ratchet is performed at time frame boundaries. The duration of the time frame defines the granularity of the time frame that the user must keep the key material around for in order to be able to decrypt messages. -This differs from e.g. the Signal protocol, where we discard the old key as soon as we evolved it in most cases. +This differs from e.g. the Signal protocol, where the old key is discarded as soon as it is evolved in most cases. Keeping some keys around is important if we want to decrypt messages in arbitrary order. Consider for example a scenario where we only want to decrypt the newest messages that we want to display to the user and not the whole history, which might be much longer and contain media material that requires a lot of potentially unnecessary communication. This requires us to keep around the ratchet state, with which we are able to decrypt the oldest non-expired message. @@ -46,12 +46,12 @@ If we set the state recovery limit to `l` ns and symmetric ratchet duration to ` For example, if we set the state recovery limit to one month and the symmetric ratchet duration to one hour, then we would need to keep the key material (and the ability to decrypt all message encrypted with it) for almost one additional hour in the worst case. The downside of this approach is that we need to keep symmetric ratchet states for more ratchet epochs around or compute them on the fly s.t. the messages can be decrypted in any order. -However, with realistic time frames, keeping around hunderds of keys or just a key that we can generate the other keys from is not a showstopper. +However, with realistic time frames, keeping around hundreds of keys or just a single key that we can generate the other keys from is not a showstopper. Realistic time frames for symmetric ratchet evolution: between a minute and a few hours/days. Being on a lower side (minutes) makes less sense, since updating the state's keys involves updating the encrypted cache, where every update costs cycles, which would make mass adoption of the chat costly. Since the symmetric ratchet is instantiated from a vetKey, the ability to obtain the vetKey allows to decrypt all messages encrypted using any derived ratchet states. -Therefore, to allow for a limited state recorve it is important to disallow the frontend to obtain the initial vetKey once a encrypted cache for that vetKey epoch was uploaded by the user. +Therefore, to allow for a limited state recovery it is important to disallow the frontend to obtain the initial vetKey again once an encrypted cache for that vetKey epoch was uploaded by the user. Instead, the frontend encrypts its symmetric ratchet states that are required for state recovery and stores them in the canister. Then, once a state recovery is required, the frontend fetches the required encrypted ratchet states and decrypts them. @@ -60,15 +60,15 @@ This poses another tradeoff in terms of the symmetric ratchet duration, i.e., th Therefore, symmetric ratchet state evolution duration in the order of minutes might be too short to be practical in chats with many users, and duration in the order of hours or days could be better in that case. Note also that only active users can update their encrypted cache in the canister. -Therefore, forward security can only be guaranteed to active users or users who disable the state recovery. +Therefore, forward secrecy can only be guaranteed to active users or users who disable the state recovery by uploading invalid encrypted cache. ## Notation -TODO +* Disappearing messages - automatically remove messages in the frontend and encrypted messages in the backend canister from the memory/storage once they expire. -* Disappearing messages +* State recovery limit - store an encrypted backup of the encryption keys in the canister that allows to decrypt messages and recover from a state loss in the frontend. -* State recovery limit +* Time - in this document we use [Unix time](https://en.wikipedia.org/wiki/Unix_time) and standard abbreviations for time units such as ns for nanoseconds. ## Components @@ -205,14 +205,17 @@ When this API is triggered, the canister checks that: Note that the latter two points guarantee that there is no intersection between `remove_participants` and `add_participants`. -A group change triggers a vetKey epoch rotation that updates the set of group participants according to the passed `GroupModification` and stores it in the next vetKey epoch for the chat. The effects of vetKey epoch rotation are further discussed in a [separate section](#vetkey-epoch-rotation). +A group change triggers a vetKey epoch rotation that updates the set of group participants according to the passed `GroupModification` and stores it in the next vetKey epoch for the chat. The effects of vetKey epoch rotation are further discussed in the [vetKet Epoch Rotation](#vetkey-epoch-rotation) section. + +The user removed from a group chat loses access to the messages and vetKeys (as well as key cache) in that chat and does not regain access if added to that group chat later. +Instead, if two users are added to the chat in the same call, while one of the users has previously had access to the chat but was removed and the other user never had access to that chat, they would be able to access only the same messages and vetKeys. > [!NOTE] > One call to `modify_group_chat_participants` triggers one vetKey epoch rotation even if multiple `principals` are added or removed. Further potential optimizations for reducing the number of vetKey epoch rotations or the number of vetKey retrievals are discussed in [Optimizations](#optimizations). ### Incoming Message Validation -Upon receival of a user message via one of the following APIs +Upon receival of a user message via the following API ``` type GroupChatId = nat64; @@ -255,7 +258,7 @@ Finally, the canister adds the message to the state and returns an `Ok`. ### Exposing Metadata about Chats and New Messages -The backend canister exposees the following APIs for fetching metadata: +The backend canister exposes the following APIs for fetching metadata: ``` type GroupChatId = nat64; @@ -437,7 +440,61 @@ The expired caches are removed transparently to the user by the canister, i.e., Expired cache is defined as a cache that neither has any messages associated with it in the canister nor does it correspond to the latest vetKey epoch for the chat ID. See also [Disappearing Messages](#disappearing-messages). -TODO: give more details about the Encrypted Maps APIs and how they are called. +Encrypted maps handles the encryption of the map values transparently to the developer, i.e., the developer does not need to know how encryption exactly works in encrypted maps and can use encrypted maps as a black box. +For the purpose of using encrypted maps for the encryption of user's cached keys, the encrypted maps object is instantiated with the domain separator `"vetkeys-example-encrypted-chat-user-cache"` in the backend. +Then, to handle the cache, the frontend relies on the following: + +* Store to encrypted maps - when the frontend obtained a new ratchet state or wants to update a ratchet state cache, the frontend calls. + +* Load from encrypted maps - when the frontend requires to recover its previous ratchet states, it retrieves the encrypted cache and decrypts it locally. + +```ts + import { EncryptedMaps } from '@dfinity/vetkeys/encrypted_maps'; + + function store(encryptedMaps: EncryptedMaps, myPrincipal: Principal, chatId: ChatId, vetKeyEpochId: bigint, cache: Uint8Array) { + const epochBytes = uBigIntTo8ByteUint8ArrayBigEndian(vetKeyEpochId); + const chatIdBytes = chatIdToBytes(chatId); + const mapKey = new Uint8Array([...chatIdBytes, ...epochBytes]); + encryptedMaps.setValue(myPrincipal, mapName(), mapKey, cache); + } + + function load(encryptedMaps: EncryptedMaps, myPrincipal: Principal, chatId: ChatId, vetKeyEpochId: bigint) : Uint8Array { + const epochBytes = uBigIntTo8ByteUint8ArrayBigEndian(vetKeyEpochId); + const chatIdBytes = chatIdToBytes(chatId); + const mapKey = new Uint8Array([...chatIdBytes, ...epochBytes]); + return encryptedMaps.getValue(myPrincipal, mapName(), mapKey); + } + + function chatIdToBytes(chatId: ChatId) : Uint8Array { + if ('Direct' in chatId) { + return new Uint8Array([ + 0, + ...chatId.Direct[0].toUint8Array(), + ...chatId.Direct[1].toUint8Array() + ]); + } else { + return new Uint8Array([1, ...uBigIntTo8ByteUint8ArrayBigEndian(chatId.Group)]); + } + } + + function uBigIntTo8ByteUint8ArrayBigEndian(value: bigint): Uint8Array { + if (value < 0n) throw new RangeError('Accepts only bigint n >= 0'); + + const bytes = new Uint8Array(8); + for (let i = 0; i < 8; i++) { + bytes[i] = Number((value >> BigInt(i * 8)) & 0xffn); + } + return bytes; + } + + function mapName(): Uint8Array { + return new TextEncoder().encode('encrypted_chat_cache'); + } +``` + +The garbage collection of expired caches happens when there can be no messages that need a particular ratchet state cache for decryption. +That ratchet state cache is then removed by the canister. +This can either happen during user's calls e.g. to add a new message to a chat, or by a timer job, or, actually, both in parallel to reduce the latency of cache deletion. As a potential further optimization, it is possible to fetch the available cached ratchet states for all chats at once to reduce the number of calls. However, this is mostly helpful for the relatively rare cases of complete state recovery but not in the other cases such as if part of the state is already available in the local state (i.e., opening the chat in the browser after some time when the chat was closed). @@ -455,9 +512,9 @@ To allow for mass adoption of a chat app, one could think of the following strat * Allow only larger time frames for the symmetric ratchet evolution, e.g., in the order of hours or days. -* Instead of updating the caches separately, introduce a batch API for batch updates, which is perfored only once per time frame for all chats. +* Instead of updating the caches separately, introduce a batch API for batch updates, which is performed only once per time frame for all chats. -* Let the privacy-focused users pay for the cost of the more frequent updates in the chats where they want to have the maximally fined-grained privacy gurantees. +* Let the privacy-focused users pay for the cost of the more frequent updates in the chats where they want to have the maximally fined-grained privacy guarantees. ### vetKey Epoch Rotation @@ -498,7 +555,8 @@ Further validation rules can be added here and are an implementation detail. For ### Disappearing Messages -Disappearing messages for a chat are defined by a non-negative integer identifying how many messages are expired, i.e., `e` expired messages mean that any message ID in the chat smaller than `e` has expired. +The prefix of messages to be removed from a chat is defined by a non-negative integer, which identifies the size of the prefix of expired messages in the chat history, i.e., `e` expired messages means that any message ID in the chat smaller than `e` has expired. +The value of `e` is calculated from the consensus time and the messages in the chat and does not necessarily need to be stored in memory. Expired messages cannot be [retrieved](#encrypted-message-retrieval) from the canister backend anymore and the canister backend will delete them eventually. The deletion algorithm is an implementation detail of the canister backend, but in general we see two options: @@ -570,7 +628,7 @@ The chat UI uses the EMS in a black-box way to: * Dispatch user messages to be encrypted and sent via the `enqueueSendMessage` API of the EMS. -* Fetch received and decrypted messages via periodically quering the `takeReceivedMessages` API of the EMS. +* Fetch received and decrypted messages via periodically querying the `takeReceivedMessages` API of the EMS. * Find out which chats are accessible at the moment via the `getCurrentChatIds` API of the EMS. New chats need to be added to the UI and chats that the user has lost access to need to be removed from the UI. @@ -580,7 +638,7 @@ Encrypted Messaging Service (EMS) is a component that gives the developer a tran Types: -* `type ChatId = variant { Group : nat64; Direct : record { principal; principal }; }` +* `type ChatId = { 'Group' : bigint } | { 'Direct' : [Principal, Principal] };` * `type ChatIdAsString = string` @@ -593,17 +651,17 @@ Types: vetkeyEpoch: bigint; symmetricRatchetEpoch: bigint; } - ` + ` The EMS exposes the following APIs: -* `enqueueSendMessage(chatId: ChatId, content: Uint8Array)`: adds the message `content` to be encrypted for and sent for adding to the chat with ID `chatId`. This API does not give any guarantees that the message will actually be added to the chat but it makes attempts to recover from recoverable errors (see [Encrypting and Sending Messages in the EMS](#encrypting-and-sending-messages)). +* `enqueueSendMessage(chatId: ChatId, content: Uint8Array)`: adds the message `content` to be encrypted for and sent to the chat with ID `chatId`. This API does not give any guarantees that the message will actually be added to the chat but it makes attempts to recover from recoverable errors (see [Encrypting and Sending Messages in the EMS](#encrypting-and-sending-messages)). * `takeReceivedMessages(): Map`: returns latest chat messages that were received and decrypted by the EMS and were not yet taken by the user from the EMS (see [Fetching and Decrypting Messages in the EMS](#fetching-and-decrypting-messages)). -* `start()`: starts the EMS service. +* `start()`: starts the EMS service. Before the service is started, calling any other APIs should throw an error. Once it is started, the APIs start to return their intended values, and the EMS starts background tasks to continuously update the relevant chat information from the canister. -* `skipMessagesAvailableLocally(chatId: ChatId, lastKnownChatMessageId: bigint)`: tells the EMS what chat message ID should be the first one to fetch the messages. This is relevant if some of the messages are available from another source such as [browser storage](#local-cache-in-indexeddb). +* `skipMessagesAvailableLocally(chatId: ChatId, lastKnownChatMessageId: bigint)`: tells the EMS what chat message ID should be the first one to be fetched. This is relevant if some of the messages are available from another source such as [browser storage](#local-cache-in-indexeddb). * `getCurrentChatIds(): ChatId[]`: returns the chat IDs that are currently accessible to the user. This particular API is mostly an efficiency optimization, since the message retrieval in the EMS anyways requires fetching the information about the currently accessible chats. @@ -628,7 +686,7 @@ The EMS periodically takes a message from the sending stream that was added via To avoid infinite loops in case of too strict parameters, bad network connectivity, etc., the maximum number of retries should be capped. > [!TIP] -> A useful feature of chat applications is displaying when a user joined or left the chat directly in the chat history. The current spec only makes use of such information in the vetKey epoch metadata, which returns the full list of participants for each vetKey epoch, which is a bit redundant for the purpose. More succinct data can be exposed by proving an additional backend API that returns a vector of `GroupModification`s (see [Group Changes](#group-changes)). +> A useful feature of chat applications is displaying when a user joined or left the chat directly in the chat history. The current spec only makes use of such information in the vetKey epoch metadata, which returns the full list of participants for each vetKey epoch, which is a bit redundant for the purpose. More succinct data can be exposed by providing an additional backend API that returns a vector of `GroupModification`s (see [Group Changes](#group-changes)). #### Fetching and Decrypting Messages @@ -640,7 +698,7 @@ For message [retrieval](#encrypted-message-retrieval) and [decryption](#ratchet- * Further canister APIs required for [Ratchet Initialization](#ratchet-initialization). -The frontend stores the following data related to this section in its state: +The frontend stores the following related data related in its state: * The chat IDs accessible to the user. @@ -654,12 +712,13 @@ Let's call this information frontend chat metadata. Periodically, the EMS queries the `get_my_chats_and_time` backend canister API. Its result is compared to the frontend chat metadata in the state. +The existing chat metadata is updated if required. If there is a new chat in the result that is not yet in the state, the EMS adds it to the state along with the information that no messages were obtained for this chat yet. -If one of the chats in the state does not appear in the result of `get_my_chats_and_time` anymore, then this chat is deleted from the state. +If one of the chats in the state does not appear in the result of `get_my_chats_and_time` anymore, then this chat is deleted from the state including from the queues containing received and decrypted messages. Also periodically, two separate routines run. -1. Check if there are new messages to be fetched from the canister: if the largest received message ID for the chat plus one is smaller than the total number of messages in the chat. If it is, then `get_messages` is invoked with the first message ID to be fetched that is equal to the largest received message ID for the chat plus one, or if no messages were received so far for a chat, message ID 0 is used. If an error occurs due to too large messages that don't fit into the canister's response due to the query response limits on the ICP, the query to `get_messages` is retried with a limit of e.g. one. A successful result is stored in the received messages queue. +1. Check if there are new messages to be fetched from the canister: if the largest received message ID for the chat plus one is smaller than the total number of messages in the chat. If it is, then `get_messages` is invoked with the first message ID to be fetched that is equal to the largest received message ID for the chat plus one, or if no messages were received so far for a chat, message ID 0 is used. If an error occurs due to too large messages that don't fit into the canister's response due to the query response limits on the ICP, the query to `get_messages` is retried with a limit of e.g. one. A successful result appended to the received messages queue. 2. Try to take a message from the received messages queue and decrypt it. @@ -667,25 +726,19 @@ Also periodically, two separate routines run. b. The EMS [decrypts](#ratchet-message-decryption) the message using the symmetric ratchet state and the vetKey epoch ID stored in the message metadata. A successfully decrypted message is put into the decrypted message queue that is exposed to the chat UI component via the [`takeReceivedMessages` API](#encrypted-messaging-service) of the EMS. If the decryption returns an error, such an error is unrecoverable and instead of a decrypted message, a message of special form is put into the decrypted message queue that indicated that this message could not be decrypted. Note that user-side errors cannot be avoided, since the canister cannot check if the encryption is valid. -#### vetKey Epoch Rotation - -TODO: how often - #### Symmetric Ratchet -A symmetric ratchet state consists of a symmetric ratchet epoch key and a symmetric ratchet epoch ID that it corresponds to. - -The backend canister APIs required for ratchet initialization are described in a [separate section](#providing-vetkeys-for-symmetric-ratchet-initialization). +The backend canister APIs required for ratchet initialization are described in the [Providing vetKeys for Symmetric Ratchet Initialization](#providing-vetkeys-for-symmetric-ratchet-initialization) section. A symmetric ratchet state consists of: -* Key material that is used to: +* Epoch key that is used to: 1. Derive the next ratchet state. 2. Derive chat participants' message encryption keys. -* Symmetric ratchet epoch ID, which is a non-negative number. +* Symmetric ratchet epoch ID that the key corresponds to, which is a non-negative number. ##### Ratchet Initialization @@ -718,11 +771,15 @@ import { deriveSymmetricKey } from '@dfinity/vetkeys'; type RawSymmetricRatchetState = { epochKey: Uint8Array, epochId: bigint }; -function evolve(symmetricRatchetState: RawSymmetricRatchetState) : RawSymmetricRatchetState { - const domainSeparator = new Uint8Array([ +function ratchetStepDomainSeparator(epoch: bigInt) { + return new Uint8Array([ ...DOMAIN_RATCHET_STEP, ...uBigIntTo8ByteUint8ArrayBigEndian(symmetricRatchetState.epochId) ]); +} + +function evolve(symmetricRatchetState: RawSymmetricRatchetState) : RawSymmetricRatchetState { + const domainSeparator = ratchetStepDomainSeparator(symmetricRatchetState.epochKey); const newEpochkey = deriveSymmetricKey(symmetricRatchetState.epochKey, domainSeparator, 32); return { epochKey: newEpochkey, epochId: symmetricRatchetState.epochId + 1n} @@ -730,14 +787,16 @@ function evolve(symmetricRatchetState: RawSymmetricRatchetState) : RawSymmetricR ``` where `DOMAIN_RATCHET_STEP` is a unique domain separator. -Alternatively, this can be implemented using Web Crypto API to make the current key non-extractable. Note though that Web Crypto API's `deriveKey` cannot derive an HKDF key and, therefore, to derive the next epoch key, it first needs to be derive via `deriveBits`, which returns the next epoch key in form of a byte vector, which needs to be imported as `CryptoKey`. +Alternatively, this can be implemented using Web Crypto API to make the current key non-extractable. Note though that Web Crypto API's `deriveKey` cannot derive an HKDF key and, therefore, derivation of the next epoch key is a two-step process: +1. Derive the next epoch key in form of a byte vector via `deriveBits`. +2. Import the derived byte vector as `CryptoKey`. ```ts type SymmetricRatchetState = { epochKey: CryptoKey, epochId: bigint }; async function deriveNextSymmetricRatchetEpochCryptoKey(symmetricRatchetState: RawSymmetricRatchetState) : Promise { const exportable = false; - const domainSeparator = ratchetStepDomainSeparator() + const domainSeparator = ratchetStepDomainSeparator(symmetricRatchetState.epochKey) const algorithm = { name: 'HKDF', hash: 'SHA-256', @@ -755,7 +814,7 @@ async function deriveNextSymmetricRatchetEpochCryptoKey(symmetricRatchetState: R } ``` -It would be quite natural and similar to [Signal's symmetric ratchet](https://signal.org/docs/specifications/doubleratchet/) if the decryption would trigger the ratchet evolution. However, that would force the frontned to decrypt all messages belonging to one vetKey epoch in cases where a chat has many messages and we only want to display the latest ones. This incurs a big and unnecessary overhead in terms of both communication and computation. +It would be quite natural and similar to [Signal's symmetric ratchet](https://signal.org/docs/specifications/doubleratchet/) if the decryption would trigger the ratchet evolution. However, that would force the frontend to decrypt all messages belonging to one vetKey epoch in cases where a chat has many messages and we only want to display the latest ones. This incurs a big and unnecessary overhead in terms of both communication and computation. Therefore, neither encryption nor decryption directly evolve `SymmetricRatchetState` but instead, the state is evolved whenever the current consensus time obtained via `get_my_chats_and_time` minus the message expiry is larger than the timestamp of the current symmetric key epoch id + 1, i.e., whenever there can be no non-expired message that we would need the current symmetric ratchet epoch to decrypt. The state evolution can be triggered by a background job that periodically checks if state evolution should be performed. @@ -774,7 +833,7 @@ async encrypt( return await derivedKeyMaterial.encryptMessage(message, domainSeparator); } ``` -where [`messageEncryptionDomainSeparator`](#typescript-domain-separators) is a unique [size-prefixed](#typescript-size-prefix) domain separator. +where [`messageEncryptionDomainSeparator`](#typescript-domain-separators) is a unique [size-prefixed](#typescript-size-prefix) domain separator and the `nonce` argument is a user-assigned nonce associated with the message that must be unique for `epochKey` (i.e., the symmetric ratchet epoch) und MUST NOT be reused. ##### Ratchet Message Decryption @@ -786,17 +845,26 @@ async decrypt( epochKey: CryptoKey, sender: Principal, nonce: Uint8Array, - message: Uint8Array + encryptedMessage: Uint8Array ): Promise { const domainSeparator = messageEncryptionDomainSeparator(sender, nonce); const derivedKeyMaterial = DerivedKeyMaterial.fromCryptoKey(epochKey); - return await derivedKeyMaterial.encryptMessage(message, domainSeparator); + return await derivedKeyMaterial.decryptMessage(encryptedMessage, domainSeparator); } ``` where [`messageEncryptionDomainSeparator`](#typescript-domain-separators) is a unique [size-prefixed](#typescript-size-prefix) domain separator. #### State Cache +TODO + +The state cache is intended to allow the user to upload encrypted symmetric ratchet epoch keys to the canister. + +Relation to disappearing messages. + +While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. +The state recovery duration is always smaller or equal to the disappearing messages duration because because keys for messages that don't exist anymore are not useful. + ##### Encrypted Maps ## Ensuring Correctness of Query Calls @@ -814,11 +882,52 @@ Is something like mixed hash tree possible in a single canister scenario to redu ## Optimizations +Here, we discuss potential further optimizations that are not an essential part of the spec. + ### Local Cache in indexedDB +The local cache in indexedDB is essentially a copy of the frontend state stored in a persistent storage. + +* Keys - whenever a new ratchet state is created or evolved, it is stored in indexedDB at an index containing its chat ID and the vetKey epoch ID. Whenever a ratchet state is deleted locally, it is also deleted from indexedDB. + +* Messages - in a similar fashion to keys, the decrypted messages are also added to and removed from indexedDB to keep it in sync with the frontend. + +An important difference between keys and messages in indexedDB is the component that is responsible for caching. +For keys, the [EMS](#encrypted-messaging-service) is responsible. +Namely, whenever a ratchet state is required, the EMS first checks if it is available from IndexedDB. +For messages, the UI is responsible. +More specifically, upon initialization of the app, it first loads all messages available in indexedDB into the local state and updates the count of the messages available locally via the EMS API s.t. they are not loaded and decrypted again from the canister. + ### IBE-Encrypted vetKey Resharing -### Allowing New Users to See Old Messsage History +To reduce the number of vetKeys required for group changes and periodic key rotations, IBE with long-term keys can be used: + +1. Each user fetches a long-term IBE key for their principal. + +2. Whenever a user fetches a vetKey for a chat, the user decrypts it and reshares with all other users by encrypting the vetKey with their public IBE keys, resulting in a vector containing a separate encryption for each user. Then, the user sends this vector to a special canister API (not described further in this spec). + +3. Upon receival of a call to that API, the canister checks which users already have either an existing reshared vetKey or have a cached ratchet state. Such reshared vetKeys are filtered out and the rest is added to the canister state. + +Then, the user frontend would check if there is a reshared vetKey stored for them in the canister state before trying to obtain the vetKey in the normal way. +If that is the case, the user would fetch and IBE-decrypt it, and then verify that the vetKey is valid (recall that the vetKey is a BLS signature that can efficiently be verified). +If the check fails, the reshared vetKey is ignored and the frontend proceeds as if there was no resharing, i.e., it obtains the vetKey via the normal API. + +Note that this adds the overhead of resharing keys with potentially many users in the chat which incurs additional runtime overhead. +However, this can be done in the background and doesn't have to block the app. +For that, most of the more costly vetKey derivation calls by the canister can be replaced by cheaper calls to store small encrypted vetKeys. +Also, this optimization works well only if most of the users provide valid reshared vetKeys, but, on the other hand, the penalty to the honest users (some additional latency due to the higher number of steps in the vetKey retrieval logic) is rather small. + +An additional step useful to save a small amount of storage in the resharing routine is to remove the reshared vetKey after an encrypted cache has been added for a user to the canister state. + +### Allowing New Users to See Old Message History + +It is not necessarily always the case that a user added to a chat should not see the previous history, which is the default setting in this spec. +This not only allows specific users to see the chat history but also reduces the number of calls to derive vetKeys. + +In the chat UI, this functionality can be realized e.g. via a flag set in the chat UI while adding a new user. + +The current spec does not allow to integrate this optimization in a completely non-invasive way, since the chat participants of a vetKey epoch cannot be changed. +To facilitate this, the list of chat participants could have versioning, where for each version the list change would be stored in the canister and `get_my_chats_and_time` would return no only the last message in the chat but also all vetKey epoch IDs for each chat and their participant list versions, or only for those where actual changes happened. ## Appendix From 9de5a2a48b7258ff58a54a71c79acf092ef308d5 Mon Sep 17 00:00:00 2001 From: Oleksandr Tkachenko Date: Mon, 13 Oct 2025 11:11:58 +0200 Subject: [PATCH 5/9] state updates --- examples/encrypted_chat/SPEC.md | 103 +++++++++++++++++++++++++++----- 1 file changed, 89 insertions(+), 14 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 5d3029fa..50dbcd36 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -591,6 +591,84 @@ Once the state is updated, it affects the behavior of [the APIs returning the me * `get_messages : (ChatId, ChatMessageId, opt Limit) -> (vec EncryptedMessage) query;` does not add expired messages to the output. +### Encrypted Ratchet State Cache + +The state encrypted ratchet state cache is intended to allow the user to upload encrypted symmetric ratchet epoch keys to the canister and then recover the local state whenever needed, e.g., for disaster recovery or after browser change. + +There is a relation between state recovery and the disappearing message duration. +The state recovery duration is always smaller or equal to the disappearing messages duration because because keys for messages that don't exist anymore are not useful. + +While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. +In general, the state recovery limit can be both chat-level and user-level setting, but it makes more sense to make state recovery a user setting because the user frontend is in full control of state recovery for a non-expired vetKey epoch. + +In the canister backend, the state recovery limit is used in three cases - all related to expired vetKey epochs: +* Removal of expired caches. +* Acceptance/rejection of cache uploads. +* Acceptance/rejection of cache downloads (because removal may not happen instantly). + +Let's denote the state recovery limit as `limit`, current consensus time as `consensus_time` and next vetKey epoch creation time as `next_epoch_creation`, which equals `None` it current epoch is the latest and there is no next epoch. + +``` +fn has_expired(limit: u64, consensus_time: u64, next_epoch_creation: Option) : bool { + if next_epoch.is_none() { + return false; + } else { + next_epoch.map(|next_epoch| consensus_time.saturating_sub(limit) > next_epoch) + } +} +let expired = now - l +``` + +The canister backend API looks as follows: +```candid +set_state_recovery_limit : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); +``` +and must remove expired caches before setting a _higher_ limit. In general, it could make sense to remove expired caches on each update, since the latency of this operation is not critical. + +The canister backend API `get_state_recovery_limit` can be used by a frontend that has lost its state to recover the information about user's state recovery limit for a chat with ID `ChatId`. +```candid +get_state_recovery_limit : (ChatId) -> (variant { Ok : ?MessageExpiryMins; Err : text }) query; +``` + +Since cache update is usually a relatively rare operation compared to e.g. potential message updates, adding the cache metadata to the `get_my_chats_and_time` API of the backend canister seems like overkill. Instead, we could add a separate API that would return this metadata s.t. the frontend can find out which caches should be updated. +```candid +type StateRecoveryCacheMetadata = record { + vetkey_epoch_id: VetKeyEpochId; + symmetric_epoch_id : SymmetricEpochId; +}; + +get_metadata_for_my_state_recovery_caches : () -> (vec StateRecoveryCacheMetadata) query; +``` + +The frontend's role in enabling state recovery is twofold: +* Initialization of caches. +* Updates of caches. + +Independent of how the local ratchet state was initialized, once initialized the frontend checks if the ratchet state cache in the canister needs to be updated. If no cache state exists in the canister or it is outdated, the frontend encrypts and uploads the state via the `update_my_symmetric_key_cache` backend canister API. + +```candid +type GroupChatId = nat64; +type ChatId = variant { + Group : GroupChatId; + Direct : record { principal; principal }; +}; +type VetKeyEpochId = nat64; +type EncryptedSymmetricRatchetCache = blob; + +update_my_symmetric_key_cache : (ChatId, VetKeyEpochId, EncryptedSymmetricRatchetCache) -> (variant { Ok; Err : text }); +``` + +The `update_my_symmetric_key_cache` backend canister API works as follows: +1. If the user doesn't have access to the chat with ID `ChatId` or the chat does not exist, return an error. +2. If the user tries to upload cache for an invalid `VetKeyEpochId`, return an error. +3. Store the EncryptedSymmetricRatchetCache + +In addition to cache updates upon initialization, the frontend should periodically check that the cache is up-to-date. How often this is done depends on the symmetric ratchet duration window. +However, note that only the client that is online can update their cache, which normally doesn't work perfectly practice, e.g., if we rotate the symmetric ratchet state every hour, there will be very few clients that will be online for actually performing cache updates every hour. +Also, cache updates cost cycles in the ICP, and hence excessive use undesirably costs additional money. +Therefore, cache updates every (half a) day, and therefore the symmetric ratchet duration of a similar length, seem to be most practical in the general use case. +If another use case requires more strict parameters, they can be set appropriately to enable that use case. + ## User Frontend Component ### Frontend State @@ -762,7 +840,7 @@ The ratchet state is initialized from a vetKey as follows: More details about the retrieval and decryption of vetKeys can be found in the [developer docs](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api) of the ICP. -TODO: add initialization from cache +After initializing the ratchet state, the user uploads encrypted cache of the state to the backend canister which is further described in [Encrypted Ratchet State Cache](#encrypted-ratchet-state-cache). ##### Ratchet Evolution @@ -818,6 +896,8 @@ It would be quite natural and similar to [Signal's symmetric ratchet](https://si Therefore, neither encryption nor decryption directly evolve `SymmetricRatchetState` but instead, the state is evolved whenever the current consensus time obtained via `get_my_chats_and_time` minus the message expiry is larger than the timestamp of the current symmetric key epoch id + 1, i.e., whenever there can be no non-expired message that we would need the current symmetric ratchet epoch to decrypt. The state evolution can be triggered by a background job that periodically checks if state evolution should be performed. +After evolving the ratchet state, the user uploads encrypted updated state cache to the canister backend which is further described in [Encrypted Ratchet State Cache](#encrypted-ratchet-state-cache). + ##### Ratchet Message Encryption ```ts import { DerivedKeyMaterial } from '@dfinity/vetkeys'; @@ -854,19 +934,6 @@ async decrypt( ``` where [`messageEncryptionDomainSeparator`](#typescript-domain-separators) is a unique [size-prefixed](#typescript-size-prefix) domain separator. -#### State Cache - -TODO - -The state cache is intended to allow the user to upload encrypted symmetric ratchet epoch keys to the canister. - -Relation to disappearing messages. - -While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. -The state recovery duration is always smaller or equal to the disappearing messages duration because because keys for messages that don't exist anymore are not useful. - -##### Encrypted Maps - ## Ensuring Correctness of Query Calls TODO @@ -1061,6 +1128,7 @@ type PublicTransportKey = blob; type DerivedVetKeyPublicKey = blob; type EncryptedVetKey = blob; type IbeEncryptedVetKey = blob; +type EncryptedSymmetricRatchetCache = blob; type EncryptedMessageMetadata = record { vetkey_epoch : VetKeyEpochId; @@ -1100,6 +1168,10 @@ type ChatMetadata = record { vetkey_epoch_id : VetKeyEpochId; symmetric_epoch_id : SymmetricEpochId; }; +type StateRecoveryCacheMetadata = record { + vetkey_epoch_id: VetKeyEpochId; + symmetric_epoch_id : SymmetricEpochId; +}; service : (text) -> { chat_public_key : (ChatId, VetKeyEpochId) -> (DerivedVetKeyPublicKey); @@ -1115,6 +1187,7 @@ service : (text) -> { get_messages : (ChatId, ChatMessageId, opt Limit) -> ( vec EncryptedMessage, ) query; + get_metadata_for_my_state_recovery_caches : () -> (vec StateRecoveryCacheMetadata) query; get_vetkey_epoch_metadata : (ChatId, VetKeyEpochId) -> (variant { Ok : VetKeyEpochMetadata; Err : text }) query; get_vetkey_resharing_ibe_decryption_key : (PublicTransportKey) -> (EncryptedVetKey); get_vetkey_resharing_ibe_encryption_key : (Receiver) -> (DerivedVetKeyPublicKey); @@ -1127,6 +1200,8 @@ service : (text) -> { rotate_chat_vetkey : (ChatId) -> (KeyRotationResult); send_message : (ChatId, UserMessage) -> (variant { Ok; Err : MessageSendingError }); set_message_expiry : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); + set_state_recovery_limit : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); + get_state_recovery_limit : (ChatId) -> (variant { Ok : ?MessageExpiryMins; Err : text }) query; update_my_symmetric_key_cache : (ChatId, VetKeyEpochId, EncryptedSymmetricRatchetCache) -> (variant { Ok; Err : text }); } ``` \ No newline at end of file From 3baebb5888432690c83db96c1660360b0ed27fa5 Mon Sep 17 00:00:00 2001 From: Oleksandr Tkachenko Date: Mon, 13 Oct 2025 11:51:16 +0200 Subject: [PATCH 6/9] state updates --- examples/encrypted_chat/SPEC.md | 250 +++++++++++++++----------------- 1 file changed, 117 insertions(+), 133 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 50dbcd36..27170a01 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -396,9 +396,22 @@ pub fn ratchet_context(chat_id_bytes: &[u8]) -> Vec { The actual initialization of the symmetric ratchet state is performed in the frontend and is, therefore, specified in [Ratchet Initialization](#ratchet-initialization) in the frontend. -### User's Encrypted Symmetric Ratchet State Cache +### Encrypted Symmetric Ratchet State Cache -The canister backend provides the following APIs for storing users' encrypted symmetric ratchet states in the canister: +The encrypted ratchet state cache is intended to allow the user to upload encrypted symmetric ratchet epoch keys to the canister and then [recover](#state-recovery) the local state in the frontend whenever needed, e.g., for disaster recovery or after a browser change. + +There is a relation between state recovery and the [disappearing messages](#disappearing-messages) duration. +The state recovery duration is always smaller or equal to the disappearing messages duration because because keys for messages that don't exist anymore are not useful. + +While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. +In general, the state recovery limit can be both chat-level and user-level setting, but it makes more sense to make state recovery a user setting because the user frontend is in full control of state recovery for a non-expired vetKey epoch. + +In the canister backend, the state recovery limit is used in three cases - all related to expired vetKey epochs: +* Removal of expired caches. +* Acceptance/rejection of cache uploads. +* Acceptance/rejection of cache downloads (because removal may not happen instantly). + +The canister backend provides the following APIs for storing and obtaining users' encrypted symmetric ratchet states in the canister: ``` type PublicTransportKey = blob; type EncryptedVetKey = blob; @@ -436,60 +449,43 @@ If the checks pass, the canister accepts the call and stores the cache, or overw The `get_my_symmetric_key_cache` call retrieves the response of Encrypted Maps for getting their cache corresponding to the input arguments, which is the encrypted bytes if the entry exists or `null` if it does not. -The expired caches are removed transparently to the user by the canister, i.e., the removal does not require explicit calls for doing that. -Expired cache is defined as a cache that neither has any messages associated with it in the canister nor does it correspond to the latest vetKey epoch for the chat ID. -See also [Disappearing Messages](#disappearing-messages). +The expired caches are removed transparently to the frontend by the canister, i.e., the removal does not require explicit calls for doing that. +Expired cache is a cache, where both of the following is true: +* The cache neither has any messages associated with its vetKey epoch (see [Disappearing Messages](#disappearing-messages)) nor does it correspond to the _latest_ vetKey epoch for the chat ID. +* The vetKey epoch has not expired with regards to the state recovery limit (see below). -Encrypted maps handles the encryption of the map values transparently to the developer, i.e., the developer does not need to know how encryption exactly works in encrypted maps and can use encrypted maps as a black box. -For the purpose of using encrypted maps for the encryption of user's cached keys, the encrypted maps object is instantiated with the domain separator `"vetkeys-example-encrypted-chat-user-cache"` in the backend. -Then, to handle the cache, the frontend relies on the following: - -* Store to encrypted maps - when the frontend obtained a new ratchet state or wants to update a ratchet state cache, the frontend calls. - -* Load from encrypted maps - when the frontend requires to recover its previous ratchet states, it retrieves the encrypted cache and decrypts it locally. - -```ts - import { EncryptedMaps } from '@dfinity/vetkeys/encrypted_maps'; - - function store(encryptedMaps: EncryptedMaps, myPrincipal: Principal, chatId: ChatId, vetKeyEpochId: bigint, cache: Uint8Array) { - const epochBytes = uBigIntTo8ByteUint8ArrayBigEndian(vetKeyEpochId); - const chatIdBytes = chatIdToBytes(chatId); - const mapKey = new Uint8Array([...chatIdBytes, ...epochBytes]); - encryptedMaps.setValue(myPrincipal, mapName(), mapKey, cache); - } +Let's denote the state recovery limit as `limit`, current consensus time as `consensus_time` and next vetKey epoch creation time as `next_epoch_creation`, which equals `None` it current epoch is the latest and there is no next epoch. +To determine if a vetKey epoch has expired with regards to the state recovery limit, the following function is used. - function load(encryptedMaps: EncryptedMaps, myPrincipal: Principal, chatId: ChatId, vetKeyEpochId: bigint) : Uint8Array { - const epochBytes = uBigIntTo8ByteUint8ArrayBigEndian(vetKeyEpochId); - const chatIdBytes = chatIdToBytes(chatId); - const mapKey = new Uint8Array([...chatIdBytes, ...epochBytes]); - return encryptedMaps.getValue(myPrincipal, mapName(), mapKey); - } +```rust +fn has_expired(limit: u64, consensus_time: u64, next_epoch_creation: Option) : bool { + if next_epoch.is_none() { + return false; + } else { + next_epoch.map(|next_epoch| consensus_time.saturating_sub(next_epoch) > limit) + } +} +``` - function chatIdToBytes(chatId: ChatId) : Uint8Array { - if ('Direct' in chatId) { - return new Uint8Array([ - 0, - ...chatId.Direct[0].toUint8Array(), - ...chatId.Direct[1].toUint8Array() - ]); - } else { - return new Uint8Array([1, ...uBigIntTo8ByteUint8ArrayBigEndian(chatId.Group)]); - } - } +The canister backend API for setting the state recovery limit is defined as follows: +```candid +set_state_recovery_limit : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); +``` +and must remove expired caches before setting a _higher_ limit. In general, it could make sense to remove expired caches on each update, since the latency of this operation is usually not critical. - function uBigIntTo8ByteUint8ArrayBigEndian(value: bigint): Uint8Array { - if (value < 0n) throw new RangeError('Accepts only bigint n >= 0'); +The canister backend API `get_state_recovery_limit` can be used by a frontend that has lost its state to recover the information about user's state recovery limit for a chat with ID `ChatId`. +```candid +get_state_recovery_limit : (ChatId) -> (variant { Ok : ?MessageExpiryMins; Err : text }) query; +``` - const bytes = new Uint8Array(8); - for (let i = 0; i < 8; i++) { - bytes[i] = Number((value >> BigInt(i * 8)) & 0xffn); - } - return bytes; - } +Since cache update is usually a relatively rare operation compared to e.g. potential message updates, adding the cache metadata to the `get_my_chats_and_time` API of the backend canister seems like overkill. Instead, we could add a separate API that would return this metadata s.t. the frontend can find out which caches should be updated. +```candid +type StateRecoveryCacheMetadata = record { + vetkey_epoch_id: VetKeyEpochId; + symmetric_epoch_id : SymmetricEpochId; +}; - function mapName(): Uint8Array { - return new TextEncoder().encode('encrypted_chat_cache'); - } +get_metadata_for_my_state_recovery_caches : () -> (vec StateRecoveryCacheMetadata) query; ``` The garbage collection of expired caches happens when there can be no messages that need a particular ratchet state cache for decryption. @@ -497,15 +493,15 @@ That ratchet state cache is then removed by the canister. This can either happen during user's calls e.g. to add a new message to a chat, or by a timer job, or, actually, both in parallel to reduce the latency of cache deletion. As a potential further optimization, it is possible to fetch the available cached ratchet states for all chats at once to reduce the number of calls. -However, this is mostly helpful for the relatively rare cases of complete state recovery but not in the other cases such as if part of the state is already available in the local state (i.e., opening the chat in the browser after some time when the chat was closed). In the best case, user's ratchet state cache should be synchronized with the frontend's state, i.e., whenever a ratchet state is evolved in the frontend, it should also be updated in the backend. However, this only works if the user is online and active. If the user is offline, the canister will delete the ratchet states that don't have any messages in canister storage that can be decrypted using those states, except for the very last state in order to be able to evolve it to a later when the user is online. -If keeping an older ratchet state is not desirable by some users, this can be mitigated by setting up a periodic key rotation, where a frontend will not encrypt new messages with a ratchet state from an old vetKey epoch and will instead request a vetKey rotation before the next encryption. +As an idea for further improvement in cases if keeping an older ratchet state is not desirable by some users even if the users may potentially go offline for longer time frames, this can be mitigated by setting up a periodic key rotation (not defined in this spec), where a frontend will not encrypt new messages with a ratchet state from an old vetKey epoch and will instead request a vetKey rotation before the next encryption. In that case, the users' encrypted caches associated with the older vetKey epoch will be garbage-collected automatically by the canister once they are not required for state recovery anymore. -Note though that this happens by a timer job, so despite that the canister APIs will returns correct results for the state recovery, the actual cache deletion from a canister might happen at a later point, e.g., during an hour or a day, depending on the configuration. +In that case, the caches for the latest vetKey epoch that should have been rotated could also be garbage-collected. +Note though that this is normally performed by a timer job, so despite that the canister APIs will return correct results for the state recovery, the actual cache deletion from a canister might happen at a later point, e.g., during an hour or a day, depending on the configuration. Also, if the symmetric ratchet evolution duration is too short or the number of users in a chat is very high, updating the cache may be costly because that would involve many canister calls (one per user per chat). To allow for mass adoption of a chat app, one could think of the following strategies to reduce costs: @@ -591,84 +587,6 @@ Once the state is updated, it affects the behavior of [the APIs returning the me * `get_messages : (ChatId, ChatMessageId, opt Limit) -> (vec EncryptedMessage) query;` does not add expired messages to the output. -### Encrypted Ratchet State Cache - -The state encrypted ratchet state cache is intended to allow the user to upload encrypted symmetric ratchet epoch keys to the canister and then recover the local state whenever needed, e.g., for disaster recovery or after browser change. - -There is a relation between state recovery and the disappearing message duration. -The state recovery duration is always smaller or equal to the disappearing messages duration because because keys for messages that don't exist anymore are not useful. - -While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. -In general, the state recovery limit can be both chat-level and user-level setting, but it makes more sense to make state recovery a user setting because the user frontend is in full control of state recovery for a non-expired vetKey epoch. - -In the canister backend, the state recovery limit is used in three cases - all related to expired vetKey epochs: -* Removal of expired caches. -* Acceptance/rejection of cache uploads. -* Acceptance/rejection of cache downloads (because removal may not happen instantly). - -Let's denote the state recovery limit as `limit`, current consensus time as `consensus_time` and next vetKey epoch creation time as `next_epoch_creation`, which equals `None` it current epoch is the latest and there is no next epoch. - -``` -fn has_expired(limit: u64, consensus_time: u64, next_epoch_creation: Option) : bool { - if next_epoch.is_none() { - return false; - } else { - next_epoch.map(|next_epoch| consensus_time.saturating_sub(limit) > next_epoch) - } -} -let expired = now - l -``` - -The canister backend API looks as follows: -```candid -set_state_recovery_limit : (ChatId, MessageExpiryMins) -> (variant { Ok; Err : text }); -``` -and must remove expired caches before setting a _higher_ limit. In general, it could make sense to remove expired caches on each update, since the latency of this operation is not critical. - -The canister backend API `get_state_recovery_limit` can be used by a frontend that has lost its state to recover the information about user's state recovery limit for a chat with ID `ChatId`. -```candid -get_state_recovery_limit : (ChatId) -> (variant { Ok : ?MessageExpiryMins; Err : text }) query; -``` - -Since cache update is usually a relatively rare operation compared to e.g. potential message updates, adding the cache metadata to the `get_my_chats_and_time` API of the backend canister seems like overkill. Instead, we could add a separate API that would return this metadata s.t. the frontend can find out which caches should be updated. -```candid -type StateRecoveryCacheMetadata = record { - vetkey_epoch_id: VetKeyEpochId; - symmetric_epoch_id : SymmetricEpochId; -}; - -get_metadata_for_my_state_recovery_caches : () -> (vec StateRecoveryCacheMetadata) query; -``` - -The frontend's role in enabling state recovery is twofold: -* Initialization of caches. -* Updates of caches. - -Independent of how the local ratchet state was initialized, once initialized the frontend checks if the ratchet state cache in the canister needs to be updated. If no cache state exists in the canister or it is outdated, the frontend encrypts and uploads the state via the `update_my_symmetric_key_cache` backend canister API. - -```candid -type GroupChatId = nat64; -type ChatId = variant { - Group : GroupChatId; - Direct : record { principal; principal }; -}; -type VetKeyEpochId = nat64; -type EncryptedSymmetricRatchetCache = blob; - -update_my_symmetric_key_cache : (ChatId, VetKeyEpochId, EncryptedSymmetricRatchetCache) -> (variant { Ok; Err : text }); -``` - -The `update_my_symmetric_key_cache` backend canister API works as follows: -1. If the user doesn't have access to the chat with ID `ChatId` or the chat does not exist, return an error. -2. If the user tries to upload cache for an invalid `VetKeyEpochId`, return an error. -3. Store the EncryptedSymmetricRatchetCache - -In addition to cache updates upon initialization, the frontend should periodically check that the cache is up-to-date. How often this is done depends on the symmetric ratchet duration window. -However, note that only the client that is online can update their cache, which normally doesn't work perfectly practice, e.g., if we rotate the symmetric ratchet state every hour, there will be very few clients that will be online for actually performing cache updates every hour. -Also, cache updates cost cycles in the ICP, and hence excessive use undesirably costs additional money. -Therefore, cache updates every (half a) day, and therefore the symmetric ratchet duration of a similar length, seem to be most practical in the general use case. -If another use case requires more strict parameters, they can be set appropriately to enable that use case. - ## User Frontend Component ### Frontend State @@ -840,7 +758,7 @@ The ratchet state is initialized from a vetKey as follows: More details about the retrieval and decryption of vetKeys can be found in the [developer docs](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api) of the ICP. -After initializing the ratchet state, the user uploads encrypted cache of the state to the backend canister which is further described in [Encrypted Ratchet State Cache](#encrypted-ratchet-state-cache). +After initializing the ratchet state, the user uploads encrypted cache of the state to the backend canister which is further described in [Encrypted Ratchet State Cache](#encrypted-symmetric-ratchet-state-cache). ##### Ratchet Evolution @@ -896,7 +814,7 @@ It would be quite natural and similar to [Signal's symmetric ratchet](https://si Therefore, neither encryption nor decryption directly evolve `SymmetricRatchetState` but instead, the state is evolved whenever the current consensus time obtained via `get_my_chats_and_time` minus the message expiry is larger than the timestamp of the current symmetric key epoch id + 1, i.e., whenever there can be no non-expired message that we would need the current symmetric ratchet epoch to decrypt. The state evolution can be triggered by a background job that periodically checks if state evolution should be performed. -After evolving the ratchet state, the user uploads encrypted updated state cache to the canister backend which is further described in [Encrypted Ratchet State Cache](#encrypted-ratchet-state-cache). +After evolving the ratchet state, the user uploads encrypted updated state cache to the canister backend which is further described in [Encrypted Ratchet State Cache](#encrypted-symmetric-ratchet-state-cache). ##### Ratchet Message Encryption ```ts @@ -996,6 +914,72 @@ In the chat UI, this functionality can be realized e.g. via a flag set in the ch The current spec does not allow to integrate this optimization in a completely non-invasive way, since the chat participants of a vetKey epoch cannot be changed. To facilitate this, the list of chat participants could have versioning, where for each version the list change would be stored in the canister and `get_my_chats_and_time` would return no only the last message in the chat but also all vetKey epoch IDs for each chat and their participant list versions, or only for those where actual changes happened. +### State Recovery + +The frontend's role in enabling state recovery (see also [Encrypted Ratchet State Cache](#encrypted-symmetric-ratchet-state-cache) in the backend) is twofold: +* Initialization of caches. +* Updates of caches. + +Independent of how the local ratchet state was initialized, once initialized the frontend checks if the ratchet state cache in the canister needs to be updated. If no cache state exists in the canister or it is outdated, the frontend encrypts and uploads the state via the `update_my_symmetric_key_cache` backend canister API. + +In addition to cache updates upon initialization, the frontend should periodically check that the cache is up-to-date. How often this is done depends on the symmetric ratchet duration window. +However, note that only the client that is online can update their cache, which normally doesn't work perfectly practice, e.g., if we rotate the symmetric ratchet state every hour, there will be very few clients that will be online for actually performing cache updates every hour. +Also, cache updates cost cycles in the ICP, and hence excessive use undesirably costs additional money. +Therefore, cache updates every (half a) day, and therefore the symmetric ratchet duration of a similar length, seem to be most practical in the general use case. +If another use case requires more strict parameters, they can be set appropriately to enable that use case. + +Encrypted maps handles the encryption of the map values transparently to the developer, i.e., the developer does not need to know how encryption exactly works in encrypted maps and can use encrypted maps as a black box. +For the purpose of using encrypted maps for the encryption of user's cached keys, the encrypted maps object is instantiated with the domain separator `"vetkeys-example-encrypted-chat-user-cache"` in the backend. +Then, to handle the cache, the frontend relies on the following: + +* Store to encrypted maps - when the frontend obtained a new ratchet state or wants to update a ratchet state cache, the frontend calls. + +* Load from encrypted maps - when the frontend requires to recover its previous ratchet states, it retrieves the encrypted cache and decrypts it locally. + +```ts + import { EncryptedMaps } from '@dfinity/vetkeys/encrypted_maps'; + + function store(encryptedMaps: EncryptedMaps, myPrincipal: Principal, chatId: ChatId, vetKeyEpochId: bigint, cache: Uint8Array) { + const epochBytes = uBigIntTo8ByteUint8ArrayBigEndian(vetKeyEpochId); + const chatIdBytes = chatIdToBytes(chatId); + const mapKey = new Uint8Array([...chatIdBytes, ...epochBytes]); + encryptedMaps.setValue(myPrincipal, mapName(), mapKey, cache); + } + + function load(encryptedMaps: EncryptedMaps, myPrincipal: Principal, chatId: ChatId, vetKeyEpochId: bigint) : Uint8Array { + const epochBytes = uBigIntTo8ByteUint8ArrayBigEndian(vetKeyEpochId); + const chatIdBytes = chatIdToBytes(chatId); + const mapKey = new Uint8Array([...chatIdBytes, ...epochBytes]); + return encryptedMaps.getValue(myPrincipal, mapName(), mapKey); + } + + function chatIdToBytes(chatId: ChatId) : Uint8Array { + if ('Direct' in chatId) { + return new Uint8Array([ + 0, + ...chatId.Direct[0].toUint8Array(), + ...chatId.Direct[1].toUint8Array() + ]); + } else { + return new Uint8Array([1, ...uBigIntTo8ByteUint8ArrayBigEndian(chatId.Group)]); + } + } + + function uBigIntTo8ByteUint8ArrayBigEndian(value: bigint): Uint8Array { + if (value < 0n) throw new RangeError('Accepts only bigint n >= 0'); + + const bytes = new Uint8Array(8); + for (let i = 0; i < 8; i++) { + bytes[i] = Number((value >> BigInt(i * 8)) & 0xffn); + } + return bytes; + } + + function mapName(): Uint8Array { + return new TextEncoder().encode('encrypted_chat_cache'); + } +``` + ## Appendix ### Constructing `ChatId` From 4478b4136bcb8baa930fb8e90cab6ce7f095fcdf Mon Sep 17 00:00:00 2001 From: Andrea Cerulli Date: Fri, 14 Nov 2025 16:57:22 +0100 Subject: [PATCH 7/9] Reviewed first section --- examples/encrypted_chat/SPEC.md | 83 ++++++++++++++------------------- 1 file changed, 34 insertions(+), 49 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 27170a01..291d52cf 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -2,73 +2,58 @@ vetKey Encrypted Chat has two main components: the canister backend and user frontend. It provides the following features: -* Messaging protected with end-to-end encryption. +* End-to-end encrypted messaging. * High security through symmetric ratchet and key rotation via [vetKeys](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/introduction). -* Disappearing messages guaranteed by the canister (ICP smart contract) logic. +* Disappearing messages, enforced by the canister (ICP smart contract) logic. Messages are automatically removed from the frontend and encrypted messages are purged from the backend once they expire. -## Encryption Keys and State Recovery - -TODO: explain what keys we have. - -vetKey Rotation: - -* Independent key material - provides both secrecy and post-compromise security, i.e., an adversary obtaining the key material for a vetKey epoch gains no knowledge about the key material in other vetKey epochs. - -* More costly than symmetric key ratchet because requires interaction with the backend canister and a [vetKey derivation](https://internetcomputer.org/docs/building-apps/network-features/vetkeys/api). - -* Can be performed on group changes in a chat or periodically. +* Encrypted state recovery, enabling users to securely restore their message-decryption capability across different devices. -For each vetKey epoch, we instantiate a Symmetric Key Ratchet State that can be evolved. +## Encryption Keys and State Recovery -Symmetric Key Ratchet: +### Key Hierarchy -* Creates a chain of symmetric keys known by the chat participants from an initial key material derived from a vetKey. +vetKey encrypted chat uses three layers of cryptographic keys: +* **vetKeys**: shared keys established thorough the [vetKD protocol](https://internetcomputer.org/docs/references/vetkeys-overview). +They rotate periodically, e.g. upon group configuration changes, to ensure both forward security and post-compromise security. +This means that an adversary who obtains the key material for one vetKey epoch gains no information about past or future epochs. +Deriving new vetKeys incurs some cost, as it requires interaction with the backend canister, which triggers a vetKey-derivation protocol on the ICP. +* **Symmetric key ratchet**: a continuously evolving chain of symmetric keys derived from the current vetKey. +It provides forward security but not post-compromise security: an adversary who obtains the ratchet key for step i can derive all future ratchet states for that same vetKey epoch. +Ratchet advancement is efficient and can be performed locally by each participant without interacting with the backend canister. +* **Message encryption keys**: per-message keys derived from the current symmetric ratchet state. -* Very efficient local ratchet evolution based on consensus time frames. +### Key Rotation -* Provides forward secrecy but not post-compromise security, i.e., an adversary obtaining the key material for the symmetric ratchet ID `i` can derive any ratchet state for an epoch ID that is larger than `i`. +While vetKey rotations can occur at arbitrary times, the symmetric ratchet progresses strictly at fixed time-frame boundaries. +The length of this time frame determines how long a user must retain ratchet key material to decrypt messages. +This contrasts with protocols such as Signal, where old keys are usually discarded immediately after advancing the ratchet. -While vetKey rotation can be performed at arbitrary time points, the symmetric ratchet is performed at time frame boundaries. -The duration of the time frame defines the granularity of the time frame that the user must keep the key material around for in order to be able to decrypt messages. -This differs from e.g. the Signal protocol, where the old key is discarded as soon as it is evolved in most cases. -Keeping some keys around is important if we want to decrypt messages in arbitrary order. -Consider for example a scenario where we only want to decrypt the newest messages that we want to display to the user and not the whole history, which might be much longer and contain media material that requires a lot of potentially unnecessary communication. -This requires us to keep around the ratchet state, with which we are able to decrypt the oldest non-expired message. +Retaining some ratchet states is necessary to support out-of-order decryption. +For example, a client may only want to decrypt recent messages shown in the UI—not the entire history, which may include large media requiring unnecessary downloads. +To decrypt the oldest non-expired message, the client must keep the appropriate ratchet state. -The duration of the symmetric key ratchet time frame defines the duration of the time frame that we would be able to decrypt messages for that goes beyond what's necessary. -Consider e.g. a state recovery limit `l` ns and the symmetric ratchet duration of `r` ns as well. -In that case, if no vetKey rotations happen in between, at a time point `p` that is `l < p < 2r` the frontend would need to keep around both symmetric ratchet states and the ability to decrypt messages received by the canister from `0` ns and `p` ns instead of from `p - l` to `p`. -If a vetKey rotation happens in-between, this redundant encryption time frame can only become shorter. +The duration of the symmetric ratchet time frame directly affects how long clients must retain decryption capability beyond what is strictly necessary. +For example, if each symmetric ratchet epoch has length `r` and the encrypted chat maintains a message history of length `h ≤ k·r`, then the frontend may need to keep the ratchet state for up to `k + 1` epochs in order to decrypt any message within that history window. -If we set the state recovery limit to `l` ns and symmetric ratchet duration to `r * l`, then we can reduce the amount of time we unnecessarily keep the key material for by a factor of `r`. -For example, if we set the state recovery limit to one month and the symmetric ratchet duration to one hour, then we would need to keep the key material (and the ability to decrypt all message encrypted with it) for almost one additional hour in the worst case. -The downside of this approach is that we need to keep symmetric ratchet states for more ratchet epochs around or compute them on the fly s.t. the messages can be decrypted in any order. -However, with realistic time frames, keeping around hundreds of keys or just a single key that we can generate the other keys from is not a showstopper. +### Encrypted State Recovery -Realistic time frames for symmetric ratchet evolution: between a minute and a few hours/days. Being on a lower side (minutes) makes less sense, since updating the state's keys involves updating the encrypted cache, where every update costs cycles, which would make mass adoption of the chat costly. +Encrypted state recovery allows a user to restore their ability to decrypt messages on a new device by securely caching encrypted symmetric ratchet states in the backend. +Each ratchet state is encrypted client-side using an individual key, ensuring that the backend never learns any cryptographic material that would enable message decryption. Since the symmetric ratchet is instantiated from a vetKey, the ability to obtain the vetKey allows to decrypt all messages encrypted using any derived ratchet states. -Therefore, to allow for a limited state recovery it is important to disallow the frontend to obtain the initial vetKey again once an encrypted cache for that vetKey epoch was uploaded by the user. -Instead, the frontend encrypts its symmetric ratchet states that are required for state recovery and stores them in the canister. -Then, once a state recovery is required, the frontend fetches the required encrypted ratchet states and decrypts them. - -Note that the end of life of a symmetric ratchet state triggers ratchet evolution, which evolves the state and forgets about the previous one, meaning that also the frontend's encrypted cache must be updated. -This poses another tradeoff in terms of the symmetric ratchet duration, i.e., the shorter the time frame is, the more often the cached state must be updated. -Therefore, symmetric ratchet state evolution duration in the order of minutes might be too short to be practical in chats with many users, and duration in the order of hours or days could be better in that case. - -Note also that only active users can update their encrypted cache in the canister. -Therefore, forward secrecy can only be guaranteed to active users or users who disable the state recovery by uploading invalid encrypted cache. - -## Notation - -* Disappearing messages - automatically remove messages in the frontend and encrypted messages in the backend canister from the memory/storage once they expire. +To provide a _limited_ state recovery it is crucial to: +* Prevent the frontend to obtain the initial vetKey once the user has uploaded the encrypted cache for its epoch. +* Encrypt and upload the necessary symmetric ratchet states using user-specific encryption keys. +* Retrieve and decrypt the stored ratchet states during recovery, restoring the user's ability to decrypt messages without exposing the initial vetKey. -* State recovery limit - store an encrypted backup of the encryption keys in the canister that allows to decrypt messages and recover from a state loss in the frontend. +Users who do not wish to enable state recovery may still want to signal to the backend that they have retrieved the vetKey, thereby preventing any future retrievals. This can be achieved, for example, by uploading a deliberately invalid encrypted cache once, which marks the vetKey as unrecoverable for that epoch. -* Time - in this document we use [Unix time](https://en.wikipedia.org/wiki/Unix_time) and standard abbreviations for time units such as ns for nanoseconds. +Practical symmetric-ratchet epoch durations range from a few minutes to several hours or even days. +When state recovery is enabled, shorter epochs increase the frequency of encrypted-cache updates, which in turn raises cycle consumption on the backend. +As a result, very short time frames may be impractical in large or busy chats. ## Components From 3d49672484cc20f5fddcee19a354516a8c360207 Mon Sep 17 00:00:00 2001 From: Andrea Cerulli Date: Fri, 14 Nov 2025 17:03:06 +0100 Subject: [PATCH 8/9] Other editorial changes --- examples/encrypted_chat/SPEC.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index 291d52cf..b8872689 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -87,6 +87,8 @@ The frontend's responsibilities are: * Fetching and decrypting incoming messages. +* Upload user's encrypted key cache in the backend. + ## Backend Canister Component ### Backend State @@ -386,7 +388,7 @@ The actual initialization of the symmetric ratchet state is performed in the fro The encrypted ratchet state cache is intended to allow the user to upload encrypted symmetric ratchet epoch keys to the canister and then [recover](#state-recovery) the local state in the frontend whenever needed, e.g., for disaster recovery or after a browser change. There is a relation between state recovery and the [disappearing messages](#disappearing-messages) duration. -The state recovery duration is always smaller or equal to the disappearing messages duration because because keys for messages that don't exist anymore are not useful. +The state recovery duration is always smaller or equal to the disappearing messages duration because keys for messages that don't exist anymore are not useful. While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. In general, the state recovery limit can be both chat-level and user-level setting, but it makes more sense to make state recovery a user setting because the user frontend is in full control of state recovery for a non-expired vetKey epoch. From c351473add0d685e7631e63646f3a30a96d344c0 Mon Sep 17 00:00:00 2001 From: Jack Lloyd Date: Wed, 7 Jan 2026 10:34:02 -0500 Subject: [PATCH 9/9] Delete incorrect comment, note unimplemented feature --- examples/encrypted_chat/SPEC.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/examples/encrypted_chat/SPEC.md b/examples/encrypted_chat/SPEC.md index b8872689..b8f356ce 100644 --- a/examples/encrypted_chat/SPEC.md +++ b/examples/encrypted_chat/SPEC.md @@ -28,7 +28,6 @@ Ratchet advancement is efficient and can be performed locally by each participan While vetKey rotations can occur at arbitrary times, the symmetric ratchet progresses strictly at fixed time-frame boundaries. The length of this time frame determines how long a user must retain ratchet key material to decrypt messages. -This contrasts with protocols such as Signal, where old keys are usually discarded immediately after advancing the ratchet. Retaining some ratchet states is necessary to support out-of-order decryption. For example, a client may only want to decrypt recent messages shown in the UI—not the entire history, which may include large media requiring unnecessary downloads. @@ -391,7 +390,7 @@ There is a relation between state recovery and the [disappearing messages](#disa The state recovery duration is always smaller or equal to the disappearing messages duration because keys for messages that don't exist anymore are not useful. While disappearing messages duration is a chat-level setting, the state recovery duration is a user-level setting in a chat. -In general, the state recovery limit can be both chat-level and user-level setting, but it makes more sense to make state recovery a user setting because the user frontend is in full control of state recovery for a non-expired vetKey epoch. +In general, the state recovery limit can be both chat-level and user-level setting, but it makes more sense to make state recovery a user setting because the user frontend is in full control of state recovery for a non-expired vetKey epoch. [Note that the current MVP of encrypted-chat does not support this setting.] In the canister backend, the state recovery limit is used in three cases - all related to expired vetKey epochs: * Removal of expired caches. @@ -1175,4 +1174,4 @@ service : (text) -> { get_state_recovery_limit : (ChatId) -> (variant { Ok : ?MessageExpiryMins; Err : text }) query; update_my_symmetric_key_cache : (ChatId, VetKeyEpochId, EncryptedSymmetricRatchetCache) -> (variant { Ok; Err : text }); } -``` \ No newline at end of file +```