Skip to content

Browser Support POC#145

Draft
langleyd wants to merge 6 commits intomainfrom
langleyd/wasm
Draft

Browser Support POC#145
langleyd wants to merge 6 commits intomainfrom
langleyd/wasm

Conversation

@langleyd
Copy link

@langleyd langleyd commented Feb 7, 2025

Why?

The purpose of this PR is to explore the feasibility of seshat on web to help unlock encrypted message search for element-web. The PR is probably a bit incomprehensible in its current state and obviously a lot more would have to be done to make it real but it wasn't terribly hard work(the design of seshat didn't really change). It will hopefully at least start the conversation and other engineers more experience in this area than I can help plot a path forward.

How does it work roughly?

Currently seshat is used by element desktop to index and store events(live and historical)and perform full text search recalling the event metadata associated with the message results.

For desktop to interface with seshat node bindings are provided.

Incoming events to be saved are indexed with tantivty(a search engine written in rust). This index is encrypted and persisted to the local file system as a collection of files.

Event metadata is saved to an encrpyted sql database(sqlcipher).

The 2 main flows are:

Save messages

  1. Historical or live message events are passed to seshat
  2. They are indexed by tantivy and are persisted to the file system(encrypted by our customer adapter)
  3. Event meta data is saved to an sqldatabase

Search messages

  1. The user types in a search query which is passed to seshat
  2. Seshat looks up the event_id in index using tantivy
  3. The message is fetched form the sql database(along with some search metadata)
  4. The search results are returned and displayed to the user

What are the hurdles to overcome?

🚧 🏃‍♂️1. Create wasm bindings to provide a portable component to be used by web and desktop clients

This would replace the need for the node bindings. I didn't encounter any major problems here other than some bugs in the packaging libraries that had workarounds and fixes coming.

Feasible: ✅

🚧 🏃‍♂️2. Seshat and tantivy are multithreaded and wasm doesn't natively support threads.

I sidestepped the threading code in seshat and made a small(1 or 2 hundred lines) amount of code changes in tantivy to to allow it work with with web worker based threads(as outlined in wasm-bindgen this has various caveats). To do this I replaced uses of std::thread with a rayon equivalent. I probably haven't done this in an entirely appropriate way but it was enough to get things running(I'll push up the fork shortly).

Another option here would be something like https://docs.rs/wasm-bindgen-spawn/latest/wasm_bindgen_spawn/ which:

uses the WebAssembly threads proposal and shared memory to communicate between workers (once they are started), instead of postMessage. The threads proposal is currently in phase 4 and available in Chrome, Firefox, Safari and Node.js

Feasible: 🟠 Possible, maybe even likely soon.

🚧 🏃‍♂️3. Seshat relies on a rustqlite which doesn't support wasm

To work around this I migrated the sql library in this PR from rustqlite to diesel so that wasm support could be added with sqlite-web-rs which seems to work, except that it doesn't support connection pooling(I'm not sure how much of a problem this is in practice as I've read that tantivy indexing is CPU-bound). sqlite-web-rs relies in OPFS.

I've also just noticed that progress has been made on adding wasm support to rusqlite which is both encouraging and unfortunate timing 😅 as most of the code is switching sql library.

wasm support for rusqlite is looking promising. Just single threaded for the moment so I've moved all the database activity to a dedicated thread.

An alternative to using sql is to abstract the persistence and use something like IndexedDB.

Feasible: ✅ With varying levels of investment/risk depending on the route taken.

🚧 🏃‍♂️4. We need sql the database file to be encrypted

This more a sub item of the previous hurdle in that it applies if SQL is selected.
I've made some progress getting SQLite3MultipleCiphers to work with sqlite-wasm-rs (which as mentioned above brings wasm support to rusqlite) to have db encryption.

Alternatively, It's unclear to me at this point whether sqlcipher is wasm compatible.

There was at least some partial success by others in hacking a solution but I don't know if there has been progress since then.

Feasible: ✅ Likely

🚧 🏃‍♂️5. We the tantivy index to be encrypted.

We already provide a custom implementation of the index persistence(to add encryption), I don't see why we couldn't change this to use OPFS.

Feasible: ✅ Likely

What does it look like?

I haven't get to integrated with element-web for an interactive demo just but in the worker in the example project shows how to add events and perform a search.

Here's a quick video of a result being returned from the search

seshat_wasm.mov

Other thoughts/considerations

  • Depending on how much investment it is to get this working for real, might it make more sense to invest the time in adding full test search to matrix-rust-sdk . E.g. It now has an event cache, can that play the role of the event db here?

- Remove rusqlite
- Add diesel with wasm backend
- import tantivy wasm fork from summa
- Create some wasm-bindgen bindings and test that can at least init tantivy and create some sql tables(no sqlcipher yet)
@bennofs
Copy link

bennofs commented Apr 10, 2025

Is there any way to contribute to this effort? I'd really like to see support for search in the PWA

@langleyd
Copy link
Author

Thanks for your interest @bennofs . This PR isn't my primary focus at the moment so will probably continue at the pace it has been so far. I did make some progress on hurdle 4 yesterday as updated in the description:

I've made some progress getting SQLite3MultipleCiphers to work with sqlite-wasm-rs (which as mentioned above brings wasm support to rusqlite) to have db encryption.

Contributions are welcome thanks.

The PR itself is a bit of a mess, I'll probably re-start from scratch at some point via a feature branch(I think it will be pretty hard/not possible to having a series of small PR's to main to keep the existing seshat and this wasm sheshat working concurrently. Until that is done it might be hard to contribute to the PR itself.

There will however need to be at least 2 upstream PRs done to to enable this work for real, notablly from the description of hurdles above:

  • Hurdle 2: A PR to tantivy to add multithreaded support for wasm by making my hacks into a real solution:

made a small(1 or 2 hundred lines) amount of code changes in tantivy to to allow it work with with web worker based threads(as outlined in wasm-bindgen this has various caveats). To do this I replaced uses of std::thread with a rayon equivalent. I probably haven't done this in an entirely appropriate way but it was enough to get things running(I'll push up the fork shortly).

  • Hurdle 4: A PR to sqlite-wasm-rs to add cipher support using SQLite3MultipleCiphers.

Contributions of either of those would be much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants