Conversation
- Remove rusqlite - Add diesel with wasm backend - import tantivy wasm fork from summa - Create some wasm-bindgen bindings and test that can at least init tantivy and create some sql tables(no sqlcipher yet)
…the db thread not running.
|
Is there any way to contribute to this effort? I'd really like to see support for search in the PWA |
|
Thanks for your interest @bennofs . This PR isn't my primary focus at the moment so will probably continue at the pace it has been so far. I did make some progress on hurdle 4 yesterday as updated in the description:
Contributions are welcome thanks. The PR itself is a bit of a mess, I'll probably re-start from scratch at some point via a feature branch(I think it will be pretty hard/not possible to having a series of small PR's to main to keep the existing seshat and this wasm sheshat working concurrently. Until that is done it might be hard to contribute to the PR itself. There will however need to be at least 2 upstream PRs done to to enable this work for real, notablly from the description of hurdles above:
Contributions of either of those would be much appreciated. |
Why?
The purpose of this PR is to explore the feasibility of seshat on web to help unlock encrypted message search for element-web. The PR is probably a bit incomprehensible in its current state and obviously a lot more would have to be done to make it real but it wasn't terribly hard work(the design of seshat didn't really change). It will hopefully at least start the conversation and other engineers more experience in this area than I can help plot a path forward.
How does it work roughly?
Currently seshat is used by element desktop to index and store events(live and historical)and perform full text search recalling the event metadata associated with the message results.
For desktop to interface with seshat node bindings are provided.
Incoming events to be saved are indexed with tantivty(a search engine written in rust). This index is encrypted and persisted to the local file system as a collection of files.
Event metadata is saved to an encrpyted sql database(sqlcipher).
The 2 main flows are:
Save messages
Search messages
What are the hurdles to overcome?
🚧 🏃♂️1. Create wasm bindings to provide a portable component to be used by web and desktop clients
This would replace the need for the node bindings. I didn't encounter any major problems here other than some bugs in the packaging libraries that had workarounds and fixes coming.
Feasible: ✅
🚧 🏃♂️2. Seshat and tantivy are multithreaded and wasm doesn't natively support threads.
I sidestepped the threading code in seshat and made a small(1 or 2 hundred lines) amount of code changes in tantivy to to allow it work with with web worker based threads(as outlined in wasm-bindgen this has various caveats). To do this I replaced uses of std::thread with a rayon equivalent. I probably haven't done this in an entirely appropriate way but it was enough to get things running(I'll push up the fork shortly).
Another option here would be something like https://docs.rs/wasm-bindgen-spawn/latest/wasm_bindgen_spawn/ which:
Feasible: 🟠 Possible, maybe even likely soon.
🚧 🏃♂️3. Seshat relies on a rustqlite which doesn't support wasm
To work around this I migrated the sql library in this PR from rustqlite to diesel so that wasm support could be added with sqlite-web-rs which seems to work, except that it doesn't support connection pooling(I'm not sure how much of a problem this is in practice as I've read that tantivy indexing is CPU-bound). sqlite-web-rs relies in OPFS.I've also just noticed that progress has been made on adding wasm support to rusqlite which is both encouraging and unfortunate timing 😅 as most of the code is switching sql library.wasm support for rusqlite is looking promising. Just single threaded for the moment so I've moved all the database activity to a dedicated thread.
An alternative to using sql is to abstract the persistence and use something like IndexedDB.
Feasible: ✅ With varying levels of investment/risk depending on the route taken.
🚧 🏃♂️4. We need sql the database file to be encrypted
This more a sub item of the previous hurdle in that it applies if SQL is selected.
I've made some progress getting SQLite3MultipleCiphers to work with sqlite-wasm-rs (which as mentioned above brings wasm support to rusqlite) to have db encryption.
Alternatively, It's unclear to me at this point whether sqlcipher is wasm compatible.
There was at least some partial success by others in hacking a solution but I don't know if there has been progress since then.
Feasible: ✅ Likely
🚧 🏃♂️5. We the tantivy index to be encrypted.
We already provide a custom implementation of the index persistence(to add encryption), I don't see why we couldn't change this to use OPFS.
Feasible: ✅ Likely
What does it look like?
I haven't get to integrated with element-web for an interactive demo just but in the worker in the example project shows how to add events and perform a search.
Here's a quick video of a result being returned from the search
seshat_wasm.mov
Other thoughts/considerations