Skip to content

Refactor for UX #3

@vyeevani

Description

@vyeevani

The biggest usability problem right now is the massive latency. On an M1 Pro MacBook Pro, each inference can take as long as a second. This is a very poor user experience.

It appears that the vast majority of the latency is coming from the generation stage of the transformer rather than just the forward itself as I previously thought.

This leads me to a potential upgrade to the user experience. If I know that the suggestions that the model is making are really bad, then I should be able to immediately start typing and prevent the model from wasting time rolling out bad suggestions. If I think it's doing fairly well, then I should be fine to let it run until I don't like it anymore, tab and this would stop it from generating more. By giving the user visibility into what the model is thinking and what it's doing, we hide the latency in the user's processing of the model's output, i.e reduce the time the user is sitting around getting angry at the extension.

Architecture pending

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions