Skip to content

Conversation

@Zetaphor
Copy link
Contributor

@Zetaphor Zetaphor commented Apr 6, 2023

This PR adds the llama-cpp-python package which adds support for locally hosted language models.

This includes llama and GPT4All.

@mpaepper
Copy link
Owner

mpaepper commented Apr 9, 2023

Awesome, thanks for adding @Zetaphor

I tested out the llama agent, but noted that it was much slower with that Python binding than when I ran llama.cpp directly.

There is also an issue about this here: abetlen/llama-cpp-python#49

Maybe there is a faster binding available?

Also I think we should have the same interface for the LLMs, for example the LLama one now doesn't support the stop token - could you add that in?

The parameters which are defined in the model are not passed on to the LLama class, so have no effect

@chteau
Copy link

chteau commented Apr 9, 2023

Awesome, thanks for adding @Zetaphor

I tested out the llama agent, but noted that it was much slower with that Python binding than when I ran llama.cpp directly.

There is also an issue about this here: abetlen/llama-cpp-python#49

Maybe there is a faster binding available?

Also I think we should have the same interface for the LLMs, for example the LLama one now doesn't support the stop token - could you add that in?

The parameters which are defined in the model are not passed on to the LLama class, so have no effect

@mpaepper I am not sure if the python binding is entirely responsible for the high response time.
I tried using llama.cpp directly on my computer, and after two hours, it hasn't finished replying to my prompt yet. (Screenshot:
image
)

The binding llamma-cpp-python has still fulfilled my prompt in only 18 minutes (which is still a long time). Maybe the binding can be a bit slow, but I am sure my computer has something to do with it as well.

@mpaepper
Copy link
Owner

mpaepper commented Apr 9, 2023

For me, llama.cpp runs a few seconds and with the binding it's more like 30 seconds

@chteau
Copy link

chteau commented Apr 9, 2023

I see, it's a lot faster for you, I'm kind of confused why it is taking so much time on my side. The owner of the llama-cpp-python repository told me it was made with the correct optimisations.

For you, it might take longer because python and C++ are quite different in terms of execution time, since python is an interpreted language, unlike C++. It may be why it takes around 30 seconds. However, I am not sure if it's why, and I'm sure you know better than I do on this field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants