Add llama.cpp LLM, brining support for locally hosted LLMs #8

Zetaphor · 2023-04-06T07:10:10Z

This PR adds the llama-cpp-python package which adds support for locally hosted language models.

This includes llama and GPT4All.

mpaepper · 2023-04-09T11:22:22Z

Awesome, thanks for adding @Zetaphor

I tested out the llama agent, but noted that it was much slower with that Python binding than when I ran llama.cpp directly.

There is also an issue about this here: abetlen/llama-cpp-python#49

Maybe there is a faster binding available?

Also I think we should have the same interface for the LLMs, for example the LLama one now doesn't support the stop token - could you add that in?

The parameters which are defined in the model are not passed on to the LLama class, so have no effect

chteau · 2023-04-09T12:01:12Z

Awesome, thanks for adding @Zetaphor

I tested out the llama agent, but noted that it was much slower with that Python binding than when I ran llama.cpp directly.

There is also an issue about this here: abetlen/llama-cpp-python#49

Maybe there is a faster binding available?

Also I think we should have the same interface for the LLMs, for example the LLama one now doesn't support the stop token - could you add that in?

The parameters which are defined in the model are not passed on to the LLama class, so have no effect

@mpaepper I am not sure if the python binding is entirely responsible for the high response time.
I tried using llama.cpp directly on my computer, and after two hours, it hasn't finished replying to my prompt yet. (Screenshot:

)

The binding llamma-cpp-python has still fulfilled my prompt in only 18 minutes (which is still a long time). Maybe the binding can be a bit slow, but I am sure my computer has something to do with it as well.

mpaepper · 2023-04-09T12:08:55Z

For me, llama.cpp runs a few seconds and with the binding it's more like 30 seconds

chteau · 2023-04-09T12:21:28Z

I see, it's a lot faster for you, I'm kind of confused why it is taking so much time on my side. The owner of the llama-cpp-python repository told me it was made with the correct optimisations.

For you, it might take longer because python and C++ are quite different in terms of execution time, since python is an interpreted language, unlike C++. It may be why it takes around 30 seconds. However, I am not sure if it's why, and I'm sure you know better than I do on this field.

Add llama.cpp LLM, brining support for locally hosted LLMs

7317726

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama.cpp LLM, brining support for locally hosted LLMs #8

Add llama.cpp LLM, brining support for locally hosted LLMs #8

Uh oh!

Zetaphor commented Apr 6, 2023

Uh oh!

mpaepper commented Apr 9, 2023 •

edited

Loading

Uh oh!

chteau commented Apr 9, 2023

Uh oh!

mpaepper commented Apr 9, 2023

Uh oh!

chteau commented Apr 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add llama.cpp LLM, brining support for locally hosted LLMs #8

Are you sure you want to change the base?

Add llama.cpp LLM, brining support for locally hosted LLMs #8

Uh oh!

Conversation

Zetaphor commented Apr 6, 2023

Uh oh!

mpaepper commented Apr 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chteau commented Apr 9, 2023

Uh oh!

mpaepper commented Apr 9, 2023

Uh oh!

chteau commented Apr 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mpaepper commented Apr 9, 2023 •

edited

Loading