ML/LlamaIndex: Add software tests and CI configuration#707
ML/LlamaIndex: Add software tests and CI configuration#707simonprickett merged 2 commits intomainfrom
Conversation
| CRATEDB_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true" | ||
| CRATEDB_SQLALCHEMY_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true" |
There was a problem hiding this comment.
CRATEDB_SQLALCHEMY_URL is the designated environment variable name for the SQLAlchemy connection string, contrary to CRATEDB_HTTP_URL, which is suitable for the Python DB API or crash, for example.
In order to not use different environment variables on our tutorials and educational material, let's adjust this to adhere to that convention.
| CREATE TABLE IF NOT EXISTS time_series_data ( | ||
| timestamp TIMESTAMP, | ||
| value DOUBLE, | ||
| location STRING, | ||
| sensor_id INT | ||
| ); | ||
|
|
||
| INSERT INTO time_series_data (timestamp, value, location, sensor_id) | ||
| VALUES | ||
| ('2023-09-14T00:00:00', 10.5, 'Sensor A', 1), | ||
| ('2023-09-14T01:00:00', 15.2, 'Sensor A', 1), | ||
| ('2023-09-14T02:00:00', 18.9, 'Sensor A', 1), | ||
| ('2023-09-14T03:00:00', 12.7, 'Sensor B', 2), | ||
| ('2023-09-14T04:00:00', 17.3, 'Sensor B', 2), | ||
| ('2023-09-14T05:00:00', 20.1, 'Sensor B', 2), | ||
| ('2023-09-14T06:00:00', 22.5, 'Sensor A', 1), | ||
| ('2023-09-14T07:00:00', 18.3, 'Sensor A', 1), | ||
| ('2023-09-14T08:00:00', 16.8, 'Sensor A', 1), | ||
| ('2023-09-14T09:00:00', 14.6, 'Sensor B', 2), | ||
| ('2023-09-14T10:00:00', 13.2, 'Sensor B', 2), | ||
| ('2023-09-14T11:00:00', 11.7, 'Sensor B', 2); | ||
|
|
||
| REFRESH TABLE time_series_data; |
There was a problem hiding this comment.
The new init.sql file could also be used within the README to be provisioned by using an incantation to crash.
There was a problem hiding this comment.
Let's keep this here as it lines up better with the format of the tutorial on Discourse.
| langchain-openai<0.3 | ||
| llama-index<0.12 | ||
| llama-index-embeddings-langchain<0.3 | ||
| llama-index-embeddings-openai<0.3 | ||
| llama-index-llms-azure-openai<0.3 | ||
| llama-index-llms-openai<0.3 | ||
| sqlalchemy-cratedb |
There was a problem hiding this comment.
After cleaning up dependencies, those are apparently all we need to define.
| from cratedb_toolkit.io.sql import DatabaseAdapter | ||
|
|
||
| HERE = Path(__file__).parent | ||
|
|
||
|
|
||
| @pytest.fixture() | ||
| def cratedb() -> DatabaseAdapter: | ||
| return DatabaseAdapter(dburi="crate://crate@localhost:4200") | ||
|
|
||
|
|
||
| @pytest.fixture(scope="function", autouse=True) | ||
| def init_database(cratedb): | ||
| """ | ||
| Initialize database. | ||
| """ | ||
| cratedb.run_sql("DROP TABLE IF EXISTS time_series_data;") | ||
| cratedb.run_sql((HERE / "init.sql").read_text()) |
There was a problem hiding this comment.
@simonprickett: You have been looking at more high-level database interfaces to CrateDB, beyond what standard interfaces (DB API, xDBC, CLI) provide, right?
At least on the Python side, there are a few SDK-like layers in CTK, to be used in occasions like those, or others. They are intended to minimize lines of code needed to conduct certain repetitive procedures, by adding a few bells and whistles here and there. Up until now, the procedures are both originating from, and focusing on support for writing notebooks, demo snippets, and software tests.
Do you like that style of providing value-added details on top of the driver layer, i.e. does it resonate with your ideas in any way?
There was a problem hiding this comment.
from cratedb_toolkit.io.sql import DatabaseAdapterCTK's DatabaseAdapter is effectively a little wrapper around a few fundamentals, also using SQLAlchemy, in this spirit very similar to the SQLDatabase wrappers of LangChain and LlamaIndex.
There was a problem hiding this comment.
Note that LangChains also bears another layer on top of its SQLDatabase wrapper, the SQLDatabaseLoader.
That one is also pretty generic, so it can be counted in on the enumeration of re-usable high-level components to interact with CrateDB, mostly powered by SQLAlchemy in one way or another.
It has been coming from those contributions, broken out of crate-workbench/langchain#1:
9a17ef1 to
34e3a4d
Compare
|
Do you agree with the changes here, @simonprickett, so we can toggle the badge on the Build Status page into green? |
simonprickett
left a comment
There was a problem hiding this comment.
I'm OK with this - the tutorial on Discourse will need updating once this has been merged. I'll take that on.
About
A followup on GH-691.
References