Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 45 additions & 91 deletions ModelOps Training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,30 @@
"cells": [
{
"cell_type": "markdown",
"id": "f6008b6e",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"\n",
"Ensure you have the following packages and python libraries installed \n",
"\n",
"```code\n",
"pip install teradataml==17.0.0.4 aoa==6.1.0 pandas==1.1.5\n",
"```\n",
"\n",
"The remainder of the notebook runs through the following steps\n",
"\n",
"- Connect to Vantage\n",
"- Create DDLs\n",
"- Import Data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"!{sys.executable} -m pip install aoa>=7.0.0rc3 pandas>=1.1.5"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0528bd6a",
"metadata": {},
"outputs": [
{
Expand All @@ -50,28 +51,35 @@
"host = input(\"Host:\")\n",
"username = input(\"Username:\")\n",
"password = getpass.getpass(\"Password:\")\n",
"database = input(\"Database (defaults to user):\")\n",
"\n",
"if not database:\n",
" database = username\n",
"\n",
"\n",
"engine = create_context(host=host, username=username, password=urllib.parse.quote(password), logmech=\"TDNEGO\")"
"engine = create_context(host=host, \n",
" username=username, \n",
" password=urllib.parse.quote(password), \n",
" logmech=\"TDNEGO\",\n",
" database=database)"
]
},
{
"cell_type": "markdown",
"id": "4eed19e0",
"metadata": {},
"source": [
"### Create DDLs\n",
"\n",
"Create the following tables \n",
"\n",
"- aoa_feature_metadata \n",
"- aoa_statistics_metadata \n",
"- aoa_byom_models\n",
"- pima_patient_predictions\n",
"\n",
"`aoa_feature_metadata` is used to store the profiling metadata for the features so that we can consistently compute the data drift and model drift statistics. This table can also be created via the CLI by executing \n",
"`aoa_statistics_metadata` is used to store the profiling metadata for the features so that we can consistently compute the data drift and model drift statistics. This table can also be created via the CLI by executing \n",
"\n",
"```bash\n",
"aoa feature create-stats-table -m <features-db>.<features-table>\n",
"aoa feature create-stats-table -e -m <statistics-metadata-db>.<statistics-metadata-table>\n",
"```\n",
"\n",
"`pima_patient_predictions` is used for storing the predictions of the model scoring for the demo use case"
Expand All @@ -80,7 +88,6 @@
{
"cell_type": "code",
"execution_count": 2,
"id": "9875d156",
"metadata": {},
"outputs": [
{
Expand All @@ -102,7 +109,7 @@
"# Also note, if a shared datalab is being used, only one user should execute the following DDL/DML commands\n",
"database = username\n",
"\n",
"create_features_stats_table(f\"{database}.aoa_feature_metadata\")\n",
"create_features_stats_table(f\"{database}.aoa_statistics_metadata\")\n",
"\n",
"get_context().execute(f\"\"\"\n",
"CREATE MULTISET TABLE {database}.aoa_byom_models\n",
Expand Down Expand Up @@ -131,7 +138,6 @@
},
{
"cell_type": "markdown",
"id": "b237d537",
"metadata": {},
"source": [
"### Import Data\n",
Expand All @@ -146,19 +152,30 @@
"\n",
"`pima_patient_diagnoses` contains the diabetes diagnostic results for the patients.\n",
"\n",
"`aoa_feature_metadata` contains the feature statistics data for the `pima_patient_features` and `pima_patient_diagnoses`\n",
"`aoa_statistics_metadata` contains the feature statistics metadata for the `pima_patient_features` and `pima_patient_diagnoses`\n",
"\n",
"Note the `pima_patient_feature` can be populated via the CLI by executing \n",
"\n",
"Compute the statistics metadata for the continuous variables\n",
"```bash\n",
"aoa feature compute-stats -s <data-db>.PIMA -m <features-db>.<features-table> -t continuous -c numtimesprg,plglcconc,bloodp,skinthick,twohourserins,bmi,dipedfunc,age \n",
"aoa feature compute-stats \\\n",
" -s <feature-db>.<feature-data> \\\n",
" -m <statistics-metadata-db>.<statistics-metadata-table> \\\n",
" -t continuous -c numtimesprg,plglcconc,bloodp,skinthick,twohourserins,bmi,dipedfunc,age\n",
"```\n",
"\n",
"Compute the statistics metadata for the categorical variables\n",
"```bash\n",
"aoa feature compute-stats \\\n",
" -s <feature-db>.<feature-data> \\\n",
" -m <statistics-metadata-db>.<statistics-metadata-table> \\\n",
" -t categorical -c hasdiabetes\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "07461699",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -196,16 +213,15 @@
" })\n",
"\n",
"# we can compute this from the CLI also - but lets import pre-computed for now.\n",
"df = pd.read_csv(\"data/aoa_feature_metadata.csv\")\n",
"df = pd.read_csv(\"data/aoa_statistics_metadata.csv\")\n",
"copy_to_sql(df=df, \n",
" table_name=\"aoa_feature_metadata\", \n",
" table_name=\"aoa_statistics_metadata\", \n",
" schema_name=database,\n",
" if_exists=\"append\")\n"
]
},
{
"cell_type": "markdown",
"id": "2b0cdd53",
"metadata": {},
"source": [
"## ModelOps UI\n",
Expand Down Expand Up @@ -242,7 +258,7 @@
" - Description: PIMA Diabetes\n",
" - Feature Catalog: Vantage\n",
" - Database: {your-db}\n",
" - Table: aoa_feature_metadata\n",
" - Table: aoa_statistics_metadata\n",
" - Features\n",
" - Query: `SELECT * FROM {your-db}.pima_patient_features`\n",
" - Entity Key: PatientId\n",
Expand All @@ -255,7 +271,6 @@
" - Database: {your-db}\n",
" - Table: pima_patient_predictions\n",
" - Entity Selection: `SELECT * FROM pima_patient_features WHERE patientid MOD 5 = 0`\n",
" - BYOM Target Column: `CAST(CAST(json_report AS JSON).JSONExtractValue('$.predicted_HasDiabetes') AS INT)`\n",
" \n",
" \n",
"- create training dataset\n",
Expand Down Expand Up @@ -307,84 +322,24 @@
},
{
"cell_type": "markdown",
"id": "17a64068",
"metadata": {},
"source": [
"#### View Predictions\n",
"\n",
"In the next version of ModelOps, you will be able to view the predictions that follow the standard pattern directly via the UI. However, for now, we can view it here. As the same predictions table contains the predictions for all the jobs, we filter by the `airflow_job_id`. You can find this id in the UI under deployment executions."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "904b2fb9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>job_id</th>\n",
" <th>PatientId</th>\n",
" <th>HasDiabetes</th>\n",
" <th>json_report</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [job_id, PatientId, HasDiabetes, json_report]\n",
"Index: []"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from teradataml import get_connection\n",
"\n",
"pd.options.display.max_colwidth = 250\n",
"\n",
"airflow_job_id = \"5761d5c1-bf57-456b-8076-c3062be0b544-scheduled__2022-07-11T00:00:00+00:00\"\n",
"In the UI, select a deployment from the deployments left hand navigation. Go to the Jobs tab and on the right hand side for each job execution, you can select \"View Predictions\". This will show you a sample of the predictions for that particular job execution.\n",
"\n",
"pd.read_sql(f\"SELECT TOP 5 * FROM pima_patient_predictions WHERE job_id='{airflow_job_id}'\", get_connection())"
"Note, your predictions table must have a `job_id` column which matches to the execution job id. If using BYOM, this is done automatically. For you own `scoring.py`, checkout the demo models."
]
},
{
"cell_type": "markdown",
"id": "d479c9cb",
"metadata": {},
"source": [
"## CLI \n",
"\n",
"\n",
"```bash\n",
"pip install aoa==6.1.0\n",
"pip install aoa>=7.0.0rc3\n",
"```\n",
"\n",
"##### Copy CLI Config\n",
Expand Down Expand Up @@ -437,15 +392,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b63bd4d5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -459,7 +413,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
"version": "3.9.12"
}
},
"nbformat": 4,
Expand Down
9 changes: 0 additions & 9 deletions data/aoa_feature_metadata.csv

This file was deleted.

10 changes: 10 additions & 0 deletions data/aoa_statistics_metadata.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"column_name","column_type","stats","update_ts"
twohourserins,continuous,"{""edges"": [0.0, 84.6, 169.2, 253.79999999999998, 338.4, 423.0, 507.59999999999997, 592.1999999999999, 676.8, 761.4, 846.0]}","2022-11-16 18:09:48.220000"
skinthick,continuous,"{""edges"": [0.0, 9.9, 19.8, 29.700000000000003, 39.6, 49.5, 59.400000000000006, 69.3, 79.2, 89.10000000000001, 99.0]}","2022-11-16 18:09:48.220000"
age,continuous,"{""edges"": [21.0, 27.0, 33.0, 39.0, 45.0, 51.0, 57.0, 63.0, 69.0, 75.0, 81.0]}","2022-11-16 18:09:48.220000"
hasdiabetes,categorical,"{""categories"": [""0"", ""1""]}","2022-11-16 20:01:01.790000"
plglcconc,continuous,"{""edges"": [0.0, 19.9, 39.8, 59.699999999999996, 79.6, 99.5, 119.39999999999999, 139.29999999999998, 159.2, 179.1, 199.0]}","2022-11-16 18:09:48.220000"
bmi,continuous,"{""edges"": [0.0, 6.71, 13.42, 20.13, 26.84, 33.55, 40.26, 46.97, 53.68, 60.39, 67.1]}","2022-11-16 18:09:48.220000"
numtimesprg,continuous,"{""edges"": [0.0, 1.7, 3.4, 5.1, 6.8, 8.5, 10.2, 11.9, 13.6, 15.299999999999999, 17.0]}","2022-11-16 18:09:48.220000"
dipedfunc,continuous,"{""edges"": [0.07, 0.31, 0.55, 0.78, 1.01, 1.25, 1.48, 1.72, 1.95, 2.19, 2.42]}","2022-11-16 18:09:48.220000"
bloodp,continuous,"{""edges"": [0.0, 12.2, 24.4, 36.599999999999994, 48.8, 61.0, 73.19999999999999, 85.39999999999999, 97.6, 109.8, 122.0]}","2022-11-16 18:09:48.220000"