This application is the backend server for the PhotonRanch Datalab. It is a django application with a REST API for communicating with the Datalab UI.
- Create a virtualenv for this project and entering it:
python -m venv ./venv
source ./venv/bin/activate
- Install pyproject.toml:
poetry install
- If the previous step fails to install check your python version is not
>= 3.13if so switch to3.11,brew install python@3.11 poetry env use <<install/path/to/python3.11>> poetry shell pip install -e .
- Run the migrations command to setup the local sqlite database
./manage.py migrate
- Create a Django superuser, for convenience you can use your LCO credentials
python manage.py createsuperuser
- Start the Django app and navigate to the /admin panel (e.g http://127.0.0.1:8000/admin)
./manage.py runserver
Django version 4.2.20, using settings 'datalab.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C
- Set up LCO credentials. Navigate to the Auth Profile's Tab and create an authuser from your superuser account and add your LCO archive api token to the token field
- Log into AWS on the browser and navigate to IAM roles to find your user profile. On your profile you should create/use an access and secret-access key that is your personal token to talk to aws. Permissions to access the datalab bucket will need to be requested from the Datalab Dev team (Jon, Lloyd, Carolina) as of Jun 2025.
- Once you have your Access Key and Secret Access Key from a datalab dev, run the configure command, and then confirm proper configuration with the
get-caller-identitycommand
> aws configure
AWS Access Key ID [****************UVN2]:
AWS Secret Access Key [****************d7X5]:
Default region name [us-west-2]:
Default output format [json]:
> aws sts get-caller-identity
{
"UserId": "****************TIFAX",
"Account": "********0537",
"Arn": "arn:aws:iam::********0537:user/datalab-server"
}
- Finally Restart your machine to update it's aws credentials cache
- Start up a Redis Server that will faciliate caching as well as the rabbitmq queue. To do this make sure you have Redis installed and then start a server at port 6379
// run in shell
redis-server
// run in background
brew services start redis
- Start the dramatiq worker threads
./manage.py rundramatiq --processes 1 --threads 2
- Start the Django server
./manage.py runserver
For this mode of development, you must install:
Then to develop, run these commands:
nix develop --impureto start your nix development environment - called anytime you use a new terminalctlptl apply -f local-registry.yaml -f local-cluster.yamlto start up the registry and cluster - should only need to be called one time within the nix environmentskaffold dev -m depsto start the dependencies - run this in a different tab to keep running during development or use 'run' instead of 'dev'- Copy
./k8s/envs/local/secrets.env.changemeto a version without.changemeand fill in values for connecting to the appropriate services. skaffold dev -m app --port-forwardto start the servers and worker. This will auto-redeploy as you make changes to the code.- Follow steps 4. - 9. in The Bare Metal Development section to setup Django auth, LCO creds, and AWS creds
You can also run a local datalab-ui to connect to your datalab.
- Change the
./public/config/config.json"datalabApiBaseUrl"to behttp://127.0.0.1:8080/api/or wherever your backend is deployed to npm installto install the librariesnpm run serveto start the frontend
The application has a REST API with the following endpoints you can use. You must pass your user's API token in the request header to access any of the endpoints - the headers looks like {'Authorization': 'Token 123456789abcdefg'} if you are using python's requests library.
Datasessions can take an input_data parameter, which should contain a list of data objects. The current format is described below, but this is probably something that will evolve as we learn more how we are using it.
session_input_data = [
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010332'
},
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010333'
},
]
Data operations can have a varying set of named keys within their input_data that is specific to each operation. For example it would look like this for an operation that just expects a list of files and a threshold value:
operation_input_data = {
'input_files': [
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010332'
}
],
'threshold': 255.0
}
POST /api/datasessions/
post_data = {
'name': 'My New Session Name',
'input_data': session_input_data
}
GET /api/datasessions/
GET /api/datasessions/datasession_id/
DELETE /api/datasessions/datasession_id/
Available Operations are introspected from the data_operations directory and must implement the BaseDataOperation class. I expect we will add more flesh to those classes when we actually start using them.
GET /api/datasessions/datasession_id/operations/
POST /api/datasessions/datasession_id/operations/
post_data = {
'name': 'Median', # This must match the exact name of an operation
'input_data': operation_input_data
}
DELETE /api/datasessions/datasession_id/operations/operation_id/