An overview of the soltuion is below.
There are two ways to install and configure the application: automated or manual. Only Ubuntu 20.04 is supported at this time.
- Go to the Azure Portal https://portal.azure.com/, and click Azure Cloud Shell (First Icon next to the search box on the top). If this is the first time you are using Azure Cloud Shell you might be asked to configure storage for this. Follow the configuration wizard.
- Clone the repository: Next step is to clone this repository using the following command:
git clone https://github.com/djdean/PythonSyntheaFHIRClient.git - Change directories to the deployment directory:
cd ./PythonSyntheaFHIRClient/IaC/ - Last step is to provision Infrastructure through the Bicep template. Provisioning needs to be initiated through the following commands.
The very first step is to create Service Account / Service Principal Name (SPN) for the application with the least privilege approach. You can specify 'Contributor' role at the resource group level but recommendation is to specify even lower permissions as needed.
az group create --name MyResourceGroupName --location MyLocation --subscription MySubscriptionId
az ad sp create-for-rbac --name MySPNName --role 'Contributor' --scopes /subscriptions/MySubscriptionId/resourceGroups/MyResourceGroupName --years 1
After creation, you need to collect the information about SPN and credentials. You can see secret value as the output of the command. Copy and note the secret. You also need ClientId and ObjectId. For that, search in Azure Portal in search box 'Azure Active Directory', select it, click 'App Registrations', 'Owned Applications', search for the name of application you just created, click the application and click through the 'Managed application in local directory'. Copy and note the value for the 'Application ID' (ClientID) and the ObjectId item for later use.
az deployment group create --resource-group MyResourceGroupName --template-file Synthea.bicep --parameters projectPrefix=specifyPrefix sqlServerLogin=specifySqlLogin sqlServerPassword=specifySqlPwd localAdminUserName='specifyVMLogin' localAdminPassword='specifyVMPwd' clientId='specifyClientId' objectId='specifyObjectId' clientSecret='specifyClientSecret' --subscription MySubscriptionId
Make sure you select good unique name for 'projectPrefix' to avoid name collisions for global names for Azure global resources. Select strong user name and password for SQL and VM credentials and provide information about SPN and its credentials. See previous section how to create SPN.
If manual configuration is needed, there are several steps which need to be followed in order to deploy the application. This guide assumes the following services have been configured and deployed already:
1) Azure Storage Account
2) FHIR Importer App
3) FHIR to Synapse Sync Agent
4) Azure API for FHIR
5) Azure Synapse Analytics
Once all services have been deployed in the portal, run the following steps to finish the deployment of the solution:
- Clone the repository: The first step is to clone this repository using the following command:
git clone https://github.com/djdean/PythonSyntheaFHIRClient.git - Run the environment setup script: Next, go to the "deployment/scripts" directory and run the following:
./setup_environment.sh
This will update the apt package manager and also install all the necessary packages needed to run the application. - Install synthea FHIR data generation tool: After environment setup is complete, run the following command (located in the "deployment/scripts" directory)
./install_synthea.sh
This will install and configure Synthea. - Configure and install the python FHIR client: This step requires setting several variables:
connection_string=<Some value>: The connection string for the Azure Storage Account created above.polling_interval=<Rate in seconds>: The rate, in seconds, for how frequently to check for new FHIR bundles.FHIR_output_path=<Local FHIR path>: The local directory to check for new FHIR bundles.local_output_path=<Local output path>: The local path for the Python daemon to output errors and uploaded data.log_path=<Local log path>: The local path to use to output log data.container_name=<Storage Account container>: The name of the container in the storage account to upload FHIR bundles. This should be the same container the FHIR Importer App, configured above, is monitoring.
Once all of the variables above have been set, run the following script (located in the "deployment/scripts" directory) to configure and deploy the Python daemon:./install_python_client.sh $connection_string $polling_interval $FHIR_output_path $local_output_path $log_path $container_name
Once installation has completed, there are three steps to start ingesting data.
- Log into the Azure portal and navigate to the virtual machine deployed into the resource group created as part of the deployment. Once there, click on the "Bastion" service on the left blade under the "Operations" heading. Deploy Bastion and then connect to the VM after deployment has completed.
- Once connected to the VM, run the following commands to change to the root user and "synthea" directory:
sudo bashcd /home/synthea - Go to the Synthea directory and generate a test patient using the commands:
cd synthea./run_synthea -p 1 - Change directories to the Python client and start the ingestion process using:
cd ../PythonSyntheaFHIRClient/python_client/./ingest.py deploy_config.json &
This will start the ingestion daemon as a background process where output messages will be written to the "log" file in the "PythonSyntheaFHIRClient/python_client/" directory. Any patients generated will be automatically uploaded to an Azure Storage account and moved to either the "uploaded" or "error" directories depending on the upload status in the "PythonSyntheaFHIRClient/python_client/out" directory. After reaching Azure, the data will be automatically ingested to the Azure API for FHIR and then extracted to the Azure Data Lake Storage account in parquet format.