data-coordinator is a set of related services, libraries, and scripts that takes data upserts, inserts them to the truth store, watches truth store logs and writes the data to the secondary stores.
- coordinator - contains the REST service for the data coordinator, as well as services for secondary watchers, various scripts including backup
- coordinatorlib
- common utilities and data structures for working with truth store and secondaries, at a lower level
- database migrations are in src/main/resources
- dummy-secondary - a dummy secondary store implementation for testing only
- secondarylib - trait for Secondary store
To run the tests, from the SBT shell:
project data-coordinator
test:test it:testTo run the data coordinator, from the regular Linux/OSX shell prompt:
bin/start_dc.shThe above scripts builds the assembly if its not present and runs the fat jar on the command line, which is much more memory efficient than running it from sbt. If you need to force a rebuild, simply run sbt clean beforehand.
If you haven't already, edit your /etc/soda2.conf file to remove the contents of the com.socrata.coordinator.common.secondary.instances section. It should resemble something like this:
com.socrata.coordinator.common = {
database = ${common-database} {
app-name = "data coordinator"
database = "datacoordinator"
}
instance = ${data-coordinator-instance}
secondary {
...
instances {}
...
}
From within the soql-postgres-adapter repository, start secondary watcher like this:
sbt clean assembly
java -Djava.net.preferIPv4Stack=true -Dconfig.file=/etc/pg-secondary.conf -jar store-pg/target/scala-2.10/store-pg-assembly-3.1.4-SNAPSHOT.jarTo run migrations in this project from SBT:
sbt -Dconfig.file=/etc/soda2.conf "coordinator/run-main com.socrata.datacoordinator.primary.MigrateSchema migrate"Alternatively, to build from scratch and run migrations:
sbt clean
bin/run_migrations.shTo run migrations without building from scratch: bin/run_migrations.sh
The command is one of migrate, undo, redo, and there is a second optional parameter for undo for the number of changes to roll back.
Running from sbt is recommended in a development environment because it ensures you are running the latest migrations without having to build a new assembly.
Below is a copy of the email distributed to engineering when breaking changes were made to the secondary watcher architecture:
Hi All, The secondary architecture has been inverted (thank you @robert.macomber). The secondaries (
pg,geocoding) are no longer dynamically loaded as jar files insecondary-watcher. But, instead are now their own executable andsecondary-watcheris now a library that they use.
The install and start scripts in
docs/onramphave been updated (pending merge) -- they also now include the geocoding / region coding secondary.
How to update your stack:
- Pull main of
data-coordinator,soql-postgres-adapter, andgeocoding-secondary(if you wish)- Fetch
docsand check-out branchesen-7807andaerust/en-7807respectively (branches aren't quite merged but functioning).- Run
sbt assemblyfor all of the above scala projects.- Update your
/etc/soda2.conffile;com.socrata.coordinator.common.secondary.instancesshould be empty (but still needs to be there :( ). Copy the new config files for the secondaries over to/etc:
sudo cp $DEV_DIR/docs/onramp/services/pg-secondary.conf /etc/ sudo cp $DEV_DIR/docs/onramp/services/geocoding-secondary.conf /etc/
If you want to use the geocoding secondary you will need to add a MapQuest app-token to the config and add it to the
secondary_stores_configtable indatacoordinator(truth).INSERT INTO secondary_stores_config (store_id, next_run_time, interval_in_seconds, is_feedback_secondary) VALUES( 'geocoding', now(), 5, true);
- You can now run the secondary stores as their own executable. (See `docs/onramp/start.sh for specifics)
- You can delete your
~/secondary-storesdirectory 🎉
Thanks, Alexa
To update the library:
- Make a PR, get it approved and merge your commits into main.
- From main, run
sbt release. This will create two commits to bump the version and create a git tag for the release version and then push them to the remote repo. - The Jenkins job for the main branch will build the stages "Check for Version Change" and "Publish Library" to publish the library.