Exercise to get familiar with profiling and performance tuning in Go
In the data directory, there are input files that contain a JSON record on each line:
{"value":"... some string ..."}
{"value":"... some string ..."}
{"value":"... some string ..."}
{"value":"... some string ..."}
...
The objective is for your code to return the number of buffalos found within the record set, meaning, the number of records that contain the word "buffalo". You can assume that a record can only ever have a maximum of one buffalo.
You will be given an existing naive implementation. From there you should use performance profiling tools and your intuition to make improvements. You can make tweaks and use the benchmark tool to compare it to a given set of 'golden' implementations (source to be uploaded after completion of this exercise).
At the end of this exercise, I will go over the 'golden' implementations to discuss the performance characteristics as well how it ties into 'clear vs. clever', premature optimization pitfalls, and the importance of benchmarking.
Before you start, be sure to decompress the test files:
gunzip data/*.gz
main.go contains an implementation as a starting point.
Edit this code to make your performance improvements.
Binaries in the repo are pre-compiled for OSX
The run command will automatically build your attempt and run it against the 'golden' implementation,
reporting the run times or failure if your code is incorrect.
./run
You can also manually test your implementation
cat data/raw1000 | go run main.go
So how do I make the naive implementation more performant?
https://golang.org/pkg/runtime/pprof/ & https://godoc.org/net/http/pprof
go tool pprof --help
pprof is a tool that helps with analyzing cpu and memory profiles for Go applications.
It is comprised of two components: a CLI tool, and a package to generate the profiles.
In a web application you can use the net/http/pprof package to add endpoints that return cpu/memory profiles.
In an application without an HTTP interface, you can use the runtime/pprof to directly generate and store the profiles.
The prepared implementation in main.go already contains the necessary code to create cpu and memory profiles. Which can be generated with the following flag:
cat data/raw10000 | go run main.go -profile
You can use whichever data file you want, depending on the performance characteristics of your code, some trends may only appear on larger datasets.
This will generate files cpu.profile and mem.profile in your local directory.
Caveat -- on OSX, CPU profiling may not function correctly. I've had limited success but have found it easier to simply run your code within a Linux environment and pull profiles from there. Docker makes this super easy for us:
GOOS=linux go build -o attempt-linux . && \
docker run -it -v $(pwd):/code -w /code golang \
sh -c "cat data/raw1000000 | ./attempt-linux -profile"
Now you can feed those profiles into the go tool pprof command to get valuable insights.
Create a SVG displaying memory allocations:
go tool pprof -svg mem.profile
Create a SVG displaying CPU usage
go tool pprof -svg cpu.profile
We'll discuss these examples and what insights can be gleamed from the reports.
Commands to build the golden-* and run binaries (uses upx for reduced binary size)
go build -ldflags="-s -w" -o golden-simple ./cmd/golden-simple && \
go build -ldflags="-s -w" -o golden-complicated ./cmd/golden-complicated && \
go build -ldflags="-s -w" -o run ./cmd/run && \
upx --brute run