Improving the Pi Searcher's speed by moving from C++ to Go

From the title, I hope you expect that there's a trick here, and, of course, there is.  I'll get to that in a second.  None of this is new - it's the same observation about the overhead of fork/exec in CGI that things like FastCGI were designed to address.  The reason I'm scribbling about it was the way in which moving to a "slower" language actually made it easier to use the right system architecture for better overall performance.

I created the Pi Searcher as a joke in 1996 or thereabouts.  In those days, we were still finding out about new and cool websites by word-of-mouth, and a guy named Arthur Bebak ran an email list called Netsurfer Digest.  In it, he spread news about cool new things happening on this "web" thing.  In one of his issues, he noted a ridiculously useless website, and asked, "What's next?  A site where you can search for your birthday in Pi?".

It had to happen.

Over the years, the Pi searcher has grown, adopting a series of algorithmic improvements to its searching mechanism:  First, a basic linear search (strstr() style) in an ASCII file of the digits of Pi.  That got too slow, so I stored the digits as binary-coded decimal packed using four bits for each digit.  I wrote a fast searcher that compared multiple digits at a time.  And then, watching my poor server strain under the load of Pi day one year, I finally moved to a suffix array index so that searches would take logarithmic time instead of linear time.  I rewrote the code in C++ a few years ago to clean it up, but it's pretty unchanged other than that.  As you might guess, it's pretty crufty.

I decided it would be fun to do a Google instant search-style version of the Pi searcher (still kinda beta and has some javascript bugs), but I was scared again of the load:  The Pi searcher runs as a CGI program invoked by Apache.  There's one fork/exec program spawn for every search request.  It now runs on an EC2 small instance, which isn't a very hefty machine.  Generating a search request for every character someone types in the search box looked like, on Pi days, it would destroy my machine.  It can only handle about 40 queries/second with the CGI model.

Enter Go.  I rewrote the Pi searcher from scratch using Go's net/http to be a persistent server.  I threw away my very fast, optimized search code and implemented some simpler ones.  I used lots of Go libraries - the json encoder, sort.search to handle the binary search on the suffix array, etc.  Instead of doing my own binary search, the code to find the start of a search string in the suffix array now looks like this, with the code basically taken from the way it's done in the Go suffix array search implementation (I had to write my own because I still store the digits packed two digits per byte and decode them on the fly, in order to save precious memory on EC2):

   i := sort.Search(int(pisearch.numDigits), func(i int) bool {
          return pisearch.Compare(pisearch.idxAt(i), searchkey) >= 0
   })

[Edit:  I've released the source code on github.  Happy pi searching.]

Let's not talk about how many bugs I'd introduced through various errors in my original C++ version of the packed suffix array search.   The results, in total wordcount lines of code, comparing the Go version from the JSON-outputting hacked up C++ version:

FunctionC++ Version LoCGo version LoC
Main
222
147
Searching
410
203
Utils
133
0
Total
765
350

And remember that the new one is a standalone web server, not just a forked program.  Both implement the same core functionality:
  1. mmap the pi file (200MB)
  2. mmap the index file (800MB)
  3. binary search in the index to locate the start and end of the result range
  4. locate the lowest-position entry in the range that satisfies the search criteria
I didn't want to ditch Apache, so I set things up with Apache simply forwarding the new searcher URL (linked with a sample query) on to the persistent Go program.  This cuts its raw speed in half, but the performance is still over an order of magnitude better than the fork/exec version, despite not implementing any of the more serious search optimizations in the Go version (and having written it in Go. :-):

Apache bench results:
C++ fork/exec version:
Requests per second:    45.76 [#/sec] (mean)
Time per request:       21.851 [ms] (mean, across all concurrent requests)

Go persistent version going through Apache reverse proxy:
Requests per second:    607.67 [#/sec] (mean)
Time per request:       1.646 [ms] (mean, across all concurrent requests)

Update May 21, 2013:  Go1.1:  I recompiled the Pi Searcher with the just-released go1.1 code, and things are slightly faster now:
requests per second:    823.89 [#/sec] (mean)

Woo, free speedup!

Is this anything I couldn't have done in C++?  No, but writing persistent, efficient servers in C++ when you really want them to keep running forever takes work.  More work than I wanted to throw at the Pi searcher this year (it's a work of love and humor, not a business).  But doing it in Go was fun, reduced the software engineering burden substantially, and I'm pretty sure it's not going to crash or run out of memory on me, which is more than I can say about the C++ version.  That difference in turn translated to a big picture speed difference of an order of magnitude, despite the actual low-level search code being (substantially) slower.

Popular posts from this blog

  • It's been a year and a half since I migrated my undergrad distributed systems course (15-440) to the Go programming language .  I'm still glad I did it and will keep it there, but we've also learned some important lessons about how to best use it in our context that I hope I'll be able to improve upon next semester.   Hopefully, some of this may be useful for those of you considering moving to or creating a Go-based class. The big points I hope to get across are:  (1)  Go is really  good for distributed systems;  (2)  Students need more intro to the language than I thought;  (3)  The language has matured a lot, but there are some technical glitches remaining that you'll have to work around. Go is good for teaching (and building) distributed systems I believe this for two reasons: Go provides a good mix of high-level and low-level functionality. Channels and goroutines provide a program structure that works well for distributed programming. The importance
    Keep reading
  • Over the past three weeks, I've been running crypto-coin mining software on between 20 and 60 Amazon EC2 GPU instances at a (very small) profit, with the goal of understanding the currencies, exchanges, and algorithms in more detail.  This is post #1 of 2 discussing scrypt coin mining on Amazon.  In the follow-up post, I'll go into the algorithmic and engineering details of building a better CUDA-based miner. A few weeks ago, Gün and colleagues wrote a paper describing a new collusion attack against Bitcoin .  I hadn't thought much about BTC before that, but he got me curious.  As I explored, I discovered Litecoin ,  an alternative to Bitcoin that is designed to reduce the comparative advantage of using custom ASICs (or GPUs) for mining it relative to using a conventional CPU.  Its future is even less certain that BTC, of course, as a later comer, but the technically interesting bits lie in its proof-of-work hash function:   Scrypt .  Scrypt is designed to be "memory
    Keep reading
  • I woke up on May 28th, 2014, on vacation with my family in the middle of the desert, to find a copy of my private source code plastered across the bitcointalk message board.  Announced as a "new optimized version" of the Monero currency miner, it was enthusiastically adopted by cryptocurrency miners across the world.  And in the process of doing so, my daily profit from the Monero Mining Project dropped by over five thousand dollars per day. But let's start at the beginning, when I started getting in to a loose collaboration with three people I've never met---one whose name I'm not even confident I really know---with hundreds of thousands of dollars worth of a nebulous new cryptocurrency at stake. It started with a cryptic note from someone I'd met online, with a link to a bitcointalk.org message board discussion for a new currency called "bitmonero".  His note said only:  "this looks interesting." From prior collaborations with him
    Keep reading