Disclaimer: This article is my experience and opinions and not that of my employer :)
At work, we have a service which holds polygons for countries, cities and even neighbourhoods we operate in, and it is the job of this service, to return the underlying city/country and other internal data like translations and some business data, given a coordinate.
We solved this problem the tried and tested way of indexing the polygons in a postgres server with the very powerful postGIS plugin. And this served us very well. Provided we kept scaling up postgres to meet our traffic, we had really great postgres latency, usually less than 20ms. Unfortunately we kept having to scale up postgres more and more, while the traffic the service handle was expected to grow more and more. As a result, we set out to find alternatives.
Redis — Geospatial.
The first option we explored, was to take advantage of our redis clusters, and use redis to cache this geospatial data. Unfortunately to the best of my knowledge, there was no way to index polygons. At best, we could store the geohashes and query based on geohashes, or take advantage of the redis geospatial feature set, which only seemed to support radius queries and not point in polygon queries. This would not solve our usecase. Exploring more, we found a redis fork https://github.com/tidwall/redis-gis, which had been created to solve this problem. But unfortunately the project was archived and discontinued. Further investigation led us to Tile38.
Tile38 is the improvement to the redis-gis. It seemed to solve our particular usecase. We could have a service similar to redis with great replication policies and a great feature set.
I was also drawn to tile38 because I had used other libraries by the author (github.com/tidwall), most particularly his burntdb project, which I had studied in the past as part of a code review/study club I help organize outside of work. So I knew it would be solid, just based off my experience with burntdb.
We implemented and tested this out with good results locally. Unfortunately, due to some limitation in our infrastructure we couldn’t get this in production as quickly. While waiting to get tile38 in our infra, another team presented on the success they had with preloading and serving polygons in-memory using the s2 system, and we figured maybe caching in-memory would be a good approach for us.
Another reason why caching in-memory was very attractive is that within this services, we had other endpoints which needed to make some calculations based on these polygons, and sometimes for a single request, would perform 2–10 similar database polygon calls. So, even though having tile38 is great, we would pay the (little) latency costs of performing these requests over the network 10x for each request. So having something in-process would remove the latency costs. This lead us to exploring other alternatives.
Before this in-memory exploration, we already had a sort of in-memory polygon lookup system which had been running for months with no problems and no notice. This system handled querying of country data. Basically we have polygons and data for the roughly 250 countries. This data gets pulled from the database when each docker pod spins up, and when a country request comes in, we loop through the list of countries and perform a point in polygon calculation for each polygon in the list of 250 countries, until we find the right country. This means finding a country would have a complexity of O(n), which is not so great, but then considering n is 250, this worked really nicely for us.
Also, because the list is loaded on startup, and the only operations on the list were read operations, there was no need for mutex locking over each operation, and hence a less complex system with less potential issues due to lock contention. We still have this system working for the countries data, and have no need to change it yet.
For the actual point in polygon calculation, the maths involved is really simple and involves subtracting the polygon edges from the point we’re querying for, to calculate whether the point exists within the given polygon.
The point in polygon query can then be optimized by doing a point in bounding box query, and then only doing the full point in polygon query if the point is in the bounding box. Example implementation:
I don’t fully understand this approach and what differentiates it from the regular point in polygon calculations. But from my understanding, you would still need to loop through all the polygons and perform the checks for each polygon at a time. Here is an example of it’s implementation:
There is an interesting article which visualizes how the point in polygon queries work, incase this interests you: https://observablehq.com/@tmcw/understanding-point-in-polygon
Google S2[Quad trees]
Following the work of another team in the company, I was drawn towards implementing our in-memory geospatial caching via the google s2 library(https://github.com/golang/geo), as this was implemented by the teams involved, with good results. Actually, these results from the teams was what gave us the confidence that it would be feasible to cache this data in memory. in their usecase, they preloaded roughly 10,000 polygons into a machine on startup, and then simply queried this data with no need to update the data except for periodic refreshes. What was most interesting was that they were able to fit all this data into a service which was using less than 1.5gb of memory while handling their not so trivial traffic.
For our own usecase, we had <20,000 polygons and were expecting this number to even triple as we are working on support polygons for areas and even neighbourhoods we operate in. So, keeping all the polygons in memory would likely be challenging, but at least this gave us an idea that it would not need an astronomic amount of memory.
Unfortunately, implementing s2 for our usecase was quite challenging for 2 reasons:
Converting our geojson polygon to s2 polygon representation wasn’t straightforward
While exploring the google s2 golang library, we were able to index a polygon with a bit of pain, but in little time. The pain was due to having to convert our geojson polygons which were in the paulmach/orb format, into the google s2 polygon format. Since I couldn’t find articles or documentation on how to do this, I had to do some trial an error to eventually convert our polygons to s2 polygons in a way that it is deemed valid by s2.
S2 — ShapeIndex only returns the shape without the identifier
After getting the polygon, I first tried using an s2 shape index to index this polygon with no issues. The issue was how to get back shape id which I can associate with a city. Instead, I was able to query and get back the shape, but I had no way of knowing what city is associated with that shape. I had some ideas such as hashing the shape and matching the shape to the city based on this hash, but without a lot of testing I can’t tell for sure that this hash would always be the same for a given shape. And it just felt a bit more complicated than necessary. In the end, I couldn’t find any way to get this generated id.
Like you would see in the screenshot, adding a polygon to the shapeIndex returns an `id` for the shape. But when I query for the shape using the
NewContainsPointQuery , I get back the shape but couldn’t find anyway to get the orginal `id`. While exploring the s2 documentation I noticed that the more correct solution to the problem I was solving is the cell index.
S2 — Cell Index
The cell index allows me to index an s2 cell or cell union along with a label. The best part is that we can run queries and actually get back the label, which we can associate with a city. The only issue is that this index doesn’t support shapes like polygons, and we have to convert our polygons into cell unions, which are a combination of cells or squares that cover the polygon.
Basically, s2 cells are gotten when the earth geometry is recursively divided into 4, until a point that is sufficient for covering our polygons. If you’re familiar with geohashes, this is fairly similar to geohashes, except that s2 takes into consideration the curvature of the earth, where geohashes do not. Each cell has a unique 64bit cell id which is an identifier for the given cell
The challenge with this solution is that the indexes are static. After an index is built, it can’t be changed. We can’t add or remove shapes, which makes this a difficult solution to use as a caching layer, which by nature involves adding and deleting items regularly. I can think of workarounds like reindexing the entire index when anything changes, but this would require a lot of locking and it’s hard to tell how slow the indexing is for large workloads, eg 10,000 cities, without any testing. And is it even practical to reindex 10,000 cities each time a city expires and needs to be changed? Probably it would have been possible to reindex the cities in batches, for example every hour. But this limitation pushed us to consider the next option.
Rtree is another popular option geospatial index. It results in slightly slower lookups than s2, but seemed like a simpler tradeoff between complexity and performance for our use case. It is also a popular option which has been the defacto spatial indexing solution in databases before s2 came around. PostGIS and even tile38 are both implemented on top of an RTree. From my investigation so far, Rtrees do not really support polygons, and are only able to index rectangles. They then build a sorted tree structure representing a hierarchy of the polygons indexed, so that we don’t necessarily need to walk the entire list to find a shape. Basically, making searches more like a binary search O(log n)
We decided to use the tidwall/rtree library, since it is the library used behind the scenes by tile38, and hence is actually tried and tested in the real work.
To cache the metadata and also handle TTL expirations, we used DGraph’s ristretto which we’ve used with great success in other places. It has great performace compared to other inmemory caching systems we tried. So, check it out if you need to cache something in memory.
Our approach for implementation was:
- Calculate and index the city’s rectangular bounds in the Rtree index against the city slug (or other identifier). Note, we only index the rectangular bounds and not the actual city polygon.
- For that same city, index the city metadata in a ristretto in-memory cache (https://github.com/dgraph-io/ristretto). This insert and the querying are protected behind a read write mutex, since the tile38 Rtree library doesn’t seem to be thread safe. Looking at the tile38 codebase, their geoindex was also secured behind a RWMutex.
- On query, we get back a list of cities where the point is within the city’s rectangular bounds. But the point might still not be in the actual polygon.
- So when we search and find a city in the RTree, we still go ahead to query the entire city from the ristretto cache. The entire city in the ristretto cache contains data like the name and slug of the city, but also the full city polygon, which we then to do a direct point in polygon query to be sure that the point is actually within the city.
The final implementation is actually not a lot of code:
Now our workflow for handling city by coordinate requests is:
- New request comes with coordinates, we check the rtree for any polygon that contains those coordinates.
- If a polygon is found, we check ristretto for the rest of the data for that city, and do a more precise point in polygon query on the full polygon to be sure the coordinates are within that city polygon. If yes, we return the city to the client (internal or external)
- If data is not found in ristretto or in the rtree, we proceed to call postgres and do the actual point in polygon postgres query as usual.
- When postgres returns a response, we store index this data in ristretto and the city polygon bounds in the rtree to serve future requests.
The service is able to handle 9x more traffic on our staging load tests. In the graph below, you can see a load test run on the service without the geocache, hitting only 20k rpm, but with the geocache was hitting 150k rpm.
You can see that the response times with the geocache+ristretto combo was really good, since 75% of requests were handled in less than 0.9μs which is quite great compared to our previous numbers of >10ms.
Unfortunately there is still a large variance in the latency and some requests took up to 16s. This was probably due to the postgres instance being under load during the load test, and isn’t unexpected. But it’s also possible that there is some mutex pressure around the rtree and ristretto usage, and is an area I am exploring and tweaking.
Here’s a graph of the change in latency before and after the cache was deployed to production. In production the latency changes were not as dramatic, due to our use of third party services which add extra latency outside of our control. But the general improvement in latency was quite obvious and interesting to see, along with the obvious improvement in max server throughput.
Below is a 24hour view of the latency change after the geocache was deployed live. The quick dip was basically due to a load test being run on production to test things. Latency dropped much more, likely due to the load test traffic being already cached.
Below is P90 percentile latency change. From ~35ms to ~11ms, which is atleast 3x, for 90% of the requests. For about 50% of the requests, we still handle requests in less than 1ms.
After a couple weeks of running the geocache in production, the results were good and everything seemed safe.
Looking at latency distribution for the past week, max latency was 315ms (much less than 16s we saw in the staging load tests). The geocache (Rtree+Ristretto combination) implementation handled most (over 75%) of it’s querying in less than 0.5ms even when dealing with up to 150k rpm.
- Ristretto as an inmemory store + tidwall/rtree implementation for bounding box indexing is a great combination for building a geocache.
- As long as mutexes are used precisely and only covering single operations at a time, mutex pressure was not a big problem for our <10k rpm per pod.
- While s2 is much faster than an RTree for general point in shape queries (O(1) for s2 vs O(log n) for RTree), a lot of applications can escape the complexity of manual geo index management by just using an RTree. Especially since s2 indexes are immutable and designed to be read only.
- We limited Ristretto to 500mb of memory, and this has been enough to handle our <20,000 polygons and metadata. Though I need to check how much evications are happening. But Ristretto is really good at managing it’s allocated memory, with support for least frequently used eviction policies.
- Even though postgis is also implemented using an RTree, we get better performance and speed(latency) that what postGIS gave us. This could be due to a combination of the postgres network latency and postgres having to manage the complexity of dealing with the data in the disk and hot swapping the data to memory?
- Storing the data as raw golang structs without any custom serialisation(json or protobuf), was a win as this saved us CPU cycles previously spent on serialisation and deserialisation. It was nice that Ristretto allowed us index arbitrary structs without any JSON or binary marshalling step which we’re used to, with embedded databases/caches.
- We probably don’t need to worry about scaling postgres for a while. If anything, we could investigate scaling it down. In the past week, our CPU usage has averaged around 0.5% (compared to regularly hitting 100% during load tests in the past). We only got to 2.5% during the last load test. This is because most of the requests that would go to postgres are now handled by the geocache.
Possible improvements which we probably will never need:
- Improve geocache concurrency, by using a sharded bucket implementation for our RTree, so we don’t have to block on the entire tree for every write, and would only need to lock on a single bucket. (Ristretto is implemented like this)
- Have a separate write buffer, from the RTree, so we catch writes in a separate tree/cache and batch writes to the tree, to reduce lock contention, etc.
- Instead of performing the Ristretto lookups while searching the tree, we could build a list from the search and then look them up from Ristretto and perform the point in polygon lookup in a different step. This could be a quick win, but I can’t tell without testing it for sure.
But I think we can survive without any of these improvements for a while.