How does Yelp calculate distance

How does Yelp calculate the distance in the database efficiently?


Suppose I have a table:

All are of course indexed. There are also 1 million records

For example, let's say I want to find companies that are closest to 106.5. How would i do that?

When i do

for example or when I do

In theory, the computer needs to calculate the distance for all stores, while in practice only those with latitude and longitude within a certain area should be calculated.

For example, how can I do what I want in PhP or SQL?

I am grateful for the answer so far. I use MySQL and they have nothing more efficient than the obvious solution. MySQL Spatial also has no function for calculating distance.

Reply:


If I understand the question correctly (and I'm not sure if I do), are you afraid of doing math for every row in the table with every query?

This can be reduced to some extent by adding the indices to it and so we just need to calculate the distance for a 'box' of points containing the circle we actually want:

Where 96, 116, etc. are chosen to match the unit of the value '2000' and the point on the globe from which you are calculating distances.

How exactly this uses indexes depends on your RDBMS and the decisions of the planner.

In general, this is a primitive way of optimizing some kind of closest neighbor search. If your RDBMS supports GiST indexes like Postgres, consider using those instead.







(Disclosure: I'm a Microsoft SQL Server type, so it will affect my answers.)

To make it really efficient, you want two things: caching and native spatial data support. With the support of geospatial data, you can store geography and geometry data directly in the database without having to perform intensive / expensive calculations on the fly, and you can create indexes to very quickly find the closest point to your current location (or the most efficient route or whatever).

Caching is important when you want to scale. The quickest query is the one you never ask. When a user asks about the things closest to them, save their location and result set in a cache like Redis or cached for a period of hours. Business locations will not change for 4 hours. However, when someone edits a company, it does not necessarily have to be updated immediately in all result sets.






PostgreSQL has the reference implementation for GIS with PostGIS. Yelp may be using MySQL, which is inferior in every way. With something like Yelp, they'll almost certainly keep the coordinates for,

  • The user
  • The possible goals

These coordinates are almost certainly in WGS84 and are stored as a geographic type. In PostgreSQL and PostGIS it would look something like this:

You would fill this table. Then they get the WGS84 coordinates from your phone and generate a query like this with SQL Alchemy (in the case of Yelp).

Further information can be found in our spatial area and at Geographic Information Systems @ StackExchange

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.