geokdtree

GeoKDTree

PyPI version License: MIT PyPI Downloads

GeoKDTree

Ultra-fast nearest-neighbor lookup for latitude/longitude data

GeoKDTree is a lightweight, high-performance spatial indexing library for Python designed to find the nearest geographic coordinate from massive datasets in nanoseconds.

It wraps a highly optimized KD-Tree with a geographic interface, allowing you to work directly with (latitude, longitude) pairs. No projections, no external dependencies, and no heavy GIS stacks.

geokdtree

Documentation

Installation

pip install geokdtree

If you are having trouble building the C++ extension during the pip installation process, you can run:

  • On Mac / Linux / WSL2:

    export SKBUILD_CMAKE_ARGS="-DSKIP_CPP_BUILD=ON"
    pip install geokdtree
    
  • On Windows:

    # POWERHELL:
    $env:SKBUILD_CMAKE_ARGS="-DSKIP_CPP_BUILD=ON"
    # CMD:
    set SKBUILD_CMAKE_ARGS=-DSKIP_CPP_BUILD=ON
    pip install scgraph
    

Getting Started

from geokdtree import GeoKDTree

example_points = [
    (34.0522, -118.2437),  # Los Angeles
    (40.7128, -74.0060),   # New York
    (37.7749, -122.4194),  # San Francisco
    (51.5074, -0.1278),    # London
    (48.8566, 2.3522),     # Paris
]

geo_kd_tree = GeoKDTree(points=example_points)

test_point = (47.6062, -122.3321)  # Seattle
# Find the index of the closest point in the original dataset
closest_idx = geo_kd_tree.closest_idx(test_point) #=> 2
# Find the closest point itself
closest_point = geo_kd_tree.closest_point(test_point) #=> (37.7749, -122.4194)

Why Use GeoKDTree?

GeoKDTree is designed to solve one focused problem extremely well:

Fast nearest-neighbor lookup for latitude/longitude data at scale.

It is worth noting that the closest point found may not be the true closest point, but should be very close for most practical applications. See KD-Tree limitations for more details.

Extremely Fast Lookups

Once constructed, nearest-neighbor queries consistently complete in tens of nanoseconds, even with very large datasets.

Typical benchmark results from the included tests:

Number of Points Build Time Query Time
1,000 ~1.7 ms ~0.02 ms
10,000 ~25 ms ~0.05 ms
100,000 ~350 ms ~0.05 ms
1,000,000 ~6.8 s ~0.07 ms

This makes GeoKDTree well-suited for:

  • Real-time proximity queries
  • Matching incoming coordinates against large reference datasets
  • High-throughput geospatial APIs
  • Pre-filtering before more expensive geospatial calculations

Exact timings depend on hardware, Python version, and data distribution. These values reflect typical results from the repository’s benchmarks.

Built for Geographic Coordinates

GeoKDTree works directly with (latitude, longitude) pairs.

You do not need to:

  • Project coordinates into planar space
  • Use heavyweight GIS libraries
  • Maintain custom spatial indexing code

Just pass geographic coordinates and query.

Simple API, Minimal Overhead

GeoKDTree intentionally keeps the API small and focused.

  • Build once from a list of coordinates
  • Query nearest neighbors with a single method call
  • Retrieve indices or points directly from your original dataset

There are no external C extensions or heavy dependencies, keeping installation and deployment simple.

Deterministic and Predictable Performance

  • Tree construction scales at approximately O(n log n)
  • Query performance scales at approximately O(log n)
  • No probabilistic approximations
  • No background indexing or caching

This predictability is valuable for production systems where latency and reproducibility matter.

Supported Features

See: https://connor-makowski.github.io/geokdtree/geokdtree.html

Contributing

Issues, feature requests, and pull requests are welcome. Please open an issue to discuss changes or enhancements.

Development

Running Tests, Prettifying Code, and Updating Docs

Make sure Docker is installed and running on a Unix system (Linux, MacOS, WSL2).

  • Create a docker container and drop into a shell
    • ./run.sh
  • Run all tests (see ./utils/test.sh)
    • ./run.sh test
  • Prettify the code (see ./utils/prettify.sh)
    • ./run.sh prettify
  • Update the docs (see ./utils/docs.sh)

    • ./run.sh docs
  • Note: You can and should modify the Dockerfile to test different python versions.

  1"""
  2# GeoKDTree
  3[![PyPI version](https://badge.fury.io/py/geokdtree.svg)](https://badge.fury.io/py/geokdtree)
  4[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
  5[![PyPI Downloads](https://pepy.tech/badge/geokdtree)](https://pypi.org/project/geokdtree/)
  6<!-- [![PyPI Downloads](https://img.shields.io/pypi/dm/geokdtree.svg?label=PyPI%20downloads)](https://pypi.org/project/geokdtree/) -->
  7
  8# GeoKDTree
  9
 10## Ultra-fast nearest-neighbor lookup for latitude/longitude data
 11
 12**GeoKDTree** is a lightweight, high-performance spatial indexing library for Python designed to find the *nearest geographic coordinate* from massive datasets in nanoseconds.
 13
 14It wraps a highly optimized KD-Tree with a geographic interface, allowing you to work directly with `(latitude, longitude)` pairs. No projections, no external dependencies, and no heavy GIS stacks.
 15
 16![geokdtree](https://raw.githubusercontent.com/connor-makowski/geokdtree/main/static/geokdtree.png)
 17
 18### Documentation
 19
 20- Docs: https://connor-makowski.github.io/geokdtree/geokdtree.html
 21- Git Repo: https://github.com/connor-makowski/geokdtree
 22
 23## Installation
 24
 25```bash
 26pip install geokdtree
 27```
 28
 29If you are having trouble building the C++ extension during the pip installation process, you can run:
 30
 31- On Mac / Linux / WSL2:
 32    ```bash
 33    export SKBUILD_CMAKE_ARGS="-DSKIP_CPP_BUILD=ON"
 34    pip install geokdtree
 35    ```
 36- On Windows:
 37    ```bash
 38    # POWERHELL:
 39    $env:SKBUILD_CMAKE_ARGS="-DSKIP_CPP_BUILD=ON"
 40    # CMD:
 41    set SKBUILD_CMAKE_ARGS=-DSKIP_CPP_BUILD=ON
 42    pip install scgraph
 43    ```
 44
 45## Getting Started
 46
 47```python
 48from geokdtree import GeoKDTree
 49
 50example_points = [
 51    (34.0522, -118.2437),  # Los Angeles
 52    (40.7128, -74.0060),   # New York
 53    (37.7749, -122.4194),  # San Francisco
 54    (51.5074, -0.1278),    # London
 55    (48.8566, 2.3522),     # Paris
 56]
 57
 58geo_kd_tree = GeoKDTree(points=example_points)
 59
 60test_point = (47.6062, -122.3321)  # Seattle
 61# Find the index of the closest point in the original dataset
 62closest_idx = geo_kd_tree.closest_idx(test_point) #=> 2
 63# Find the closest point itself
 64closest_point = geo_kd_tree.closest_point(test_point) #=> (37.7749, -122.4194)
 65```
 66
 67## Why Use GeoKDTree?
 68
 69GeoKDTree is designed to solve one focused problem extremely well:
 70
 71**Fast nearest-neighbor lookup for latitude/longitude data at scale.**
 72
 73It is worth noting that the closest point found may not be the true closest point, but should be very close for most practical applications. See KD-Tree limitations for more details.
 74
 75### Extremely Fast Lookups
 76
 77Once constructed, nearest-neighbor queries consistently complete in **tens of nanoseconds**, even with very large datasets.
 78
 79Typical benchmark results from the included tests:
 80
 81| Number of Points | Build Time | Query Time |
 82| ---------------: | ---------: | ---------: |
 83|            1,000 |    ~1.7 ms |   ~0.02 ms |
 84|           10,000 |     ~25 ms |   ~0.05 ms |
 85|          100,000 |    ~350 ms |   ~0.05 ms |
 86|        1,000,000 |     ~6.8 s |   ~0.07 ms |
 87
 88This makes GeoKDTree well-suited for:
 89
 90* Real-time proximity queries
 91* Matching incoming coordinates against large reference datasets
 92* High-throughput geospatial APIs
 93* Pre-filtering before more expensive geospatial calculations
 94
 95> Exact timings depend on hardware, Python version, and data distribution. These values reflect typical results from the repository’s benchmarks.
 96
 97### Built for Geographic Coordinates
 98
 99GeoKDTree works directly with `(latitude, longitude)` pairs.
100
101You do **not** need to:
102
103* Project coordinates into planar space
104* Use heavyweight GIS libraries
105* Maintain custom spatial indexing code
106
107Just pass geographic coordinates and query.
108
109### Simple API, Minimal Overhead
110
111GeoKDTree intentionally keeps the API small and focused.
112
113* Build once from a list of coordinates
114* Query nearest neighbors with a single method call
115* Retrieve indices or points directly from your original dataset
116
117There are no external C extensions or heavy dependencies, keeping installation and deployment simple.
118
119### Deterministic and Predictable Performance
120
121* Tree construction scales at approximately `O(n log n)`
122* Query performance scales at approximately `O(log n)`
123* No probabilistic approximations
124* No background indexing or caching
125
126This predictability is valuable for production systems where latency and reproducibility matter.
127
128## Supported Features
129
130See: https://connor-makowski.github.io/geokdtree/geokdtree.html
131
132## Contributing
133
134Issues, feature requests, and pull requests are welcome.
135Please open an issue to discuss changes or enhancements.
136
137# Development
138## Running Tests, Prettifying Code, and Updating Docs
139
140Make sure Docker is installed and running on a Unix system (Linux, MacOS, WSL2).
141
142- Create a docker container and drop into a shell
143    - `./run.sh`
144- Run all tests (see ./utils/test.sh)
145    - `./run.sh test`
146- Prettify the code (see ./utils/prettify.sh)
147    - `./run.sh prettify`
148- Update the docs (see ./utils/docs.sh)
149    - `./run.sh docs`
150
151- Note: You can and should modify the `Dockerfile` to test different python versions.
152
153"""
154
155try:
156    from geokdtree.cpp import GeoKDTree, KDTree
157except ImportError:
158    from geokdtree.geokdtree import GeoKDTree
159    from geokdtree.kdtree import KDTree