Back to Blogs

Back to Blogs

Back to Blogs

Summary

Share this article

Published on:

Apr 10, 2025

Solving Geospatial Data Challenges with DynamoDB and Shapefiles at Waterplan

Matias De Santis

Head of Engineering at Waterplan

Data & AI

Water Risk

Published on:

Apr 10, 2025

Data & AI

Water Risk

Solving Geospatial Data Challenges with DynamoDB and Shapefiles at Waterplan

Matias De Santis

Head of Engineering at Waterplan

Reading time:

4 to 5 minutes

Introduction

At Waterplan, our goal is to help manage water resources efficiently and sustainably. To do this effectively, we rely on detailed geospatial data such as aquifers, river basins, and hydrological boundaries. However, handling these datasets, particularly in the form of shapefiles, brings significant technical challenges due to their size and complexity.

In this blog post, we'll explore how we tackled these challenges by integrating shapefiles with DynamoDB, enabling efficient indexing, storage, and rapid retrieval of geospatial data.

Technical Challenges

Working with shapefiles poses several challenges.

Point-in-polygon queries, a common operation in geospatial applications, requires specialized spatial indexing techniques and optimized search algorithms for real-time or near-real-time applications to quickly identify the containing polygon.

Storing large polygons directly in databases with item size restrictions, like DynamoDB, can be problematic. Strategies like polygon simplification, compression, or partitioning may be necessary to fit the data within the storage constraints.

Furthermore, shapefiles store spatial and attribute data separately, which can lead to data redundancy and potential inconsistencies. Maintaining data integrity and synchronization between these components can be challenging.

In addition, shapefiles may use different coordinate systems and projections, which can cause misalignment and incorrect spatial relationships when data from multiple sources is combined.

Another challenge is that shapefiles do not inherently enforce topological relationships between features. Ensuring data quality and topological consistency often requires additional validation and cleaning steps.

Finally, the shapefile format is not always well-suited for modern data processing frameworks and cloud-based environments. Conversion to other formats or the use of specialized geospatial databases may be necessary for optimal performance and scalability.Working with shapefiles poses several challenges. For example, point-in-polygon queries, a common operation in geospatial applications, require specialized spatial indexing techniques and optimized search algorithms to quickly identify the containing polygon; this is especially critical for real-time or near-real-time applications.

Our Approach

To address these challenges, we adopted a strategy involving geohashes and DynamoDB.

Step 1: Calculate the geohashes

The first step was to calculate the geohashes that intersected the polygon we wanted to index. If you’re new to geohashes, here’s a web that can help understand how they work. This map shows the result of obtaining the level 3 geohashes of a polygon:

Step 2: Indexing in DynamoDB

Once the intersecting geohashes were obtained, we grouped all polygons by the geohash. This process resulted in a list of polygons for each geohash.

In order to reduce the size of the item to be stored in DynamoDB, we took a few extra steps:

For each polygon that falls into a geohash we only keep the intersection between the polygon and the geohash. For example, in the image below there is no need to store the entire polygon that spans across 2 states in the US. We only need to keep the intersection:

Once each polygon has been reduced only to the intersection with the geohash we compressed the entire list of polygons so as to minimize the size and keep it under the 400Kb DynamoDB allows per item.

Step 3: Building a scalable pipeline

We used Amazon S3 to dynamically store shapefiles, with AWS Lambda functions processing new uploads. Lambdas generated geohashes and grouped them using GeoPandas. Geohashes were used as partition keys in DynamoDB, facilitating fast lookups.

In our system architecture, we leveraged the scalability and flexibility of Amazon S3 to dynamically store incoming shapefiles. To manage the processing of new shapefile uploads, we implemented AWS Lambda functions, ensuring serverless and event-driven operation. These Lambda functions were designed to generate geohashes from the spatial data within the shapefiles, employing the GeoPandas library for efficient geospatial operations. Geohashes, serving as a spatial indexing system, were then grouped and organized based on their geographic proximity.

The generated geohashes played a crucial role in optimizing data storage and retrieval within DynamoDB. By utilizing geohashes as partition keys in DynamoDB, we achieved significant improvements in query performance, particularly for spatial lookups. This strategy ensured that data with similar spatial characteristics was co-located within the database, minimizing the amount of data that needed to be scanned for each query. The combination of S3 for storage, Lambda for processing, Geohashes for spatial indexing, and DynamoDB for fast lookups resulted in a robust and scalable solution for managing and querying geospatial data.

Lessons Learned

An important insight from this approach was realizing DynamoDB's strengths, even though it's not typically considered a geospatial database. For relatively static datasets like ours, DynamoDB's scalability and efficient key-value lookup capabilities were ideal, significantly reducing operational complexity and cost compared to alternatives like Postgres or Elasticsearch.

By compressing data and indexing with geohashes, we achieved rapid geospatial queries at a fraction of the cost and complexity.

Reading time:

5 to 7 minutes

4 to 5 minutes

How AI is Revolutionizing Corporate Water Sustainability

Discover how AI and sound science combine to transform corporate water stewardship, bridging global data with local insights for real impact worldwide. Act now!

Blog

Data & AI

Water Risk

text

How AI is Revolutionizing Corporate Water Sustainability

Discover how AI and sound science combine to transform corporate water stewardship, bridging global data with local insights for real impact worldwide. Act now!

Blog

Data & AI

Water Risk

text

How AI is Revolutionizing Corporate Water Sustainability

Discover how AI and sound science combine to transform corporate water stewardship, bridging global data with local insights for real impact worldwide. Act now!

Blog

Data & AI

text

Features To Make Water Risk Management Effortless

Waterplan’s tech empowers EHS and sustainability managers to streamline water risk, scale impact, and simplify reporting—year after year.

Blog

Data & AI

text

Features To Make Water Risk Management Effortless

Waterplan’s tech empowers EHS and sustainability managers to streamline water risk, scale impact, and simplify reporting—year after year.

Blog

Data & AI

text

Features To Make Water Risk Management Effortless

Waterplan’s tech empowers EHS and sustainability managers to streamline water risk, scale impact, and simplify reporting—year after year.

Blog

Data & AI

Water Risk

text

Harnessing AI for Next-Level Hydrological Accuracy in Japan

A breakthrough study in Japan using deep learning, specifically LSTM networks, has improved river water flow prediction in hard-to-access areas

Blog

Data & AI

Water Risk

text

Harnessing AI for Next-Level Hydrological Accuracy in Japan

A breakthrough study in Japan using deep learning, specifically LSTM networks, has improved river water flow prediction in hard-to-access areas

Blog

Data & AI

Water Risk

text

Harnessing AI for Next-Level Hydrological Accuracy in Japan

A breakthrough study in Japan using deep learning, specifically LSTM networks, has improved river water flow prediction in hard-to-access areas

Subscribe to our newsletter

Get insights, expert analysis and tips on measuring, reporting, and responding to water risk

Connect with us to learn how Waterplan can help you achieve your water sustainability goals

Connect with us

Connect with us to learn how Waterplan can help you achieve your water sustainability goals

Connect with us

Connect with us to learn how Waterplan can help you achieve your water sustainability goals

Connect with us

Connect with us to learn how Waterplan can help you achieve your water sustainability goals

Connect with us

Established in 2021, we're a SaaS company dedicated to helping corporate sustainability teams accelerate their journey towards water security. Waterplan is the leading water platform to measure, respond, and report water risk, saving time from water data collection to reporting, providing access to the best-in-class water risk data and water expert leaders, and enabling stakeholder alignment to take action on water risks.

2193 Fillmore St.

San Francisco, CA 94115

Platform

Solutions

Resources