Matthieu Gouel, Kevin Vermeulen, Olivier Fourmaux, Timur Friedman, Robert Beverly

20210914TMA

IP geolocation has myriad applications. While a body of prior research has investigated the accuracy of geolocation databases, we take a first look at their stability. Using a large collection of snapshots from a popular geolocation database, we examine the longitudinal evolution of its location mappings and address coverage. Across different classes of IP addresses, we find that significant differences can exist even between two successive weekly snapshots - a previously underappreciated source of potential error. To assess the sensitivity of research results to the geo database instance, we examine a prior study that used geolocation. Using their data and methodology, we generate results for each database instance available during their measurement period, i.e., the hypothetical results had the authors used a different snapshot. We show that the median distance of addresses considered shifted over 100km from ground truth and the coverage differed by 30% - potentially impacting the conclusions of this prior study. Based on our findings, we recommend best practices when using geolocation databases for network research to encourage reproducibility and soundness.


Paper in Open Access: https://dl.ifip.org/db/conf/tma/tma2021/tma2021-paper2.pdf
Technical report: https://arxiv.org/abs/2107.03988

@inproceedings{tma21geostable,
 author = "Matthieu Gouel and Kevin Vermeulen and Olivier Fourmaux and Timur Friedman and Robert Beverly",
 title = "{IP} Geolocation Database Stability and Implications for Network Research",
 booktitle = {Proceedings of the Network Traffic Measurement and Analysis (TMA) Conference},
 month = sep,
 year = 2021,
}