- Reconciling High Speed Probing with Ethical Probing
What if tomorrow, hundreds, thousands, or even more people start sending 100,000 packets per second into the network using high speed probing? It is reasonable to think that this would not be a desirable scenario for the Internet. However, this might already be taking place, as high speed probing techniques such as ZMap and Yarrp, the high speed evolution of Ping and Traceroute, are extensively used by the network research community and beyond. If they can definitely help for improving our understanding of the Internet, very little work has been done to examine their potential negative impact, i.e., their potential harm to the Internet’s infrastructure, or their harm to the quality of measurement results. In this paper, we quantify the risks of those techniques, and provide recommendations on how to put in place high-speed ethical probing methods.
Dec 8, 2023 - 1 min read - Replication: Towards a Publicly Available Internet scale IP Geolocation Dataset
IP geolocation is one of the most widely used forms of metadata for IP addresses, and despite almost twenty years of effort from the research community, the reality is that there is no accurate, complete, up-to-date, and explainable publicly available dataset for IP geolocation. We argue that a central reason for this state of affairs is the impressive results from prior publications, both in terms of accuracy and coverage: up to street level accuracy and locating millions of IP addresses with a few hundred vantage points in months. We believe the community would substantially benefit from a public baseline dataset and code. To encourage future research in IP geolocation, we replicate two geolocation techniques and evaluate their accuracy and coverage. We show that we can neither use the first technique to obtain the previously claimed street level accuracy, nor the second to geolocate millions of IP addresses on today’s Internet and with publicly available measurement infrastructure. In addition to this reappraisal, we re-evaluate the fundamental insights that led to these prior results, as well as provide new insights and recommendations to help the design of future geolocation techniques. All of our code and data are publicly available to support reproducibility.
Oct 24, 2023 - 2 min read - Towards a Publicly Available Framework to Process Traceroutes with MetaTrace
The objective of this research is to contribute towards the development of an open-source framework for processing large-scale traceroute datasets. By providing such a framework, we aim to benefit the community by saving time in everyday traceroute analysis and enabling the design of new scalable reactive measurements, where prior traceroute measurements are leveraged to make informed decisions for future ones. It is important to clarify that our goal is not to surpass proprietary solutions like BigQuery, which are utilized by CDNs for processing billions of traceroutes. These proprietary solutions are not freely accessible to the public, whereas our focus is on creating an open and freely available framework for the wider community. Our contributions include (1) sharing the ideas and thinking process behind building MetaTrace, which efficiently utilizes ClickHouse features for traceroute processing; and (2) providing an open-source implementation of MetaTrace.
Oct 24, 2023 - 2 min read - Zeph & Iris map the internet: A resilient reinforcement learning approach to distributed IP route tracing
We describe a new system for distributed tracing at the IP level of the routes that packets take through the IPv4 internet. Our Zeph algorithm coordinates route tracing efforts across agents at multiple vantage points, assigning to each agent a number of /24 destination prefixes in proportion to its probing budget and chosen according to a reinforcement learning heuristic that aims to maximize the number of multipath links discovered. Zeph runs on top of Iris, our open-source system for orchestrating internet measurements across distributed agents of heterogeneous probing capacities. We show that carefully choosing which destination prefixes to probe from which vantage point matters for optimizing topology discovery and that a system can learn to improve its assignments based on previous measurements. After 10 cycles of probing, Zeph is capable of discovering 3.3M nodes and 19.8M links in a cycle of 15 hours, when deployed on 5 Iris agents. This is 3 times more nodes and 10 times more links than the existing state-of-the-art production system for the same number of prefixes probed.
Mar 1, 2022 - 3 min read - IP Geolocation Database Stability and Implications for Network Research
IP geolocation has myriad applications. While a body of prior research has investigated the accuracy of geolocation databases, we take a first look at their stability. Using a large collection of snapshots from a popular geolocation database, we examine the longitudinal evolution of its location mappings and address coverage. Across different classes of IP addresses, we find that significant differences can exist even between two successive weekly snapshots - a previously underappreciated source of potential error. To assess the sensitivity of research results to the geo database instance, we examine a prior study that used geolocation. Using their data and methodology, we generate results for each database instance available during their measurement period, i.e., the hypothetical results had the authors used a different snapshot. We show that the median distance of addresses considered shifted over 100km from ground truth and the coverage differed by 30% - potentially impacting the conclusions of this prior study. Based on our findings, we recommend best practices when using geolocation databases for network research to encourage reproducibility and soundness.
Sep 14, 2021 - 2 min read - Alias Resolution Based on ICMP Rate Limiting
Alias resolution techniques (e.g., Midar) associate, mostly through active measurement, a set of IP addresses as belonging to a common router. These techniques rely on distinct router features that can serve as a signature. Their applicability is affected by router support of the features and the robustness of the signature. This paper presents a new alias resolution tool called Limited Ltd. that exploits ICMP rate limiting, a feature that is increasingly supported by modern routers that has not previously been used for alias resolution. It sends ICMP probes toward target interfaces in order to trigger rate limiting, extracting features from the probe reply loss traces. It uses a machine learning classifier to designate pairs of interfaces as aliases. We describe the details of the algorithm used by Limited Ltd. and illustrate its feasibility and accuracy. Limited Ltd. not only is the first tool that can perform alias resolution on IPv6 routers that do not generate monotonically increasing fragmentation IDs (e.g., Juniper routers) but it also complements the state-of-the-art techniques for IPv4 alias resolution. All of our code and the collected dataset are publicly available.
Feb 1, 2020 - 2 min read