STARDUST: Sustainable Tools for Analysis and Research on Darknet UnSolicited Traffic
Network Telescopes (aka Internet black holes, Internet sinks, darknets, darkspace) are passive monitoring systems capturing unsolicited Internet traffic sent to a segment of unutilized IP address space (i.e., IP addresses owned by an organization but not assigned to any hosts). Traffic captured at network telescopes (“telescope traffic”) provides precious data for researchers to study a large variety of Internet-related phenomena.
STARDUST is a collection of software tools and datasets as well as research infrastructure built to make real-time and historical analysis of telescope traffic efficient and easily accessible to researchers. STARDUST hosts the UCSD Network Telescope, one of the largest known network telescopes on the Internet (≈12 million IPv4 addresses). It also provides data from network telescopes operated by other collaborating organizations.
In addition, it makes available a research compute environment that enables users to access telescope traffic in real-time as well as historical datasets with various level of granularity (raw packets, flow-level data, time series) and augmented with meta-data (IP geolocation, IP-to-ASN, special tags). Each user is allocated a dedicated virtual machine.
The diagram below provides an overview of the STARDUST infrastructure architecture.
Unsolicited network traffic is captured by the UCSD Network Telescope and sent to the STARDUST Packet Distribution Server, which streams traffic on a dedicated VLAN through a technology specifically conceived for the STARDUST project.
Each user can access this stream of traffic in real time from their own STARDUST virtual machine. In addition, STARDUST internal components process this traffic, augment it with meta-data (e.g., geolocation information or ASN associated with the source IP address of each packet), and re-distribute it on the same VLAN on a separate stream that is also accessible from the users’ VMs.
From the same VM, users have access to the STARDUST cloud-based object storage, where raw traffic traces and flow-level traces are stored. Access to (and capabilities to process) all these data resources is provided through various tools, software libraries and APIs documented on this web site.
Finally, the traffic is also continuously processed to extract statistics (e.g., per-minute count of unique source IP addresses per country or ASN or protocol port number) saved as time-series data that can be visualized through browser-accessible Grafana dashboards.