[Artifact] Server-side DNS HTTPS dataset
We provide DNS HTTPS record datasets collected through daily scans of the Tranco top 1 million domains. Specifically, we offer the following resources:
-
Dataset: Our parsed dataset includes DNS HTTPS, A, AAAA, NS, SOA records, and RRSIG of HTTPS records (where available) for the Tranco top 1 million domains, collected daily from our measurement server. This dataset will be updated monthly. Details about the dataset can be found in the Dataset section below.
-
Code: We provide the code used to generate the graphs in our paper, serving as a starting point for creating graphs using our datasets. We also offer the script used to collect DNS record data for Tranco domains.
Dataset
We provide parsed data that has been pre-processed to include only the necessary information for analysis. Each day, four CSV files are generated — two for apex domains and two for www subdomains.
-
DNS records data (
apex_https.csv
andwww_https.csv
): This data includes the following DNS record types for each of the Tranco top 1 million domains (if we are unable to retrieve responses for certain DNS records, those entries are left empty). For domains with a CNAME, we follow the CNAME and resolve DNS records for the target.- DNS record types
- HTTPS (and the corresponding RRSIG, if available)
- NS
- A
- AAAA
- SOA
- DNS record types
-
DNS flags data (
apex_flags.csv
andwww_flags.csv
): This data includes the flags returned in the response to the HTTPS record request. The following flags are provided as boolean values:- Flags: AD, QR, RD, RA, CD, AA, TC
Each dataset is provided on a monthly basis, with each monthly dataset further divided into daily data. We have been performing daily scans for HTTPS records since May 2023, with updates available monthly.
Download link (click)
*File format: tar.gz
Date (YYYY-MM) | Download | Misc. |
---|---|---|
2024-10 | link | |
2024-09 | link | |
2024-08 | link | |
2024-07 | link | |
2024-06 | link | |
2024-05 | link | |
2024-04 | link | |
2024-03 | link | |
2024-02 | link | |
2024-01 | link | |
2023-12 | link | |
2023-11 | link | |
2023-10 | link | |
2023-09 | link | |
2023-08 | link | |
2023-07 | link | |
2023-06 | link | |
2023-05 | link |
Code
We provide code that can be used to generate the graphs in our paper, using the parsed data above as input. These scripts can serve as starting points for your own analysis. For those who only wish to reproduce the graphs from the paper, we also provide processed data that can be directly used by plotting scripts. Additionally, we provide code for querying DNS records for Tranco domains.
- Generating graphs in the paper
Working with the parsed dataset
After downloading the code (e.g., cloning the GitHub repo) and dataset, place the dataset in the data/parsed/
directory. Please refer to the instructions in the README.md
.
Using the plotting data
Using the plotting data in the data/plotting/
directory, you can easily reproduce the graphs from the paper.
No additional data downloads are needed - simply use this data and open the Jupyter Notebook files (in the notebooks/
directory) to generate the graphs presented in the paper.
GitHub repository
You can download the code here - Link
- Collecting DNS data from Tranco domains
We provide scripts to send DNS queries to Tranco domains and collect responses. Additionally, we offer code to test TLS connection establishment for domains (e.g., establishing TLS connections to domains with mismatched IP addresses in HTTPS records).
Please refer to README.md
for further instructions.
GitHub repository
You can download the code here - Link