Ethereum censorability monitor
Table of Contents
- Introduction
- Dataset and dashboard
- Building hypothesis
- Censorship analysis (10.02.2023 - 14.02.2023)
- Conclusions
- Future plans
Introduction
This article is our submission to Lido’s Ethereum censorability monitor grant.
Ever since the Ethereum merge, MEV-boost has become a significant part of the ecosystem. At the same time, the US government via the Office of Foreign Assets Control (OFAC) have imposed sanctions on certain digital addresses. MEV-relays are now divided between those which are OFAC-compliant and those which are not.
The main goal of this article is to demonstrate the influence of this censorship on blockchain degradation and propose a solution to monitor the censorship problem in the Ethereum blockchain.
For the purpose of this paper, we will refer to the time difference between when a transaction enters the mempool and is included in a block as "delay".
Dataset and dashboard
Our sources of data:
- Mempool public data.
This data was collected via a web3 python package and kept in our Data warehouse (DWH) and only one node was used (located in Europe). The data was streamed 24/7 and we parsed approximately 1-1.2m potential transactions per day. Our mempool sample covers about 95% of all transactions in the public Ethereum dataset. - Public Ethereum Dataset
- Level of censorship applied by relays
- Block information obtained directly from Relays (e.g. flashbots)
- Government-sanctioned list of digital addresses
- Lido validator pubkeys. We used the validator dataset from Lido.
After the data was processed, we created a large dataset. The table below contains a description of the main variables:
Column name | Data type (units) | Description |
block_hash | STRING | Unique block identifier from the public Ethereum dataset |
transaction_hash | STRING | Unique transaction identifier from the public Ethereum dataset |
to_address | STRING | Transaction receiver |
from_address | STRING | Transaction sender |
block_timestamp | TIMESTAMP | Timestamp at which the block was created |
mempool_timestamp | TIMESTAMP | Timestamp of when we parsed the mempool transaction |
time_diff | BIGINT (seconds) | Time difference between when a transaction enters the mempool and is included in a block |
block_diff | INT | Number of blocks produced between when a transaction enters the mempool and is finalized |
gas | BIGINT | Gas allocated to the transaction |
gas_price | BIGINT | Gas price |
gas_fact | BIGINT | Gas spent |
max_fee_per_gas | BIGINT | base_fee + max_priority_fee |
max_priority_fee_per_gas | BIGINT | Additional fee to speed up transaction |
relay | STRING | Name of relay |
num_transaction | INT | Number of transactions within a block |
height | BIGINT | Serial number of the block |
builder_pubkey | STRING | Unique address of MEV-builder |
lido_validator | STRING | Company of Lido validator (null if it is not a Lido validator) |
transaction_censured_from | BOOLEAN | True if the sending address is under the sanctioned list |
transaction_censured_to | BOOLEAN | True if the receiving address is under the sanctioned list |
error_dummy | BOOLEAN | True if the transaction failed |
censured_relay | BOOLEAN | True if the relay is censuring transactions |
lido_validator_dummy | BOOLEAN | True if the Lido validator produces block |
mev_dummy | BOOLEAN | True if the block is produced by MEV-builders |
This dataset is available on Google BigQuery in the table `p2p-data-warehouse.p2p_public.eth_mev_censored`.
Based on the data described above, we’ve created a dashboard with the main characteristics of Ethereum transactions.
Dashboard description
Our dashboard has 6 parts:
- General data info. Data about transactions and blocks within the Ethereum blockchain, delay and transaction cost.
- Censorship between relays. Here we showcase the share of blocks for each MEV relay and the average delay for censorship and non-censorship MEV relays.
- Censorship addresses. Here we showcase all the available information about addresses that are under the OFAC-sanctioned list.
- Lido vs Other validators. Here we divide transactions between those validated by Lido and other validators for comparison.
- Censorship between MEV-builders. Here we showcase a few metrics for every MEV-builder in the Ethereum ecosystem.
Building hypothesis
Full sample dataset
Our main goal is to estimate the level of blockchain degradation, i.e. longer time to verify transactions and higher transaction costs, that may be caused by censorship. We hypothesize that longer delays and higher transaction costs could be a sign of censorship.
We hypothesize that our main metrics (delay and transaction costs) could be statistically different in the following subgroups:
- MEV-boost / non-MEV-boost
- Relays (OFAC-compliant/non-compliant)
- Lido validators versus others
We also want to check the level of censorship employed by Lido validators so we are going to check the following hypothesis:
- MEV transactions may be under censorship and it can lead to a slowdown in operations compared to non-MEV.
- Delay and cost of transactions could be different between Lido validators and other validators.
- Relays that censor transactions can have a longer delay than other relays.
- OFAC-compliant relays could take longer to process transactions compared to other relays
- The probability of some transaction being included in the Nth block in the case of OFAC/not-OFAC could be different.
- The probability to be included in the OFAC block for Lido validators could be different, compared to non-Lido validators.
Truncated dataset for potentially censored transactions
We want to highlight a certain amount of transactions whose high time delay could not be explained by normal network conditions. Such transactions will be suspect of being subject to censorship. To realise this, we must take into account the following transaction properties.
High delay
We will start by choosing all the transactions over a certain threshold for the time delay in seconds.
Successful transactions
Next, we will only consider successful transactions since failure could be a reason for the delay.
Low transaction fees
Another reason for a transaction to have a high delay could be low fees. That is why we should account for that and start by checking the transaction fee.
Previous transaction pending
Sometimes transactions could delayed simply because a previous transaction from the same sender had not yet finished. We use nonce parameters to exclude these transactions in our analysis.
After forming the truncated dataset, we will try to find out the reasons for the high delay in censored transactions: government-driven or ethical censorship. We will check the receiver and sender addresses against the sanctioned list and share OFAC/ethical censoring MEV-relays. Results across the full daily dataset can be found in our dashboard.
Censorship analysis (10.02.2023 - 14.02.2023)
The code required to reproduce our results is available here
Exploratory Data Analysis
Our sample dataset has 4 798 993 transactions and of those, 153 847 (~3%) are failed transactions.
The delay for most transactions does not exceed 27 seconds (95% quantile) and almost every transaction has been delayed for only one block (block_diff = 1).
Most transactions have fees with a skewness of zero. The difference between the 99.5% quantile and the 95% quantile is greater than 4 times the fee. The table below shows the transaction time delay and fees.
Main quantiles for delay and fees
Variable | 5% | 10% | 25% | 50% | 75% | 90% | 95% | 97.5% | 99% | 99.5% |
---|---|---|---|---|---|---|---|---|---|---|
Delay, secs | 1 | 2 | 4 | 8 | 10 | 15 | 27 | 122 | 972 | 11984 |