/
Ethereum

Ethereum censorability monitor

Post preview image

Table of Contents

Introduction

This article is our submission to Lido’s Ethereum censorability monitor grant.

Ever since the Ethereum merge, MEV-boost has become a significant part of the ecosystem. At the same time, the US government via the Office of Foreign Assets Control (OFAC) have imposed sanctions on certain digital addresses. MEV-relays are now divided between those which are OFAC-compliant and those which are not.

The main goal of this article is to demonstrate the influence of this censorship on blockchain degradation and propose a solution to monitor the censorship problem in the Ethereum blockchain.

For the purpose of this paper, we will refer to the time difference between when a transaction enters the mempool and is included in a block as "delay".

Dataset and dashboard

Our sources of data:

  1. Mempool public data.
    This data was collected via a web3 python package and kept in our Data warehouse (DWH) and only one node was used (located in Europe). The data was streamed 24/7 and we parsed approximately 1-1.2m potential transactions per day. Our mempool sample covers about 95% of all transactions in the public Ethereum dataset.
  2. Public Ethereum Dataset
  3. Level of censorship applied by relays
  4. Block information obtained directly from Relays (e.g. flashbots)
  5. Government-sanctioned list of digital addresses
  6. Lido validator pubkeys. We used the validator dataset from Lido.

After the data was processed, we created a large dataset. The table below contains a description of the main variables:

Column name

Data type (units)

Description

block_hash

STRING

Unique block identifier from the public Ethereum dataset

transaction_hash

STRING

Unique transaction identifier from the public Ethereum dataset

to_address

STRING

Transaction receiver

from_address

STRING

Transaction sender

block_timestamp

TIMESTAMP

Timestamp at which the block was created

mempool_timestamp

TIMESTAMP

Timestamp of when we parsed the mempool transaction

time_diff

BIGINT (seconds)

Time difference between when a transaction enters the mempool and is included in a block

block_diff

INT 

Number of blocks produced between when a transaction enters the mempool and is finalized 

gas

BIGINT

Gas allocated to the transaction 

gas_price

BIGINT

Gas price

gas_fact

BIGINT

Gas spent 

max_fee_per_gas

BIGINT

base_fee + max_priority_fee

max_priority_fee_per_gas

BIGINT

Additional fee to speed up transaction

relay

STRING

Name of relay

num_transaction

INT

Number of transactions within a block

height

BIGINT

Serial number of the block

builder_pubkey

STRING

Unique address of MEV-builder

lido_validator

STRING

Company of Lido validator (null if it is not a Lido validator)

transaction_censured_from

BOOLEAN

True if the sending address is under the sanctioned list

transaction_censured_to

BOOLEAN

True if the receiving address is under the sanctioned list

error_dummy

BOOLEAN

True if the transaction failed

censured_relay

BOOLEAN

True if the relay is censuring transactions

lido_validator_dummy

BOOLEAN

True if the Lido validator produces block

mev_dummy

BOOLEAN

True if the block is produced by MEV-builders

This dataset is available on Google BigQuery in the table `p2p-data-warehouse.p2p_public.eth_mev_censored`.

Based on the data described above, we’ve created a dashboard with the main characteristics of Ethereum transactions.

Dashboard description

Our dashboard has 6 parts:

  1. General data info. Data about transactions and blocks within the Ethereum blockchain, delay and transaction cost.
  2. Censorship between relays. Here we showcase the share of blocks for each MEV relay and the average delay for censorship and non-censorship MEV relays.
  3. Censorship addresses. Here we showcase all the available information about addresses that are under the OFAC-sanctioned list.
  4. Lido vs Other validators. Here we divide transactions between those validated by Lido and other validators for comparison.
  5. Censorship between MEV-builders. Here we showcase a few metrics for every MEV-builder in the Ethereum ecosystem.

Building hypothesis

Full sample dataset

Our main goal is to estimate the level of blockchain degradation, i.e. longer time to verify transactions and higher transaction costs, that may be caused by censorship. We hypothesize that longer delays and higher transaction costs could be a sign of censorship.

We hypothesize that our main metrics (delay and transaction costs) could be statistically different in the following subgroups:

We also want to check the level of censorship employed by Lido validators so we are going to check the following hypothesis:

  1. MEV transactions may be under censorship and it can lead to a slowdown in operations compared to non-MEV.
  2. Delay and cost of transactions could be different between Lido validators and other validators.
  3. Relays that censor transactions can have a longer delay than other relays.
  4. OFAC-compliant relays could take longer to process transactions compared to other relays
  5. The probability of some transaction being included in the Nth block in the case of OFAC/not-OFAC could be different.
  6. The probability to be included in the OFAC block for Lido validators could be different, compared to non-Lido validators.

Truncated dataset for potentially censored transactions

We want to highlight a certain amount of transactions whose high time delay could not be explained by normal network conditions. Such transactions will be suspect of being subject to censorship. To realise this, we must take into account the following transaction properties.

High delay

We will start by choosing all the transactions over a certain threshold for the time delay in seconds.

Successful transactions

Next, we will only consider successful transactions since failure could be a reason for the delay.

Low transaction fees

Another reason for a transaction to have a high delay could be low fees. That is why we should account for that and start by checking the transaction fee.

Previous transaction pending

Sometimes transactions could delayed simply because a previous transaction from the same sender had not yet finished. We use nonce parameters to exclude these transactions in our analysis.

After forming the truncated dataset, we will try to find out the reasons for the high delay in censored transactions: government-driven or ethical censorship. We will check the receiver and sender addresses against the sanctioned list and share OFAC/ethical censoring MEV-relays. Results across the full daily dataset can be found in our dashboard.

Censorship analysis (10.02.2023 - 14.02.2023)

The code required to reproduce our results is available here

Exploratory Data Analysis

Our sample dataset has 4 798 993 transactions and of those, 153 847 (~3%) are failed transactions.

The delay for most transactions does not exceed 27 seconds (95% quantile) and almost every transaction has been delayed for only one block (block_diff = 1).

Most transactions have fees with a skewness of zero. The difference between the 99.5% quantile and the 95% quantile is greater than 4 times the fee. The table below shows the transaction time delay and fees.

Main quantiles for delay and fees

Variable

5%

10%

25%

50%

75%

90%

95%

97.5%

99%

99.5%

Delay, secs

1

2

4

8

10

15

27

122

972

11984

Fees, ETH

0.00035

0.00042

0.000956

0.00231

0.00528

0.0114

0.0199

0.0375

0.0704

0.105

We used a rank Spearman correlation analysis due to the data having quite a few outliers that did not reveal any non-obvious relationships between the variables.

Relays: Approximately 87% of all transactions within the sample are from the top 5 relays: flashbots (43%, OFAC-compliant), max_profit (22%, non-compliant), ultra_sound (11.3%, non-compliant), agnostic_gnosis (6.62%, non-compliant) and blocknative (4.2%, OFAC). The table below showcases the main quantiles for delay among different relays.

Main quantiles for top 5 relays based on delay

Relay

5%

10%

25%

50%

75%

90%

95%

97.5%

99%

99.5%

Flashbots (OFAC)

2

2

4

8

10

14

23

95

679

3639

Max_profit (no)

1

2

4

7

11

15

26

113

905

12561

Ultra_sound (no)

1

2

4

7

11

16

27

129

768

9918

Agnostic_gnosis (no)

1

2

4

7

10

12

19

55

562

1547

Blocknative (OFAC)

1

1

4

7

11

18

38

192

1457

12126

Notice that flashbots relay has the biggest delay in the 5% quantile (2 seconds vs 1 second). This could partly be explained by the fact that flashbots relay has the biggest share of all relays for MEV transactions on Ethereum. To avoid this, we will use random sampling and check hypotheses on equal-size samples (in section 4.2).

Apart from the time difference in seconds, it is quite interesting to analyze the difference in the number of blocks as a time delay metric. The plot and table below show cumulative empirical probability to enter the Nth block or earlier for a random transaction. It is calculated as a share of transactions to be included in the Nth or earlier block across compliable and non-compliable MEV relays.

Empirical cumulative probability to be included in Nth block or earlier

Empirical cumulative probability to be included in Nth block or earlier for the block numbers

Block number

OFAC-compliant

non-compliant

Ethical

1

87.22%

87.14%

89.45%

2

95.08%

95.02%

94.54%

3

96.02%

96.01%

95.35%

4

96.51%

96.49%

95.81%

5

96.87%

96.83%

96.16%

6

97.13%

97.08%

96.40%

7

97.30%

97.25%

96.66%

8

97.44%

97.39%

96.83%

9

97.57%

97.52%

96.99%

10

97.66%

97.63%

97.10%

20

98.21%

98.22%

97.81%

30

98.46%

98.56%

98.15%

40

98.65%

98.74%

99.61%

50

98.87%

98.90%

99.68%

As the plot and table above show, it is really hard to identify any differences between the probability to be included in the Nth block between transactions under OFAC-compliant MEV-relays and non-compliant MEV-relays. Differences in probability for ethically censoring MEV-relays could be because of a low sample size.

Statistical analysis

In order to check the hypotheses correctly, in every statistical test we will generate random samples with an equal number of observations (10000).

Let’s check hypotheses 1-3. To do this, we will use Mann-Whitney and bootstrapped tests because data samples do not follow a normal distribution (We proved this fact, using a Shapiro-Wilk test and QQ-plot).

We start by using the Mann-Whitney test because of its robustness for distribution with outliers. We also backed up our results with bootstrap statistical tests.

Mann-Whitney test results on the full sample

Variable

Subgroups

Median variable difference between subgroup-1 and subgroup-2

p-value < 0.05

Statistical significance 

Time delayed, secs

MEV / non-MEV

-2 

yes

Significant

Time delayed, secs

Lido / non-Lido

-1 

yes

Significant

Time delayed, secs

Compliant relay / Non-compliant relay

yes

Significant

Fee, eth

Lido / Non-Lido


8.7e-5

no

Insignificant


These are the main conclusions that can be derived from the Mann-Whitney tests:

Next, we used a bootstrap test to approve the results above and, in addition, to check multiple comparison hypotheses among different relays (hypothesis 4). Also due to the fact that we use equal size samples, we can estimate the differences between metrics.

Bootstrap tests results for two subgroups

Variable

Subgroup

Difference in means between subgroup-1 and subgroup-2


p-value < 0.05

Statistical significance 

Delay

MEV / non-MEV

-1.14 secs

yes

Significant

Delay

Lido / non-Lido

-0.12 secs

no

Insignificant

Delay

Ccompliant relay / Non-compliant relay

0.1 secs

no

Insignificant

We deleted observations after the 95% quantile to calculate sample means more accurately

The plots below show the bootstrapped distribution of the sample mean differences.

Bootstrapped statistical tests proved statistically different only for MEV / non-MEV subgroups. MEV transactions are faster by 1.16 seconds on average.

Despite the censored transactions being longer by 0.12 seconds on average, this is not statistically significant. The same conclusion is valid for Lido validators - there are no statistical differences.

We then run multiple comparison tests among relays to check hypothesis 4. The tables below introduce the results of bootstrapped statistical tests with Bonferroni correction.

Relays (top 5) pairs with a significant time delay difference

Relay-1

Relay-2

BS 95% confidence intervals for difference

Flashbots

Agnostic Gnosis

(0.29; 0.63)

Max Profit

Agnostic Gnosis

(0.23; 0.55)

Ultra Sound

Agnostic Gnosis

(0.3; 0.6)

Agnostic Gnosis

Blocknative

(-0.36; -0.09)

Ultra Sound

Blocknative

(0.05; 0.37)

Flashbots

Blocknative

(0.08; 0.35)

Max Profit

Blocknative

(0.01; 0.29)

Bootstrap multiple comparisons test showed that Agnostic Gnosis and Blocknative MEV-relays have lower time delay after deleting all the outliers. Note that Agnostic's relay is non-compliant but Blocknative's relay is. In addition, Flashbots MEV-relay, which has the biggest share in MEV-boost-producing blocks, has a higher time delay on average, compared to Agnostic and Blocknative.

To check hypotheses 5-6 from section 3.1, we will use a Bayesian approach. The main statistic here is the chance to beat all (probability to be best). The difference is statistically significant when the chance to beat all exceeds 95%.

We then check the probability of some transactions to be included in the Nth block in the case of OFAC/not-OFAC samples.

Results of testing probabilities to be included in Nth block between OFAC and non-OFAC transactions

N block

Probabilities to be included in OFAC/non-OFAC block, %

Probabilities to be best for OFAC/non-OFAC, %

Conclusion

The first 2 blocks

{95.09; 94.98}

{64.02; 35.98}

Insignificant

The first 10 blocks

{97.7; 97.58}

{71.15; 28.77}

Insignificant

After the 10th blocks

{2.3; 2.42}

{28.83; 71.2}

Insignificant

There is no difference in the chance to be included within the 2nd and 10th blocks for OFAC-compliant MEV relays and non-compliant relays. The same is true for the probability to be included after the 10th block.

Results of testing probabilities to be included in an OFAC / non-OFAC block

Validators

Probability to be in  OFAC block, %

Probability to be best to be in OFAC block, %

Conclusion

Lido validator

52.73

0.01

Significant

Non-Lido validator

56.2

99.9


As the table above shows, the probability to be included in an OFAC block is higher for non-Lido validators.

Potentially censored sample dataset

Here we want to select transactions whose high delay could be explained by censorship. To decrease false positives, we must take into account cases in which delay is caused by any other reason. We will use the term “missing blocks” to describe blocks produced between when the transaction first appears in the mempool and the transaction is confirmed.

Criterion 1. We assume that transactions for which the delay is greater than 8 seconds (50% quantile) are suspected of being censored.

Criterion 2. Next, let’s suppose that failed transactions (error_dummy = 1 in our dataset) are not censored.

Criterion 3. We only consider transactions for which the fees in Ethereum are higher than 50% quantile within the block.

Criterion 4. We consider transactions where some “from address” is used more than once.

Criterion 5. We must also account for how full a block is. We will compare gas usage and gas limit for every block.

Criterion 6. We should only consider transactions for which the maximum base fee is higher than the block base fee.

Criterion 7. The nonce for every transaction with a longer mempool timestamp must be higher for every sender.

OFAC criterion. The address of the receiver or the address of the sender must be in OFAC sanctioned list.

The monitoring of all transactions is on our dashboard in the section “Potential censoring transactions”. The analysis of the data can be found in the Google collab.

Conclusions

In this article, we attempt to estimate how censorship affects blockchain degradation within the Ethereum ecosystem. The main metrics we used to judge this were “the time difference between when a transaction entered the mempool and it got confirmed” and “the number of blocks created between the time a transaction entered the mempool and it got confirmed”.

MEV-boost and OFAC censorship

Since the Ethereum merge, the share of validators that uses MEV-boost relays has increased from 10% to 87% since November 2022. However, since the beginning of 2023, the number of blocks produced by OFAC-compliant MEV relays has decreased from approximately 75% to 40%, which translates to a decrease in the share of blocks produced by flashbots.

Lido validators

Despite every Lido validator using MEV-boost, the use of OFAC-compliant MEV relays has gone down from 57% on February 10, 2023, to 42% on February 24, 2023.

Non-Lido validators

Among other validators, the use of OFAC-compliant MEV relays almost did not change and stayed at around 50% from February 10 to February 24.

Our statistical analysis also shows that the probability to be included in an OFAC block is higher for non-Lido validators when compared to Lido validators.

Effect on blockchain degradation

In general, censorship has an insignificant effect on the Ethereum blockchain degradation.

The delay distribution does not differ statistically between OFAC-compliant and non-compliant relays. However, there are some differences among MEV relays. Agnostic (non-compliant) has the lowest time delay and Blocknative (compliant) showed the second lowest transaction delay. But these relays have, at the same time, the lowest share among all the top 5 MEV relays in blocks built over this period.

We also did not notice a statically significant difference between the delay distribution between Lido and other validators.

We did not reveal any dependencies between censorship and transaction cost.

We tried to account for false positive cases of when a high delay is caused by other reasons besides censorship. Among these reasons, we can highlight low transaction cost, failure, repeated transactions from one sender, gas used, and having to wait for transactions with a lower nonce.

Future plans

This is not our final research regarding the effects of censorship on blockchain degradation in the Ethereum network. We are planning to investigate it further, while addressing the following:

Mempool quality data

Our main limitation was parsing manually mempool data rather than getting full ready-to-use mempool data. Therefore our time delay metrics could be biassed.

Rocketpool data

To compare Lido validators versus Rocketpool validators.

Empty blocks

To improve time delay metrics, considering empty blocks.

Quality of MEV-relays data

Currently, we have some problems collecting MEV-relay data. We are planning to improve this process.


Subscribe to P2P-economy

Get the latest posts delivered right to your inbox

Subscribe
Read more