Maximizing Celestia's Analytical Capabilities

Data analytics is an important part of any blockchain's lifecycle. It provides valuable insights into the network's behaviour, allows for informed decision-making, and helps improves performance.

It is not only important but necessary to implement a data-driven approach from an early stage, such as during the test-net phase. This approach can help identify and resolve issues, predict the emergence of new ones, facilitate the resolution of multiple challenges on the path to the mainnet, and ensure that the final product is robust and reliable.

Performing data analytics on Celestia can be challenging due to its modular architecture, which requires a unique approach to data storage and indexing. However, leveraging our expertise in working with indexers and cutting-edge technology, we propose developing a comprehensive set of open-source tools for data indexing and storage. This will enable efficient and reliable data management solutions within the Celestia blockchain.

What to analyze in Celestia?

To understand how the Celestia network grows and operates, we propose analyzing a set of metrics that depict different product aspects. As the network's future is expected to consist of a variety of rollups, it should be possible to drill down to the level of a particular rollup in addition to a high-level system overview. Metrics will form certain benchmarks and trends, enabling health checks and predictions.

Measures of demand for Celestia's Data Availability Layer.

The metrics below will help evaluate the scale and rate of adoption, and identify the main consumers and their costs. This is a major part of Celestia's tokenomics.

  • Number of data requests (ever/per period/per sender/rollup)
  • Number of unique request senders/rollups (ever/per period) - this is the number of entities that use DA
  • Amount of data requested (ever/per period/average per sender/rollup)
  • Fees paid for data (per request/per sender/rollup)

Measures of Data Availability Layer operation quality.

While it is valuable to see how Celestia is used, it is also important to track the robustness of the network. Therefore, the following metrics can be used to provide an idea of whether operational requirements are met, security (light nodes for sampling) is ensured, and requests are fulfilled in a timely manner.

  • Percentage of fulfilled requests (ever, per period, and per rollup).
  • Percentage of data made available (ever, per period, and per rollup).
  • Average actual number of light nodes in DAS per rollup per request (ever, per period) and average minimum required light nodes in DAS per rollup per request (ever, per period).
  • Average time it takes to fulfill requests after transaction is sent.

How to analyze?

P2P.org has already developed a solution that can easily extract raw data from any Cosmos chain - the P2P.org Any-Cosmos-Chain Indexer. While Celestia may have specific data structure requirements, we are confident that we will not need to start development from scratch.

The Indexer source is open source and can be deployed and launched by anyone to access raw data. P2P.org suggests running an indexer on their own infrastructure, which includes an Indexer instance, a database for storing data, and several nodes for running data extraction. This provides an end-to-end solution for community data analysis.

Data can be stored in any database, but P2P.org offers to store it in a public DWH project. This is the best way to integrate a data-driven culture into your project, by having a public dataset with all the necessary data. An example of a CosmosHub public dataset, consisting of raw data from our indexer on CosmosHub, is available.

Currently, we use Google BigQuery to share this data with the community, but we plan to provide totally-free access to the data in the future. For now, you need a Google account with a free trial to query the data for free.

With the raw data, you can do anything, including data mining and providing analysis. Our main purpose is to extract knowledge from the data source. We take raw data and transform it into structured domain data which consists of metric definitions and metric history storage.

However, the main goal of any data analysis and research is transparency, clarity, and data visibility. The best way to achieve this is to build a public dashboard. P2P.org supports Looker (Google BI system) and Superset as BI for public dashboards, but also works with custom front-end solutions. This is just the tip of the iceberg in P2P.org data analysis process. Anyone who wants to understand and control the situation needs public dashboards to share insights and attract attention from the community.

We welcome any feedback and look forward to receiving the green light to begin.