Overview

The Catalyst-Cardano bridge is a custom bridge interface between Catalyst and a Cardano Node. It tracks data relevant to the unified Catalyst system, as it appears on the Cardano network, in real-time.

The bridge is not just a data logger, it also:

Acts as an event trigger for other Catalyst systems.
Acts as an information server to data pertinent to Catalyst operations.

Issues with the previous systems

Catalyst has used a tool called dbsync to aquire “snapshot” data. A “snapshot” is a record at a moment in time of all staked ADA in the network.

dbsync is a tool which captures a relational interpretation of the Cardano blockchain to an SQL database. This is useful for general-purpose queries of information contained on Cardano, but to query bulk data it is slow, and complex. The relational structure means that individual transactions need to be pieced together from multiple tables. Even with indexes this exerts a heavy efficiency toll when a single transactions state is queried. When bulk data is queried, it results in a large and complex query which takes a very long time to run (on the order of hours).

dbsync itself takes a very very long time to sync to the blockchain, and get progressively slower. As at mid january 2023, one dbsync instance in a production environment took more than 5 days to sync with a local node.

It is supposed to be possible to recover dbsync database from a backup, however experience shows this is a time consuming process itself. It took more than 12 hours just to load the backup image into the database, but then the node would not sync with main net. These issues cause excessive complexity, slow operation and fragile environments.

Project Catalyst is also not in control of the dbsync database schema, and the schema can change between revisions. This could mean the entire database needs to be re-synched (taking days), or the schema changes and breaks tools which rely on the schema.

The solution

The solution detailed here is a new bridge service, that has the following features:

Can sync from multiple redundant nodes.
Does not need to trust any single node (so it can sync from public nodes).
Focused on data and events required by Project Catalyst:
- Registration Records at all points in the past.
- Staked ADA at all points in the past.
- Minimum necessary state to track staked ADA.
More efficient database schema.
Schema is not accessed directly but via a simple API Service.
- Prevents downstream consumers from breaking if the DB Schema needs to change.
Does not need to snapshot:
- Data is accumulated progressively, not at instants in time.
- Data storage allows the state at any past time to be calculated simply and efficiently.
Is easy to independently deploy by the Catalyst Community, so they can independently validate data reported by Project Catalyst.
- Distributed use does not rely on any Catalyst-supplied data, which improves audibility and trust.

Architecture Overview

The System has these components:

1 or more Cardano Nodes (Preferably 2 or more)
A Pipeline which processes the data from the nodes:
- Read blocks from multiple nodes
- Validate blocks by independent reference (A valid block has n independent copies)
- Queue valid blocks for processing.
- Read valid blocks from the queue and process every transaction in the block.
  - Calculate the change in staked ADA caused by all transactions in the block.
  - Validate all Registration Records in the block:
    - Record all validated registrations.
    - Record all in-valid registrations (including the reason the registration is invalid).
- Queue the complete block of transactions, ledger state and registration updates for storing and alerting.
- Lock the Databases for writing (Transactional)
- Check if the block being recorded is new:
  - New:
    - Record the updated current ledger state.
    - Record the staked ADA for every stake address which changed in this block (time series record)
    - Record the registrations (time series record)
    - Send alerts to all upstream subscribers that subscribed events have changed.
    - Commit the transaction (unlocks the DB)
  - Already Recorded:
    - Abort the write transaction (release the DB)
    - Read the recorded data from the DB
    - Validate the DB data with the data calculated from the block.
    - If there is any discrepancy, LOG errors and send configured alerts.
A REST/HTTP service to report catalyst bridge data
- Report current staked/unpaid rewards in ADA for any stake address.
- Report staked/unpaid rewards in ADA for any stake address, at any past time.
- Report staked/unpaid rewards over a period of previous time, with various processing:
  - Daily Averages
  - All records
  - other
- Calculate voting power given a set of voting power options for a single address, or all registrations of a particular type.
  - Snapshot (instantaneous) voting power
  - Time window based voting power calculation
  - Linear vs functional voting power function of raw ADA.
  - Capped at a particular %
  - other parameters which can affect the voting power calculation.
Catalyst Event stream published via:
- Kafka
- other

Architectural Diagram

Integration to the Catalyst Unified Backend

The Cardano-Catalyst bridge is an essential and integral part of the Catalyst Unified backend. However, it is also a useful and capable tool in its own right.

It has a secondary use case of allowing the community to INDEPENDENTLY validate their registrations and voting power.

Accordingly, it is developed as a stand-alone service. This means it can be easily distributed and deployed INDEPENDENTLY of the rest of the catalyst unified backend services.

It has two internal long running tasks. Read, validate and record latest registrations/delegations from the linked block chain. Read and record running total balance and unclaimed rewards for every stake address. It also exposes a Voting Power API. Get voting power for Stake Address or Voting Key as at (timestamp). Would respect the registrations valid at that time. So if you asked for your voting power but you were delegated, the API would return you have X personal voting power, and Y..Z Voting power of yours has been delegated to Keys A-B. Options: Max Registration Age (So regitrations before this date/time are not considered). Must have valid payment address. (So we can later make a valid payment address a necessity if required, and this would also exclude just using stake address.) Voting power calculation type Absolute on the time of snapshot Average Maximum daily value Parameter: Length of time to average over (in days). Voting power linearity Linear (1 ADA = X voting power). Where X is a parameter. Logarithmic (Voting power is attenuated by a logarithmic function). Would need parameters to define the curve. Other?? Get Registration/Delegation information for a Stake Address/Voting Key as at a time. Similar to above but does NOT do any Get all active registrations as at a time. Time and max age of registrations are parameters. If stake addresses without registration are included in the output. What do you think? (edited)

Catalyst Voting System - Core Technology