On-Chain Data Infrastructure
Every transaction on a public blockchain is permanently recorded and freely readable. This creates a unique situation in financial markets: complete trade-level transparency. In TradFi, getting tick-level exchange data requires expensive subscriptions (Bloomberg, Refinitiv, direct exchange feeds). In DeFi, it’s all public — but raw blockchain data is nearly unusable without significant processing.
The Data Pipeline
The architecture mirrors a CDC (Change Data Capture) pipeline that any data engineer would recognize:
Blockchain nodes (source of truth)
→ Indexers (decode raw bytes into structured events)
→ Columnar store (Parquet, DuckDB, Trino)
→ SQL query engine
→ Dashboards / APIs
Each blockchain transaction contains:
- Sender, receiver, program (smart contract) addresses
- Instruction data (encoded function calls and parameters)
- Logs / events emitted by the smart contract during execution
- State changes (account balance diffs)
The raw format is compact binary. Indexers decode this using the smart contract’s IDL (Interface Definition Language) — analogous to deserializing Protobuf or Avro with a schema.
Key Platforms
DeFiLlama
An open-source DeFi analytics aggregator tracking TVL, revenue, fees, and volumes across chains and protocols.
- Architecture: Per-protocol adapters (open source on GitHub) define how to compute metrics from on-chain data. Adapters run on a schedule, feeding a time-series database exposed via REST API.
- Strengths: Standardized cross-protocol comparison, open methodology, good for aggregate trends.
- Limitations: Adapter quality varies. Metrics definitions (what counts as “revenue” vs. “fees”) can differ from protocol self-reporting.
- URL pattern:
defillama.com/protocol/<name>
Dune Analytics
A SQL query engine over decoded blockchain data. Think of it as BigQuery for blockchains.
- Architecture: Raw chain data is decoded into structured tables using contract ABIs/IDLs. Tables live in a columnar warehouse (DuckDB/Trino). Users write SQL queries and compose them into dashboards.
- Strengths: Full SQL expressiveness. Anyone can write queries — and critically, anyone can read the query behind any chart. This makes analysis reproducible and auditable.
- Limitations: Query quality depends on the author. Popular dashboards are community-vetted but not formally audited. Schema changes when Dune updates their data models can break queries.
- URL pattern:
dune.com/<author>/<dashboard-name>
Other Notable Platforms
- Flipside Crypto — similar to Dune, different query engine
- The Graph — decentralized indexing protocol (subgraphs)
- Helius / Shyft — Solana-specific APIs with decoded transaction data
- Nansen — adds wallet labeling (identify which wallets belong to known entities, funds, MEV bots)
How to Read a Dune Dashboard
When you encounter a Dune dashboard (e.g., the adam_tehc pump.fun
dashboards), here’s what to look for:
- Click “Edit query” on any chart to see the SQL — this tells you exactly what’s being measured and how
- Check the
FROMtables —pumpfun.trades,solana.transactions, etc. These are Dune’s decoded tables - Check date filters — dashboards may show all-time, last 30 days, or custom ranges
- Check for hardcoded addresses — queries often filter by specific program addresses (pump.fun’s Solana program ID)
- Note the last refresh time — Dune queries are not real-time; they run on a schedule or on-demand
Citing On-Chain Data
When referencing on-chain data in articles or analysis:
- Link to the specific dashboard or query
- State the access date (data changes continuously)
- Note whether the source is a community dashboard (Dune) or an open-source adapter (DeFiLlama)
- For critical claims, cross-reference multiple sources
- Prefer linking to the query itself over the chart — queries are the “source code” of the analysis
Connection to Engineering Experience
If you’ve built event-driven systems with Kafka, Spark, and columnar stores, the on-chain data stack is immediately familiar. The main differences:
- Immutability is absolute — blockchain data cannot be corrected or backfilled (unlike a Kafka topic with compaction)
- Schema is encoded in smart contracts — the IDL is the schema, and it’s immutable once deployed (though programs can be upgraded on Solana)
- Every “event” is a financial transaction — there’s no separation between the event log and the ledger
- Latency is block-time — on Solana, ~400ms; on Ethereum, ~12 seconds
Companion Resources
- DeFiLlama pump.fun protocol page
- Dune dashboards by
adam_tehc: pump.fun overview, creator earnings, PumpSwap metrics
Questions to consider: How would you design an alerting system on top of this data? What’s the equivalent of a “materialized view” in the on-chain data world? If you were building a credit scoring system for DeFi wallets, what on-chain signals would you use?