Completeness of crypto subledger data
Ensuring accurate blockchain data in a crypto accounting sub-ledger is a major challenge. In this article, we’ll explore the ten most frequent causes of data completeness issues.
This blogpost has been originally published in Cryptoworth blog here.
Ensuring accurate blockchain data in a crypto accounting sub-ledger is a major challenge. Missing or inconsistent transactions can create significant discrepancies, making reconciliation a time-consuming nightmare. Whether due to API limitations, blockchain design quirks, or overlooked internal transactions, these issues are common across crypto accounting. In this article, we’ll explore the ten most frequent causes of data completeness issues and provide actionable solutions.
In this article, we explore the most common causes of discrepancies in blockchain data completeness and offer practical solutions. We will also share our insights on the most effective crypto accounting sub-ledger for tackling these challenges.
#10 - Internal transactions
On Ethereum and other EVM-compatible blockchains, internal transactions are not explicitly recorded on-chain. These transactions, often triggered by smart contracts, can impact account balances without leaving a direct trace in block data. This lack of visibility can lead to incomplete transaction histories if proper indexing tools are not used.
This is how the issue is described by some good of great resources we found on this topic:
“While internal transactions have real consequences to account balances, surprisingly the internal transactions themselves are not stored in on-chain. To see internal transactions, you have to run the transaction and trace the calls that it makes. While some contracts do log events to the chain that record internal activity, many do not because doing so requires additional gas.”
“Internal transactions aren't actual transactions on the blockchain and therefore aren't included directly in the blockchain. Internal transactions are value transfers initiated by a contract or when a contract simply makes a call to another contract. The result of the value transfer or contract call isn't stored on the blockchain and therefore you cannot return the result of the internal transaction from the blockchain.
Etherscan obtains the results of these transactions by running a modified node that records all the value transfers by looking at the actual transaction that resulted in the change, and then stores the effects of the transaction separately as an 'internal transaction'. Currently, there is not any way to imitate this process simply using the web3 API.”
Solution: Since standard blockchain APIs do not provide direct access to internal transactions, specialized explorers and indexing tools are essential for capturing this missing data.
Recommended tools:
For the minimal view use Arkham,
For the most functional view use Blocksec,
For the most detailed view use BitQuery,
For human-readable transaction labels use Noves / Blockscout.
Etherscan API for internal transactions.
To better understand how to read data on Etherscan, we also advise to read this guide:
#9 - Genesis allocations
Genesis allocations—the initial distribution of tokens to accounts—are not recorded in any blocks validated by miners. Instead, they are defined in the network’s genesis file, which serves as the foundation for the blockchain’s state at launch. However, these files can be difficult to locate, as they are not always readily available.
The best place to start your search is the GitHub repository of the blockchain network you are working with. Some validators kindly publish these files or provide guidance on retrieving them via command-line tools. If you know who are the network’s validators, checking their official websites may also yield useful information.
Solution: See StakeLabs
#8 - Complex transactions
Complex transactions, such as liquidity provisioning, often generate multiple log entries. However, some sub-ledgers may only capture the first log entry, leading to incomplete or incorrect accounting. Consider the following example:
0xTXN…1 - Swap BTC for ETH: In this transaction, an owner sends 1,000 BTC to 0x…2 in exchange for 15,000 ETH from 0x…2 and spend.
0xTXN…2 - Transfer of BTC to another wallet.
To handle this properly, the recommended approach is to add an index to each amount associated with the transaction, message, or event:
This is a long format where all the numerical characteristics of the quantities of tokens are stored in a single column with related qualitative features stored in separate columns, which allows users to verify the completeness and accuracy of activities by simply summing up each record’s quantity field value and comparing the result with changes in account state at the end of the period vs. at the beginning of each period.
#7 - Data pruning
Many blockchain protocols today have adopted a data pruning model to optimize storage and processing speed. Below is a definition borrowed from a blog post on the official Monero website. Monero, a blockchain focused on private and censorship-resistant transactions, had a market capitalization of $3.8 billion as of March 18, 2025:
“Pruning is the process of removing non-critical blockchain information from local storage. Full nodes keep an entire copy of everything that is stored on the blockchain, including data that is not very useful anymore. Pruned nodes remove much of this less relevant information to have a lighter footprint.”
[Monero]
Under the pruning model, light validation clients store only a limited number of recent blocks, while archival nodes retain the full transaction history. Since each block contains a hash of all previous blocks, integrity is preserved, but access to historical data becomes more limited.
When importing blockchain data into a crypto sub-ledger, the system will typically prioritize faster-pruned nodes over slower archival nodes. As a result, older transactions may not be imported, leading to potential gaps in historical data.
Solution: To mitigate data loss from pruned nodes, ensure your sub-ledger connects to an archive node that maintains a full transaction history. This can typically be done by selecting a provider like Infura or Alchemy or by running your archival node.
#6 - Lazy Blockchains
Certain blockchain data sources include both successful and failed transactions. This happens on so-called “lazy blockchains” where the process of transaction validation is decoupled from the protocol consensus to favor the use of light clients while leaving the validation of transactions to full nodes. This approach is widespread among Cosmos ecosystem blockchains (e.g., Celestia) and results in the production of “dirty ledgers” that contain both succeeded and failed transactions.
One might have concluded that they can simply exclude failed transactions from the ledger. However, both failed and successful transactions require gas payment to be included in the block, hence, failed transactions result in a change of account balance by the amount of gas consumed by these transactions. Usually, the gas fees are not as material for this type of blockchain, but for companies with a high volume of transactions on these blockchains, the cumulative amount of fees may become material.
Solution: Use detailed audit logs in the sub-ledger to identify any hidden, ignored, or deleted transactions.
#5 - Missing wallets
It is very common for startups to not be able to easily track and identify the complete list of all wallets that their organization has control over. Because of it, new wallets are being regularly discovered during the audit at every web3 company.
Solution: Use my tracker for early-stage startups in Notion here.
#4 - Centralized exchange API
CEX APIs are often unreliable, providing incomplete or inconsistent data.
Limited access to historical transactions (e.g., Kraken’s past inconsistencies).
API payloads may omit key transaction details, making reconciliation difficult.
Solution: Regularly export and back up exchange data. Additionally, we are developing a comprehensive crypto exchange reconciliation module—stay tuned for updates.
#3 - Module accounts
Let’s take a look at two ecosystems - Solana and Cosmos.
Cosmos
The user’s unrestricted asset balance consists of five components:
Available,
Delegatable Vesting,
Delegated,
Unbonding, and
Reward.
Only Available assets are tracked inside the bank module of the blockchain and represent the actual account balance.
When staked/unbonded/unstaked, these tokens are transferred to the bonded and/or unbonded pool accounts managed by the staking program module.
In return, the program module tracks Delegatable Vesting, Delegated, Unbonding, and Reward balances of wallets that staked their assets.
These records of the staking program module are what users see in cosmos wallets, even though these assets are technically outside of user wallets.
Solana
Solana's account model separates functionality to different accounts instead of a single user account like on Ethereum. Having multiple accounts associated with the wallet address is further complicated by multiple types of account balances on these accounts. For example, to properly include all account funds, we need to add the active and inactive balances of the stake account and the balance of the vote account.
Similarly to Cosmos, Solana’s token balances may belong to a program or other system account rather than directly to the owner’s identity account.
Solution: See more resources on the Solana account model recently published by the CryptoCFO community here. One tool to highlight is Old Faithful.
#2 - Block header events
Cosmos blockchains handle events differently from transactions, requiring additional tracking mechanisms. Unlike regular transactions, certain state changes occur via block header events, which must be accounted for separately.
For block header events (ResponseBeginBlock and ResponseEndBlock types), the account state may change without any corresponding transaction being recorded in the block.
This is why it is important to identify and account for these events separately and in addition to transactions. However, to solve this issue, users would typically have to implement their own custom indexing solutions that track direct changes in account states rather than simply pulling transactional data from a public node API. One of the ways to build a custom indexing solution.
Solution: To track these state changes, you may need a custom indexing solution like SubQuery, which allows querying blockchain events beyond simple transactions.
#1 - Non-transactional debits and credits
There are many unusual situations and events that have happened in the history of blockchains. Sometimes, such situations and events were causing additional adjustments recorded without transactions accompanying these adjustments. Some examples are discussed below. One good example found in Coinmetrics documentation:
“Even though the overwhelming majority of debits and credits take place within a transaction, some protocols have balance updates that occur outside of transactions (for example, Ethereum blocks rewards are credited implicitly, outside of any transaction). There are also unusual circumstances where a block may carry additional credits and debits so that the ledger can be accurately balanced. For example, the irregular ledger update following Ethereum's notorious DAO hack required us to append additional credits and debits to that block in order for the irregular ledger change to be accounted for.”
Examples:
Staking Rewards
Often, we see that high-volume intensive allocations to network participants are recorded without separate transactions that would contain an identifier. Instead of including the related transactions in the block, these allocations are recorded at a protocol level directly (as changes in the state of the recipient’s addresses). This is typical for allocations that have a high number of recipients or have a frequent rate of occurrence.
Governance Income
This is also an important feature of protocol distributions related to blockchains with built-in asset management functionality that allows for the automatic distribution of tokens based on the accepted governance proposals. Because such distributions are recorded originally via submitted governance proposal transactions that may or may not be approved. When the submitted proposal is approved, the recipient’s account state changes; when the submitted proposal is denied, no subsequent change in the state of account of the recipient’s address occurs. However, regardless of whether the prop is approved or denied, the subsequent change in the account state happens without an additional transaction included in any of the subsequent blocks.
Vesting events
Also, it is important to remember that the complexity of transactions may lie in the impact of the transaction. Some transactions may change the qualities of tokens owned, but they do not actually change the owner. As an example, vesting tranche events are produced based on the vesting schedule of restricted accounts on Avalanche. Vested tokens are reflected as deposits, although tokens are not transferred between the accounts. In such a scenario, only transaction fees (if any) would affect the account state.
Solution: See #1.
Afterthoughts
Cryptoworth team is committed to helping its clients to tackle the challenges outlined in this article. Its strength lies in its ability to aggregate and process data from a vast array of sources and the dedicated team working hard to solve every new issue the industry brings. It currently covers over 200 blockchains, numerous DeFi protocols, and centralized exchanges—providing a more complete picture than many alternatives. In particular, it’s useful when handling internal transactions on Ethereum. Its detailed transaction logs and flexible categorization also make it easier to audit issues.

That said, the complexities I’ve described—ranging from genesis allocations to non-transactional state changes—are still very much prevalent, even with a tool like Cryptoworth. As new token behaviors emerge in the space—think novel staking mechanisms, cross-chain interactions, or zero-knowledge roll-up transactions—these introduce fresh layers of complexity that demand continuous adaptation. For now, it’s my sub-ledger of choice.
Ensuring the completeness of blockchain data in sub-ledgers remains a challenge, requiring both deep technical knowledge and robust tools. We’ve examined key discrepancies, from internal transactions to missing wallets, and provided practical solutions. As blockchain technology evolves, new complexities will emerge, necessitating continuous adaptation. Cryptoworth, with its ability to aggregate data across 200+ blockchains, remains a strong option for tackling these challenges.