Clear Standards, Clear Future: Why Verifiable Performance Benchmarks Are Critical For Our Industry

Aptos Performance Metrics poster artwork

Aptos Foundation is putting forth the first, fully reproducible, performance benchmark test to kick off an industry-wide conversation around the definition of ‘performance’ with an eye toward consensus.


  • There is an urgent need for verifiable, real-world, performance benchmarks due to a lack of transparency and accountability in Web3.

  • The Aptos Foundation has developed the first benchmark of its reproducible benchmark suite and put forth a definition for terms such as a transaction and peak performance in order to properly measure performance.

  • Aptos encourages and recognizes the need for ongoing community input to establish the most accurate standards.

  • This technical breakdown dives into how Block-STM and Quorum Store are driving Aptos Network’s performance and why horizontal scaling is Aptos’ future.

The industry needs clear standards

As the Web3 landscape grows rapidly, assessing performance across numerous blockchains and scaling solutions poses new challenges for developers and users alike – namely, determining credibility in a burgeoning industry. 

Trust is built through transparency and accountability. Yet, with no standardized method of evaluation established today, this poses an urgent need for verifiable, real-world benchmarks.

Aptos Foundation is putting forth the first, fully reproducible, performance benchmark test to kick off an industry-wide conversation around the definition of ‘performance’ with an eye toward consensus. Over time, Aptos’ goal is to work with our peers to evolve this framework into the blockchain equivalent of TPC benchmarks.

Before diving into the details of the test, we need to first define the standard measurement of blockchain performance: a transaction.

What is a Transaction?

‘Transactions per second’ is a metric commonly used to evaluate the performance and scalability of a blockchain network. It can be difficult to quantify this metric as many networks define their own transactions differently– making it challenging to assess performance directly.

At Aptos, we believe that a transaction is a user-signed sequence of one or more operations, all served as a single logical unit of work. For example, 100 actions signed once by a user is one transaction. Transactions added automatically for the correct operation of the system — for example, votes, metadata or other artifacts are system transactions, not user transactions. Transactions may be short and simple or long and complex and that’s why benchmarks that capture common patterns are essential to understanding throughput of different use cases.

By extension of this definition, one key measure of success of a blockchain is the ‘peak sustained performance’ the network can maintain without downtime. So, how do you ensure the network is credibly measuring this sustained throughput? We were tasked with conducting a benchmark test in collaboration with a legacy web2 company in order to assess throughput.

Presenting the First Fully Reproducible Performance Benchmark Test 

To achieve an end-to-end, real world setup for this evaluation, we set up a full network of nodes with a complete node stack running on them. To make it as close to a real-world setup as possible, we modeled it closely to the mainnet network.

The network consisted of 100 `t2d-standard-48` nodes on GCP with 48 vcores and 64 GB of Memory, physically spread across three regions (US, Asia, and Europe). Maximum egress bandwidth for these nodes is 10 Gbps and the round trip latency between the nodes varies from 120ms to 250ms depending on the region. This setup closely replicates mainnet conditions of network latency and reliability. We test that consensus can scale to a meaningful number of nodes. A transaction load generator is written, which generates coin transfer transactions amongst random account pairs with a large working set and reasonable conflicts.

To ensure that the benchmark is repeatable and can be run independently, we published a repository that contains all the necessary tools and runbooks for running the benchmark on GCP. This includes everything necessary to set up machines and nodes, initialize the network, run the load test, and view the results. As we explore Web3 solutions for gaming, social, and other industries, Aptos aims to expand this generalized benchmark and build out a suite of benchmark tests that apply to common industry use cases.With this setup, we were able to verify an end-to-end sustained TPS of 20k for 30 minutes. Let’s take a look under the hood to see how a performant blockchain is designed and how these initial results set the stage for significant performance gains through horizontal scaling.

How to Build a Performant Blockchain

A typical blockchain stack consists of three main components: consensus, execution, and storage.

  • The consensus layer receives incoming transactions, and is responsible for ensuring that all nodes within the network agree on a particular order of transactions.

  • The execution layer takes the current state and the incoming transactions in the order agreed upon by the consensus, and is responsible for processing smart contracts and executing transactions.

  • The storage layer is responsible for persisting all of the data associated with the blockchain, including the state of the ledger and any associated smart contract data. It provides the current state to the execution, and updates the state based on the execution results.

To design a high-performance blockchain and support a large number of transactions per second while maintaining low latency, it is crucial to ensure that all of the components can scale to support high throughput and low latency.

As Aptos Foundation explored how to best tackle performance, we collaborated with the community on distinct and novel strategies for consensus, execution and storage:

  • Consensus:

  • Quorum Store, Aptos’ Narwhal implementation, decouples data from metadata. This allows data dissemination to happen outside of the critical path of consensus, enabling it to be very efficient and scalable. Developers are in the final stages of deploying Quorum Store to mainnet.

  • Execution:

  • Block-STM, Aptos’ parallel execution engine, uses a novel approach to combine Software Transactional Memory (STM) with optimistic concurrency control to execute transactions in parallel and validate them post-execution and re-execute them if needed.

  • Storage:

  • Aptos’ storage approach uses a combination of persisted and in-memory, lock-free sparse Merkle tree implementation, which is specifically tailored to work with Block-STM for caching and parallelization.

Looking to the future as Aptos Network’s throughput continues to scale, our team has identified horizontal scaling as the most effective approach. By scaling resources in a way similar to cloud computing, orders of magnitude more resources can be brought into the system, which is crucial for Web3 to be adopted for mainstream use cases.

Scaling the storage layer is the most pressing priority, as it is both the bottleneck and the component that can be most easily expanded to multiple disks or machines. Additionally, there are also opportunities to horizontally scale Quorum Store-based consensus. As demonstrated in the Narwhal paper, when data dissemination is decoupled from metadata ordering, consensus can exceed 600k tps with multiple worker machines. Lastly, while the Block-STM implementation has demonstrated that it can allow the execution component to lead the Aptos stack by achieving 160k TPS on a single 32-core machine, this will eventually become a bottleneck as storage and consensus scale. To prepare, developers are exploring sharding as a way to horizontally scale execution across multiple machines.

Building the Future of Web3 Together

At the Aptos Foundation, we are dedicated to upholding the core Web3 tenet of transparency: being open about all aspects of blockchain design and network operations. Furthering this commitment was the driving force behind the development of this reproducible performance benchmark. This setup is the first of its kind and we hope that it may serve as a lens through which conversations can be held on coming to a shared understanding of performance.

The success of Web3 is dependent on collaboration and shared standards. We encourage the community to independently verify the results and look to our peers to challenge us to improve these methods as we work together to move the industry forward.