Parallel Execution
Kaizen Core implements a high-performance parallel execution pipeline to maximize throughput while maintaining determinism.
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ Parallel Execution Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌───────────────┐ ┌──────────────────┐ │
│ │ RPC/WS │───▶│ Verification │───▶│ Execution │ │
│ │ Ingress │ │ (rayon) │ │ Scheduler │ │
│ └────────────┘ └───────────────┘ └──────────────────┘ │
│ │ │ │
│ Parallel Sig Parallel Batch │
│ Recovery Execution │
│ │
└─────────────────────────────────────────────────────────────────┘Key Components
1. Parallel Signature Verification
Signature recovery is CPU-bound and embarrassingly parallel. The VerificationPipeline uses rayon to parallelize across all available CPU cores:
// Parallel signature verification
let verified: Vec<VerifiedTx> = txs
.par_iter()
.filter_map(|(tx, bytes)| {
auth.recover_signer(tx).ok()
.filter(|&signer| signer == tx.from)
.map(|signer| VerifiedTx { tx, signer, bytes })
})
.collect();| Batch Size | Parallel | Sequential | Speedup |
|---|---|---|---|
| 100 txs | 229K TPS | 43K TPS | 5.3x |
| 500 txs | 308K TPS | 36K TPS | 8.6x |
| 1000 txs | 318K TPS | 40K TPS | 8.0x |
| 2000 txs | 317K TPS | 35K TPS | 9.0x |
2. Aggregate-Based Scheduling
The ExecutionScheduler groups transactions into non-conflicting batches using the AggregateAccess trait:
pub trait AggregateAccess {
fn required_aggregates(&self) -> HashSet<AggregateId>;
}Transactions touching disjoint state can execute in parallel. The scheduler:
- Analyzes read/write sets of each transaction
- Groups non-conflicting transactions into batches
- Batches execute in parallel, sequentially between batches
3. ParallelStateManager
The ParallelStateManager provides thread-safe state access without requiring &mut self:
// Thread-safe state access
let state = ParallelStateManager::from_storage(storage, read_version, write_version);
// Can be cloned and used across threads
state.set_balance(addr, 1000)?; // Thread-safe via DashMapThe underlying RocksDbStorage uses DashMap for pending updates, enabling lock-free concurrent access.
4. TrueParallelExecutor
The TrueParallelExecutor combines all components for true parallel execution:
let executor = TrueParallelExecutor::new(config);
let result = executor.execute_parallel(
storage,
verified_txs,
read_version,
write_version,
);| Batch Size | Parallel | Sequential | Speedup |
|---|---|---|---|
| 100 txs | 48K TPS | 33K TPS | 1.5x |
| 500 txs | 52K TPS | 20K TPS | 2.6x |
| 1000 txs | 62K TPS | 33K TPS | 1.9x |
| 2000 txs | 62K TPS | 33K TPS | 1.9x |
Determinism Guarantee
Even with parallel execution, results are fully deterministic:
- Batching is deterministic - Same transactions produce same batches
- Within-batch order is deterministic - Results collected in original order
- State writes are deterministic - DashMap sorted by key hash before JMT commit
Tile Infrastructure
For specialized workloads, the Tile abstraction provides dedicated threads with optional CPU affinity:
let tile = Tile::new(
"verification",
Some(core_id), // Pin to specific CPU core
|rx, tx| {
for tx in rx {
// Process on dedicated thread
}
},
);Benefits:
- Consistent latency (no context switching)
- Cache locality
- Isolation from other workloads
Configuration
let config = ExecutionConfig {
enable_parallel_verification: true,
enable_parallel_execution: true,
verification_threads: 0, // 0 = use all cores
max_batch_size: 1000,
};When Parallelism Helps
| Scenario | Parallelism | Notes |
|---|---|---|
| Independent transfers | High | Different senders = different state |
| Same sender, multiple TXs | None | Must execute sequentially |
| RFQ submissions | Moderate | Depends on user/solver overlap |
| Oracle updates | None | Single feeder per pair |
Benchmarks
Run benchmarks with:
cargo bench -p kaizen-engine --bench tps -- parallelKey benchmark groups:
parallel_verification- Signature verification speedupparallel_scheduler- Batch scheduling overheadparallel_true_execution- True parallel executionparallel_full_pipeline- End-to-end pipeline
