Skip to content

Parallel Execution

Kaizen Core implements a high-performance parallel execution pipeline to maximize throughput while maintaining determinism.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Parallel Execution Pipeline                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌────────────┐    ┌───────────────┐    ┌──────────────────┐   │
│   │  RPC/WS    │───▶│  Verification │───▶│    Execution     │   │
│   │  Ingress   │    │  (rayon)      │    │    Scheduler     │   │
│   └────────────┘    └───────────────┘    └──────────────────┘   │
│                              │                    │             │
│                        Parallel Sig         Parallel Batch      │
│                          Recovery             Execution         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Key Components

1. Parallel Signature Verification

Signature recovery is CPU-bound and embarrassingly parallel. The VerificationPipeline uses rayon to parallelize across all available CPU cores:

// Parallel signature verification
let verified: Vec<VerifiedTx> = txs
    .par_iter()
    .filter_map(|(tx, bytes)| {
        auth.recover_signer(tx).ok()
            .filter(|&signer| signer == tx.from)
            .map(|signer| VerifiedTx { tx, signer, bytes })
    })
    .collect();
Performance:
Batch SizeParallelSequentialSpeedup
100 txs229K TPS43K TPS5.3x
500 txs308K TPS36K TPS8.6x
1000 txs318K TPS40K TPS8.0x
2000 txs317K TPS35K TPS9.0x

2. Aggregate-Based Scheduling

The ExecutionScheduler groups transactions into non-conflicting batches using the AggregateAccess trait:

pub trait AggregateAccess {
    fn required_aggregates(&self) -> HashSet<AggregateId>;
}

Transactions touching disjoint state can execute in parallel. The scheduler:

  1. Analyzes read/write sets of each transaction
  2. Groups non-conflicting transactions into batches
  3. Batches execute in parallel, sequentially between batches

3. ParallelStateManager

The ParallelStateManager provides thread-safe state access without requiring &mut self:

// Thread-safe state access
let state = ParallelStateManager::from_storage(storage, read_version, write_version);
 
// Can be cloned and used across threads
state.set_balance(addr, 1000)?;  // Thread-safe via DashMap

The underlying RocksDbStorage uses DashMap for pending updates, enabling lock-free concurrent access.

4. TrueParallelExecutor

The TrueParallelExecutor combines all components for true parallel execution:

let executor = TrueParallelExecutor::new(config);
let result = executor.execute_parallel(
    storage,
    verified_txs,
    read_version,
    write_version,
);
True Parallel Performance:
Batch SizeParallelSequentialSpeedup
100 txs48K TPS33K TPS1.5x
500 txs52K TPS20K TPS2.6x
1000 txs62K TPS33K TPS1.9x
2000 txs62K TPS33K TPS1.9x

Determinism Guarantee

Even with parallel execution, results are fully deterministic:

  1. Batching is deterministic - Same transactions produce same batches
  2. Within-batch order is deterministic - Results collected in original order
  3. State writes are deterministic - DashMap sorted by key hash before JMT commit

Tile Infrastructure

For specialized workloads, the Tile abstraction provides dedicated threads with optional CPU affinity:

let tile = Tile::new(
    "verification",
    Some(core_id),  // Pin to specific CPU core
    |rx, tx| {
        for tx in rx {
            // Process on dedicated thread
        }
    },
);

Benefits:

  • Consistent latency (no context switching)
  • Cache locality
  • Isolation from other workloads

Configuration

let config = ExecutionConfig {
    enable_parallel_verification: true,
    enable_parallel_execution: true,
    verification_threads: 0,  // 0 = use all cores
    max_batch_size: 1000,
};

When Parallelism Helps

ScenarioParallelismNotes
Independent transfersHighDifferent senders = different state
Same sender, multiple TXsNoneMust execute sequentially
RFQ submissionsModerateDepends on user/solver overlap
Oracle updatesNoneSingle feeder per pair

Benchmarks

Run benchmarks with:

cargo bench -p kaizen-engine --bench tps -- parallel

Key benchmark groups:

  • parallel_verification - Signature verification speedup
  • parallel_scheduler - Batch scheduling overhead
  • parallel_true_execution - True parallel execution
  • parallel_full_pipeline - End-to-end pipeline