Skip to content

Changelog

All notable changes to Kaizen Core are documented here.

[Unreleased] - 2025-12-11

Added

Parallel Execution Pipeline

Added a high-performance parallel execution pipeline for improved transaction throughput.

Key Components:
  1. Parallel Signature Verification: Uses rayon to parallelize ECDSA signature recovery across all CPU cores.

  2. Aggregate-Based Scheduling: Groups non-conflicting transactions into parallel batches using the AggregateAccess trait.

  3. ParallelStateManager: Thread-safe state access via DashMap without requiring &mut self.

  4. TrueParallelExecutor: Combines all components for true parallel transaction execution.

Performance Results:
ComponentBatch SizeParallelSequentialSpeedup
Signature Verification1000 txs318K TPS40K TPS8.0x
True Parallel Execution1000 txs62K TPS33K TPS1.9x
Full Pipeline1000 txs91K TPS33K TPS2.8x
New Exports:
  • kaizen_engine::parallel::* - Parallel execution infrastructure
  • kaizen_state::parallel::ParallelStateManager - Thread-safe state access
Files Changed:
  • crates/engine/src/parallel.rs - Parallel execution pipeline
  • crates/state/src/parallel.rs - Thread-safe state manager
  • crates/engine/benches/tps.rs - Added parallel execution benchmarks

Documentation: See Architecture / Parallel Execution for details.


[Previous] - 2025-12-08

Fixed

Settler Sync Error on Pruned Blocks

Fixed settler failing to sync when write-node has pruned historical blocks.

Root Cause: The EventStreamClient in sidecar expected strictly sequential block heights. When the write-node had pruned blocks (e.g., only keeping last 50K blocks), the settler would request events from block 1 but receive events starting from the earliest available block (e.g., 28771), causing a height mismatch error.

Previous Behavior:
Height mismatch: expected 1, got 28771
Event stream error, reconnecting in 5s...
(repeat forever)

Solution: Changed event stream sync logic to accept monotonically increasing heights with gaps. Instead of requiring strict sequential heights, we now:

  • Accept any height that's greater than the last processed height
  • Log when blocks are skipped due to pruning
  • Continue syncing from wherever the server starts
// Before: Strict sequential check
if batch.height != expected_height {
    return Err("Height mismatch");
}
 
// After: Monotonic increasing check with gap tolerance
if batch.height <= last_height {
    return Err("Height not increasing");
}
// Gaps allowed, just log them
Files Changed:
  • crates/app/src/sync/sidecar.rs - Relaxed height check in EventStreamClient::event_loop()

Added

RocksDB Native Prometheus Metrics

Added export of RocksDB internal statistics as Prometheus gauges for better storage observability.

New Metrics:
MetricDescription
kaizen_rocksdb_estimate_num_keysEstimated number of keys in DB
kaizen_rocksdb_live_data_size_bytesSize of live data
kaizen_rocksdb_sst_files_size_bytesTotal SST file size
kaizen_rocksdb_memtable_size_bytesCurrent memtable size
kaizen_rocksdb_block_cache_usage_bytesBlock cache memory usage
kaizen_rocksdb_block_cache_pinned_bytesPinned block cache memory
kaizen_rocksdb_num_running_compactionsActive compaction jobs
kaizen_rocksdb_num_running_flushesActive flush jobs
kaizen_rocksdb_pending_compaction_bytesBytes pending compaction
Implementation:
  • Added export_metrics() method to RocksDbStorage
  • Periodic export via background task in Server (every 5 seconds when metrics enabled)
  • Uses RocksDB's property API to fetch internal statistics
Files Changed:
  • crates/state/src/rocksdb_storage.rs - Added export_metrics() method
  • crates/state/src/manager.rs - Added export_metrics() method
  • crates/app/src/server.rs - Added periodic metrics export task
  • crates/metrics/src/lib.rs - Added individual metric recording functions (record_rocksdb_num_keys(), record_rocksdb_live_data_size(), etc.)

JMT Cache Prometheus Metrics

Added metrics for JMT version cache performance monitoring.

New Metrics:
MetricDescription
kaizen_jmt_cache_sizeNumber of entries in version cache
kaizen_jmt_cache_hits_totalTotal cache hits
kaizen_jmt_cache_misses_totalTotal cache misses
Implementation:
  • Cache hits/misses recorded in JmtStorage::get_value_option()
  • Cache size recorded after each commit in RocksDbStorage::commit()
Files Changed:
  • crates/state/src/jmt_storage.rs - Added hit/miss recording
  • crates/state/src/rocksdb_storage.rs - Added cache size recording
  • crates/metrics/src/lib.rs - Added cache metric functions

Grafana Dashboard: Storage Performance Panels

Added new panels to the Kaizen Performance dashboard for RocksDB and JMT cache monitoring.

New Sections:
  1. πŸ’Ύ Storage Performance (RocksDB & JMT Cache)
    • JMT Cache Size (per node)
    • JMT Cache Hit Rate
    • RocksDB Database Size
  2. πŸ—„οΈ RocksDB Internals (Native Stats)
    • Block Cache & Memtable Usage
    • Running Compactions & Flushes
    • Pending Compaction & Live Data
Files Changed:
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-performance.json - Added new panels

Performance

RocksDB Storage Optimizations

Applied comprehensive storage optimizations for improved throughput, lower latency, and reduced disk usage.

1. Column Family-Specific Options

Each column family now has optimized RocksDB options based on its access patterns:

Column FamilyOptimization
jmt_nodesBloom filter (10-bit), point lookup optimized
jmt_valuesPrefix bloom (32-byte key_hash), seek optimized
stale_jmt_nodesLower memory, version prefix for range scans
blocksZSTD compression, larger blocks for sequential reads
block_hashesBloom filter for hash↔height lookups
metaBloom filter for point lookups
2. JMT Version Cache

Added in-memory cache for latest version per key_hash to avoid disk seeks for current state reads.

// Fast path: cache hit for current state
jmt_version_cache: Arc<DashMap<KeyHash, VersionCacheEntry>>
  • Cache auto-populates on slow-path reads
  • Pruner invalidates cache entries after pruning
  • Zero impact on state consistency (read-only optimization)
3. Range Delete for Stale Node Pruning

Optimized stale JMT node pruning from O(n) individual deletes to O(1) range delete:

// Before: Individual deletes
for key in stale_indices { batch.delete_cf(cf_stale, key); }
 
// After: Single range delete
self.db.delete_range_cf(cf_stale, &start_key, &end_key)?;
4. WAL Tuning

New configuration options for Write-Ahead Log management:

[storage.rocksdb]
max_total_wal_size_mb = 1024  # Prevents unbounded WAL growth
wal_ttl_seconds = 3600        # Auto-delete old WAL files
5. Statistics for Monitoring

Optional RocksDB statistics collection for performance analysis:

[storage.rocksdb]
enable_statistics = true  # ~5-10% overhead

Access via storage.statistics_string() for block cache hit rates, compaction stats, bloom filter effectiveness.

New Configuration Options:
[storage.rocksdb]
write_buffer_size_mb = 128      # Write buffer per CF
max_write_buffer_number = 4     # Max write buffers before flush
block_cache_size_mb = 512       # Shared LRU block cache
max_background_jobs = 4         # Compaction/flush parallelism
enable_compression = true       # LZ4 for hot, ZSTD for cold
bloom_filter_bits = 10          # Bloom filter bits per key
max_total_wal_size_mb = 1024    # Max WAL size before recycling
wal_ttl_seconds = 0             # WAL file TTL (0 = disabled)
enable_statistics = false       # RocksDB internal stats
Presets:
  • RocksDbAppConfig::default() - Balanced for general use
  • RocksDbAppConfig::production() - High-performance (256MB buffers, 1GB cache, stats enabled)
Files Changed:
  • crates/state/src/rocksdb_options.rs - New RocksDB configuration module
  • crates/state/src/rocksdb_storage.rs - CF-specific options, JMT cache wiring
  • crates/state/src/jmt_storage.rs - Version cache support, fast-path reads
  • crates/state/src/pruner.rs - Range delete optimization, cache invalidation
  • crates/state/src/manager.rs - Added with_config() constructor
  • crates/state/src/types.rs - Added StorageConfig composite type
  • crates/app/src/config.rs - Exposed RocksDB options in app config
  • crates/app/src/server.rs - Use new storage config

State Consistency: All optimizations are internal implementation details. State roots remain identical across nodes. Verified with existing test suite (55 state tests, 52 engine tests passed).


Fixed

Tester: WebSocket Race Condition on Wallet Switch

Fixed "WebSocket not connected" error when switching to a different wallet.

Root Cause: When switching wallets, the mainWalletAddress changed and triggered the WebSocket subscription effect. However, the React state isWebSocketConnected was still true from the previous connection (hadn't propagated yet), while the actual client.ws instance was already null or disconnected. The guard check passed but subscribeUserTheses threw.

Solution: Added synchronous check on the client's actual WebSocket state (client.isWebSocketConnected) in addition to the React state check.

// Before: Only React state check
if (!client || !isWebSocketConnected || !mainWalletAddress || isMockMode) {
 
// After: Also check client's internal state
if (
  !client ||
  !isWebSocketConnected ||
  !client.isWebSocketConnected ||  // ← Catches race condition
  !mainWalletAddress ||
  isMockMode
) {
Files Changed:
  • apps/tester/src/hooks/use-thesis-sync.ts - Added client.isWebSocketConnected guard
  • apps/tester/src/hooks/use-price-stream.ts - Same defensive fix applied

Changed

Documentation Restructure

Reorganized docs for better agent-friendliness and task-oriented navigation.

New Structure:
docs/pages/
β”œβ”€β”€ introduction/     ← What is Kaizen
β”œβ”€β”€ api/              ← Quick lookup (NEW)
β”‚   β”œβ”€β”€ rpc.mdx       ← JSON-RPC + WebSocket
β”‚   β”œβ”€β”€ transactions.mdx
β”‚   └── errors.mdx
β”œβ”€β”€ sdk/              ← TypeScript SDK
β”œβ”€β”€ deployment/       ← How to run (NEW)
β”‚   β”œβ”€β”€ docker.mdx
β”‚   β”œβ”€β”€ configuration.mdx
β”‚   └── monitoring.mdx
β”œβ”€β”€ architecture/     ← How it works (MERGED)
β”‚   β”œβ”€β”€ overview.mdx
β”‚   β”œβ”€β”€ stf.mdx
β”‚   β”œβ”€β”€ block-production.mdx
β”‚   β”œβ”€β”€ settlement.mdx
β”‚   β”œβ”€β”€ oracle.mdx
β”‚   └── storage.mdx
β”œβ”€β”€ components/       ← Individual services
└── reference/        ← Misc reference
Key Changes:
BeforeAfterWhy
operations/ + advanced/deployment/Task-oriented
core-concepts/ + execution/architecture/Related content merged
reference/transactions.mdxapi/transactions.mdxBetter discoverability
API buried in operations/api/ sectionQuick lookup
Removed:
  • docs/pages/core-concepts/ - Merged into architecture/
  • docs/pages/execution/ - Merged into architecture/
  • docs/pages/operations/ - Split into api/ and deployment/
  • docs/pages/advanced/ - Moved to deployment/ and architecture/
  • docs/pages/components/tester.mdx - Removed from sidebar (demo app)
Files Changed:
  • docs/vocs.config.ts - New sidebar structure
  • docs/pages/api/* - New API reference section
  • docs/pages/deployment/* - New deployment section
  • docs/pages/architecture/* - Merged architecture section
  • Multiple cross-reference fixes across docs

README Cleanup

Simplified README.md to focus on quick start, pointing to docs for details.

Changes:
  • Fixed outdated binary name (kaizen-app β†’ kaizen-node)
  • Updated project structure (added missing apps)
  • Simplified to ~125 lines (was ~400)
  • Added link to docs.miyao.ai

Fixed

Prometheus Histogram Metrics Export

Fixed histogram metrics being exported as summaries instead of proper histograms, causing histogram_quantile() queries to fail in Grafana.

Root Cause: The metrics-exporter-prometheus crate defaults to exporting histograms as summaries (with quantile labels). Grafana's histogram_quantile() function requires proper histogram format with _bucket suffix.

Solution: Explicitly configured PrometheusBuilder with histogram buckets:

const LATENCY_BUCKETS: &[f64] = &[
    0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0,
];
 
PrometheusBuilder::new()
    .set_buckets(LATENCY_BUCKETS)
    .set_buckets_for_metric(Matcher::Suffix("tx_count"), COUNT_BUCKETS)
    .set_buckets_for_metric(Matcher::Suffix("_amount"), AMOUNT_BUCKETS)
Metrics Now Working:
  • kaizen_tx_execution_duration_seconds - TX execution latency
  • kaizen_tx_validation_duration_seconds - TX validation latency
  • kaizen_block_production_duration_seconds - Block production time
  • kaizen_block_tx_count - Transactions per block
  • kaizen_rfq_bet_amount / kaizen_rfq_payout - RFQ amounts
Files Changed:
  • crates/metrics/src/prometheus.rs - Added bucket configuration

RFQ Settlement Status Labels

Fixed inconsistent status labels for kaizen_rfq_settled_total metric between engine and settler.

Previous Behavior: Engine used debug format (SettledUserWin, SettledSolverWin) while settler used snake_case (user_wins, solver_wins).

Solution: Standardized both to use user_wins / solver_wins labels.

Files Changed:
  • crates/engine/src/lib.rs - Changed status label format to match settler

Grafana Dashboard Cleanup

Removed uninstrumented metric panels from dashboards and fixed broken queries.

Removed Panels (not instrumented in code):
DashboardRemoved
OverviewActive RFQs, Mempool, Sync Clients, Pending W/D, Uptime, Oracle metrics, Memory, DB Size
DebugMempool Deep Dive section, Storage Debug section
CompareTX Latency comparisons, Memory comparisons, DB Size Growth
BusinessRenamed "Won/Lost" to "User Wins/Solver Wins"
Fixed Queries:
  • Sync Lag: Changed to scalar(kaizen_block_height) - kaizen_sync_height for correct label matching
  • Business dashboard: Updated status labels from won/lost to user_wins/solver_wins
Files Changed:
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-overview.json
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-debug.json
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-compare.json
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-business.json

Performance

State Layer Optimizations

Applied drop-in performance optimizations to the state layer for improved throughput and lower latency.

1. DashMap for Pending Updates

Replaced RwLock<HashMap> with lock-free DashMap for pending state updates.

// Before: Lock contention on every read/write
pending_updates: Arc<RwLock<HashMap<KeyHash, Option<Vec<u8>>>>>
 
// After: Lock-free concurrent access
pending_updates: Arc<DashMap<KeyHash, Option<Vec<u8>>>>

Impact: Eliminates lock contention during concurrent state operations within a block.

2. JMT Value Lookup O(1)

Optimized JMT value lookup from O(n) iteration to O(1) using RocksDB's reverse seek.

// Before: Iterate through ALL versions
for item in prefix_iterator { ... }  // O(n)
 
// After: Direct seek to target version
iterator_cf(IteratorMode::From(&seek_key, Direction::Reverse))
iter.next()  // O(1)

Impact: State reads now constant-time regardless of version history depth.

3. Hot Path Caching

Added in-memory caching for frequently-read, rarely-written values.

Cached ValueRead FrequencyWrite Frequency
SystemPausedEvery transactionAdmin only
GlobalConfigEvery RFQ submitAdmin only

Cache is invalidated on begin_block() and cleared on writes.

Benchmark Results:
MetricBeforeAfter
State commit (read-node)~3-5ms~1ms
JMT value lookupO(n)O(1)
Dependencies Added:
dashmap     = "6.1"
parking_lot = "0.12"
Files Changed:
  • crates/state/src/rocksdb_storage.rs - DashMap for pending_updates
  • crates/state/src/jmt_storage.rs - Reverse seek optimization
  • crates/state/src/manager.rs - Hot path caching
  • crates/state/Cargo.toml - Added dashmap, parking_lot
  • Cargo.toml - Workspace dependencies

State Consistency: All changes are internal implementation details. State roots remain identical between nodes. Verified with 189k+ blocks synced with 0 mismatches.


Added

Multi-Chain Withdrawal Support

Withdrawals now support specifying both destination address and destination chain ID, enabling proper multi-chain bridge functionality.

Previous Behavior: WithdrawTx only had a single destination field (Address type), which was incorrectly repurposed as chain ID in the bridge relayer.

New Behavior:
// Before
pub struct WithdrawTx {
    pub amount: u64,
    pub destination: Address,
}
 
// After
pub struct WithdrawTx {
    pub amount: u64,
    pub destination_address: Address,
    pub destination_chain_id: u64,
}
Supported Chains:
Chain IDNetwork
42161Arbitrum
421614Arbitrum Sepolia
8453Base
84532Base Sepolia
SDK Changes:
// Before
withdraw(amount, destination);
 
// After
withdraw(amount, destinationAddress, destinationChainId);
Tester UI Updates:
  • Chain selector in Bridge Modal now sets destinationChainId
  • Pending Withdrawals tab shows destination address with chain name
  • Example: 0x1234...abcd (Base Sepolia)
CLI Changes:
# Before
kata tx withdraw --destination 0x... --amount 1000000
 
# After
kata tx withdraw --destination 0x... --chain-id 84532 --amount 1000000
Files Changed:
  • crates/tx/src/payload.rs - WithdrawTx with new fields
  • crates/types/src/withdrawal.rs - WithdrawalRequest with new fields
  • crates/types/src/event.rs - Event::WithdrawRequested with new fields
  • crates/engine/src/executors/bridge.rs - Updated executor
  • crates/app/src/rpc/types.rs - RPC response with new fields
  • crates/app/src/indexer/mod.rs - SQL schema and event handling
  • apps/cli/src/commands/tx.rs - Added --chain-id flag
  • sdk/src/types.ts - Updated TypeScript types
  • sdk/src/schema/tx.ts - Updated Borsh schema
  • sdk/src/signer.ts - Updated EIP-712 type hash
  • apps/tester/src/components/BridgeModal.tsx - Chain selection
  • apps/tester/src/components/ThesisPanel.tsx - Display chain name
  • bridge/src/relayer/server.ts - Use correct fields

Breaking Change: Borsh encoding changed for WithdrawTx. Requires coordinated upgrade of all components.


Settings Modal with Theme Customization

Added a Settings modal accessible from the sidebar, providing user-configurable options for URLs and visual themes.

Features:
  • URL Configuration: Customize Solver URL, Node RPC URL, and Node WebSocket URL
  • Theme Selection: 6 themes each for Normal Mode and Degen Mode
  • Persistence: Settings automatically saved to localStorage
  • Reset: One-click reset to default settings
Normal Mode Themes:
ThemePrimary ColorDescription
Emerald#22c55eClassic green (default)
Cyber#06b6d4Cyan/teal
Sunset#f59e0bAmber/gold
Arctic#a5b4fcLavender/indigo
Neon#e879f9Pink/fuchsia
Monochrome#d4d4d8Gray/zinc
Degen Mode Themes:
ThemePrimary ColorDescription
Inferno#f97316Orange (default)
Plasma#ef4444Red
Blaze#fbbf24Yellow/amber
Volcanic#fb7185Rose/pink
Supernova#c026d3Purple/fuchsia
Ember#fdba74Peach/orange light
Implementation:
  • Theme colors applied via CSS variables (--theme-primary, --theme-primary-rgb, --theme-bg, etc.)
  • useThemeApplier hook updates document root CSS variables on theme change
  • All major UI components updated to use theme variables instead of hardcoded colors
Files Changed:
  • apps/tester/src/stores/use-settings-store.ts - New store with theme definitions and persistence
  • apps/tester/src/components/SettingsModal.tsx - New settings modal component
  • apps/tester/src/hooks/use-theme.ts - Theme applier and color hooks
  • apps/tester/src/hooks/use-config.ts - Dynamic config based on settings
  • apps/tester/src/styles/globals.css - CSS variable definitions
  • apps/tester/src/pages/_app.tsx - Theme applier integration
  • Multiple components updated to use CSS variables

Pending Margin Reservation for Box Creation

Box creation now reserves margin from available balance, preventing users from creating multiple boxes that exceed their total balance.

Previous Behavior: Users could draw multiple boxes in quick succession. Each box only checked against total balance, not accounting for boxes still in QUOTING status. This allowed creating boxes whose combined margins exceeded the user's actual balance, causing on-chain failures.

New Behavior:
  • Available balance = Total balance - Sum of all QUOTING theses' margins
  • Box creation blocked if available balance < bet amount
  • Error message includes pending amount: "Insufficient balance: X USDC available (Y pending)"
  • Double-check in executeDegenThesis prevents race conditions
Affected Flows:
ModeCheck Location
Degen ModehandleMouseDown + executeDegenThesis in Chart.tsx
Modal ModeuseThesisValidation hook (balance check)
Files Changed:
  • apps/tester/src/components/Chart.tsx - Added pending margin calculation in handleMouseDown and executeDegenThesis
  • apps/tester/src/hooks/use-thesis-validation.ts - Balance check now subtracts pending QUOTING margins

Pending Withdrawals Tab in Tester

Added a new "Pending Withdrawals" tab to the tester footer panel, showing user's pending withdrawal requests.

Features:
  • Displays withdrawal ID, status, amount, destination address, and request timestamp
  • Status indicator with animated pulse for pending items
  • Loading state while fetching data
  • Empty state with guidance to request withdrawal from bridge modal
  • Auto-refreshes every 10 seconds
SDK Changes:
  • Added getWithdrawalsByUser() method to fetch paginated withdrawal IDs for a user
  • Added getPendingWithdrawalsForUser() convenience method that filters unprocessed withdrawals by user
Files Changed:
  • sdk/src/rpc.ts - Added getWithdrawalsByUser()
  • sdk/src/client.ts - Added getWithdrawalsByUser(), getPendingWithdrawalsForUser()
  • apps/tester/src/hooks/use-withdrawals-query.ts - New hook for fetching pending withdrawals
  • apps/tester/src/lib/query-keys.ts - Added withdrawals.pending query key
  • apps/tester/src/components/ThesisPanel.tsx - Added WithdrawalsTab component and tab button

Modal UX Improvements

Improved modal interaction patterns across tester app.

CheatCodePanel:
  • Clicking outside the modal (backdrop) now closes it
BridgeModal:
  • Clicking outside the modal (backdrop) now closes it
  • Auto-closes after 1.5 seconds when transaction is successful (deposit confirmed or withdrawal submitted)
Files Changed:
  • apps/tester/src/components/CheatCodePanel.tsx - Added backdrop click handler
  • apps/tester/src/components/BridgeModal.tsx - Added backdrop click handler and auto-close effect

RPC Pagination Response Format

Updated thesis-related RPC methods to return proper paginated responses with metadata.

Previous Behavior: Methods like kaizen_getThesesByUser returned a plain array Thesis[], making it impossible to know total count or implement pagination UI.

New Behavior: Returns PaginatedResponse<Thesis> with full pagination metadata:

{
  "items": [...],
  "total": 150,
  "limit": 100,
  "offset": 0,
  "hasMore": true
}
Updated Methods:
MethodChange
kaizen_getThesesByUserReturns PaginatedResponse<Thesis>
kaizen_getThesesBySolverReturns PaginatedResponse<Thesis>
kaizen_internal_getPendingThesesReturns PaginatedResponse<Thesis>
kaizen_internal_getThesesByStatusReturns PaginatedResponse<Thesis>
kaizen_internal_getThesesByPairReturns PaginatedResponse<Thesis>

Note: kaizen_getWithdrawalsByUser already returned PaginatedResponse<number> (withdrawal IDs).

SDK Changes:
  • Added PaginatedResponse<T> type export
  • Updated getThesesByUser(), getThesesBySolver(), getMyTheses() return types
  • Added getWithdrawalsByUser() method to SDK client
Files Changed:
  • crates/app/src/rpc/methods.rs - Updated RPC handlers
  • sdk/src/types.ts - Added PaginatedResponse<T> interface
  • sdk/src/rpc.ts - Updated return types, added getWithdrawalsByUser
  • sdk/src/client.ts - Updated return types, added withdrawal methods
  • sdk/src/index.ts - Exported PaginatedResponse
  • apps/tester/src/hooks/use-thesis-sync.ts - Updated to use response.items
  • apps/mock-solver/src/rpc.ts - Updated return type
  • apps/cli/src/rpc.rs - Added PaginatedResponse<T> type
  • apps/cli/src/commands/degen.rs - Updated to use response.items

Breaking Change: SDK methods now return PaginatedResponse<Thesis> instead of Thesis[]. Update callsites to access .items property.


Grafana Dashboard Suite

Expanded monitoring dashboards from 1 to 5 specialized views for different use cases.

New Dashboards:
DashboardUIDPurpose
Overviewkaizen-overviewHealth check, business metrics (slimmed down)
Performancekaizen-performanceDeep-dive latency, throughput analysis
Businesskaizen-businessTrading volume, win rates, bridge flows
Debugkaizen-debugError analysis, latency breakdown, logs
Comparekaizen-comparePeriod-over-period trend comparison

Kaizen Performance (/d/kaizen-performance):

  • TPS & throughput with success/fail breakdown
  • TX execution latency distribution (p50/p90/p95/p99)
  • TX latency by type (transfer, deposit, submit_thesis, etc.)
  • Block production phase breakdown (validation β†’ execution β†’ commit)
  • Storage IOPS and latency (state, RocksDB)
  • Sync performance and lag tracking
  • Pruning duration breakdown

Kaizen Business (/d/kaizen-business):

  • Total trading volume (USDC)
  • Trade outcomes (won/lost/cancelled/expired)
  • Hourly volume bars and win rate trends
  • Bet size distribution over time
  • Bridge deposit/withdrawal flows and net flow
  • Transaction type mix (pie chart)

Kaizen Debug (/d/kaizen-debug):

  • Error counters with color thresholds (failed TXs, rejections, sig failures)
  • Failed transactions by type
  • Mempool rejection reasons
  • Latency breakdown by TX type and RPC method
  • Mempool queue sizes and eviction rates
  • Storage IOPS and p99 latency
  • Integrated error logs from all services

Kaizen Compare (/d/kaizen-compare):

  • Today vs Yesterday vs Last Week overlays for TPS, latency, RFQ rate
  • Period-over-period % change stat panels
  • Block production and storage growth trends
  • Uses Prometheus offset for time-shifted queries
Overview Dashboard Optimization:
  • Reduced from ~2600 lines to ~500 lines (80% reduction)
  • Removed detailed latency panels (moved to Performance dashboard)
  • Added link to Performance dashboard for deep-dive
  • Focused on business metrics and high-level health
Files Changed:
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-overview.json - Slimmed down
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-performance.json - New
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-business.json - New
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-debug.json - New
  • docker/monitoring/grafana/provisioning/dashboards/json/kaizen-compare.json - New

JMT Node Pruning

Fixed unbounded disk growth across all node types (write, read-aggressive, read-archive) by implementing JMT (Jellyfish Merkle Tree) node pruning.

Problem: All three node types experienced identical disk growth rates regardless of pruning configuration. The aggressive and custom pruning modes only pruned JMT values, blocks, and snapshots - but not the JMT tree structure nodes themselves.

Root Cause: The TreeUpdateBatch.stale_node_index_batch from the JMT library was completely ignored. This batch tracks which tree nodes become obsolete at each version, enabling safe deletion of old nodes.

Solution:
  • New column family - CF_STALE_JMT_NODES stores stale node indices on each commit
  • Stale node tracking - Records (stale_since_version, node_key) for each obsolete node
  • Pruning implementation - prune_jmt_nodes() deletes nodes where stale_since_version < min_version_to_keep
Expected Behavior After Fix:
Node TypeDisk Growth
writeStabilizes at blocks_to_keep Γ— avg_block_size
read-aggressiveStabilizes at ~16 min of history
read-archiveGrows indefinitely (pruning disabled)
New Metrics:
  • kaizen_pruning_jmt_nodes_total - Total JMT nodes pruned
  • kaizen_pruning_jmt_nodes_duration_seconds - JMT node pruning duration
  • kaizen_pruning_blocks_duration_seconds - Block pruning duration
  • kaizen_pruning_snapshots_duration_seconds - Snapshot pruning duration
  • kaizen_pruning_jmt_values_duration_seconds - JMT values pruning duration
Files Changed:
  • crates/state/src/jmt_storage.rs - Added CF_STALE_JMT_NODES, stale index storage
  • crates/state/src/rocksdb_storage.rs - Create new CF on DB open
  • crates/state/src/pruner.rs - Implemented prune_jmt_nodes()
  • crates/state/src/block_storage.rs - Added jmt_nodes_pruned to PruneStats
  • crates/metrics/src/lib.rs - Added pruning timing metrics

Migration Note: Existing databases will automatically create the new column family on startup. However, historical stale node data is not available, so previously accumulated JMT nodes won't be pruned. For aggressive pruning nodes, consider wiping data and re-syncing from archive node.


Batch Settlement for Settler Sidecar

Settler now batches multiple settlements into a single transaction for improved efficiency.

Previous Behavior: Each settlement was submitted as a separate transaction, requiring N signatures and N RPC calls for N settlements.

New Behavior:
  • Batch collection - Collects settlements for 50ms or until batch_size (default 100) is reached
  • Single transaction - All collected settlements are submitted in one SystemSettle transaction
  • Atomic execution - Uses validation-first pattern to ensure all-or-nothing semantics
  • Unified type - SystemSettleTx now contains Vec<Settlement> (single settlement = batch of 1)
Benefits:
  • Reduced transaction count: N settlements β†’ 1 transaction
  • Reduced signature overhead: N signatures β†’ 1 signature
  • Reduced RPC calls: N calls β†’ 1 call
  • Lower latency for burst settlements
API Changes:
// Before: Two separate types
pub struct SystemSettleTx {
    pub thesis_id: u64,
    pub settlement_type: SystemSettlementType,
}
pub struct SystemBatchSettleTx {
    pub settlements: Vec<Settlement>,
}
 
// After: Unified type
pub struct SystemSettleTx {
    pub settlements: Vec<Settlement>,
}
 
pub struct Settlement {
    pub thesis_id: u64,
    pub settlement_type: SystemSettlementType,
}
Files Changed:
  • crates/tx/src/payload.rs - Unified SystemSettleTx with Vec<Settlement>
  • crates/engine/src/executors/rfq.rs - Added execute_settlement() with validation-first pattern
  • crates/engine/src/lib.rs - Single handler for settlement
  • crates/app/src/settler/service.rs - Batch collection and submission logic
  • sdk/src/schema/tx.ts - Added SystemSettleTxSchema, SettlementSchema

Breaking Change: Borsh encoding changed. Requires coordinated upgrade of settler and nodes.


Hybrid Thesis Sync (WebSocket + RPC Polling)

Tester app now uses a hybrid approach for active thesis and thesis history, combining real-time WebSocket events with RPC polling for improved reliability.

Previous Behavior: Thesis data only existed in local memory. If WebSocket disconnected, settlement events were missed. No history persisted across sessions.

New Behavior:
  • Initial load from RPC - Fetches thesis history on connect via getThesesByUser()
  • Real-time WebSocket - Subscribes to subscribeUserTheses() for immediate settlement notifications
  • Fallback RPC polling - When WebSocket disconnects, polls every 5s (only if active theses exist)
  • Reconnection sync - On WebSocket reconnect, syncs from RPC to catch missed events
Benefits:
  • Thesis history persists across browser refreshes
  • Missed settlements are recovered when WebSocket is down
  • Efficient - real-time events when available, polling only as fallback
Files Changed:
  • apps/tester/src/hooks/use-thesis-sync.ts - New hybrid sync hook
  • apps/tester/src/stores/use-thesis-store.ts - Added syncTheses, updateThesisByThesisId, clearAll actions
  • apps/tester/src/hooks/use-price-stream.ts - Removed thesis subscription (moved to sync hook)
  • apps/tester/src/pages/index.tsx - Integrated useThesisSync hook

Fixed

Tester: Bridge Withdrawal Targeting External Chain

Fixed withdrawal requests incorrectly calling external chain gateway contracts instead of Kaizen Core.

Root Cause: BridgeModal.tsx used wagmi's writeContract to call an external gateway contract for withdrawals, which is incorrect. Withdrawals should be submitted to Kaizen Core, which then gets processed by the relayer to send funds to the external chain.

Previous Behavior:
// Wrong: Calling external chain contract
writeContract({
  address: selectedChainConfig.gatewayAddress,
  abi: GATEWAY_ABI,
  functionName: "withdraw",
  args: [amountParsed, wagmiAddress],
});
New Behavior:
// Correct: Submit to Kaizen Core
const payload = withdrawPayload(amountParsed, wagmiAddress);
await client.sendTransaction(payload, { waitForConfirmation: true });
UI Changes:
  • Description updated: "Request withdrawal from Kaizen. Funds will be sent to [chain] by the relayer."
  • Shows Kaizen Core transaction hash instead of external chain explorer link
  • Info message: "πŸ’‘ Withdrawals are submitted to Kaizen Core. The relayer will process and send funds to your selected chain."
  • Withdraw button no longer requires chain switch (only deposits need external chain interaction)
Files Changed:
  • apps/tester/src/components/BridgeModal.tsx - Use SDK's withdraw() payload builder and client.sendTransaction()

WebSocket Duplicate Event Broadcast

Fixed duplicate WebSocket events being sent to frontend clients, which could cause unnecessary re-renders and subscription loops.

Root Cause: In write-node's executor.rs, events were broadcast twice:

  1. Immediately during execute_tx() for real-time feedback
  2. Again during checkpoint() when the block was committed

This meant every thesis settlement, transfer, and oracle price update was delivered to WebSocket subscribers twice.

Solution: Removed immediate event broadcast from execute_tx(). Events are now only broadcast once during checkpoint(), which includes both transaction events and oracle price events from begin_block().

Trade-off: Transaction events now have ~100ms higher latency (wait for next checkpoint) but are guaranteed to be delivered exactly once.

Files Changed:
  • crates/app/src/executor.rs - Removed duplicate event broadcast in execute_tx()

WebSocket UserTheses Subscription Missing Settlement Events

Fixed users not receiving RfqSettled events for their own theses when they lost.

Root Cause: The UserTheses WebSocket subscription in subscriptions.rs filtered RfqSettled events by winner == address instead of user == address. This meant users only received settlement notifications when they won, not when they lost.

Solution: Changed the filter condition to check if the event's user field matches the subscription address, ensuring users receive all settlement events for their theses regardless of outcome.

Files Changed:
  • crates/app/src/ws/subscriptions.rs - Fixed RfqSettled event filter to use user instead of winner

Tester: WebSocket Subscription Loop on Thesis Updates

Fixed infinite WebSocket re-subscription loop that could occur when thesis events were received.

Root Cause: In use-thesis-sync.ts, the handleThesisEvent callback had theses array in its dependency list. When a thesis event arrived and updated the store, the callback was recreated, which triggered the useEffect to unsubscribe and re-subscribe to the WebSocket channel, which could cause duplicate events and further re-renders.

Solution: Wrapped handleThesisEvent in a useRef to keep a stable reference. The subscription useEffect now only depends on connection state, not on the callback itself. The ref is updated on each render to always have access to the latest store state.

Files Changed:
  • apps/tester/src/hooks/use-thesis-sync.ts - Stabilized callback reference with useRef

Settler: Invalid Breach Timestamp Outside Thesis Window

Fixed "Invalid breach timestamp: X not in [start_time, end_time]" error when settler submitted SolverWins settlements.

Root Cause: The find_breach function in settler could return a breach timestamp that was slightly before the thesis's start_time. This happened because get_price_at() uses a 100ms tolerance window, so it might return a price from timestamp T-50ms when querying for timestamp T. The breach was valid (price did breach), but the returned timestamp was outside the thesis's valid observation period.

Solution: Added explicit bounds check in find_breach to ensure the actual timestamp (from the price cache) falls within [thesis.start_time, thesis.end_time] before returning it as a valid breach.

Files Changed:
  • crates/app/src/settler/service.rs - Added timestamp bounds validation in find_breach()

Settler: State Loss on Restart

Added height persistence to settler so it can resume from the last processed block after restart.

Previous Behavior: Settler always started from block 0 on restart, requiring full event replay which could be slow or fail if historical events were pruned.

New Behavior:
  • Settler saves last processed height to {data_dir}/height.txt every 100 blocks
  • On startup, reads persisted height and resumes event stream from that point
  • Falls back to height 0 if persistence file doesn't exist or is corrupted
Configuration:
# CLI
settler --data-dir ./.data/settler
 
# Environment variable
SETTLER_DATA_DIR=./.data/settler
Files Changed:
  • crates/app/src/settler/config.rs - Added data_dir field
  • crates/app/src/settler/service.rs - Added read_persisted_height(), write_persisted_height()
  • apps/settler/src/main.rs - Added --data-dir CLI argument

Settler: Failed Settlement Retry

Fixed settler not retrying settlements that failed due to RPC or execution errors.

Root Cause: When a settlement transaction failed (either RPC error or execution error like "Invalid breach timestamp"), the thesis remained in pending_settlements indefinitely. The breach detector would skip it, assuming a settlement was already in flight.

Solution: Added feedback channel from settlement_submitter to a new settlement_result_handler task. When a settlement fails, its thesis_id is removed from pending_settlements, allowing the breach detector to pick it up again for retry.

Files Changed:
  • crates/app/src/settler/service.rs - Added SettlementResult, settlement_result_handler(), feedback channel

Settler: Breach Price Mismatch on SolverWins Settlement

Fixed "Breach price mismatch" error when settler submitted SolverWins settlements.

Root Cause: The find_breach function used the iteration timestamp (t) instead of the actual price entry timestamp when reporting breaches. When prices were cached at timestamps slightly different from 100ms intervals, the node's ring buffer lookup would return a different price.

Example:
- Oracle price at timestamp 1050 β†’ P1
- Settler iterates at t=1100, finds P1 (within 100ms tolerance)
- Settler sends: breach_timestamp=1100, breach_price=P1
- Node: slot_for_timestamp(1100) = slot 11, which has P2 (from timestamp 1150)
- Mismatch: P1 β‰  P2

Solution: Changed get_price_at() to return (actual_timestamp, price) tuple instead of just price. The breach detector now uses the actual oracle timestamp, ensuring the node's ring buffer lookup returns the same price.

Files Changed:
  • crates/app/src/settler/service.rs - Fixed get_price_at() and find_breach() to use actual timestamps

Settler: Double Settlement Race Condition

Fixed "Thesis not active" error caused by settler submitting duplicate settlements for the same thesis.

Root Cause: After detecting a breach and sending a settlement decision to the channel, the thesis remained in active_theses until the RfqSettled event arrived. The next breach detection cycle would detect the same breach and send another settlement, which failed because the thesis was already settled.

Solution: Added pending_settlements: HashSet<u64> to track theses with in-flight settlements:

  1. Before detection, filter out theses already in pending_settlements
  2. After detection, mark decided theses as pending before sending to channel
  3. When RfqSettled event arrives, clear both active_theses and pending_settlements
Files Changed:
  • crates/app/src/settler/service.rs - Added pending settlement tracking

Settler: Enhanced Settlement Response Logging

Added detailed logging for settlement transaction responses to aid debugging.

New Log Fields:
  • On success: status, block_height, tx_index from receipt
  • On execution failure: receipt.error message, individual settlement details
  • On RPC error: breach_timestamp, breach_price for each failed settlement
New Metric:
  • settler_execution_errors_total - Count of transactions included but failed execution
Files Changed:
  • crates/app/src/settler/service.rs - Enhanced submit_batch() logging

Tester: API Wallet Mismatch on Wallet Switch

Fixed "Invalid user signature: signer is not user nor an authorized API wallet" error when switching accounts in external wallet.

Root Cause: When user switched accounts in MetaMask, the localStorage API wallet still belonged to the previous account. The new account would try to sign quotes with the old API wallet, causing signature verification failures.

Solution: Added wallet change detection that automatically clears mismatched API wallets.

Files Changed:
  • apps/tester/src/stores/use-wallet-store.ts - Added handleMainWalletChange() to clear API wallet on account switch
  • apps/tester/src/hooks/use-kaizen-client.tsx - Calls handleMainWalletChange() when wagmi address changes

Tester: WebSocket Abrupt Disconnect on Wallet Change

Fixed WebSocket connection dropping abruptly when switching wallet accounts, causing poor UX.

Root Cause: React effect cleanup immediately called client.disconnectWebSocket() without any grace period.

Solution: Added graceful disconnection sequence:

  1. Mark WebSocket as disconnected immediately (prevents new requests)
  2. Clear core service client
  3. Wait 100ms before actual WebSocket close
Files Changed:
  • apps/tester/src/hooks/use-kaizen-client.tsx - Added graceful disconnect with timeout

Tester: EnableConnectionModal "Existing Wallet Found" UX Confusion

Fixed confusing "Existing API Wallet Found" message appearing during API wallet setup flow.

Root Cause: The useEffect that reset modal state had apiWallet in dependencies, causing it to re-run and show the message immediately after generating a new wallet.

Solution:
  • Changed effect to only trigger on isOpen change, not wallet state changes
  • Added isReusingExisting state to track if resuming previous setup
  • Changed message from "Existing API Wallet Found" to "Resume Setup" for clarity
Files Changed:
  • apps/tester/src/components/EnableConnectionModal.tsx - Fixed effect dependencies and improved messaging

Tester: Prevent Box Drawing Without Sufficient Balance

Box drawing is now disabled when user balance is below minimum bet amount, regardless of degen mode.

Previous Behavior: Users could draw boxes with 0 balance, only to see error after attempting to execute.

New Behavior:
  • BOX tool button is disabled and grayed out when balance < minimum bet
  • Clicking disabled button shows toast explaining insufficient balance
  • If balance drops while BOX tool is active, tool is auto-deactivated
  • Chart also checks balance before allowing drag start (defense in depth)
Files Changed:
  • apps/tester/src/components/RightPanel.tsx - Added balance check on tool activation
  • apps/tester/src/components/Chart.tsx - Added balance check in mousedown handler

Settler Challenge Deadline Timing Race

Fixed "ChallengeWindowNotOver" error when settler submits UserWins settlement right after deadline passes.

Root Cause: Settler used SystemTime::now() to check deadline, but core uses block timestamp which is aligned down to 100ms intervals via align_timestamp(). This caused a race condition where settler saw the deadline as passed, but core's block timestamp hadn't caught up yet.

Timeline example:
- T=950ms: Checkpoint β†’ block_timestamp = 900ms (aligned down)
- T=1001ms: Settler sees now >= deadline(1000ms) β†’ submits UserWins
- TX executes with block_timestamp = 900ms
- Core: 900 < 1000 β†’ ChallengeWindowNotOver!

Solution: Added deadline_buffer (default 200ms) to settler config. Settler now waits until now >= challenge_deadline + deadline_buffer before submitting UserWins settlement.

Files Changed:
  • crates/app/src/settler/config.rs - Added deadline_buffer field
  • crates/app/src/settler/service.rs - Apply buffer in breach detection
  • apps/settler/src/main.rs - Added --deadline-buffer CLI flag
  • docker-compose.yml - Explicitly set deadline buffer
Configuration:
# CLI (default 200ms)
settler --write-node 127.0.0.1:9000 --deadline-buffer 200

SDK Quote Signing Hash Mismatch

Fixed "Invalid user signature" error when submitting thesis via tester app.

Root Cause: SDK's buildQuoteSigningHash passed hex-encoded bytes to viem's keccak256, which produced a different hash than Rust's direct byte hashing.

Solution: Pass raw Uint8Array directly to keccak256 instead of converting to hex first.

// Before (incorrect)
return keccak256(bytesToHex(message));
 
// After (correct)
return keccak256(message);
Files Changed:
  • sdk/src/signer.ts - Fixed buildQuoteSigningHash function

SDK RfqSettledEvent Schema Mismatch

Fixed WebSocket event deserialization failure for thesis settlement events.

Root Cause: SDK's RfqSettledEvent was missing fields that Rust's Event::RfqSettled had.

Solution: Added missing fields to match Rust schema.

Fields Added:
  • user: AddressSchema
  • solver: AddressSchema
  • oraclePair: OraclePairSchema
  • betAmount: bigint
Files Changed:
  • sdk/src/schema/event.ts - Updated RfqSettledEvent class

Changed

Oracle Service Rename

Renamed mock-oracle to oracle as it's now the official production service.

Changes:
  • Directory: apps/mock-oracle β†’ apps/oracle
  • Package: @kaizen-core/mock-oracle β†’ @kaizen-core/oracle
  • Docker service: mock-oracle β†’ oracle
  • Container: kaizen-mock-oracle β†’ kaizen-oracle
Files Changed:
  • apps/oracle/package.json - Package name
  • apps/oracle/src/logger.ts - Logger name
  • pnpm-workspace.yaml - Workspace path
  • docker-compose.yml - Service config
  • docker/Dockerfile.oracle - Build paths
  • docker/config/write-node.toml - Oracle URL
  • docker/monitoring/prometheus/prometheus.yml - Scrape target
Migration:
# Docker users: rebuild the oracle image
docker compose build oracle
 
# Development: reinstall dependencies
pnpm install

Fixed

Read-Node State Root Divergence

Fixed critical state synchronization issues between write-node and read-node that caused WebSocket disconnections after transaction execution.

Root Causes:
  1. Duplicate Transaction Check During Replay: Read-node was rejecting replayed transactions as duplicates
  2. Non-deterministic HashMap Iteration: pending_updates HashMap iteration order varied between nodes, causing different JMT state roots
  3. Timestamp Inconsistency: Thesis.created_at used different timestamps between write-node (system time) and read-node (block time)
Solutions:
  • Added execute_tx_replay() method to bypass duplicate checks during block sync
  • Sorted pending_updates by KeyHash before JMT commit for deterministic ordering
  • Introduced read_version/write_version separation in StateManager
  • Passed consistent block_timestamp to transaction execution
Files Changed:
  • crates/engine/src/lib.rs - Added replay mode for transaction execution
  • crates/state/src/manager.rs - Read/write version separation
  • crates/state/src/rocksdb_storage.rs - Deterministic HashMap ordering
  • crates/app/src/sync/client.rs - Snapshot/restore on verification failure
  • crates/app/src/executor.rs - Consistent block timestamp handling

CLI Transaction Encoding

Fixed CLI's tx commands (withdraw, transfer, etc.) not working due to incorrect transaction encoding.

Root Cause: CLI was building transactions with custom format instead of using kaizen_tx::Transaction type.

Solution: Refactored CLI to use kaizen-tx crate for proper transaction building and signing.

Files Changed:
  • apps/cli/Cargo.toml - Added kaizen-tx dependency
  • apps/cli/src/commands/tx.rs - Rewrote using kaizen_tx::Transaction
  • apps/cli/src/rpc.rs - Fixed RPC method name and response type

CLI Signature Mismatch

Fixed "Invalid user signature" error when submitting thesis via CLI.

Root Cause: CLI's sign_quote function used different domain separator than SDK/mock-solver.

Solution: Aligned signing logic to use "Kaizen:SolverQuote" domain separator with keccak256 hashing.

Files Changed:
  • apps/cli/src/commands/thesis.rs - Fixed signature generation

Bridge Withdrawal Status Format

Fixed withdrawal status format incompatibility with bridge service.

Root Cause: Core returned human-readable status ("Pending") but bridge expected numeric string ("0").

Solution: Changed status serialization to output enum discriminant as string.

Files Changed:
  • crates/app/src/rpc/types.rs - Changed format!("{:?}", status) to (status as u8).to_string()
Status Mapping:
NumericStatus
"0"Pending
"1"Processing
"2"Completed
"3"Failed

API Changes

RPC Methods

kaizen_sendTransaction
  • Now properly returns execution result object instead of just transaction hash
{
  "hash": "0x...",
  "status": "executed",
  "success": true,
  "error": null,
  "blockHeight": 1234,
  "txIndex": 0,
  "events": [...]
}
kaizen_getUnprocessedWithdrawals
  • Status field now returns numeric string for bridge compatibility

Testing

Full lifecycle test verified:

  1. βœ… Bridge Deposit (faucet mint)
  2. βœ… Thesis Submit (RFQ)
  3. βœ… Settlement (UserWin/SolverWin)
  4. βœ… Bridge Withdraw
  5. βœ… Read-node Sync (0 state root mismatches)

Stress test results:

  • 87 thesis submissions
  • 50 rapid-fire parallel submissions
  • 0 state root mismatches between write-node and read-node