NoKV

package module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 30, 2025 License: Apache-2.0 Imports: 28 Imported by: 0

README

🚀 NoKV – High-Performance Distributed KV Engine

NoKV Logo

Status Go Reference Go Report Card Coverage Go Version License

LSM Tree • ValueLog • MVCC • Multi-Raft Regions • Redis-Compatible

NoKV is a Go-native storage engine that mixes RocksDB-style manifest discipline with Badger-inspired value separation. You can embed it locally, drive it via multi-Raft regions, or front it with a Redis protocol gateway—all from a single topology file.


✨ Feature Highlights

  • 🚀 Dual runtime modes – call NoKV.Open inside your process or launch nokv serve for a distributed deployment, no code changes required.
  • 🔁 Hybrid LSM + ValueLog – WAL → MemTable → SST pipeline for latency, with a ValueLog to keep large payloads off the hot path.
  • MVCC-native transactions – snapshot isolation, conflict detection, TTL, and iterators built into the core (no external locks).
  • 🧠 Multi-Raft regionsraftstore manages per-region raft groups, WAL/manifest pointers, and tick-driven leader elections.
  • 🛰️ Redis gatewaycmd/nokv-redis exposes RESP commands (SET/GET/MGET/NX/XX/TTL/INCR...) on top of raft-backed storage.
  • 🔍 Observability firstnokv stats, expvar endpoints, hot key tracking, RECOVERY/TRANSPORT metrics, and ready-to-use recovery scripts.
  • 🧰 Single-source configraft_config.json feeds local scripts, Docker Compose, Redis gateway, and CI so there’s zero drift.

🚦 Quick Start

Start an end-to-end playground with either the local script or Docker Compose. Both spin up a three-node Raft cluster (plus the optional TSO) and expose the Redis-compatible gateway.

# Option A: local processes
./scripts/run_local_cluster.sh --config ./raft_config.example.json
# In another shell: launch the Redis gateway on top of the running cluster
go run ./cmd/nokv-redis --addr 127.0.0.1:6380 --raft-config raft_config.example.json

# Option B: Docker Compose (cluster + gateway + TSO)
docker compose up --build
# Tear down
docker compose down -v

Once the cluster is running you can point any Redis client at 127.0.0.1:6380 (or the address exposed by Compose).

For quick CLI checks:

# Inspect stats from an existing workdir
go run ./cmd/nokv stats --workdir ./artifacts/cluster/store-1

Minimal embedded snippet:

package main

import (
	"fmt"
	"log"

	NoKV "github.com/feichai0017/NoKV"
	"github.com/feichai0017/NoKV/utils"
)

func main() {
	opt := NoKV.NewDefaultOptions()
	opt.WorkDir = "./workdir-demo"

	db := NoKV.Open(opt)
	defer db.Close()

	key := []byte("hello")
	if err := db.SetCF(utils.CFDefault, key, []byte("world")); err != nil {
		log.Fatalf("set failed: %v", err)
	}

	entry, err := db.Get(key)
	if err != nil {
		log.Fatalf("get failed: %v", err)
	}
	fmt.Printf("value=%s\n", entry.Value)
	entry.DecrRef()
}

ℹ️ run_local_cluster.sh rebuilds nokv, nokv-config, nokv-tso, seeds manifests via nokv-config manifest, and parks logs under artifacts/cluster/store-<id>/server.log. Use Ctrl+C to exit cleanly; if the process crashes, wipe the workdir (rm -rf ./artifacts/cluster) before restarting to avoid WAL replay errors.


🧭 Topology & Configuration

Everything hangs off a single file: raft_config.example.json.

"stores": [
  { "store_id": 1, "listen_addr": "127.0.0.1:20170", ... },
  { "store_id": 2, "listen_addr": "127.0.0.1:20171", ... },
  { "store_id": 3, "listen_addr": "127.0.0.1:20172", ... }
],
"regions": [
  { "id": 1, "range": [-inf,"m"), peers: 101/201/301, leader: store 1 },
  { "id": 2, "range": ["m",+inf), peers: 102/202/302, leader: store 2 }
]
  • Local scripts (run_local_cluster.sh, serve_from_config.sh, bootstrap_from_config.sh) ingest the same JSON, so local runs match production layouts.
  • Docker Compose mounts the file into each container; manifests, transports, and Redis gateway all stay in sync.
  • Need more stores or regions? Update the JSON and re-run the script/Compose—no code changes required.
  • Programmatic access: import github.com/feichai0017/NoKV/config and call config.LoadFile / Validate for a single source of truth across tools.
🧬 Tech Stack Snapshot
Layer Tech/Package Why it matters
Storage Core lsm/, wal/, vlog/ Hybrid log-structured design with manifest-backed durability and value separation.
Concurrency mvcc/, txn.go, oracle Timestamp oracle + lock manager for MVCC transactions and TTL-aware reads.
Replication raftstore/* Multi-Raft orchestration (regions, peers, router, schedulers, gRPC transport).
Tooling cmd/nokv, cmd/nokv-config, cmd/nokv-redis CLI, config helper, Redis-compatible gateway share the same topology file.
Observability stats, hotring, expvar Built-in metrics, hot-key analytics, and crash recovery traces.

🧱 Architecture Overview

graph TD
    Client[Client API / Txn] -->|Set/Get| DBCore
    DBCore -->|Append| WAL
    DBCore -->|Insert| MemTable
    DBCore -->|ValuePtr| ValueLog
    MemTable -->|Flush Task| FlushMgr
    FlushMgr -->|Build SST| SSTBuilder
    SSTBuilder -->|LogEdit| Manifest
    Manifest -->|Version| LSMLevels
    LSMLevels -->|Compaction| Compactor
    FlushMgr -->|Discard Stats| ValueLog
    ValueLog -->|GC updates| Manifest
    DBCore -->|Stats/HotKeys| Observability

Key ideas:

  • Durability path – WAL first, memtable second. ValueLog writes occur before WAL append so crash replay can fully rebuild state.
  • Metadata – manifest stores SST topology, WAL checkpoints, and vlog head/deletion metadata.
  • Background workers – flush manager handles Prepare → Build → Install → Release, compaction reduces level overlap, and value log GC rewrites segments based on discard stats.
  • Transactions – MVCC timestamps ensure consistent reads; commit reuses the same write pipeline as standalone writes.

Dive deeper in docs/architecture.md.


🧩 Module Breakdown

Module Responsibilities Source Docs
WAL Append-only segments with CRC, rotation, replay (wal.Manager). wal/ WAL internals
LSM MemTable, flush pipeline, leveled compactions, iterator merging. lsm/ Memtable
Flush pipeline
Cache
Manifest VersionEdit log + CURRENT handling, WAL/vlog checkpoints, Region metadata. manifest/ Manifest semantics
ValueLog Large value storage, GC, discard stats integration. vlog.go, vlog/ Value log design
Transactions MVCC oracle, managed/unmanaged transactions, iterator snapshots. txn.go Transactions & MVCC
RaftStore Multi-Raft Region management, hooks, metrics, transport. raftstore/ RaftStore overview
HotRing Hot key tracking, throttling helpers. hotring/ HotRing overview
Observability Periodic stats, hot key tracking, CLI integration. stats.go, cmd/nokv Stats & observability
CLI reference
Filesystem mmap-backed file helpers shared by SST/vlog. file/ File abstractions

Each module has a dedicated document under docs/ describing APIs, diagrams, and recovery notes.


📡 Observability & CLI

  • Stats.StartStats publishes metrics via expvar (flush backlog, WAL segments, value log GC stats, txn counters).
  • cmd/nokv gives you:
    • nokv stats --workdir <dir> [--json] [--no-region-metrics]
    • nokv manifest --workdir <dir>
    • nokv regions --workdir <dir> [--json]
    • nokv vlog --workdir <dir>
  • hotring continuously surfaces hot keys in stats + CLI so you can pre-warm caches or debug skewed workloads.

More in docs/cli.md and docs/testing.md.


🔌 Redis Gateway

  • cmd/nokv-redis exposes a RESP-compatible endpoint. In embedded mode (--workdir) every command runs inside local MVCC transactions; in distributed mode (--raft-config) calls are routed through raftstore/client and committed with TwoPhaseCommit so NX/XX, TTL, arithmetic and multi-key writes match the single-node semantics.
  • TTL metadata is stored under !redis:ttl!<key> and is automatically cleaned up when reads detect expiration.
  • --metrics-addr publishes NoKV.Redis statistics via expvar and --tso-url can point to an external TSO service (otherwise a local oracle is used).
  • A ready-to-use cluster configuration is available at cmd/nokv-redis/raft_config.example.json, matching both scripts/run_local_cluster.sh and the Docker Compose setup.

For the complete command matrix, configuration and deployment guides, see docs/nokv-redis.md.


📚 Documentation

Topic Document
Architecture deep dive docs/architecture.md
WAL internals docs/wal.md
Flush pipeline docs/flush.md
Memtable lifecycle docs/memtable.md
Transactions & MVCC docs/txn.md
Manifest semantics docs/manifest.md
ValueLog manager docs/vlog.md
Cache & bloom filters docs/cache.md
Hot key analytics docs/hotring.md
Stats & observability docs/stats.md
File abstractions docs/file.md
Crash recovery playbook docs/recovery.md
Testing matrix docs/testing.md
CLI reference docs/cli.md
RaftStore overview docs/raftstore.md
Redis gateway docs/nokv-redis.md

📄 License

Apache-2.0. See LICENSE.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ColumnFamilySnapshot

type ColumnFamilySnapshot struct {
	Writes uint64 `json:"writes"`
	Reads  uint64 `json:"reads"`
}

type CoreAPI

type CoreAPI interface {
	Set(data *kv.Entry) error
	Get(key []byte) (*kv.Entry, error)
	Del(key []byte) error
	SetCF(cf kv.ColumnFamily, key, value []byte) error
	GetCF(cf kv.ColumnFamily, key []byte) (*kv.Entry, error)
	DelCF(cf kv.ColumnFamily, key []byte) error
	NewIterator(opt *utils.Options) utils.Iterator
	Info() *Stats
	Close() error
}

NoKV对外提供的功能集合

type DB

type DB struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

DB 对外暴露的接口对象 全局唯一,持有各种资源句柄

func Open

func Open(opt *Options) *DB

Open DB

func (*DB) Close

func (db *DB) Close() error

func (*DB) Del

func (db *DB) Del(key []byte) error

func (*DB) DelCF

func (db *DB) DelCF(cf kv.ColumnFamily, key []byte) error

DelCF deletes a key from the specified column family.

func (*DB) DeleteVersionedEntry added in v0.2.0

func (db *DB) DeleteVersionedEntry(cf kv.ColumnFamily, key []byte, version uint64) error

DeleteVersionedEntry marks the specified version as deleted by writing a tombstone record.

func (*DB) Get

func (db *DB) Get(key []byte) (*kv.Entry, error)

func (*DB) GetCF

func (db *DB) GetCF(cf kv.ColumnFamily, key []byte) (*kv.Entry, error)

GetCF reads a key from the specified column family.

func (*DB) GetVersionedEntry added in v0.2.0

func (db *DB) GetVersionedEntry(cf kv.ColumnFamily, key []byte, version uint64) (*kv.Entry, error)

GetVersionedEntry retrieves the value stored at the provided MVCC version. The caller is responsible for releasing the returned entry via DecrRef.

func (*DB) Info

func (db *DB) Info() *Stats

func (*DB) IsClosed

func (db *DB) IsClosed() bool

func (*DB) Manifest

func (db *DB) Manifest() *manifest.Manager

Manifest exposes the manifest manager for coordination components.

func (*DB) NewIterator

func (db *DB) NewIterator(opt *utils.Options) utils.Iterator

func (*DB) NewTransaction

func (db *DB) NewTransaction(update bool) *Txn

func (*DB) RunValueLogGC

func (db *DB) RunValueLogGC(discardRatio float64) error

RunValueLogGC triggers a value log garbage collection.

func (*DB) Set

func (db *DB) Set(data *kv.Entry) error

func (*DB) SetCF

func (db *DB) SetCF(cf kv.ColumnFamily, key, value []byte) error

SetCF writes a key/value pair into the specified column family.

func (*DB) SetRegionMetrics

func (db *DB) SetRegionMetrics(rm *storepkg.RegionMetrics)

SetRegionMetrics attaches region metrics recorder so Stats snapshot and expvar include region state counts.

func (*DB) SetVersionedEntry added in v0.2.0

func (db *DB) SetVersionedEntry(cf kv.ColumnFamily, key []byte, version uint64, value []byte, meta byte) error

SetVersionedEntry writes a value to the specified column family using the provided version. It mirrors SetCF but allows callers to control the MVCC timestamp embedded in the internal key.

func (*DB) Update

func (db *DB) Update(fn func(txn *Txn) error) error

Update executes a function, creating and managing a read-write transaction for the user. Error returned by the function is relayed by the Update method. Update cannot be used with managed transactions.

func (*DB) View

func (db *DB) View(fn func(txn *Txn) error) error

View executes a function creating and managing a read-only transaction for the user. Error returned by the function is relayed by the View method. If View is used with managed transactions, it would assume a read timestamp of MaxUint64.

func (*DB) WAL

func (db *DB) WAL() *wal.Manager

WAL exposes the underlying WAL manager.

type DBIterator

type DBIterator struct {
	// contains filtered or unexported fields
}

func (*DBIterator) Close

func (iter *DBIterator) Close() error

func (*DBIterator) Item

func (iter *DBIterator) Item() utils.Item

func (*DBIterator) Next

func (iter *DBIterator) Next()

func (*DBIterator) Rewind

func (iter *DBIterator) Rewind()

func (*DBIterator) Seek

func (iter *DBIterator) Seek(key []byte)

func (*DBIterator) Valid

func (iter *DBIterator) Valid() bool

type HotKeyStat

type HotKeyStat struct {
	Key   string `json:"key"`
	Count int32  `json:"count"`
}

type Item

type Item struct {
	// contains filtered or unexported fields
}

func (*Item) Entry

func (it *Item) Entry() *kv.Entry

func (*Item) ValueCopy

func (it *Item) ValueCopy(dst []byte) ([]byte, error)

ValueCopy returns a copy of the current value into dst (if provided). Mirrors Badger's semantics to aid callers expecting defensive copies.

type IteratorOptions

type IteratorOptions struct {
	Reverse        bool // Direction of iteration. False is forward, true is backward.
	AllVersions    bool // Fetch all valid versions of the same key.
	InternalAccess bool // Used to allow internal access to keys.
	KeyOnly        bool // Avoid eager value materialisation.

	Prefix  []byte // Only iterate over this given prefix.
	SinceTs uint64 // Only read data that has version > SinceTs.
	// contains filtered or unexported fields
}

type LSMLevelStats added in v0.4.0

type LSMLevelStats struct {
	Level              int     `json:"level"`
	TableCount         int     `json:"tables"`
	SizeBytes          int64   `json:"size_bytes"`
	ValueBytes         int64   `json:"value_bytes"`
	StaleBytes         int64   `json:"stale_bytes"`
	IngestTables       int     `json:"ingest_tables"`
	IngestSizeBytes    int64   `json:"ingest_size_bytes"`
	IngestValueBytes   int64   `json:"ingest_value_bytes"`
	ValueDensity       float64 `json:"value_density"`
	IngestValueDensity float64 `json:"ingest_value_density"`
	IngestRuns         int64   `json:"ingest_runs"`
	IngestMs           float64 `json:"ingest_ms"`
	IngestTablesCount  int64   `json:"ingest_tables_compacted"`
	MergeRuns          int64   `json:"ingest_merge_runs"`
	MergeMs            float64 `json:"ingest_merge_ms"`
	MergeTables        int64   `json:"ingest_merge_tables"`
}

LSMLevelStats captures aggregated metrics per LSM level.

type Options

type Options struct {
	ValueThreshold     int64
	WorkDir            string
	MemTableSize       int64
	SSTableMaxSz       int64
	MaxBatchCount      int64
	MaxBatchSize       int64 // max batch size in bytes
	ValueLogFileSize   int
	ValueLogMaxEntries uint32

	// ValueLogGCInterval specifies how frequently to trigger a check for value
	// log garbage collection. Zero or negative values disable automatic GC.
	ValueLogGCInterval time.Duration
	// ValueLogGCDiscardRatio is the discard ratio for a value log file to be
	// considered for garbage collection. It must be in the range (0.0, 1.0).
	ValueLogGCDiscardRatio float64

	// Value log GC sampling parameters. Ratios <= 0 fall back to defaults.
	ValueLogGCSampleSizeRatio  float64
	ValueLogGCSampleCountRatio float64
	ValueLogGCSampleFromHead   bool

	// ValueLogVerbose enables verbose logging across value-log operations.
	ValueLogVerbose bool

	WriteBatchMaxCount int
	WriteBatchMaxSize  int64

	DetectConflicts bool
	HotRingEnabled  bool
	HotRingBits     uint8
	HotRingTopK     int
	// HotRingDecayInterval controls how often HotRing halves its global counters.
	// Zero disables periodic decay.
	HotRingDecayInterval time.Duration
	// HotRingDecayShift determines how aggressively counters decay (count >>= shift).
	HotRingDecayShift uint32
	// HotRingWindowSlots controls the number of sliding-window buckets tracked per key.
	// Zero disables the sliding window.
	HotRingWindowSlots int
	// HotRingWindowSlotDuration sets the duration of each sliding-window bucket.
	HotRingWindowSlotDuration time.Duration

	SyncWrites   bool
	ManifestSync bool
	// WriteHotKeyLimit caps how many consecutive writes a single key can issue
	// before the DB returns utils.ErrHotKeyWriteThrottle. Zero disables write-path
	// throttling.
	WriteHotKeyLimit int32
	// HotWriteBurstThreshold marks a key as “hot” for batching when its write
	// frequency exceeds this count; zero disables hot write batching.
	HotWriteBurstThreshold int32
	// HotWriteBatchMultiplier scales write batch limits when a hot key is
	// detected, allowing short-term coalescing of repeated writes.
	HotWriteBatchMultiplier int
	// WriteBatchWait adds an optional coalescing delay when the commit queue is
	// momentarily empty, letting small bursts share one WAL fsync/apply pass.
	// Zero disables the delay.
	WriteBatchWait time.Duration
	// CommitPipelineDepth controls the buffering between commit queue, value log
	// writes, and LSM apply. Values <= 0 fall back to a small default.
	CommitPipelineDepth int

	// Block cache configuration for read path optimization. Cached blocks
	// target L0/L1; colder data relies on the OS page cache.
	BlockCacheSize int
	BloomCacheSize int

	// RaftLagWarnSegments determines how many WAL segments a follower can lag
	// behind the active segment before stats surfaces a warning. Zero disables
	// the alert.
	RaftLagWarnSegments int64

	// EnableWALWatchdog enables the background WAL backlog watchdog which
	// surfaces typed-record warnings and optionally runs automated segment GC.
	EnableWALWatchdog bool
	// WALAutoGCInterval controls how frequently the watchdog evaluates WAL
	// backlog for automated garbage collection.
	WALAutoGCInterval time.Duration
	// WALAutoGCMinRemovable is the minimum number of removable WAL segments
	// required before an automated GC pass will run.
	WALAutoGCMinRemovable int
	// WALAutoGCMaxBatch bounds how many WAL segments are removed during a single
	// automated GC pass.
	WALAutoGCMaxBatch int
	// WALTypedRecordWarnRatio triggers a typed-record warning when raft records
	// constitute at least this fraction of WAL writes. Zero disables ratio-based
	// warnings.
	WALTypedRecordWarnRatio float64
	// WALTypedRecordWarnSegments triggers a typed-record warning when the number
	// of WAL segments containing raft records exceeds this threshold. Zero
	// disables segment-count warnings.
	WALTypedRecordWarnSegments int64

	// DiscardStatsFlushThreshold controls how many discard-stat updates must be
	// accumulated before they are flushed back into the LSM. Zero keeps the
	// default threshold.
	DiscardStatsFlushThreshold int

	// NumCompactors controls how many background compaction workers are spawned.
	// Zero uses an auto value derived from the host CPU count.
	NumCompactors int
	// NumLevelZeroTables controls when write throttling kicks in and feeds into
	// the compaction priority calculation. Zero falls back to the legacy default.
	NumLevelZeroTables int
	// IngestCompactBatchSize decides how many L0 tables to promote into the
	// ingest buffer per compaction cycle. Zero falls back to the legacy default.
	IngestCompactBatchSize int
	// IngestBacklogMergeScore triggers an ingest-merge task when the ingest
	// backlog score exceeds this threshold. Zero keeps the default (2.0).
	IngestBacklogMergeScore float64

	// CompactionValueWeight adjusts how aggressively the scheduler prioritises
	// levels whose entries reference large value log payloads. Higher values
	// make the compaction picker favour levels with high ValuePtr density.
	CompactionValueWeight float64

	// CompactionValueAlertThreshold triggers stats alerts when a level's
	// value-density (value bytes / total bytes) exceeds this ratio.
	CompactionValueAlertThreshold float64

	// IngestShardParallelism caps how many ingest shards can be compacted in a
	// single ingest-only pass. A value <= 0 falls back to 1 (sequential).
	IngestShardParallelism int
}

Options NoKV 总的配置文件

func NewDefaultOptions

func NewDefaultOptions() *Options

NewDefaultOptions 返回默认的options

type Stats

type Stats struct {
	EntryNum int64 // Mirrors Entries for backwards compatibility.
	// contains filtered or unexported fields
}

func (*Stats) SetRegionMetrics

func (s *Stats) SetRegionMetrics(rm *storepkg.RegionMetrics)

SetRegionMetrics attaches region metrics recorder used in snapshots.

func (*Stats) Snapshot

func (s *Stats) Snapshot() StatsSnapshot

Snapshot returns a point-in-time metrics snapshot without mutating state.

func (*Stats) StartStats

func (s *Stats) StartStats()

StartStats runs periodic collection of internal backlog metrics.

type StatsSnapshot

type StatsSnapshot struct {
	Entries                        int64                           `json:"entries"`
	FlushPending                   int64                           `json:"flush_pending"`
	FlushQueueLength               int64                           `json:"flush_queue_length"`
	FlushActive                    int64                           `json:"flush_active"`
	FlushWaitMs                    float64                         `json:"flush_wait_ms"`
	FlushLastWaitMs                float64                         `json:"flush_last_wait_ms"`
	FlushMaxWaitMs                 float64                         `json:"flush_max_wait_ms"`
	FlushBuildMs                   float64                         `json:"flush_build_ms"`
	FlushLastBuildMs               float64                         `json:"flush_last_build_ms"`
	FlushMaxBuildMs                float64                         `json:"flush_max_build_ms"`
	FlushReleaseMs                 float64                         `json:"flush_release_ms"`
	FlushLastReleaseMs             float64                         `json:"flush_last_release_ms"`
	FlushMaxReleaseMs              float64                         `json:"flush_max_release_ms"`
	FlushCompleted                 int64                           `json:"flush_completed"`
	CompactionBacklog              int64                           `json:"compaction_backlog"`
	CompactionMaxScore             float64                         `json:"compaction_max_score"`
	CompactionLastDurationMs       float64                         `json:"compaction_last_duration_ms"`
	CompactionMaxDurationMs        float64                         `json:"compaction_max_duration_ms"`
	CompactionRuns                 uint64                          `json:"compaction_runs"`
	CompactionIngestRuns           int64                           `json:"compaction_ingest_runs"`
	CompactionMergeRuns            int64                           `json:"compaction_ingest_merge_runs"`
	CompactionIngestMs             float64                         `json:"compaction_ingest_ms"`
	CompactionMergeMs              float64                         `json:"compaction_ingest_merge_ms"`
	CompactionIngestTables         int64                           `json:"compaction_ingest_tables"`
	CompactionMergeTables          int64                           `json:"compaction_ingest_merge_tables"`
	CompactionValueWeight          float64                         `json:"compaction_value_weight"`
	CompactionValueWeightSuggested float64                         `json:"compaction_value_weight_suggested,omitempty"`
	ValueLogSegments               int                             `json:"vlog_segments"`
	ValueLogPendingDel             int                             `json:"vlog_pending_deletes"`
	ValueLogDiscardQueue           int                             `json:"vlog_discard_queue"`
	ValueLogHead                   kv.ValuePtr                     `json:"vlog_head"`
	WALActiveSegment               int64                           `json:"wal_active_segment"`
	WALSegmentCount                int64                           `json:"wal_segment_count"`
	WALActiveSize                  int64                           `json:"wal_active_size"`
	WALSegmentsRemoved             uint64                          `json:"wal_segments_removed"`
	WALRecordCounts                wal.RecordMetrics               `json:"wal_record_counts"`
	WALSegmentsWithRaftRecords     int                             `json:"wal_segments_with_raft_records"`
	WALRemovableRaftSegments       int                             `json:"wal_removable_raft_segments"`
	WALTypedRecordRatio            float64                         `json:"wal_typed_record_ratio"`
	WALTypedRecordWarning          bool                            `json:"wal_typed_record_warning"`
	WALTypedRecordReason           string                          `json:"wal_typed_record_reason,omitempty"`
	WALAutoGCRuns                  uint64                          `json:"wal_auto_gc_runs"`
	WALAutoGCRemoved               uint64                          `json:"wal_auto_gc_removed"`
	WALAutoGCLastUnix              int64                           `json:"wal_auto_gc_last_unix"`
	RaftGroupCount                 int                             `json:"raft_group_count"`
	RaftLaggingGroups              int                             `json:"raft_lagging_groups"`
	RaftMinLogSegment              uint32                          `json:"raft_min_log_segment"`
	RaftMaxLogSegment              uint32                          `json:"raft_max_log_segment"`
	RaftMaxLagSegments             int64                           `json:"raft_max_lag_segments"`
	RaftLagWarnThreshold           int64                           `json:"raft_lag_warn_threshold"`
	RaftLagWarning                 bool                            `json:"raft_lag_warning"`
	WriteQueueDepth                int64                           `json:"write_queue_depth"`
	WriteQueueEntries              int64                           `json:"write_queue_entries"`
	WriteQueueBytes                int64                           `json:"write_queue_bytes"`
	WriteAvgBatchEntries           float64                         `json:"write_avg_batch_entries"`
	WriteAvgBatchBytes             float64                         `json:"write_avg_batch_bytes"`
	WriteAvgRequestWaitMs          float64                         `json:"write_avg_request_wait_ms"`
	WriteAvgValueLogMs             float64                         `json:"write_avg_vlog_ms"`
	WriteAvgApplyMs                float64                         `json:"write_avg_apply_ms"`
	WriteBatchesTotal              int64                           `json:"write_batches_total"`
	WriteThrottleActive            bool                            `json:"write_throttle_active"`
	TxnsActive                     int64                           `json:"txns_active"`
	TxnsStarted                    uint64                          `json:"txns_started"`
	TxnsCommitted                  uint64                          `json:"txns_committed"`
	TxnsConflicts                  uint64                          `json:"txns_conflicts"`
	RegionTotal                    int64                           `json:"region_total"`
	RegionNew                      int64                           `json:"region_new"`
	RegionRunning                  int64                           `json:"region_running"`
	RegionRemoving                 int64                           `json:"region_removing"`
	RegionTombstone                int64                           `json:"region_tombstone"`
	RegionOther                    int64                           `json:"region_other"`
	HotKeys                        []HotKeyStat                    `json:"hot_keys,omitempty"`
	HotWriteLimited                uint64                          `json:"hot_write_limited"`
	BlockL0HitRate                 float64                         `json:"block_l0_hit_rate"`
	BlockL1HitRate                 float64                         `json:"block_l1_hit_rate"`
	BloomHitRate                   float64                         `json:"bloom_hit_rate"`
	IndexHitRate                   float64                         `json:"index_hit_rate"`
	IteratorReused                 uint64                          `json:"iterator_reused"`
	ColumnFamilies                 map[string]ColumnFamilySnapshot `json:"column_families,omitempty"`
	LSMLevels                      []LSMLevelStats                 `json:"lsm_levels,omitempty"`
	LSMValueBytesTotal             int64                           `json:"lsm_value_bytes_total"`
	LSMValueDensityMax             float64                         `json:"lsm_value_density_max"`
	LSMValueDensityAlert           bool                            `json:"lsm_value_density_alert"`
}

StatsSnapshot captures a point-in-time view of internal backlog metrics.

type Txn

type Txn struct {
	// contains filtered or unexported fields
}

func (*Txn) Commit

func (txn *Txn) Commit() error

Commit commits the transaction, following these steps:

1. If there are no writes, return immediately.

2. Check if read rows were updated since txn started. If so, return ErrConflict.

3. If no conflict, generate a commit timestamp and update written rows' commit ts.

4. Batch up all writes, write them to value log and LSM tree.

5. If callback is provided, will return immediately after checking for conflicts. Writes to the database will happen in the background. If there is a conflict, an error will be returned and the callback will not run. If there are no conflicts, the callback will be called in the background upon successful completion of writes or any error during write.

If error is nil, the transaction is successfully committed. In case of a non-nil error, the LSM tree won't be updated, so there's no need for any rollback.

func (*Txn) CommitWith

func (txn *Txn) CommitWith(cb func(error))

CommitWith acts like Commit, but takes a callback, which gets run via a goroutine to avoid blocking this function. The callback is guaranteed to run, so it is safe to increment sync.WaitGroup before calling CommitWith, and decrementing it in the callback; to block until all callbacks are run.

func (*Txn) Delete

func (txn *Txn) Delete(key []byte) error

Delete deletes a key.

This is done by adding a delete marker for the key at commit timestamp. Any reads happening before this timestamp would be unaffected. Any reads after this commit would see the deletion.

The current transaction keeps a reference to the key byte slice argument. Users must not modify the key until the end of the transaction.

func (*Txn) Discard

func (txn *Txn) Discard()

Discard discards a created transaction. This method is very important and must be called. Commit method calls this internally, however, calling this multiple times doesn't cause any issues. So, this can safely be called via a defer right when transaction is created.

NOTE: If any operations are run on a discarded transaction, ErrDiscardedTxn is returned.

func (*Txn) Get

func (txn *Txn) Get(key []byte) (item *Item, rerr error)

Get looks for key and returns corresponding Item. If key is not found, ErrKeyNotFound is returned.

func (*Txn) NewIterator

func (txn *Txn) NewIterator(opt IteratorOptions) *TxnIterator

NewIterator 方法会生成一个新的事务迭代器。 在 Option 中,可以设置只迭代 Key,或者迭代 Key-Value

func (*Txn) NewKeyIterator

func (txn *Txn) NewKeyIterator(key []byte, opt IteratorOptions) *TxnIterator

NewKeyIterator is just like NewIterator, but allows the user to iterate over all versions of a single key. Internally, it sets the Prefix option in provided opt, and uses that prefix to additionally run bloom filter lookups before picking tables from the LSM tree.

func (*Txn) ReadTs

func (txn *Txn) ReadTs() uint64

ReadTs returns the read timestamp of the transaction.

func (*Txn) Set

func (txn *Txn) Set(key, val []byte) error

Set adds a key-value pair to the database. It will return ErrReadOnlyTxn if update flag was set to false when creating the transaction.

The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.

func (*Txn) SetEntry

func (txn *Txn) SetEntry(e *kv.Entry) error

SetEntry takes an kv.Entry struct and adds the key-value pair in the struct, along with other metadata to the database.

The current transaction keeps a reference to the entry passed in argument. Users must not modify the entry until the end of the transaction.

type TxnIterator

type TxnIterator struct {
	// contains filtered or unexported fields
}

Iterator helps iterating over the KV pairs in a lexicographically sorted order.

func (*TxnIterator) Close

func (it *TxnIterator) Close()

Close would close the iterator. It is important to call this when you're done with iteration.

func (*TxnIterator) Item

func (it *TxnIterator) Item() *Item

Item returns pointer to the current key-value pair. This item is only valid until it.Next() gets called.

func (*TxnIterator) Next

func (it *TxnIterator) Next()

Next would advance the iterator by one. Always check it.Valid() after a Next() to ensure you have access to a valid it.Item().

func (*TxnIterator) Rewind

func (it *TxnIterator) Rewind()

Rewind would rewind the iterator cursor all the way to zero-th position, which would be the smallest key if iterating forward, and largest if iterating backward. It does not keep track of whether the cursor started with a Seek().

func (*TxnIterator) Seek

func (it *TxnIterator) Seek(key []byte) uint64

Seek would seek to the provided key if present. If absent, it would seek to the next smallest key greater than the provided key if iterating in the forward direction. Behavior would be reversed if iterating backwards.

func (*TxnIterator) Valid

func (it *TxnIterator) Valid() bool

Valid returns false when iteration is done.

func (*TxnIterator) ValidForPrefix

func (it *TxnIterator) ValidForPrefix(prefix []byte) bool

ValidForPrefix returns false when iteration is done or when the current key is not prefixed by the specified prefix.

Directories

Path Synopsis
cmd
nokv command
nokv-config command
nokv-redis command
lsm
kv
scripts
tso command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL