0% found this document useful (0 votes)
17 views14 pages

Manu_ A Cloud Native Vector Database Management System

Uploaded by

dilip9999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Manu_ A Cloud Native Vector Database Management System

Uploaded by

dilip9999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Manu: A Cloud Native Vector Database Management System

Rentong Guo†∗ , Xiaofan Luan†∗ , Long Xiang‡∗ , Xiao Yan‡∗ , Xiaomeng Yi†∗ , Jigao Luo†§
Qianya Cheng† , Weizhi Xu† , Jiarui Luo‡ , Frank Liu† , Zhenshan Cao† , Yanliang Qiao† , Ting Wang†
Bo Tang‡ Charles Xie†
† Zilliz
‡ Department of Computer Science and Engineering, Southern University of Science and Technology
§ Technical University of Munich
† {firstname.lastname}@zilliz.com
‡ {xiangl3@mail., yanx@, 11911419@mail., tangb3@}sustech.edu.cn, § [email protected]
arXiv:2206.13843v1 [cs.DB] 28 Jun 2022

ABSTRACT PVLDB Artifact Availability:


With the development of learning-based embedding models, embed- The source code, data, and/or other artifacts have been made available at
ding vectors are widely used for analyzing and searching unstruc- https://github.com/milvus-io/milvus/tree/2.0.
tured data. As vector collections exceed billion-scale, fully managed
and horizontally scalable vector databases are necessary. In the 1 INTRODUCTION
past three years, through interaction with our 1200+ industry users, According to IDC, unstructured data, such as text, images, and video,
we have sketched a vision for the features that next-generation took up about 80% of the 40,000 exabytes of new data generated in
vector databases should have, which include long-term evolvability, 2020, their percentage keeps rising due to the increasing amount
tunable consistency, good elasticity, and high performance. of human-generated rich media [48]. With the rise of learning-
We present Manu, a cloud native vector database that imple- based embedding models, especially deep neural networks, using
ments these features. It is difficult to integrate all these features embedding vectors to manage unstructured data has become com-
if we follow traditional DBMS design rules. As most vector data monplace in many applications such as e-commerce, social media,
applications do not require complex data models and strong data and drug discovery [49, 63, 68]. A core feature of these applica-
consistency, our design philosophy is to relax the data model and tions is that they encode the semantics of unstructured data into a
consistency constraints in exchange for the aforementioned fea- high-dimensional vector space. Given the representation power of
tures. Specifically, Manu firstly exposes the write-ahead log (WAL) embedding vectors, operations like recommendation, search, and
and binlog as backbone services. Secondly, write components are analysis can be implemented via similarity-based vector search. To
designed as log publishers while all read-only analytic and search support these applications, many specialized vector databases are
components are designed as independent subscribers to the log ser- built to manage vector data [11, 13, 18–20, 81].
vices. Finally, we utilize multi-version concurrency control (MVCC) In 2019, we open sourced Milvus [81], our previous vector data-
and a delta consistency model to simplify the communication and base, under the LF AI & Data Foundation. Since then, we collected
cooperation among the system components. These designs achieve feed-backs from more than 1200 industry users and found that some
a low coupling among the system components, which is essential of the design principles adopted by Milvus are not suitable. Milvus
for elasticity and evolution. We also extensively optimize Manu for followed the design principles of relational databases, which are
performance and usability with hardware-aware implementations optimized for either transaction [52] or analytical [81] workloads,
and support for complex search semantics. Manu has been used and focused on functionality supports (e.g., attribute filtering and
for many applications, including, but not limited to, recommenda- multi-vector search) and execution efficiency (e.g., SIMD and cache
tion, multimedia, language, medicine and security. We evaluated optimizations). However, vector database applications have differ-
Manu in three typical application scenarios to demonstrate its effi- ent requirements in the following three aspects, which motivates
ciency, elasticity, and scalability. us to restructure Manu from scratch with focuses on a cloud-native
architecture.
PVLDB Reference Format:
• Support for complex transactions is not necessary. Instead
Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xiaomeng Yi, Jigao Luo,
Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang of decomposing entity representations into different fields or
Qiao, Ting Wang, Bo Tang, and Charles Xie. Manu: A Cloud Native Vector tables, learning-based models encode complex and hybrid data
Database Management System. PVLDB, 15(12): XXX-XXX, 2022. semantics into a single vector. As a result, multi-row or multi-
doi:XX.XX/XXX.XX table transactions are not necessary; row-level ACID is sufficient
for the majority of vector database applications.
∗ Co-first-authors are ordered alphabetically.
‡ Work done while working with Zilliz, correspondence to Bo Tang.
• A tunable performance-consistency trade-off is important.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International Different users have different consistency requirements; some
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of users prefer high throughput and eventual consistency, while
this license. For any use beyond those covered by this license, obtain permission by
emailing [email protected]. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment. doi:XX.XX/XXX.XX
Proceedings of the VLDB Endowment, Vol. 15, No. 12 ISSN 2150-8097.
1
others require some level of guaranteed consistency, i.e., newly • We summarize lessons learned from communicating with over
inserted data should be visible to queries either immediately or 1200 industry users over three years. We shed light on typical
within a pre-configured time. Traditional relational databases application requirements of vector databases and show how they
generally support either strong consistency or eventual consis- differ from those of traditional relational databases. We then
tency; there is little to no room for customization between these outline the key design goals that vector databases should meet.
two extremes. As such, tunable consistency is a crucial attribute • We introduce Manu’s key architectural designs as a cloud native
for cloud-native vector databases. vector database, building around the core design philosophy of
• High hardware cost calls for fine-grained elasticity. Some relaxing transaction complexity in exchange for tunable consis-
vector database operations (e.g., vector search and index build- tency and fine-grained elasticity.
ing) are computationally intensive, and hardware accelerators • We present important usability and performance-related en-
(e.g. GPUs or FPGAs) and/or a large working memory are re- hancements, e.g., high-level API, a GUI tool, automatic parameter
quired for good performance. However, depending on application configuration, and SSD support.
types, workload differs amongst database functionalities. Thus,
The rest of the paper is organized as follows. Section 2 pro-
resources can be wasted or improperly allocated if the vector
vides background on the requirements and design goals of vector
database does not have fine-grained elasticity. This necessitates
databases. Section 3 dives deep into Manu’s design. Section 4 high-
a careful decoupling of functional and hardware layers; system-
lights the key features for usability and performance. Section 5
level decoupling such as separation of read and write logic is
discusses representative use cases for Manu. Section 6 review re-
insufficient, elasticity and resource isolation should be managed
lated works. Section 7 concludes the paper and outlines future
at the functionalities level rather than the system level.
work.
In summary, modern vector databases should have tunable con-
sistency, functionality-level decoupling, and per-component scal- 2 BACKGROUND AND MOTIVATION
ability. Following the design principles of traditional relational
Consider video recommendation as a typical use case of vector
databases makes achieving these design goals extremely difficult, if
databases. The goal is to help users discover new videos based on
not impossible. A key opportunity for achieving these design goals
their personal preferences and previous browsing history. Using
lies in the potential for relaxing transaction complexity.
machine learning models (especially deep neural networks), fea-
Manu follows the “log as data” paradigm. Specifically, Manu struc-
tures of users and videos, such as search history, watch history,
tures the entire system as a group of log publish/subscribe micro-
age, gender, video language, and tags are converted to embedding
services. The write-ahead log (WAL) and inter-component mes-
vectors. These models are carefully designed and trained to encode
sages are published as “logs", i.e., durable data streams that can be
the similarity between user and video vectors into a common vec-
subscribed. Read-side components, such as search and analytical
tor space. Recommendation is conducted by retrieving candidate
engines, are all built as log subscribers. This architecture provides
videos from the collection of video vectors via similarity scores
a simple yet effective way to decouple system functionalities; it
with respect to the specified user vector. The system also needs
enables the decoupling of read from write, stateless from stateful,
to handle updates to vectors when new videos are updated, some
and storage from computing. Each log entry is assigned a global
videos are deleted and the embedding model is changed.
unique timestamp, and special log entries called time-tick (simi-
Video recommendation and other applications of vector databases
lar to watermarks in Apache Flink [26]) are periodically inserted
can involve hundreds of billions of vectors with daily growth at
into each log channel signaling the progress of event-time for log
hundred-million scale, and serve million-scale queries per second
subscribers. The timestamp and time-tick form the basis of the
(QPS). Existing DBMSs (e.g., relational databases [9, 12], NoSQL [76,
tunable consistency mechanism and multi-version consistency con-
86], NewSQL [40, 74]) were not built to manage vector data on that
trol (MVCC). To control the consistency level, a user can specify
scale. Moreover, the underlying data management requirements of
a tolerable time lag between a query’s timestamp and the latest
their applications differ greatly from vector database applications.
time-tick consumed by a subscriber.
First, when compared with relational databases, both the archi-
Additionally, we extensively optimize Manu for performance
tecture and theory of vector databases are far from mature. A key
and usability. Manu supports various indexes for vector search, in-
reason for this is that AI- and data-driven applications are still
cluding vector quantization [22, 34, 37, 83], inverted index [24], and
in a state of constant evolution, thereby necessitating continued
proximity graphs [33]. In particular, we tailor the implementations
architectural and functionality changes to vector databases as well.
to better utilize the parallelization capabilities of modern CPUs
Second, complex transactions are unnecessary for vector databases.
and GPUs along with the improved read/write speeds of SSDs over
In the above example, the recommendation system encodes all se-
HDDs. Manu also integrates refactored functionalities from Mil-
mantic features of users and videos into standalone vectors as
vus [81], such as attribute filtering and multi-vector search. More-
opposed to multi-row or multi-column entity fields in a relational
over, build a visualization tool that allows users to track the perfor-
database. As a result, row-level ACID is sufficient; multi-table oper-
mance of Manu in real time and include an auto-configuration tool
ations (such as joins) are inessential.
that recommends indexing algorithm parameters using machine
Third, vector database applications need a flexible performance-
learning.
consistency trade-off. While some applications adopt a strong or
To summarize, this paper makes the following contributions:
eventual consistency model, there are others that fall between the
two extremes. Users may wish to relax consistency constraints in
exchange for better system throughput. In the video recommen- ID Embedding Filtering Field
dation example, observing a newly uploaded video after several Primary Key Feature Vector Label Numerical attribute LSN
seconds is acceptable but keeping users waiting for recommenda-
tion harms user experience. Thus, the application can configure the User Fields System Fields
allowed maximal delay for the video updates in order to improve
Figure 1: An example of Manu’s schema.
system throughput.
Fourth, vector databases have more stringent and diversified • High performance: Query processing performance is key to
hardware requirements compared with traditional databases. This vector databases. For good performance, implementations to be
is attributed to three reasons. First, vector database operations extensively optimized for hardware. Moreover, the framework
are computation-intensive, and thus hardware accelerators such should be carefully designed so as to minimize system overheads
as GPUs are critical for computing functionalities such as search for query serving.
and indexing. Second, accesses to vector data (e.g., search or up- • Strong adaptability: Our customers use vector databases in a
date) generally have poor locality, thereby requiring large RAM variety of environments, ranging from prototyping on laptops to
for good performance. Third, different applications vary signifi- large-scale deployments on the cloud. A vector database should
cantly in their resource demands for the system functionalities. provide consistent user experience and reduce code/data migra-
Core functionalities of a vector database include data insertion, tion overhead across environments.
indexing, filtering, and vector search. Applications such as video
recommendation require online insertion and high concurrency 3 THE MANU SYSTEM
vector search. In contrast, for interactive use cases such as drug dis- In this section, we begin by first introducing the basic concepts of
covery, offline data ingestion and indexing are generally acceptable. Manu. Next, we present the system designs, including the overall
Although interactive applications usually require lower throughput system architecture, the log backbone, and how Manu conducts
than recommendation systems, they have high demands for real- vector searches and builds vector search indexes.
time filtering, similarity-based vector search, and hybrid queries.
The high hardware costs as well as diverse workload features call 3.1 Schema, Collection, Shard, and Segment
for fine-grained elasticity.
The key design goals of Manu are summarized below; these Schema: The basic data types of Manu are vector, string, boolean,
design goals not only fully encompass the above characteristics but integer, and floating point. A schema example is given in Figure 1.
also share some common goals with generic cloud-based databases. Suppose each entity consists of five fields and corresponds to a
product on an e-commerce platform. The Primary key is the ID
• Long-term evolvability: Overall system complexity must be
of the entity. It can either be an integer or a string. If users do
controlled for the continuous evolution of Manu’s functionalities.
not specify this field, the system will automatically add an integer
Without the need to support complex transactions, there lies an
primary key for each entity. The Feature vector is the embedding of
opportunity to model all the event sequences (such as WAL and
the product. The Label is the category of the product, such as food,
inter-component messages) as message queues to cleanly decou-
book, and cloth. The Numerical attribute is a float or an integer
ple the entire system. In this way, individual components can
associated with the product, such as price, weight, or production
evolve, be added, or be replaced easily with minimal interference
date. Manu supports multiple labels and numerical attributes in
to other components. This design echos large-scale data analytic
each entity. Note that these fields are used for filtering, rather than
platforms, which often rely on data streaming systems such as
join or aggregation. The Logical sequence number (LSN) is a system
Kafka to connect system components.
field hidden from users.
• Tunable consistency: To enable flexible consistency-performance
trade-off, Manu should introduce delta consistency that falls be- Collection: A Collection is a set of entities similar to the concept of
tween strong consistency and eventual consistency, where a read tables in relational databases. For example, a collection can contain
operation returns the last value that was produced at most delta all the products of an e-commerce platform. The key difference is
time units preceding itself. It’s worth noting that strong consis- that collections have no relations with each other; thus, relational
tency and eventual consistency can be realized as special cases algebra, such as join operations, are not supported.
of this model, with delta being zero and infinity, respectively. Shard: The Shard correspondence to insertion/deletion channel.
• Good elasticity: Workload fluctuations can cause different loads Entities are hashed into multiple shards based on their primary keys
on individual system components. In order to dynamically al- during insertion/deletion. Manu’s data placement unit is segment
locate compute resources to high-load tasks, components must rather than shard. 1
be carefully decoupled, taking both functionality and hardware Segment: Entities from each shard are organized into segments. A
dependencies into consideration. System elasticity and resource segment can be in either a growing or sealed state. Sealed segments
isolation should be managed at the component-level rather than are read-only while growing segments can accept new entities. A
at the system-level (e.g. decoupling indexing from querying ver- growing segment will switch to sealed state when it reaches a prede-
sus decoupling read from write). fined size (set to 512MB by default) or if a period of time has passed
• High availability: Availability is a must-have for modern cloud-
1 Using segments for data placement is more flexible than shards, as the number of
based applications; Manu must isolate system failures at the
shards is static, while the number of segments grows as the volume of the collection
component level and make failure recovery transparent. increases.
The Log System
Access Proxy Proxy Proxy Proxy
Layer Cache Cache Cache Cache Logger WAL (Stream) Binlog (Batch)

Query Data Index


Coordinator Root
Coordinator Coordinator Coordinator
Layer Metadata Metadata Metadata Metadata
Cache Cache Cache Cache Data
Data Node
DataNode
Node Query
Data Node
DataNode
Node Index
Data Node
DataNode
Node More Analytic
Node
Query Node Data Node Index Node
Worker
Layer
The Log Subscribers
Key Value
Storage Key Value Figure 3: Overview of Manu’s log system.
Layer Key Value
KV-Storage Object-Storage tasks and do not need to coordinate with each other. This ensures
Figure 2: The Architecture of Manu. that computation intensive (thus expensive) worker nodes can be
easily scaled on demand. We use different worker nodes for dif-
without an insertion (e.g., 10 seconds). As some segments may be ferent tasks, i.e., query nodes for query processing, index nodes for
small (e.g., when insertion has a low arrival rate), Manu merges index building, and data nodes for log archiving. Due to the fact that
small segments into larger ones for search efficiency. the workloads for different computation tasks vary significantly
over time and across applications, each worker type can scale inde-
pendently. This design also achieves resource isolation as different
3.2 System Architecture
computation tasks have different QoS requirements.
Manu adopts a service-oriented design [65] to achieve fine-grained
decoupling among the system components. As shown in Figure 2, Storage layer persists system status, metadata, the collections,
from top to bottom, Manu has four layers, i.e., access layer, coordi- and associated indexes. Manu uses etcd [7] (a key-value store)
nator layer, worker layer, and storage layer. to host system status and metadata for the coordinators as etcd
provides high availability with its leader election mechanism for
Access layer consists of stateless proxies that serve as the user failure recovery. When metadata is updated, the updated data is first
endpoints. They work in parallel to receive requests from clients, written to etcd, and then synchronized to coordinators. Since the
distribute the requests to the corresponding processing components, volume of other data (e.g., binlog, data, index) is large, Manu uses
and aggregate partial search results before returning to clients. Fur- AWS S3 [14] (an object storage) for persistence due to its high
thermore, the proxies cache a copy of the metadata for verifying the availability and low cost. The API of many other object storage
legitimacy of search requests (e.g., whether the collection to search systems is compatible with AWS S3. This allows Manu to easily
exists). Search request verification is lightweight and moving it to swap storage engines, if necessary. At present, storage engines
the proxies has two key benefits. First, requests that fail verification including AWS S3, MinIO [8], and Linux file system are supported.
are rejected early, thus lowering the load on other systems compo- Note that the high latency that comes with object storage is not a
nents. Second, it reduces the number of routing hops for requests, performance bottleneck as the worker nodes conduct computation
thus shortening request processing latency. tasks on in-memory, read-only copies of data.
Coordinator layer manages system status, maintains metadata of
the collections, and coordinates the system components for process- 3.3 The Log Backbone
ing tasks. There are four coordinators, each responsible for different The log system is the backbone of Manu, which connects the de-
tasks. Root coordinator handles data definition requests, such as coupled system components. As shown in Figure 3, Manu exposes
creating/deleting collections, and maintains meta-information of the write-ahead log (WAL) and binlog as backbone services. The
the collections. Data coordinator records detailed information about WAL is the incremental part of system log while the binlog is the
the collections (e.g., the routes of the segments on storage), and base part; they complement each other in delay, capacity and cost.
coordinates the data nodes to transform data update requests into Loggers are entry points for publishing data onto the WAL. Data
binlogs [4]. Query coordinator manages the status of the query nodes subscribe to the WAL and convert the row-based WALs into
nodes, and adjusts the assignment of segments (along with related column-based binlogs. All read-only components such as index
indexes) to query nodes for load balancing. Index coordinator main- nodes and query nodes are independent subscribers to the log ser-
tains meta-information of the indexes (e.g., index types and storage vice to keep themselves up-to-date. This architecture completely
routes), and coordinates index nodes in index building tasks. A decouples the write and read components, thus allowing the compo-
coordinator can have multiple instances (e.g., one main and two nents (e.g., WAL, binlog, data nodes, index nodes and query nodes)
backups) for reliability. As vector databases usually do not have to scale independently.
the cross table operations that relational databases have, different Manu records all the requests that change system state to the
collections can be served by separate coordinator instances for log, including data definition requests (e.g., create/delete collection),
throughput. data manipulation requests (e.g., insert/delete a vector), and sys-
Worker layer conducts the actual computation tasks. The worker tem coordination messages (e.g., load/dump a collection to/from
nodes are stateless—they fetch read-only copies of data to conduct memory). Note that vector search requests are not written to the
Cloud-native Cloud-native
log as they are read-only operations and do not change system Event-stream Engine Object Storage
state. We use logical logs instead of physical logs, as logical logs WAL Channel 1 Binlog
focus on event recording, rather than describing the modifications Logger 1 Data
Node
to physical data pages. This allows the subscribers to consume the Cache WAL Channel 2

log data in different ways depending on their functions. Hash TSO


WAL Channel 3
Ring
Figure 4 illustrates the detailed architecture of the log system. Logger 2 Data
SSTable

For the sake of clarity, we only illustrate the parts related to insert Cache WAL Channel 4 Node

requests. The loggers are organized in a hash ring, and each logger Proxy
Proxy
Proxy
handles one or more logical buckets in the hash ring based on
consistent hashing. Each shard corresponds to a logical bucket in
the hash ring and a WAL channel. Each entity in insert requests
is hashed to a shard (and thus channel) based on their ID. When Figure 4: Detailed structure of Manu’s log system.
a logger receives a request, it will first verify the legibility of the
request, assign an LSN for the logged entity by consulting the Table 1: Major indexes in Manu
central time service oracle (TSO), determine the segment the entity
should go to, and write the entity to WAL. The logger also writes Vector Quantization PQ, OPQ, RQ, SQ
the mapping of the new entity ID to segment ID into a local LSM Inverted Index IVF-Flat, IVF-PQ, IVF-SQ, IVF-HNSW, IMI
tree and periodically flushes the incremental part of the LSM tree Proximity Graph HNSW, NSG, NGT
Numerical Attribute B-Tree, Sorted List
to object storage, which keeps the entity to segment mapping using
the SSTable format of RocksDB. Each logger caches the segment
mapping (e.g., for checking if the entity to delete exists) for the can be stale for up to delta time units, with respect to time of the last
shards it manages by consulting the SSTable in object storage. data update, where delta is an user-specified “staleness tolerance”
The WAL is row-based and read in a streaming manner for low given in virtual time.
delay and fine-grained log pub/sub. It is implemented via a cloud- In practice, users prefer to define temporal tolerance as physical
based message queue such as Kafka or Pulsar. We use multiple time, e.g., 10 seconds. Manu achieves this by making the LSN as-
logical channels for the WAL in order to prevent different types of signed to each request extremely close to physical time. Manu uses
requests from interfering with each other, thus achieving a high a hybrid logical clock in the TSO to generate timestamps. Each
throughput. Data definition requests and system coordination mes- timestamp has two components: a physical component that tracks
sages use their own channels while data manipulation requests physical time, and a logical component that tracks event order. The
hashed across multiple channels to increase throughput. logical component is needed since multiple events may happen at
Data nodes subscribe to the WAL and convert the row-based the same physical time unit. Since a timestamp is used as a request’s
WALs into column-based binlogs. Specifically, values from the same LSN, the value of the physical component indicates the physical
field (e.g., attribute and vector) from the WALs are stored together time when the request was received by Manu.
in a column format in binlog files. The column-based nature of For a log subscriber, e.g., a query node, to run the delta consis-
binlog makes it suitable for reading per field values in batches, thus tency model, it needs to know three things: (1) the user-specified
increasing storage and IO efficiency. An example of this efficiency staleness tolerance 𝜏, (2) the time of the last data update, and (3) the
comes with the index nodes. Index nodes only read the required issue time of the search request. In order to let each log subscriber
fields (e.g., attribute or vector) from the binlog for index building know (2), we introduce a time-tick mechanism. Special control mes-
and thus are free from the read amplifications. sages called time-ticks (similar to watermarks in Apache Flink [26])
are periodically inserted into each log channel (for example, WAL
System coordination: Inter-component messages are also passed
channel) signaling the progress of data synchronization. Denote
via log, e.g., data nodes announce when segments are written to
the latest time-tick a subscriber consumed as 𝐿𝑠 and the issue time
storage and index nodes announce when indexes have been built.
of a query as 𝐿𝑟 , if 𝐿𝑟 − 𝐿𝑠 < 𝜏 is not satisfied, the query node will
This is because the log system provides a simple and reliable mecha-
wait for the next time-tick before executing the query.
nism for broadcasting system events. Moreover, the time semantics
Note that strong consistency and eventual consistency are two
of the log system provide a deterministic order for coordination
special cases of delta consistency, where delta equals to 0 and infin-
messages. For example, when a collection should be released from
ity, respectively. To the best of our knowledge, our work is the first
memory, the query coordinator publishes the request to log, and
to support delta consistency in a vector database.
does not need to confirm whether the query nodes receive the mes-
sage or handle query node failure. The query nodes independently
3.5 Index Building
subscribe to the log and asynchronously release segments of the
collection. Searching similar vectors in large collections by brute-force, i.e.,
scanning the whole dataset, usually yields unacceptably long de-
lays. Numerous indexes have been proposed to accelerate vector
3.4 Tunable Consistency search and Manu automatically builds user specified indexes. Ta-
We adopt a delta consistency model to enable flexible performance- ble 1 summarizes the indexes currently supported by Manu, and
consistency trade-offs, which guarantees a bounded staleness of we are continuously adding new indexes following the latest in-
data seen by search queries. Specifically, the data seen by a query dexing algorithms. These indexes differ in their properties and use
cases. Vector quantization (VQ) [34, 45] methods compress vectors entity similarity function. For more details about how Manu han-
to reduce memory footprint and the costs for vector distance/simi- dles attribute filtering and multi-vector search, interested readers
larity computation. For example, scalar quantization (SQ) [91] maps can refer to Milvus [81].
each dimension of vector (data types typically are int32 and float) For vector search, Manu partitions a collection into segments
to a single byte. Inverted indexes [69] group vectors into clusters, and distributes the segments among query nodes for parallel exe-
and only scan the most promising clusters for a query. Proximity cution. 2 The proxies cache a copy of the distribution of segments
graphs [33, 42, 61] connect similar vectors to form a graph, and on query nodes by inquiring the query coordinator, and dispatch
achieve high accuracy and low latency at the cost of high mem- search requests to query nodes that hold segments of the searched
ory consumption [54]. Besides vector indexes, Manu also supports collection. The query nodes perform vector searches on their local
indexes on the attribute field of the entities to accelerate attribute- segments without coordination using a two-phase reduce procedure.
based filtering. For a top-𝑘 vector search request, the query nodes search their local
There are two index building scenarios in Manu, i.e., batch in- segments to obtain the segment-wise top-𝑘 results. These results
dexing and stream indexing. Batch indexing occurs when the user are merged by each query node to form the node-wise top-𝑘 results.
builds an index for an entire collection (e.g., when all vectors are Then, the node-wise top-𝑘 results are aggregated by the proxy for
updated with a new embedding model). In this case, the index co- the global top-𝑘 results and returned to the application. To handle
ordinator obtains the paths of all segments in the collection from the deletion of vectors, the query nodes use a bitmap to record
the data coordinator, and instructs index nodes to build indexes the deleted vectors in each segment and filter the deleted vectors
for each segment. Stream indexing happens when users contin- from the segment-wise search results. Users can configure Manu to
uously insert new entities, and indexes are built asynchronously batch search requests to improve efficiency. In this case, the proxies
on-the-fly without stopping search services. Specifically, after a seg- organize cache search requests if results of the previous batches
ment accumulates a sufficient number of vectors, its resident data have not been returned yet. In the cache, requests of the same type
node seals the segment and writes it to object storage as a binlog. (i.e., target the same collection and use the same similarity function)
The data coordinator then notifies the index coordinator, which are organized into the one batch and handled by Manu together.
instructs a index node to build index for the segment. The index Manu also allows maintaining multiple hot replicas of a collection
node loads only the required column (e.g., vector or attribute) of to serve queries for availability and throughput.
the segment from object storage for indexing building to avoid read Query nodes obtain data from three sources, i.e., the WAL, the
amplification. For entity deletions, Manu uses a bitmap to record index files, and the binlog. For data in the growing segments, query
the deleted vectors and rebuilds the index for a segment when a nodes subscribe to the WAL and conduct searches using brute
sufficient number of its entities have been deleted. In both batch force scan so that updates can be searched within a short delay. A
and stream indexing scenarios, after the required index is built for a dilemma for segment size is that larger size yields better search
segment, the index node persists it in the object storage and sends efficiency once the index is built but brute force scan on growing
the path to the index coordinator, which notifies the query coordi- segment is also more costly. To tackle this problem, we divide each
nator so that query nodes can load the index for processing queries. segment into slices (each containing 10,000 vectors by default). New
The index coordinator also monitors the status of the index nodes data are inserted into the slices sequentially, and after a slice is
and shuts down idle index nodes to save costs. As vector indexes full, a light-weight temporary index (e.g., IVF-FLAT) is built for it.
generally have sub-linear search complexity w.r.t. the number of Empirically, we observed that the temporary index brings up to 10X
vectors, searching a large segment is cheaper than several small speedup for searching growing segments. When a segment changes
segments, Manu builds joint indexes on multiple segments when from growing state to sealed state, its index will be built by an index
appropriate. node and then stored in object storage. After that, query nodes are
notified to load the index and replace the temporary index.
Query nodes access the binlog for data when the distribution
3.6 Vector Search of segments among the query nodes changes, which may happen
Manu supports classical vector search, attribute filtering, and multi- during scaling, load-balancing, query node failure and recovery.
vector search. For classical vector search, the distance/similarity Specifically, the query coordinator manages the segment distribu-
function can be Euclidean distance, inner product or angular dis- tion and monitors the query nodes for liveness and workload to
tance. Attribute filtering is useful when searching vectors similar coordinate failure recovery and scaling. On failure recovery, the
to the query subject to some attribute constraints. For example, an segments and their corresponding indexes (if they exist) handled
e-commerce platform may want to find products that interest the by failed query nodes are loaded to the healthy ones. 3 In the case
customer and cost less than 100$. Manu supports three strategies of scaling down, a query node can be removed once other query
for attribute filtering and uses a cost-based model to choose the nodes load the indexes for the segments it handles from the object
most suitable strategy for each segment. Multi-vector search is storage. When scaling up, the query coordinator assigns some of
required when an entity is encoded by multiple vectors, for exam-
ple, a product can be described by both embeddings of its image 2 Manu loads all data to the query nodes as different queries may access different parts
and embeddings of its text description. In this case, the similarity of the data, and a hot compute side cache is necessary for low latency. This is different
function between entities is defined as a composition of similarity from general cloud DBMSs that decouple compute and storage (e.g., Snowflake [29]),
which only fetch the required data to compute side upon request.
functions on the constituting vectors. Manu supports two strategies 3 The WAL channels subscribed to by failed query nodes are also assigned to healthy
for multi-vector search and chooses the one to use according to the ones.
Table 2: Main commands of Python-based PyManu API
Methods Description
Collection(name, schema) Create collection with name str and schema schema
Collection.insert(vec) Insert vector vec into collection
Collection.delete(expr) Delete vectors satisfying boolean expression expr from collection
Collection.create_index(field, params) Create index on a field of the vectors using parameters params
Collection.search(vec, params) Vector search for vec with parameters params
Collection.query(vec, params, expr) Vector search for vec with boolean expression expr as filters

the segments to the newly added nodes. A new query node can join applications can migrate with little or no change across different
after it loads the assigned segments, and existing query nodes can deployment scenarios.
release the segments no longer handled by them. The query coordi-
nator also balances the workloads (and memory consumption) of 4.2 Good Usability
the query nodes by migrating segments. Note that Manu does not
Data pipelines interact with Manu in simple ways: vector collec-
ensure that segment redistribution is atomic, and a segment can re-
tions, updates for vector data and search requests are fed to Manu,
side on more than one query node. This does not affect correctness
and Manu returns the identifiers of the search results for each
as the proxies remove duplicate result vectors for a query.
search request, which can be used to retrieve objects (e.g.. images,
advertisements, movies) in other systems. Because different users
4 FEATURE HIGHLIGHTS adopt different programming languages and development envi-
In this part, we introduce several key features of Manu for usability ronments, Manu provides APIs in popular languages including
and performance. Python, Java, Go, C++, along with RESTful APIs. As an example, we
show key commands of the Python-based PyManu API in Table 2,
which uses the object-relational mapping (ORM) model and most
4.1 Cloud Native and Adaptive commands are related to the collection class. As shown in Table 2,
The primary design goal of Manu is to be a cloud native vector PyManu allows users to manage collections and indexes, update
database such that it fits well into cloud-based data pipelines. To collections, and conduct vector searches. The search command is
this end, Manu decouples system functionalities into storage, coor- used for similarity-based vector search while the query command
dinators, and workers in the overall design. For storage, Manu uses is mainly used for attribute filtering. We show an example of con-
a transaction KV for metadata, message queues for logs, and an ob- ducting top-𝑘 vector search by specifying the parameters in params
ject KV for data, which are all general storage services provided by in as follows.
major cloud vendors and thus enables easy deployment. For coordi- query_param = {
nators that manage system functionalities, Manu uses the standard "vec": [[0.6, 0.3, ..., 0.8]],
one main plus two hot backups configuration for high availability. "field": "vector",
For workers, Manu decouples vector search, log archiving and in- "param": {"metric_type": "Euclidean"},
dex building tasks for component-wise scaling, a model suitable for "limit": 2,
cloud-based on-demand resource provisioning. The log backbone "expr": "product_count > 0",
}
allows the system components to interact by writing/reading logs
res = collection.search(**query_param)
in their own ways. This enables the system components to evolve
independently and makes it easy to add new components. The log In the above example, the search request provides a high dimen-
backbone also provides consistent time semantics in the system, sional vector [0.6, 0.3, ..., 0.8] as query and searches the feature
which are crucial for deterministic execution and failure recovery. vector field of the collection. The similarity function is Euclidean
Our customers use vector databases in the entire life-cycle of distance and the targets are the top-2 similar vectors in the collec-
their applications. For example, an application usually starts with tion (i.e., with limit = 2).
data scientists conducting proof of concept (PoC) on their personal For easy system management, Manu provides a GUI tool called
computers. Then, it is migrated to dedicated clusters for testing Attu, for which a screen shot is shown in Figure 5. In the system view,
and finally deployed on the cloud. Thus, to reduce migration costs, users can observe overall system status including queries processed
our customers expect vector databases to adapt to different deploy- per second (QPS), average query latency, and memory consumption
ment scenarios while providing a consistent set of APIs. To this on the top of screen. By clicking a specific service (e.g., data service),
end, Manu defines unified interface for the system components users can view detailed information of the worker nodes for the
but provides different invocation methods and implementations for service on the side. We also allow users to add and drop worker
different platforms. For example, on cloud, local cluster and per- nodes with mouse clicks. In the collection view, users can check the
sonal computer, Manu uses cloud service APIs, remote procedure collections in the system, load/dump collections to/from memory,
call (RPC) and direct function calls to invoke system functionali- delete/import collections, check the index built for the collections,
ties, respectively. The object KV can be the local file system (e.g., and build new indexes. In the vector search view, users can check
MinIO [8]) on personal computers, and S3 on AWS. Thus, Manu the search traffic and performance on each collection, configure the
0DQXN 0DQXN 0DQXN
0LOYXVN 0LOYXVN 0LOYXVN
1.25
1.00

Latency (s)
0.75
0.50
0.25
0.00
0 500 1000 1500
Time (s)
Figure 6: Manu and Milvus for mixed workloads, numbers
Figure 5: A screenshot of Attu, the GUI tool of Manu.
behind legends (e.g., 1k) indicate insertion rate.
index and search parameters to use for each collection. The vector SSD is 100x cheaper than dram and offers 10x larger bandwidth
search view also allows to issue queries for functionality test. than HDD. thus, Manu supports using SSD to store large vector
For vector search, using different parameters for the indexes collections on cheap query nodes with limited dram capacity. the
(e.g., neighbor size 𝑀 and queue size 𝐿 for HNSW [61]) yields dif- challenge is that SSD bandwidth is still much smaller than dram,
ferent trade-offs among cost, accuracy, and performance. However, which may lead to low query processing throughput and thus ne-
even experts find it difficult to set proper index parameters as the cessitates careful designs for storage layout and index structure. as
parameters are interdependent and their influences vary across SSD reads are conducted with 4kb blocks (i.e., reading less than
collections. Manu adopts a Bayesian Optimization with Hyperband 4kb has the same cost as reading 4kb), Manu organizes the vectors
(BOHB) [32] method to automatically explore good index param- into buckets whose sizes are close to but smaller than 4kb. 4 this is
eter configurations. Users provide a utility function to score the achieved by conducting hierarchical k-means for the vectors and
configurations (e.g., according to search recall, query throughput) controlling the sizes of the clusters. each bucket is stored on 4kb
and set a budget to limit the costs of parameter search. BOHB starts aligned blocks on SSD for efficient read and represented by its k-
with a group of initial configurations and evaluates their utilities. means center in dram. these centers are organized using existing
Then, Bayesian Optimization is used to generate new candidate vector search indexes (e.g., ivf-flat, hnsw).
configurations according to historical trials and Hyperband is used vector search with SSD is conducted in two stages. first, we
to allocate budgets to different areas in the configuration space. search the cluster centers in dram for the ones that are most similar
The idea is to prioritize the exploration of areas close to high util- to the query. then, the corresponding buckets are loaded from
ity configurations to find even better configurations. Manu also SSD for scan. to reduce the amount of data fetched from SSD, we
supports sampling a subset of the collection for the trails to reduce compress the vectors using scalar quantization, which has negligible
search costs. We are still improving the automatic parameter search influence on the quality of search results according to our trials.
module and plan to extend it to searching system configurations another problem is that k-means can put vectors similar to a query
(e.g., the number and type of query nodes). into several buckets but the centers of some buckets may not be
similar to the query, which leads to a low recall. to tackle this
4.3 Time Travel problem, Manu uses a strategy similar to multiple hash tables in
Users often need to rollback the database to fix corrupted data or locality sensitive hashing [41]. hierarchical k-means is conducted
code bugs. Manu allows users to specify a target physical time 𝑇 by multiple times, each time assigning a vector to a bucket. this
for database restore, and jointly uses checkpoint and log replay for means that a vector is replicated multiple times in SSD and we index
rollback. We mark each segment with its progress 𝐿 and periodi- all cluster centers for bucket search in dram. Manu’s SSD solution
cally checkpoints the segment map for a collection, which contains wins track 2 (search with SSD) of the billion-scale approximate
information (such a route, rather than data) of all its segments. To nearest neighbor search challenge at neurips’2021 [3]. tests results
restore the database at time 𝑇 , we read the closest checkpoint before show that Manu’s solution improves the recall of the competition
𝑇 , load all segments in the segment map and replay the WAL log for baseline by up to 60% at the same query processing throughput. 5
each segment from its local progress 𝐿. This design reduces storage we notice that another work adopts similar designs for SSD-based
consumption as we do not write entire collection for each check- vector search [27].
point. Instead, segments that have no changes are shared among
checkpoints. The replay overhead is also reduced as each segment 5 USE CASES AND EVALUATION
has its own progress. Users can also specify a expiration period to Before introducing the use cases of Manu, we first compare Manu
delete outdated log and segments to reduce storage consumption. with Milvus, our previous vector database. Milvus adopts an even-
tual consistency model and thus does not support the tunable con-
4.4 Hardware Optimizations sistency of Manu. To show the advantages brought by Manu’s
Manu comes with extensively optimized implementations for CPU, 4 we set the bucket size to a few times (e.g., 4 and 8) of 4kb if the size of an individual
GPU and SSD for efficiency. For more details about our CPU and vector is large.
GPU optimizations, interested readers can refer to Milvus [81]. 5 for more details about results please refer to [72].
0DQX+16: 9DOG1*7 9HDUFK+16: (6+16:
Movie Finance
0DQX,9)B)/$7 9HVSD+16: 9HDUFK,9)B)/$7
Music News
Information 10k
8k
8k

Query per Second

Query per Second


Friends RecSys Security
6k 6k
Fraud
Product 4k 4k
2k 2k
Protein Manu Video

0.8 0.9 1.0 0.80 0.85 0.90 0.95 1.00


Medical Medicine Multimedia Recall Recall
SIFT10M (Euclidean) DEEP10M (Inner Product)
Language Audio
DNA
Image Figure 8: Recall vs. throughout comparison.
Q&A

Dialogue

contents boils down to finding content vectors that are similar to


Figure 7: The use cases of Manu. user query.
Security: Blocking spams and scanning viruses are important for
security. The common practice is to map spams and viruses into
fine-grained functionality decomposition, we create a mixed work- vectors using hashing [59] or tailored algorithms [39]. After that,
load. Specifically, we start with an empty collection, insert vectors suspicious spams and viruses can be checked by finding the most
at a fixed rate (e.g., 2k vectors per second), and measure the latency similar candidates in the corpus for further check.
for search requests over time. Both Manu and Milvus use 6 nodes Medicine: Many medical applications search for certain chemical
and are properly configured for good performance. The results in structures and gene sequences for drug discovery or health risk
Figure 6 show that the search latency of Milvus is significantly identification. With tools such as GNN [68] and LSTM [84], chemi-
longer than Manu, especially when insertion rates are high (e.g., cal structures and gene sequences can be embedded into vectors
at 3k and 4k). Milvus has multiple read nodes, but only one write and their search tasks are cast into vector search.
node, to ensure eventual consistency. The write node responsible Full-fledged vector databases are necessary for the forgoing do-
for data insertion and index construction, and thus write tasks and mains as they require much more complex functionality support
index building tasks contend for resource. As a result, the index in addition to vector search. Specifically, as the vector datasets
building latency is long and brute force search is used for a large are large and applications have high requirements for throughput,
amount of data. In contrast, with dedicated index nodes, Manu they need distributed computing with multiple nodes for scalability.
finishes index building quickly and thus search latency remains The vectors are also continuously updated when new user/content
low over the entire period. comes, user behavior changes or the embedding model is updated.
Since most of these applications serve end users, they require high
5.1 Overview of Use Cases availability and durability. Some of our customers have deployed
Manu in their production environment, and they found Manu satis-
We classify our customers into 5 application domains in Figure 7 factory in terms of usability, performance, elasticity, and adaptabil-
and briefly elaborate them as follows. ity. In what follows, we simulate some typical application scenarios
Recommendation: Platforms for e-commerce [79], music [77], of our customers to demonstrate the advantages of Manu.
news [55], video [28], and social network [44] record user-content
interactions, and use the data to map users and contents to em- 5.2 Example Use Cases
bedding vectors with techniques such as ALS [75] and deep learn-
ing [50]. Finding contents of interest for a user is conducted by Due to business security, the names of the customers are anony-
searching content vectors having large similarity scores (typically mous. For the experiments, we use two datasets widely used for
inner product) with user vector. vector search research, i.e., SIFT [5](with 128-dim vectors) and
DEEP [2] (with 96-dim vectors), and extract sub-datasets with the re-
Multimedia: Multimedia contents (e.g., images, video and audio) quired sizes. By default, we use two query nodes, one data node and
are becoming increasingly popular, and searches for multimedia one index node for Manu. Each worker node is an EC2 m5.4xlarge
contents from large corpus are common online. The general prac- instance running on Amazon Linux AMI version 5.4.129. For in-
tice is to embed both user query and corpus contents into vectors dex, we experiment with IVF-FLAT [46] and HNSW [61], which
using tools such as CNN [49] and RNN [87]. Searching multimedia are widely used in practice. When comparing Manu with other
contents is conducted by finding vectors similar to the user query. systems, we always ensure that the systems use the same resource
Language: Automatic questing answering and machine-based dia- and are properly configured. Due to time and expense limits, we
logue attract much attention recently with products such as Siri [15] are only able to compare with some vector databases in a subset
and Xiaoice [21], and searches for text contents is a general need. of the experiments. We search the top-50 most similar vectors for
With models such as Word2Vec [63] and BERT [31], language se- each query, and ensure that average search recall is above 0.8 if
quences are embedded into vectors such that retrieving language recall is not reported explicitly.
Nember of Used Query Nodes 0DQX,9) 0DQX+16: 9HDUFK,9) 9HDUFK+16:
2 4 8 16 10k 8k
300

8k

Query per Second

Query per Second


Query per Second
Query workload
8k 250
6k

)
Search latency

Latency ms
6k
200 6k
4k

(
150
4k
4k
100
2k 2k
2k
50
2 4 6 8 10 2 4 6 8 10
0 Query Node Number Query Node Number
0 4 8 12
Time hour
(
16
)
20 24
SIFT10M (Euclidean) DEEP10M (Inner Product)

Figure 9: Search workload, query latency, and number of Figure 10: Scalability of Manu w.r.t. query nodes.
query nodes used by Manu over time. Different colors indi-
0DQX,9) 0DQX+16: 9HDUFK,9) 9HDUFK+16:
cate different number of query nodes are used.
10k
8k
8k

Query per Second

Query per Second


6k 6k
E-commerce recommendation. Company A is a leading online 4k
shopping platform in China that mainly sells clothing and make-
4k
ups. They use Manu for recommendation, and products are recom- 2k 2k
mended to a user according to their similarity scores with the user
embedding vector. They have three main requirements for vector 20M 40M 60M 80M 100M 20M 40M 60M 80M 100M
Dataset Size Dataset Size
database: (1) high throughput as they need to handle the requests SIFT (Euclidean) DEEP (Inner Product)
of many concurrent costumers; (2) high quality search results for
good recommendation effect; (3) good elasticity for low costs as Figure 11: Scalability of Manu w.r.t. data volume.
their search requests have large fluctuations over time (peaks in
evening but very low in midnight, very high at promotion events). The black line reports the search latency and shows that Manu can
In Figure 8, we compare the recall-throughput performance of keep search latency within the target range via scaling.
Manu with Elasticsearch (ES for short) [6], Vearch [52], Vald [18],
and Vespa [19], four popular open-source vector search systems, Video deduplication. Company B is a video sharing website in
when using a single node. Note that the ES we use is the latest 8.0 Europe, on which users can upload videos and share with others.
version with tailored support for vector search instead of ES Plugin. They find that there are many duplicate videos that result in high
We use Euclidean distance for SIFT and inner product for DEEP to management costs and thus conduct deduplication before archiving
test different similarity functions. Datasets with 10 million (10M) the videos. They model a video as a set of its critical frames and
vectors are used as ES takes too much time to build index for larger encode each frame into a vector. They use vector search to find
dataset. As Vald only supports the NGT index [10] and Vespa only videos in the corpus that are most similar to a new video and
supports the HNSW index [61] (both are efficient proximity graphs), conduct further checking on the shortlisted videos to determine if
we have only a single curve for them in each plot. The results show the new video is a duplicate. They also use vector search to find
that Manu consistently outperforms the baselines across different videos similar to those viewed by users for recommendation. They
datasets and similarity functions. ES and Vearch achieve signifi- require vector DBMS to have good scalability with respect to both
cantly lower query processing throughput than Manu at the same data volume and computing resource as their corpus grows quickly.
recall. This is because that ES is a disk-based solution and Vearch’s In Figure 10 and Figure 11, we test the scalability of Manu when
three-layer aggregation procedure (searcher-broker-blender) for changing the number of query nodes and the size of dataset, respec-
search results introduces high overhead. The performances of Vald tively. The results show that query processing throughput scales
and Vespa are much better than ES and Vearch but still inferior com- almost linearly with the number of query nodes and the reciprocal
pared with Manu. We conjecture this is because Manu has better of dataset size. The observation is consistent for different datasets,
implementations with optimizations for CPU cache and SIMD. indexes and similarity functions. This is because Manu uses seg-
To test the elasticity of Manu, we use the search traffic of an ments to distribute search tasks among query nodes. With segment
e-commerce platform over one day period [17], which is plotted as size fixed, each query node handles more segments when data vol-
the purple curve in Figure 9. The results show that search workload ume increases, and fewer segments when the number of query
fluctuates violently over time, and the peak is much higher than nodes increases. Note that better scalability w.r.t. data volume can
the valley. We use SIFT100M as the dataset and Euclidean distance be achieved by configuring Manu to use larger segments when
as the similarity function. Manu is configured to reduce query dataset size increases. This is because similarity search indexes
nodes by 0.5x when search latency is shorter than 100ms and add usually have sub-linear complexity w.r.t. dataset size.
query nodes to 2x when search latency is over 150ms. The colors in Virus scan. Company C is a world leading software security service
Figure 9 indicate the number of query nodes used by Manu, which provider and one of its main service is scanning viruses for smart
shows that Manu has good elasticity to adapt to query workload. phones. They have a virus base that continuously collects new
500 200ms 500 200ms classified into four categories, i.e., space partitioning tree (SPT),
400 400ms 400ms locality sensitive hashing (LSH), vector quantization (VQ) and prox-
800ms 400 800ms
Latency (ms)

Latency (ms)
300 imity graph (PG). SPT algorithms divide the space into areas, and
300 use tree structures to quickly narrow down search results to some
200 200 areas [30, 58, 64, 71, 80]. LSH algorithms design hash functions such
100 100 that similar vectors are hashed to the same bucket with high prob-
ability, and examples include [35, 36, 41, 43, 53, 56, 57, 60, 70, 90].
0 0 1000 2000 0 0 1000 2000 VQ algorithms compress vectors and accelerate similarity compu-
Grace time(ms) Grace time (ms)
tation by quantizing the vectors using a set of vector codebooks,
(a) SIFT10M (Euclidean) (b) DEEP10M (Inner Product) and well-known VQ algorithms include [23, 34, 42, 45, 91]. PG al-
Figure 12: The relation between search latency and grace gorithms form a graph by connecting a vector with those most
time, legends stand for the time tick interval. similar to it in the dataset, and conduct vector search by graph
walk [33, 61, 73, 89]. Different algorithms have different trade-offs,
3000 2000
HNSW HNSW
e.g., LSH is cheap in indexing building but poor in result quality,
2500 IVF_FLAT
1500
IVF_FLAT VQ reduces memory and computation but also harms result quality,
2000 PG has high efficiency but requires large memory. Manu supports
Time (s)

Time (s)

1500 1000 a comprehensive set of search algorithms such that users can trade
1000 off between different factors.
500
500 Vector databases. Vector data management solutions have gone
0 0 through two stages of development. Solutions in the first stage
20M 40M 60M 80M 100M 20M 40M 60M 80M 100M
Dataset Size Dataset Size are libraries (e.g., Facebook Faiss [46], Microsoft SPTAG [16], HN-
(a) SIFT (Euclidean) (b) DEEP (Inner Product) SWlib [61] and Annoy [1]) and plugins (e.g., ES plugins [6], Postgres
plugins [12]) for vector search. They are insufficient for current ap-
Figure 13: Index construction time of Manu vs. data volume. plications as full-fledged management functionalities are required,
e.g., distributed execution for scalability, online data update, and
viruses and develop specialized algorithms to map virus and user failure recovery. Two OLAP database systems, AnalyticDB-V [82]
APK to vector embedding. To conduct a virus scan, they find viruses and PASE [85] support vector data by adding a table column to
in their base that have embedding similar to the query APK and then store them but lacks optimizations tailored for vector data.
compare the search results with the APK in more detail. They have The second stage solutions are full-fledged vector databases such
two requirements for vector DBMS: (1) short delay for streaming as Vearch [52], Vespa [19], Weaviate [20], Vald [18], Qdrant [13],
update as new viruses (vectors) are continuously added to their Pinecone [11], and our previous effort Milvus [81]. 6 Vearch uses
virus base and vector search needs to observe the latest viruses Faiss as the underlying search engine and adopts a three-layer ag-
with a short delay; (2) fast index building as they frequently adjust gregation procedure to conduct distributed search. Similarly, Vespa
their embedding algorithm to fix problems, which leads to update distributes data over nodes for scalability. A modified version of the
of the entire dataset and requires to rebuild index. HNSW algorithm is used to support online updates for vector data,
In Figure 12, we show the average delay of search requests for and Vespa also allows attribute filtering during search and learning-
Manu. Recall that grace time (i.e., 𝜏) means that a search request based inference on search results (e.g., for re-ranking). Weaviate
must observe updates that happen time 𝜏 before it, and is con- adopts a GraphQL interface and allows storing objects (e.g., texts,
figurable by users. The legends correspond to different time tick images), properties, and vectors. Users can directly import vec-
interval, with which the loggers write time tick to WAL. The results tors or customize embedding models to map objects to vectors,
show that search latency decreases quickly with grace time, and and Weaviate can retrieve objects based on vector search results.
shorter time tick interval results in shorter search latency. This is Vald supports horizontal scalability by partitioning a vector dataset
because with longer grace time, search requests can tolerate longer into segments and builds indexes without stopping search services.
update delay and are less likely to wait for updates. When the time Qdrant is a single-machine vector search engine with extensive
tick interval reduces, each segment can confirm that all updates support for attribute filtering. It allows filtering with various data
have been received more quickly, thus the search requests wait for types and query conditions (e.g., string matching, numerical ranges,
a shorter time. In Figure 13, we report the index building time of geo-locations), and uses a tailored optimizer to determine the fil-
Manu when changing data volume. The results show that index tering strategy. Note that Vespa, Weaviate and Vald only support
building time scales linearly with data volume. This is because proximity graph index.
Manu builds index for each segment and larger data volume leads We can observe that these vector databases focus on different
to more segments. functionalities, e.g., learning-based inference, embedding genera-
tion, object retrieval, and attribute filtering. Thus, we treat evolv-
6 RELATED WORK ability as first class priority when design Manu such that new func-
tionalities can be easily introduced. Manu also differs from these
Vector search algorithms. Vector search algorithms have a long
research history, and most works focus on efficient approximate 6 Pinecone is offered as SaaS and closed source. Thus, we do not know its design
search on large-scale datasets. Existing algorithms can be roughly details.
vector databases in important perspectives. First, the log backbone
of Manu provides time semantics and allows tunable consistency. • Multi-way search: Many applications jointly search multiple
Second, Manu decomposes system functionalities with fine granu- types of contents, e.g., vector and primary key, vector and text.
larity and instantiates them as cloud services for performance and The log system of Manu allows to add search engines for other
failure isolation, and thus is more suitable for cloud deployment. contents (e.g., primary key and text) as co-processors by sub-
Third, Manu comes with more comprehensive optimizations for scribing to the log stream. We will explore how multiple search
usability and performance, e.g., support various indexes, hardware engines can interact efficiently and how to flexibly coordinate
tailored implementations, and GUI tools. different search engines to meet application requirements.
Cloud native databases. Many OLAP databases are designed to
• Modularized algorithms: We think vector search algorithms
run on the cloud recently and examples include Redshift [38], Big-
can be distilled into independent components, e.g., compression
Query [62], Snowflake [29] and AnalyticDB [88]. Redshift is a data
for memory reduction and efficient computation, indexing for
warehouse system offered as a service on Amazon Web Service and
limiting computation to a small portion of vectors, and bucketing
adopts a shared-nothing architecture. It scales by adding or remov-
for grouping similar vectors. Existing vector search algorithms
ing EC2 instances, and data is redistributed in the granularity of
only explore some combinations of techniques for different com-
columns. Snowflake uses a shared-data architecture by delegating
ponents. We will provide a unified framework for vector search
data storage to Amazon S3. Compute nodes are stateless and fetch
such that users can flexibly combine different techniques accord-
read-only copies of data for tasks, and thus can be easily scaled. For
ing to their desired trade-off between cost and performance.
efficiency, high-performance local disk is used to cache hot data.
Aurora [78] and PolarDB Serverless [51] are two cloud native • Hierarchical storage aware index: Current vector search in-
OLTP databases. Aurora uses a shared-disk architecture and pro- dex assumes a single type of storage, e.g., GPU memory, main
poses the “log is database" principle by pushing transaction pro- memory or disk. We will explore indexes that can jointly utilize all
cessing down to the storage engine. It observes that the bottleneck devices on the storage hierarchy. For example, most applications
of cloud-based platforms has shifted from computation and storage have some hot vectors (e.g., popular products in e-commerce)
IO to network IO. Thus, it only persists redo log for transaction that are frequently accessed by search requests, which can be
processing and commits transaction by processing log according to placed in fast storage. As a query accesses only a portion of
LSN. PolarDB Serverless adopts a disaggregation architecture, which the vectors and a node processes many concurrent queries, the
uses high-speed RDMA network to decouple hardware resources storage swap latency may be hidden by pipelining.
(e.g., compute, memory and storage) as resource pools.
Our Manu follows the general design principles of cloud native • Advanced hardware: NVM [67] costs about one-third of DRAM
databases to decouple the system functionalities at fine granularity for unit capacity but provides comparable read bandwidth and
for high elasticity, fast evolution and failure isolation. However, we latency comparable, which makes it a good choice for replacing
also consider the unique design opportunities of vector databases to expensive DRAM when storing large datasets. RDMA [25, 47]
trade the simple data model and weak consistency requirement for significantly reduces the communication latency among nodes,
performance, cost and flexibility. Specifically, complex transactions and NVLink [66] directly connects GPUs with much larger band-
are not supported and the log backbone is utilized to support tunable width than PCIe. By exploiting these fast connections, we will
consistency-performance trade-off. Moreover, vector search, index explore indexes and search algorithms that jointly use multiple
building and log archiving tasks are further decoupled as their devices. We are also working with hardware vendors to apply
workloads may have significant variations. FPGA and MLU for vector search and index building.
• Embedding generation toolbox: For better application level-
integration, we plan to incorporate a application-oriented toolbox
for generating embedding vectors. This toolbox would incorpo-
7 CONCLUSIONS AND FUTURE DIRECTIONS rate model fine-tuning in addition to providing a number of
pre-trained models that can be used out-of-the-box, allowing for
In this paper, we introduce the designs of Manu as a cloud native
rapid prototyping.
vector database. To ensure that Manu suits vector data applications,
we set ambitious design goals, which include good evolvability, tun-
able consistency, high elasticity, good efficiency, and etc. To meet
these design goals, Manu trades the simple data model of vectors ACKNOWLEDGMENTS
and weak consistency requirement of applications for performance, Manu is a multi-year project open sourced by Zilliz. The devel-
costs and flexibility. Specifically, Manu conducts fine-grained de- opment of Manu involves many engineers in its community. In
coupling of the system functionalities for component-wise scaling particular, we thank Bingyi Sun, Weida Zhu, Yudong Cai, Yihua
and evolution, and uses the log backbone to connect the system Mo, Xi Ge, Yihao Dai, Jiquan Long, Cai Zhang, Congqi Xia, Xuan
components while providing time semantics and simplifying inter- Yang, Binbin Lv, Xiaoyun Liu, Wenxing Zhu, Yufen Zong, Jie Zeng,
component interaction. We also introduce important features such Shaoyue Chen, Jing Li, Zizhao Chen, Jialian Ji, Min Tian, Yan Wang
as high-level API, GUI tool, hardware optimizations, and complex and all the other contributors in the community for their contri-
search. We think Manu is still far from perfect and some of our butions. We also thank Filip Haltmayer for proofreading the paper
future directions include: and valuable suggestions to improve paper quality.
REFERENCES [34] Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product
[1] 2021. Annoy: Approximate Nearest Neighbors Oh Yeah. https://github.com/ quantization for approximate nearest neighbor search. In Proceedings of the IEEE
spotify/annoy. Conference on Computer Vision and Pattern Recognition. 2946–2953.
[2] 2021. Benchmarks for Billion-Scale Similarity Search. https://research.yandex. [35] Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in
com/datasets/biganns. high dimensions via hashing. In Vldb, Vol. 99. 518–529.
[3] 2021. Billion-Scale Approximate Nearest Neighbor Search Challenge. https://big- [36] Long Gong, Huayi Wang, Mitsunori Ogihara, and Jun Xu. 2020. iDEC: indexable
ann-benchmarks.com. distance estimating codes for approximate nearest neighbor search. Proceedings
[4] 2021. binlog. https://hevodata.com/learn/using-mysql-binlog/. of the VLDB Endowment 13, 9 (2020).
[5] 2021. Datasets for approximate nearest neighbor search. http://corpus-texmex. [37] Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quan-
irisa.fr/. tization based fast inner product search. In Artificial Intelligence and Statistics.
[6] 2021. ElasticSearch: Open Source, Distributed, RESTful Search Engine. https: PMLR, 482–490.
//github.com/elastic/elasticsearch. [38] Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Ste-
[7] 2021. etcd. https://etcd.io/. fano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for
[8] 2021. MinIO. https://min.io/. Simpler Data Warehouses. In Proceedings of the 2015 ACM SIGMOD International
[9] 2021. MySQL. https://www.mysql.com/. Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June
[10] 2021. NGT. https://github.com/yahoojapan/NGT. 4, 2015. ACM, 1917–1923.
[11] 2021. Pinecone. https://www.pinecone.io/. [39] Michael Hersovici, Michal Jacovi, Yoelle S Maarek, Dan Pelleg, Menachem Shtal-
[12] 2021. PostgreSQL: The World’s Most Advanced Open Source Relational Database. haim, and Sigalit Ur. 1998. The shark-search algorithm. an application: tailored
https://www.postgresql.org/. web site mapping. Computer Networks and ISDN Systems 30, 1-7 (1998), 317–326.
[13] 2021. Qdrant. https://qdrant.tech/. [40] Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu
[14] 2021. S3. https://aws.amazon.com/cn/s3/. Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP
[15] 2021. siri. https://www.apple.com/siri/. database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072–3084.
[16] 2021. SPTAG: A library for fast approximate nearest neighbor search. https: [41] Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards
//github.com/microsoft/SPTAG. removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM
[17] 2021. User Behavior Data from Taobao for Recommendation. https://tianchi. symposium on Theory of computing. 604–613.
aliyun.com/dataset/dataDetail?dataId=649. [42] Masajiro Iwasaki and Daisuke Miyazaki. 2018. Optimization of indexing based on
[18] 2021. Vald. https://github.com/vdaas/vald. k-nearest neighbor graph for proximity search in high-dimensional data. arXiv
[19] 2021. Vespa. https://vespa.ai/. preprint arXiv:1810.07355 (2018).
[20] 2021. Weaviate. https://github.com/semi-technologies/weaviate. [43] Omid Jafari, Parth Nagarkar, and Jonathan Montaño. 2020. mmLSH: A Practical
[21] 2021. Xiaoice. https://en.wikipedia.org/wiki/Xiaoice. and Efficient Technique for Processing Approximate Nearest Neighbor Queries on
[22] Reza Akbarinia, Esther Pacitti, and Patrick Valduriez. 2007. Best Position Algo- Multimedia Data. In International Conference on Similarity Search and Applications.
rithms for Top-k Queries. In Proceedings of the 33rd International Conference on Springer, 47–61.
Very Large Data Bases (Vienna, Austria). VLDB Endowment, 495–506. [44] Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with
[23] Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2015. Cache trust propagation for recommendation in social networks. In Proceedings of the
Locality is Not Enough: High-Performance Nearest Neighbor Search with Product fourth ACM conference on Recommender systems. 135–142.
Quantization Fast Scan. Proc. VLDB Endow. 9, 4, 288–299. [45] Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization
[24] Artem Babenko and Victor S. Lempitsky. 2015. The Inverted Multi-Index. IEEE for nearest neighbor search. IEEE transactions on pattern analysis and machine
Trans. Pattern Anal. Mach. Intell. 37, 6 (2015), 1247–1260. https://doi.org/10.1109/ intelligence 33, 1 (2010), 117–128.
TPAMI.2014.2361319 [46] Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity
[25] Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xun- search with gpus. IEEE Transactions on Big Data (2019).
tao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, et al. 2021. PolarDB Serverless: [47] Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA
A Cloud Native Database for Disaggregated Data Centers. In Proceedings of the efficiently for key-value services. In Proceedings of the 2014 ACM Conference on
2021 International Conference on Management of Data. 2477–2489. SIGCOMM. 295–306.
[26] Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, [48] Timothy King. 2019. 80 Percent of Your Data Will Be Unstructured in Five
and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a Years. https://solutionsreview.com/data-management/80-percent-of-your-data-
single engine. Bulletin of the IEEE Computer Society Technical Committee on Data will-be-unstructured-in-five-years/.
Engineering 36, 4 (2015). [49] Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images,
[27] Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zhiyong Zheng, speech, and time series. The handbook of brain theory and neural networks 3361,
Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale 10 (1995), 1995.
Approximate Nearest Neighborhood Search. Advances in Neural Information [50] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature
Processing Systems 34 (2021). 521, 7553 (2015), 436–444.
[28] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks [51] Feifei Li. 2019. Cloud native database systems at Alibaba: Opportunities and
for youtube recommendations. In Proceedings of the 10th ACM conference on Challenges. Proc. VLDB Endow. 12, 12 (2019), 2263–2272.
recommender systems. 191–198. [52] Jie Li, Haifeng Liu, Chuanghua Gui, Jianyu Chen, Zhenyuan Ni, Ning Wang,
[29] Benoît Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin and Yuan Chen. 2018. The design and implementation of a real time visual
Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, search system on jd e-commerce platform. In Proceedings of the 19th International
Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Middleware Conference Industry. 9–16.
Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. [53] Mingjie Li, Ying Zhang, Yifang Sun, Wei Wang, Ivor W Tsang, and Xuemin
The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Lin. 2020. I/O efficient approximate nearest neighbour search based on learned
Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, functions. In 2020 IEEE 36th International Conference on Data Engineering (ICDE).
USA, June 26 - July 01, 2016. ACM, 215–226. IEEE, 289–300.
[30] Sanjoy Dasgupta and Yoav Freund. 2008. Random projection trees and low [54] Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and
dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional
Theory of computing. 537–546. data—experiments, analyses, and improvement. IEEE Transactions on Knowledge
[31] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. and Data Engineering 32, 8 (2019), 1475–1488.
BERT: Pre-training of Deep Bidirectional Transformers for Language Under- [55] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news
standing. In Proceedings of the 2019 Conference of the North American Chap- recommendation based on click behavior. In Proceedings of the 15th international
ter of the Association for Computational Linguistics: Human Language Tech- conference on Intelligent user interfaces. 31–40.
nologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 [56] Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, and Lu Qin. 2019. I-LSH:
(Long and Short Papers). Association for Computational Linguistics, 4171–4186. I/O efficient c-approximate nearest neighbor search in high-dimensional space.
https://doi.org/10.18653/v1/n19-1423 In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE,
[32] Stefan Falkner, Aaron Klein, and Frank Hutter. 2017. Combining hyperband and 1670–1673.
bayesian optimization. In NIPS 2017 Bayesian Optimization Workshop (Dec 2017). [57] Kejing Lu and Mineichi Kudo. 2020. R2LSH: A Nearest Neighbor Search Scheme
[33] Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast approximate Based on Two-dimensional Projected Spaces. In 2020 IEEE 36th International
nearest neighbor search with the navigating spreading-out graph. Proceedings of Conference on Data Engineering (ICDE). IEEE, 1045–1056.
the VLDB Endowment 12, 5 (2019), 461–474. [58] Kejing Lu, Hongya Wang, Wei Wang, and Mineichi Kudo. 2020. VHP: approximate
nearest neighbor search via virtual hypersphere partitioning. Proceedings of the
VLDB Endowment 13, 9 (2020), 1443–1455.
[59] Lailong Luo, Deke Guo, Richard TB Ma, Ori Rottenstreich, and Xueshan Luo. [80] Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, and
2018. Optimizing bloom filter: Challenges, solutions, and comparisons. IEEE Xian-Sheng Hua. 2014. Trinary-Projection Trees for Approximate Nearest Neigh-
Communications Surveys & Tutorials 21, 2 (2018), 1912–1949. bor Search. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2 (2014), 388–403.
[60] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2017. In- [81] Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi-
telligent probing for locality sensitive hashing: Multi-probe LSH and beyond. angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus:
(2017). A Purpose-Built Vector Data Management System. In Proceedings of the 2021
[61] Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate International Conference on Management of Data. 2614–2627.
nearest neighbor search using hierarchical navigable small world graphs. IEEE [82] Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and
transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836. Yuanzhe Cai. 2020. AnalyticDB-V: A Hybrid Analytical Engine Towards Query
[62] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shiv- Fusion for Structured and Unstructured Data. Proc. VLDB Endow. 13, 12 (2020),
akumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of 3152–3165.
Web-Scale Datasets. Proc. VLDB Endow. 3, 1 (2010), 330–339. [83] Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N
[63] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Holtmann-Rice, David Simcha, and Felix Yu. 2017. Multiscale quantization for
Distributed representations of words and phrases and their compositionality. In fast similarity search. Advances in Neural Information Processing Systems 30
Advances in neural information processing systems. 3111–3119. (2017), 5745–5755.
[64] Marius Muja and David G Lowe. 2014. Scalable nearest neighbor algorithms [84] SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and
for high dimensional data. IEEE transactions on pattern analysis and machine Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning ap-
intelligence 36, 11 (2014), 2227–2240. proach for precipitation nowcasting. In Advances in neural information processing
[65] Michael P Papazoglou and Willem-Jan Van Den Heuvel. 2006. Service-oriented systems. 802–810.
design and development methodology. International Journal of Web Engineering [85] Wen Yang, Tao Li, Gai Fang, and Hong Wei. 2020. PASE: PostgreSQL Ultra-High-
and Technology 2, 4 (2006), 412–442. Dimensional Approximate Nearest Neighbor Search Extension. In Proceedings of
[66] Carl Pearson, I-Hsin Chung, Zehra Sura, Wen-Mei Hwu, and Jinjun Xiong. 2018. the 2020 International Conference on Management of Data, SIGMOD Conference
NUMA-aware data-transfer measurements for power/NVLink multi-GPU sys- 2020, online conference [Portland, OR, USA], June 14-19, 2020. ACM, 2241–2253.
tems. In International Conference on High Performance Computing. Springer, 448– [86] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion
454. Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10,
[67] Jie Ren, Minjia Zhang, and Dong Li. 2020. Hm-ann: Efficient billion-point nearest 10-10 (2010), 95.
neighbor search on heterogeneous memory. Advances in Neural Information [87] Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural
Processing Systems 33 (2020). network regularization. arXiv preprint arXiv:1409.2329 (2014).
[68] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele [88] Chaoqun Zhan, Maomeng Su, Chuangxian Wei, Xiaoqiang Peng, Liang Lin, Sheng
Monfardini. 2008. The graph neural network model. IEEE transactions on neural Wang, Zhe Chen, Feifei Li, Yue Pan, Fang Zheng, et al. 2019. Analyticdb: Real-time
networks 20, 1 (2008), 61–80. olap database system at alibaba cloud. Proceedings of the VLDB Endowment 12,
[69] Falk Scholer, Hugh E Williams, John Yiannis, and Justin Zobel. 2002. Compression 12 (2019), 2059–2070.
of inverted indexes for fast query evaluation. In Proceedings of the 25th annual [89] Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neigh-
international ACM SIGIR conference on Research and development in information bor Search on GPU. In 36th IEEE International Conference on Data Engineering,
retrieval. 222–229. ICDE 2020, Dallas, TX, USA, April 20-24, 2020. IEEE, 1033–1044.
[70] Anshumali Shrivastava and Ping Li. 2014. Asymmetric LSH (ALSH) for sublinear [90] Bolong Zheng, Zhao Xi, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu,
time maximum inner product search (MIPS). Advances in neural information and Christian S Jensen. 2020. PM-LSH: A fast and accurate LSH framework for
processing systems 27 (2014). high-dimensional approximate NN search. Proceedings of the VLDB Endowment
[71] Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image 13, 5 (2020), 643–655.
descriptor matching. In 2008 IEEE Conference on Computer Vision and Pattern [91] Wengang Zhou, Yijuan Lu, Houqiang Li, and Qi Tian. 2012. Scalar quantization for
Recognition. IEEE, 1–8. large scale image search. In Proceedings of the 20th ACM international conference
[72] Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, on Multimedia. 169–178.
Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar
Krishnaswamy, Gopal Srinivasa, et al. 2022. Results of the NeurIPS’21 Chal-
lenge on Billion-Scale Approximate Nearest Neighbor Search. arXiv preprint
arXiv:2205.03763 (2022).
[73] Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar
Krishnaswamy, and Rohan Kadekodi. 2019. Rand-NSG: Fast Accurate Billion-
point Nearest Neighbor Search on a Single Node. In Advances in Neural Informa-
tion Processing Systems 32: Annual Conference on Neural Information Processing
Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 13748–
13758.
[74] Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis,
Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et al. 2020.
Cockroachdb: The resilient geo-distributed sql database. In Proceedings of the
2020 ACM SIGMOD International Conference on Management of Data. 1493–1509.
[75] Gábor Takács and Domonkos Tikk. 2012. Alternating least squares for personal-
ized ranking. In Proceedings of the sixth ACM conference on Recommender systems.
83–90.
[76] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka,
Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a
warehousing solution over a map-reduce framework. Proceedings of the VLDB
Endowment 2, 2 (2009), 1626–1629.
[77] Aäron Van Den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep
content-based music recommendation. In Neural Information Processing Systems
Conference (NIPS 2013), Vol. 26. Neural Information Processing Systems Founda-
tion (NIPS).
[78] Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam,
Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz
Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations
for High Throughput Cloud-Native Relational Databases. In Proceedings of the
2017 ACM International Conference on Management of Data, SIGMOD Conference
2017, Chicago, IL, USA, May 14-19, 2017. ACM, 1041–1052.
[79] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun
Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation
in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining. 839–848.

You might also like