A real time search index for
e-commerce
Umesh Prasad
Thejus V M
Oh!! Out Of Stock
Damn !! Out of Stock
Damn !! Missed the Offer
E-commerce Index Attributes
catalogue
service
Promise
Engine
Availability
Service
Seller
Rating
LISTING
PRODUCT aka SKU
Offer
Engine
Pricing
Engine
Out Of Stock, but Why Show?
Index has Stale
Availability Data
234K
Products
Outline
❏ E-commerce search Challenge
❏ Challenges in Keeping an Inverted Index Updated
❏ Our approach to Near Real Time indexing
Challenge 1 : Update rates
updates / sec
max update
/hr
min max
text /
catalogue ~10 ~100 ~100K
pricing ~100 ~1K ~10 million
availability ~100 ~10K ~10 million
offer ~100 ~10K ~10 million
seller
rating ~10 ~1K ~1 million
signal 6 ~10 ~100 ~1 million
signal 7 ~100 ~10K ~10 million
signal 8 ~100 ~10K ~10 million
Challenge 2 : Lucene Index Update
● Lucene doesn’t support Partial Updates.
● Update = Delete Old Doc + Add New Document
– Recreate the entire document for every update
– Not friendly with multiple micro-services with
different update rates
● Problem Compounded By MarketPlace
● Product + All Its Listings == SINGLE BLOCK
● BLOCK structure chosen for query performance ( ~100X
better latencies)
Challenge 3 : Refresh Cycle
Ingestion pipeline Solr
Master
Solr
Slave
Solr
Slave
Solr
Slave
Solr
Slave
Solr
Slave
Solr
Slave
Commit
fsync
Replication
Open new
Index
Open new
Index
Open new
Index
Open new
Index
Open new
Index
Open new
Index
Batch of
documents
ProductA
brand : Apple
availability : T
price : 45000
ProductB
brand : Samsung
availability : F
price : 23000
ProductC
brand : Apple
availability : T
price : 5000
Document ID
Mappings
Posting List
(Inverted Index)
DocValues
(columunar data)
Lucene Segment
Lucene Index
0 ProductA
1 ProductB
2 ProductC
45000 23000 5000Price
availability : T
brand : Samsung
brand : Apple 0 , 1
2
0 , 2
Terms Sparse
Bitsets
Root Cause :Updating Data Structures
Term 3 Bitset 3
POSTING LIST
……………
…………...
Millions of Terms
BitSet 1Term 1
BitSet 2Term 2
BitSet 3Term 3
Document
Term1 Term2
Term3 Term4
……………
…………...
Thousands of Terms
Posting List / Bit Set
D : 0 1 0 0 0 0 1 0 0 0 0 0 0 1
S: 2,7,14
SE : 2,5,7
Yes
May Be
NO
Updatable ?
Millions of
Documents
Outline
❏ E-commerce search Challenge
❏ Challenges in Keeping an Inverted Index Updated
❏ Our approach to Near Real Time indexing
A Typical Search Flow
Query Rewrite
Results
Query
Matching
Ranking Faceting
Stats
Posting List
Doc Values
Other
Components
Lucene Segment
Inverted Index
Forward Index
NRT Store
NRT Forward Index - Considerations
● Lookup efficiency
– 50th percentile : ~10K matches
– 99th percentile : ~1 million matches
● Data on Java heap
– Memory efficiency
● Hook it to Lucene
NRT Store - Forward Index Naive
NRT Forward IndexLucene Segment
Lookup Engine
0 ProductB
1 ProductA
2 ProductC
3 ProductD
ProductC
ProductA
ProductB
ProductC
ProductD
True
False
False
True
100
150
200
250
ProductId(3) <ProductC,price>
DocId : 3
field : price
200
ProductId Availability Price
Latency : ~10 secs for ~1 Million
lookups
NRT Store - Forward Index Optimized
NRT Forward Index (Segment Independent)
Lucene Segment
Lookup Engine
0 ProductB
1 ProductA
2 ProductC
3 ProductD
100 200 250 150
NrtId(3)
2
DocId : 3
field : price
200
Availability
Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
T F F T
DocId - NrtId
0
1
2
3
3
0
1
2
Price(2
)
200
NRT Store - Invert index
NRT Forward Store
NRT Invert Store
NRT Inverter
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
Availability : T 0 3
Offer : O1 2 3
Availability:T
Matching
BitSet
Near Real Time Solr Architecture
Solr
Kafka
Ingestion pipeline
NRT Forward
Index
Ranking
Macthing
Faceting
Redis
Bootstrap
NRT Inverted
store
Solr Master
NRT Updates
Text Updates
Catalogue
Pricing
Availability
Offers
Seller
Quality
Commit
+
Replicate
+
Reopen
Lucene
Others
Accomplishments
● Real time consumption for Ranking Signals
● BBD saw upto ~30K updates/second
● Query latency comparable to DocValues
– Consistent 99% performance
Thank you
&
Questions
A Typical Search Flow
Query Rewrite
Results
Query
Matching
Ranking Faceting
Stats
Posting List
Doc Values
Schema
Other
Components
Lucene Index
Inverted Index
Forward Index
Schema
NRT Store
Lucene Index
0 availability:true 0,2
1 availability:false 1
0 brand:adidas 0,1
1 brand:nike 2
1 price:230 1
2 price:250 0
term ords Terms
Dictionary
Posting List
(inverted index)
Doc Value
(Forward index)
field 0 1 2
price 2 2 3
brand 0 0 1
availability 0 1 0
docId External ID Brand Availability Price
0 ProductA Adidas True 250
1 ProductB Adidas False 230
2 ProductC Nike True 500
● Lucene Index = Multiple Mini Indexes aka
Segments
● Lucene Segment
○ Write Once → Immutable Data structures
○ Posting Listing ( Sparse encoded bitsets)
○ Doc Values (Columnar Data structures)
Lucene Index
0 availability:true 0,2
1 availability:false 1
0 brand:adidas 0,1
1 brand:nike 2
1 price:230 1
2 price:250 0
term ords Terms
Dictionary
Posting List
(inverted index)
Doc Value
(Forward index)
field 0 1 2
price 2 2 3
brand 0 0 1
availability 0 1 0
docId External ID Brand Availability Price
0 ProductA Adidas True 250
1 ProductB Adidas False 230
2 ProductC Nike True 500
● Lucene Index = Multiple Mini Indexes aka
Segments
● Lucene Segment
○ Write Once → Immutable Data structures
○ Posting Listing ( Sparse encoded bitsets)
○ Doc Values (Columnar Data structures)
C5 : Lucene in-place update
● Only numeric / byte Array fields
● Updates to go through the entire refresh cycle
● Not exposed via Solr
Forward Index - API Hook
● Lucene API Hook
– ValueSource
● Input
– Lucene Internal Document Id
– Field Name
● Output
– Field Value
NRT Store - Inverted Index
● Input
– Lucene Segment
– query
• Field Name : Field Value
• offer : o1
● Output
– DocSet (posting list)

More Related Content

PDF
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
PDF
Consuming RealTime Signals in Solr
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PDF
The delta architecture
PDF
Considerations for Data Access in the Lakehouse
PDF
Introduction to elasticsearch
PPTX
Introduction to Elasticsearch
PPTX
Introduction to Elasticsearch with basics of Lucene
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Consuming RealTime Signals in Solr
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
The delta architecture
Considerations for Data Access in the Lakehouse
Introduction to elasticsearch
Introduction to Elasticsearch
Introduction to Elasticsearch with basics of Lucene

What's hot (20)

PDF
Change Data Feed in Delta
PPTX
Ozone- Object store for Apache Hadoop
ODP
Deep Dive Into Elasticsearch
PDF
Building real time analytics applications using pinot : A LinkedIn case study
PPTX
Elastic search overview
ODP
Query DSL In Elasticsearch
PDF
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
PDF
Introduction to elasticsearch
PDF
The Parquet Format and Performance Optimization Opportunities
PPT
5.1 mining data streams
PDF
ETL and Event Sourcing
PPTX
Apache Kudu: Technical Deep Dive


PDF
Elasticsearch From the Bottom Up
PPTX
Neural Search Comes to Apache Solr
PPTX
Free Training: How to Build a Lakehouse
PPTX
ElasticSearch Basic Introduction
PDF
Apache Lucene/Solr Document Classification
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
ODP
Elasticsearch for beginners
PDF
Elk - An introduction
Change Data Feed in Delta
Ozone- Object store for Apache Hadoop
Deep Dive Into Elasticsearch
Building real time analytics applications using pinot : A LinkedIn case study
Elastic search overview
Query DSL In Elasticsearch
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Introduction to elasticsearch
The Parquet Format and Performance Optimization Opportunities
5.1 mining data streams
ETL and Event Sourcing
Apache Kudu: Technical Deep Dive


Elasticsearch From the Bottom Up
Neural Search Comes to Apache Solr
Free Training: How to Build a Lakehouse
ElasticSearch Basic Introduction
Apache Lucene/Solr Document Classification
Large Scale Lakehouse Implementation Using Structured Streaming
Elasticsearch for beginners
Elk - An introduction
Ad

Viewers also liked (20)

PDF
Search@flipkart
PDF
Webinar: Replace Google Search Appliance with Lucidworks Fusion
PDF
Apache Solr 5.0 and beyond
PDF
Webinar: Fusion for Business Intelligence
PDF
Webinar: Search and Recommenders
PDF
Understanding the Solr security framework - Lucene Solr Revolution 2015
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
PDF
What's New in Apache Solr 4.10
PDF
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
PDF
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
PDF
What's new in Solr 5.0
PDF
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
PPTX
Scaling SolrCloud to a large number of Collections
PDF
Ease of use in Apache Solr
PDF
it's just search
PDF
Solr security frameworks
PDF
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
PDF
SolrCloud Cluster management via APIs
PDF
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Search@flipkart
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Apache Solr 5.0 and beyond
Webinar: Fusion for Business Intelligence
Webinar: Search and Recommenders
Understanding the Solr security framework - Lucene Solr Revolution 2015
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
What's New in Apache Solr 4.10
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
What's new in Solr 5.0
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Scaling SolrCloud to a large number of Collections
Ease of use in Apache Solr
it's just search
Solr security frameworks
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
SolrCloud Cluster management via APIs
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Ad

Similar to Slash n near real time indexing (20)

PPTX
Near RealTime search @Flipkart
PDF
near real time search in e-commerce
PDF
Real-time search in Drupal with Elasticsearch @Moldcamp
PDF
Real-time search in Drupal. Meet Elasticsearch
PDF
SFScon19 - Martin Malfertheiner - Writing to ElasticSearch
PDF
Use of-solr-at-trovit-classified-ads marc-sturlese
PPTX
Elastic search Walkthrough
PPTX
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
PPTX
IT talk SPb "Full text search for lazy guys"
PPTX
ElasticSearch AJUG 2013
PPTX
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
PPTX
ElasticSearch - DevNexus Atlanta - 2014
PPTX
Elasticsearch
PDF
Batch Indexing & Near Real Time, keeping things fast
PDF
E-commerce Search Engine with Apache Lucene/Solr
PDF
Hibernate Search Seam 1.5
PPTX
Elasticsearch an overview
PDF
Introduction to Elasticsearch
PDF
Enhancement of Searching and Analyzing the Document using Elastic Search
PDF
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Near RealTime search @Flipkart
near real time search in e-commerce
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal. Meet Elasticsearch
SFScon19 - Martin Malfertheiner - Writing to ElasticSearch
Use of-solr-at-trovit-classified-ads marc-sturlese
Elastic search Walkthrough
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
IT talk SPb "Full text search for lazy guys"
ElasticSearch AJUG 2013
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
ElasticSearch - DevNexus Atlanta - 2014
Elasticsearch
Batch Indexing & Near Real Time, keeping things fast
E-commerce Search Engine with Apache Lucene/Solr
Hibernate Search Seam 1.5
Elasticsearch an overview
Introduction to Elasticsearch
Enhancement of Searching and Analyzing the Document using Elastic Search
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)

Recently uploaded (20)

PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
Java Basics-Introduction and program control
PDF
20250617 - IR - Global Guide for HR - 51 pages.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PPTX
CONTRACTS IN CONSTRUCTION PROJECTS: TYPES
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
PPTX
Management Information system : MIS-e-Business Systems.pptx
DOC
T Pandian CV Madurai pandi kokkaf illaya
PPTX
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PPTX
Amdahl’s law is explained in the above power point presentations
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PPTX
Software Engineering and software moduleing
PPTX
MAD Unit - 3 User Interface and Data Management (Diploma IT)
Exploratory_Data_Analysis_Fundamentals.pdf
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
MLpara ingenieira CIVIL, meca Y AMBIENTAL
Java Basics-Introduction and program control
20250617 - IR - Global Guide for HR - 51 pages.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
CONTRACTS IN CONSTRUCTION PROJECTS: TYPES
August 2025 - Top 10 Read Articles in Network Security & Its Applications
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
Management Information system : MIS-e-Business Systems.pptx
T Pandian CV Madurai pandi kokkaf illaya
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
Beginners-Guide-to-Artificial-Intelligence.pdf
Amdahl’s law is explained in the above power point presentations
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Software Engineering and software moduleing
MAD Unit - 3 User Interface and Data Management (Diploma IT)

Slash n near real time indexing

  • 1. A real time search index for e-commerce Umesh Prasad Thejus V M
  • 2. Oh!! Out Of Stock
  • 3. Damn !! Out of Stock
  • 4. Damn !! Missed the Offer
  • 6. Out Of Stock, but Why Show? Index has Stale Availability Data 234K Products
  • 7. Outline ❏ E-commerce search Challenge ❏ Challenges in Keeping an Inverted Index Updated ❏ Our approach to Near Real Time indexing
  • 8. Challenge 1 : Update rates updates / sec max update /hr min max text / catalogue ~10 ~100 ~100K pricing ~100 ~1K ~10 million availability ~100 ~10K ~10 million offer ~100 ~10K ~10 million seller rating ~10 ~1K ~1 million signal 6 ~10 ~100 ~1 million signal 7 ~100 ~10K ~10 million signal 8 ~100 ~10K ~10 million
  • 9. Challenge 2 : Lucene Index Update ● Lucene doesn’t support Partial Updates. ● Update = Delete Old Doc + Add New Document – Recreate the entire document for every update – Not friendly with multiple micro-services with different update rates ● Problem Compounded By MarketPlace ● Product + All Its Listings == SINGLE BLOCK ● BLOCK structure chosen for query performance ( ~100X better latencies)
  • 10. Challenge 3 : Refresh Cycle Ingestion pipeline Solr Master Solr Slave Solr Slave Solr Slave Solr Slave Solr Slave Solr Slave Commit fsync Replication Open new Index Open new Index Open new Index Open new Index Open new Index Open new Index Batch of documents
  • 11. ProductA brand : Apple availability : T price : 45000 ProductB brand : Samsung availability : F price : 23000 ProductC brand : Apple availability : T price : 5000 Document ID Mappings Posting List (Inverted Index) DocValues (columunar data) Lucene Segment Lucene Index 0 ProductA 1 ProductB 2 ProductC 45000 23000 5000Price availability : T brand : Samsung brand : Apple 0 , 1 2 0 , 2 Terms Sparse Bitsets
  • 12. Root Cause :Updating Data Structures Term 3 Bitset 3 POSTING LIST …………… …………... Millions of Terms BitSet 1Term 1 BitSet 2Term 2 BitSet 3Term 3 Document Term1 Term2 Term3 Term4 …………… …………... Thousands of Terms Posting List / Bit Set D : 0 1 0 0 0 0 1 0 0 0 0 0 0 1 S: 2,7,14 SE : 2,5,7 Yes May Be NO Updatable ? Millions of Documents
  • 13. Outline ❏ E-commerce search Challenge ❏ Challenges in Keeping an Inverted Index Updated ❏ Our approach to Near Real Time indexing
  • 14. A Typical Search Flow Query Rewrite Results Query Matching Ranking Faceting Stats Posting List Doc Values Other Components Lucene Segment Inverted Index Forward Index NRT Store
  • 15. NRT Forward Index - Considerations ● Lookup efficiency – 50th percentile : ~10K matches – 99th percentile : ~1 million matches ● Data on Java heap – Memory efficiency ● Hook it to Lucene
  • 16. NRT Store - Forward Index Naive NRT Forward IndexLucene Segment Lookup Engine 0 ProductB 1 ProductA 2 ProductC 3 ProductD ProductC ProductA ProductB ProductC ProductD True False False True 100 150 200 250 ProductId(3) <ProductC,price> DocId : 3 field : price 200 ProductId Availability Price Latency : ~10 secs for ~1 Million lookups
  • 17. NRT Store - Forward Index Optimized NRT Forward Index (Segment Independent) Lucene Segment Lookup Engine 0 ProductB 1 ProductA 2 ProductC 3 ProductD 100 200 250 150 NrtId(3) 2 DocId : 3 field : price 200 Availability Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB T F F T DocId - NrtId 0 1 2 3 3 0 1 2 Price(2 ) 200
  • 18. NRT Store - Invert index NRT Forward Store NRT Invert Store NRT Inverter Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD Availability : T 0 3 Offer : O1 2 3 Availability:T Matching BitSet
  • 19. Near Real Time Solr Architecture Solr Kafka Ingestion pipeline NRT Forward Index Ranking Macthing Faceting Redis Bootstrap NRT Inverted store Solr Master NRT Updates Text Updates Catalogue Pricing Availability Offers Seller Quality Commit + Replicate + Reopen Lucene Others
  • 20. Accomplishments ● Real time consumption for Ranking Signals ● BBD saw upto ~30K updates/second ● Query latency comparable to DocValues – Consistent 99% performance
  • 22. A Typical Search Flow Query Rewrite Results Query Matching Ranking Faceting Stats Posting List Doc Values Schema Other Components Lucene Index Inverted Index Forward Index Schema NRT Store
  • 23. Lucene Index 0 availability:true 0,2 1 availability:false 1 0 brand:adidas 0,1 1 brand:nike 2 1 price:230 1 2 price:250 0 term ords Terms Dictionary Posting List (inverted index) Doc Value (Forward index) field 0 1 2 price 2 2 3 brand 0 0 1 availability 0 1 0 docId External ID Brand Availability Price 0 ProductA Adidas True 250 1 ProductB Adidas False 230 2 ProductC Nike True 500 ● Lucene Index = Multiple Mini Indexes aka Segments ● Lucene Segment ○ Write Once → Immutable Data structures ○ Posting Listing ( Sparse encoded bitsets) ○ Doc Values (Columnar Data structures)
  • 24. Lucene Index 0 availability:true 0,2 1 availability:false 1 0 brand:adidas 0,1 1 brand:nike 2 1 price:230 1 2 price:250 0 term ords Terms Dictionary Posting List (inverted index) Doc Value (Forward index) field 0 1 2 price 2 2 3 brand 0 0 1 availability 0 1 0 docId External ID Brand Availability Price 0 ProductA Adidas True 250 1 ProductB Adidas False 230 2 ProductC Nike True 500 ● Lucene Index = Multiple Mini Indexes aka Segments ● Lucene Segment ○ Write Once → Immutable Data structures ○ Posting Listing ( Sparse encoded bitsets) ○ Doc Values (Columnar Data structures)
  • 25. C5 : Lucene in-place update ● Only numeric / byte Array fields ● Updates to go through the entire refresh cycle ● Not exposed via Solr
  • 26. Forward Index - API Hook ● Lucene API Hook – ValueSource ● Input – Lucene Internal Document Id – Field Name ● Output – Field Value
  • 27. NRT Store - Inverted Index ● Input – Lucene Segment – query • Field Name : Field Value • offer : o1 ● Output – DocSet (posting list)

Editor's Notes

  • #4: Going from a Page 1 to Page could be a matter of seconds on Sales Day ( Big Billion Day)
  • #6: Hierarchical documents ( Product → Listing ) Highly structured Free Text, Numeric, Tags Micro services for individual field updates Different update rates Independently updating fields
  • #7: Availabilty has been used in ranking, but it is stale, hence OOS. Explain challenge of 234K
  • #9: Means, the entire index will be recreated every hour
  • #10: Product Documents + Seller SKU Documents block-join index block : Composite document, with product and all its seller SKU Con Any Update = Delete + Recreate entire block Aggravates Delete + Recreate problem
  • #11: Remove animation, don’t spend too much time on it.
  • #12: Posting =
  • #15: Keep the fast changing data outside of the index Update this data independent of Solr updates Hooks in Lucene/Solr for retrieval ValueSource Filter Collector
  • #17: Explain the API Hook
  • #18: Lucene APIs : internal document id Columnar data structures Implementation dependent on data type Chosen for memory efficiency boolean : 1bit enum : log(#enumerations) bits int : 4 bytes multi val : array of the above data structures
  • #19: Filter API of lucene DocIdSet getDocIdSet(LuceneIndex) Invert data to adhere to lucene’s internal order at regular intervals of time
  • #24: Extract segment structure in a different slide
  • #25: Extract segment structure in a different slide