SlideShare a Scribd company logo
What’s new in Apache Solr 5.0
Who am I?
• Anshum Gupta, Apache Lucene/Solr committer,
Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010.
• Organizations I am or have been a part of:
Solr - Releases
–Someone
Ease of Use: Because usability doesn’t end after
the first five minutes!
Scripts - Richer, faster, easier!
• Solr Demo:
• bin/post script
• Auto config-set copying
• Create -> Post -> Browse -> Delete
• bin/solr start -e cloud -noprompt ; bin/post -c
gettingstarted http://lucidworks -recursive 2; open http://
localhost:8983/solr/gettingstarted/browse
Example is now Server
• No default collection1
• Configset options
• ant example server
• post.sh
Posting documents was never so easy!
• bin/post script wraps around the improved
SimplePostTool
• Index JSON directly OTB
• Developers: SolrServer is now SolrClient
Managing Solr
Managing Solr Configuration - Application
• Paramsets: Add/Edit
• initParams: Generic appends, invariants and defaults
outside of the component
• Schema API: REST API for adding field types, and
dynamic fields
• Managing requestHandlers through API
• Implicit registration of replication, get and admin
Handlers.
Managing the cluster - Systems
• Collection APIs
• BALANCESHARDUNIQUE: Even distribution of custom replica properties
• Improved APIs
• Option to not shuffle nodeSet specified during CREATE Collection
• Logging
• Transaction log replay status
• Slow request (optional)
• Support for editing common solrconfig.xml values
• Scripts to support installing and running Solr as a service on Linux.
Keeping Solr Instance(s) Stable
• ReplicationHandler now has an option to throttle the speed of
replication
• timeAllowed respected more widely - Query expansion,
collection and LBHTTPSolrClient retries
• Finite default timeouts for select and update requests
Scalability
• Splitting of ClusterState
• Every collection has its own cluster state
• No need to watch what everyone else is doing
• Might be the default in 5.0
• Improved Solr - Zk communication
• Speed up overseer operations avoiding cluster state
reads from zookeeper at the start of each loop
• Better default timeouts to operate at a large scale
–Johnny Appleseed
“Type a quote here.”
Solr scalability is unmatched.
Features
Distributed IDF
• Multiple contributors and almost 5 years.
• 4 implementations OTB:
• LocalStatsCache: Local Stats
• ExactStatsCache: One time use aggregation
• ExactSharedStatsCache: Stats shared across requests
• LRUStatsCache: Stats shared in an LRU cache across requests
• Flow:
• Conditionally Send GET_TERM_STATS request to participating nodes
• Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS
• Conditional GET_FIELDS
Stats Component
• stats.field can now be used to generate stats over
the numeric results of arbitrary functions,
• stats.field={!func}product(price,popularity)
• Stats hang off pivots via tags
And there are more…
• DateRangeField for indexing date ranges, especially multi-valued ones.
• Spatial fields that used to require units=degrees now take
distanceUnits=degrees/kilometers miles instead.
• MoreLikeThis QueryParser: Works in SolrCloud mode too.
• API for managing blobs
and more…
• First class support in SolrJ for Collection API calls
• Upgrade Tika to 1.7: This adds support for parsing
Outlook PST and Matlab (MAT) files.
Maturity
• Jepsen tests
• More unit tests and more success
stories of Solr.
• Protection of ZK content
No more WAR!
• Solr is now an app, no more shipping a war starting
Solr 5.0
• Upgrade to Jetty 9 coming soon
• Will allow for a lot of things (SPDY) that wouldn’t be
possible if we had to support tomcat/netty/jetty
everything else.
Between 4.10 and 5.0: The new Identity
Timeline*
• Release branch cut
• 2nd RC vote in progress.
• Vote - 3 days, 3 votes
• Artifacts propagation to ASF mirrors - 1 day
• Official release note - Right after!
* prospective and subject to how things go
Coming soon
• Collections API: REBALANCESHARDS
• Spatial 2D heat-map faceting
• Facet and analytics
• Replication performance
• More API goodness
Questions?
Connect @
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/
anshum@apache.org

More Related Content

PPTX
Scaling SolrCloud to a large number of Collections
PDF
Ease of use in Apache Solr
PDF
Deploying and managing Solr at scale
PDF
SolrCloud Cluster management via APIs
PDF
Apache Solr 5.0 and beyond
PDF
What's New in Apache Solr 4.10
PPTX
Managing a SolrCloud cluster using APIs
PDF
Best practices for highly available and large scale SolrCloud
Scaling SolrCloud to a large number of Collections
Ease of use in Apache Solr
Deploying and managing Solr at scale
SolrCloud Cluster management via APIs
Apache Solr 5.0 and beyond
What's New in Apache Solr 4.10
Managing a SolrCloud cluster using APIs
Best practices for highly available and large scale SolrCloud

What's hot (20)

PDF
Understanding the Solr security framework - Lucene Solr Revolution 2015
PDF
Solr security frameworks
PDF
First oslo solr community meetup lightning talk janhoy
PDF
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
PDF
Inside Solr 5 - Bangalore Solr/Lucene Meetup
PDF
Intro to Apache Solr
PPTX
Solrcloud Leader Election
PDF
SolrCloud Failover and Testing
PDF
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
PPTX
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
PPTX
Solr Exchange: Introduction to SolrCloud
PDF
Scaling search with Solr Cloud
PDF
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
ODP
Get involved with the Apache Software Foundation
PDF
Autoscaling Suggestions: Simplifying Operations - Varun Thacker, Lucidworks
PPTX
Storm worker redesign
PPTX
Project Orleans - Actor Model framework
PPTX
"Walk in a distributed systems park with Orleans" Евгений Бобров
PDF
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
PPTX
A Brief Intro to Microsoft Orleans
Understanding the Solr security framework - Lucene Solr Revolution 2015
Solr security frameworks
First oslo solr community meetup lightning talk janhoy
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Intro to Apache Solr
Solrcloud Leader Election
SolrCloud Failover and Testing
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Solr Exchange: Introduction to SolrCloud
Scaling search with Solr Cloud
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Get involved with the Apache Software Foundation
Autoscaling Suggestions: Simplifying Operations - Varun Thacker, Lucidworks
Storm worker redesign
Project Orleans - Actor Model framework
"Walk in a distributed systems park with Orleans" Евгений Бобров
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
A Brief Intro to Microsoft Orleans
Ad

Viewers also liked (19)

PDF
Webinar: What's New in Solr 6
PDF
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
PDF
Webinar: Search and Recommenders
PDF
Webinar: Fusion for Business Intelligence
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
PDF
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
PDF
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
PDF
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
PDF
it's just search
PDF
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
PDF
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
PDF
Working with deeply nested documents in Apache Solr
PDF
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
PDF
Webinar: Replace Google Search Appliance with Lucidworks Fusion
PPTX
Slash n near real time indexing
PDF
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
PDF
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
PDF
Parallel SQL and Streaming Expressions in Apache Solr 6
Webinar: What's New in Solr 6
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Webinar: Search and Recommenders
Webinar: Fusion for Business Intelligence
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
it's just search
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Working with deeply nested documents in Apache Solr
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Slash n near real time indexing
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Parallel SQL and Streaming Expressions in Apache Solr 6
Ad

Similar to What's new in Solr 5.0 (20)

PDF
Webinar: Inside Apache Solr 5
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PDF
Meet Solr For The Tirst Again
PPTX
What's new in solr june 2014
PPT
Building Intelligent Search Applications with Apache Solr and PHP5
PPTX
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
KEY
Apache Solr - Enterprise search platform
PDF
Apache Solr crash course
PDF
Solr 8 interview
PDF
Oslo Solr MeetUp March 2012 - Solr4 alpha
PPS
Introduction to Solr
PPTX
20130310 solr tuorial
PDF
Introduction to Apache Solr
PPTX
Apachesolr presentation
ODP
Introduction to Apache solr
PDF
Apache solr liferay
PDF
Introduction to Solr
PDF
Solr Masterclass Bangkok, June 2014
PPTX
Apache solr
Webinar: Inside Apache Solr 5
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Meet Solr For The Tirst Again
What's new in solr june 2014
Building Intelligent Search Applications with Apache Solr and PHP5
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Apache Solr - Enterprise search platform
Apache Solr crash course
Solr 8 interview
Oslo Solr MeetUp March 2012 - Solr4 alpha
Introduction to Solr
20130310 solr tuorial
Introduction to Apache Solr
Apachesolr presentation
Introduction to Apache solr
Apache solr liferay
Introduction to Solr
Solr Masterclass Bangkok, June 2014
Apache solr

Recently uploaded (20)

PDF
The Role of Automation and AI in EHS Management for Data Centers.pdf
PPTX
Save Business Costs with CRM Software for Insurance Agents
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
How to Confidently Manage Project Budgets
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
PPTX
Computer Hardware tool: hand tools, diagnostics, ESD and cleaning tools
PDF
A Practical Breakdown of Automation in Project Management
PDF
Become an Agentblazer Champion Challenge
PPTX
Using Bootstrap to Make Accessible Front-Ends(2).pptx
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PPTX
Hire Expert Blazor Developers | Scalable Solutions by OnestopDA
PPTX
Presentation of Computer CLASS 2 .pptx
PDF
Become an Agentblazer Champion Challenge Kickoff
PPTX
Odoo Consulting Services by CandidRoot Solutions
PDF
Best Mobile App Development Company in Lucknow - Code Crafter Web Solutions
PPTX
Benefits of DCCM for Genesys Contact Center
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
PDF
Exploring AI Agents in Process Industries
PDF
Perfecting Gamer’s Experiences with Performance Testing for Gaming Applicatio...
PDF
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
The Role of Automation and AI in EHS Management for Data Centers.pdf
Save Business Costs with CRM Software for Insurance Agents
How Creative Agencies Leverage Project Management Software.pdf
How to Confidently Manage Project Budgets
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
Computer Hardware tool: hand tools, diagnostics, ESD and cleaning tools
A Practical Breakdown of Automation in Project Management
Become an Agentblazer Champion Challenge
Using Bootstrap to Make Accessible Front-Ends(2).pptx
Materi_Pemrograman_Komputer-Looping.pptx
Hire Expert Blazor Developers | Scalable Solutions by OnestopDA
Presentation of Computer CLASS 2 .pptx
Become an Agentblazer Champion Challenge Kickoff
Odoo Consulting Services by CandidRoot Solutions
Best Mobile App Development Company in Lucknow - Code Crafter Web Solutions
Benefits of DCCM for Genesys Contact Center
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
Exploring AI Agents in Process Industries
Perfecting Gamer’s Experiences with Performance Testing for Gaming Applicatio...
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK

What's new in Solr 5.0

  • 1. What’s new in Apache Solr 5.0
  • 2. Who am I? • Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee. • Search and related stuff for 9+ years. • Apache Lucene since 2006 and Solr since 2010. • Organizations I am or have been a part of:
  • 4. –Someone Ease of Use: Because usability doesn’t end after the first five minutes!
  • 5. Scripts - Richer, faster, easier! • Solr Demo: • bin/post script • Auto config-set copying • Create -> Post -> Browse -> Delete • bin/solr start -e cloud -noprompt ; bin/post -c gettingstarted http://lucidworks -recursive 2; open http:// localhost:8983/solr/gettingstarted/browse
  • 6. Example is now Server • No default collection1 • Configset options • ant example server • post.sh
  • 7. Posting documents was never so easy! • bin/post script wraps around the improved SimplePostTool • Index JSON directly OTB • Developers: SolrServer is now SolrClient
  • 9. Managing Solr Configuration - Application • Paramsets: Add/Edit • initParams: Generic appends, invariants and defaults outside of the component • Schema API: REST API for adding field types, and dynamic fields • Managing requestHandlers through API • Implicit registration of replication, get and admin Handlers.
  • 10. Managing the cluster - Systems • Collection APIs • BALANCESHARDUNIQUE: Even distribution of custom replica properties • Improved APIs • Option to not shuffle nodeSet specified during CREATE Collection • Logging • Transaction log replay status • Slow request (optional) • Support for editing common solrconfig.xml values • Scripts to support installing and running Solr as a service on Linux.
  • 11. Keeping Solr Instance(s) Stable • ReplicationHandler now has an option to throttle the speed of replication • timeAllowed respected more widely - Query expansion, collection and LBHTTPSolrClient retries • Finite default timeouts for select and update requests
  • 13. • Splitting of ClusterState • Every collection has its own cluster state • No need to watch what everyone else is doing • Might be the default in 5.0 • Improved Solr - Zk communication • Speed up overseer operations avoiding cluster state reads from zookeeper at the start of each loop • Better default timeouts to operate at a large scale
  • 14. –Johnny Appleseed “Type a quote here.” Solr scalability is unmatched.
  • 16. Distributed IDF • Multiple contributors and almost 5 years. • 4 implementations OTB: • LocalStatsCache: Local Stats • ExactStatsCache: One time use aggregation • ExactSharedStatsCache: Stats shared across requests • LRUStatsCache: Stats shared in an LRU cache across requests • Flow: • Conditionally Send GET_TERM_STATS request to participating nodes • Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS • Conditional GET_FIELDS
  • 17. Stats Component • stats.field can now be used to generate stats over the numeric results of arbitrary functions, • stats.field={!func}product(price,popularity) • Stats hang off pivots via tags
  • 18. And there are more… • DateRangeField for indexing date ranges, especially multi-valued ones. • Spatial fields that used to require units=degrees now take distanceUnits=degrees/kilometers miles instead. • MoreLikeThis QueryParser: Works in SolrCloud mode too. • API for managing blobs
  • 19. and more… • First class support in SolrJ for Collection API calls • Upgrade Tika to 1.7: This adds support for parsing Outlook PST and Matlab (MAT) files.
  • 20. Maturity • Jepsen tests • More unit tests and more success stories of Solr. • Protection of ZK content
  • 21. No more WAR! • Solr is now an app, no more shipping a war starting Solr 5.0 • Upgrade to Jetty 9 coming soon • Will allow for a lot of things (SPDY) that wouldn’t be possible if we had to support tomcat/netty/jetty everything else.
  • 22. Between 4.10 and 5.0: The new Identity
  • 23. Timeline* • Release branch cut • 2nd RC vote in progress. • Vote - 3 days, 3 votes • Artifacts propagation to ASF mirrors - 1 day • Official release note - Right after! * prospective and subject to how things go
  • 24. Coming soon • Collections API: REBALANCESHARDS • Spatial 2D heat-map faceting • Facet and analytics • Replication performance • More API goodness