SlideShare a Scribd company logo
Solr Powr
Enterprise-grade search for your app
Nick Zadrozny
Hi, my name is Nick.
       I’m a webdev — full-time
       w/ Rails since 2005.

       Generalist background.

       Perspective of a relative
       Solr noob.
Brought my generalist
perspective to Websolr about
six months ago.

We do hosted search

I enjoy doing things the Right
Way.




                             websolr
What is Solr?
How can we make the most of it?
Take some text

Make a list of the words
and where they show up

Of course, being geeks,
we throw a lot of
features into that




                           Indexing
Java search library that
does indexing. You give it
some words, it builds
those indexes.

Most of what we will
talk about is actually
Lucene.




                    Apache Lucene
What is Solr?

Web application interface for Lucene

Essentially RESTful

  POST in data, GET with queries

Various administrative features

Various web scaling features
Just so you know, I’m
going to be blurring Solr
and Lucene from here on
out.




                            Still with me?
Do smarter things with a
little bit of structure.




                           Schema
binary    external file
                          long
boolean   float
                          short
byte      geohash
                          string
date      int
                          text
double    integer
                          trie
Most of the interesting
stuff happens here




                          Text
adding and updating
records, doing statistics,
correlating with your sql
database, etc




                             Unique key
                             Not required, but handy.
tokenize on whitespace or non-letter chars

             standard tokenizer is sort of “type aware” and
             understands acronyms, urls, words with

Text         apostrophes

             so-called stop words since we’re not doing
             actual semantic language search

             Shingles: consecutive n-sized word groups
             “the quick” “quick brown” “brown fox” “fox
             jumped”


Tokenize words
                                                 Stop words
Strip HTML
                                                 Language stemming
Normalize case
                                                 Phonetic stemming
Normalize accented
                                                 Synonyms
characters
                                                 Word shingles
Pattern replacement
Index rich content
 HTML, PDF, Word, etc.
Add and Update
Serialize your         Updates are
documents to XML,      incremental
JSON and a handful
of others.

HTTP POST to your
Solr URL

Solr hands your data
to Lucene for
processing
Querying
Powerful query syntax.
  Boolean logic is just the start.
min, max, average,
stddev




              Numeric operations.
do stuff relative to
“now”




                       Date ranges,
                        date math.
Yeah, one killer feature
here is that Solr supports
spatial search.

Give it a lat/lon.




                             Distance.
Present the available values so your users
can filter by it.

Great for building out rich taxonomies.

Example: facet books by language, author,
genre.




                                             Faceting.
spelling suggestion for
user queries.

query auto-suggest from
popular queries




                   “Did you mean…?”
Generate a list of similar
documents. Consider blog
posts.




                             More Like This
Probably more.
Solr in Production
This is why we run Solr.




                     It’s really, really fast.
                           When properly configured.
Average max response
time is 75ms.

Even the 95 percentile is
way below that.
updates are incremental to keep things
running fast

for performance reasons, they don’t show up
in search results until you issue a commit

Commits are sorta heavy

200ms – 2 sec




                                              Commits
most of the time you
don’t have to worry
about this

        Lock the writer
but it’s easy to screw
this up if you flood the
system with updates and
commits

        Flush updates to disk
                                Tear down the old
        Start a new reader      reader

        Warm up the reader’s    Unlock the writer
        cache


        Register the reader
        with Solr
As you’re committing changes,
you’re usually creating new
files in “segments”

Optimize takes your index
and rewrites it into a more
compact number of files

Good to do this periodically to
use less memory and avoid
running out of open files




                                  Optimize
Actual replication is pull from slave and
really fast. Like, don’t worry.

Best way to deal with high IO.

Reads go to read cores, writes go to write
cores.

Scale read resources separately.

Make sure writes don’t interrupt reads.




                                            Replication.
                                             Stupidly easy.
All I’ll say is that it’s really
powerful and gives you a lot
of rope.

I’ve seen cache warmups
take down Tomcat — in
particular, on a very large
index with spatial search.




                                   Caching
I’m a Rails generalist

I like to do things the right way.

Solr is fast, fully-featured, and can be
scaled separately from the rest of your
app.

It takes the load off your database and
app servers, and does a better job.

In some cases, it offers features that just
aren’t other wise even possible.




                                           In Conclusion
Questions?
Thanks!

More Related Content

PPT
Tldr solr-courseload
ZIP
Solr Powr — Enterprise-grade search for your app
ZIP
Cooking up a Cloud
PDF
Find it, possibly also near you!
PDF
Get the most out of Solr search with PHP
KEY
Solr 101
PDF
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Tldr solr-courseload
Solr Powr — Enterprise-grade search for your app
Cooking up a Cloud
Find it, possibly also near you!
Get the most out of Solr search with PHP
Solr 101
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever

Similar to 2010 08-06 - sd ruby - solr (20)

PDF
Apache Solr crash course
PDF
Rapid prototyping with solr - By Erik Hatcher
PDF
Rapid Prototyping with Solr
PDF
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
PDF
Building Lanyrd
PPT
Solr Presentation
PPT
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
PPTX
Apache Solr - search for everyone!
PDF
NoSQL, Apache SOLR and Apache Hadoop
KEY
ApacheCon Europe 2012 -Big Search 4 Big Data
PDF
Introduction to Solr
PPT
PDF
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PDF
IIPC-Training-Event-Jan-2014-Solr-Introduction.pdf
KEY
Apache Solr - Enterprise search platform
PDF
Rapid Prototyping with Solr
PPTX
What's new in Lucene and Solr 4.x
PDF
Solr rug
PPTX
Introduction to Apache Solr
Apache Solr crash course
Rapid prototyping with solr - By Erik Hatcher
Rapid Prototyping with Solr
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Building Lanyrd
Solr Presentation
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Apache Solr - search for everyone!
NoSQL, Apache SOLR and Apache Hadoop
ApacheCon Europe 2012 -Big Search 4 Big Data
Introduction to Solr
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
IIPC-Training-Event-Jan-2014-Solr-Introduction.pdf
Apache Solr - Enterprise search platform
Rapid Prototyping with Solr
What's new in Lucene and Solr 4.x
Solr rug
Introduction to Apache Solr
Ad

2010 08-06 - sd ruby - solr

  • 1. Solr Powr Enterprise-grade search for your app Nick Zadrozny
  • 2. Hi, my name is Nick. I’m a webdev — full-time w/ Rails since 2005. Generalist background. Perspective of a relative Solr noob.
  • 3. Brought my generalist perspective to Websolr about six months ago. We do hosted search I enjoy doing things the Right Way. websolr
  • 4. What is Solr? How can we make the most of it?
  • 5. Take some text Make a list of the words and where they show up Of course, being geeks, we throw a lot of features into that Indexing
  • 6. Java search library that does indexing. You give it some words, it builds those indexes. Most of what we will talk about is actually Lucene. Apache Lucene
  • 7. What is Solr? Web application interface for Lucene Essentially RESTful POST in data, GET with queries Various administrative features Various web scaling features
  • 8. Just so you know, I’m going to be blurring Solr and Lucene from here on out. Still with me?
  • 9. Do smarter things with a little bit of structure. Schema
  • 10. binary external file long boolean float short byte geohash string date int text double integer trie
  • 11. Most of the interesting stuff happens here Text
  • 12. adding and updating records, doing statistics, correlating with your sql database, etc Unique key Not required, but handy.
  • 13. tokenize on whitespace or non-letter chars standard tokenizer is sort of “type aware” and understands acronyms, urls, words with Text apostrophes so-called stop words since we’re not doing actual semantic language search Shingles: consecutive n-sized word groups “the quick” “quick brown” “brown fox” “fox jumped” Tokenize words Stop words Strip HTML Language stemming Normalize case Phonetic stemming Normalize accented Synonyms characters Word shingles Pattern replacement
  • 14. Index rich content HTML, PDF, Word, etc.
  • 15. Add and Update Serialize your Updates are documents to XML, incremental JSON and a handful of others. HTTP POST to your Solr URL Solr hands your data to Lucene for processing
  • 17. Powerful query syntax. Boolean logic is just the start.
  • 18. min, max, average, stddev Numeric operations.
  • 19. do stuff relative to “now” Date ranges, date math.
  • 20. Yeah, one killer feature here is that Solr supports spatial search. Give it a lat/lon. Distance.
  • 21. Present the available values so your users can filter by it. Great for building out rich taxonomies. Example: facet books by language, author, genre. Faceting.
  • 22. spelling suggestion for user queries. query auto-suggest from popular queries “Did you mean…?”
  • 23. Generate a list of similar documents. Consider blog posts. More Like This
  • 26. This is why we run Solr. It’s really, really fast. When properly configured.
  • 27. Average max response time is 75ms. Even the 95 percentile is way below that.
  • 28. updates are incremental to keep things running fast for performance reasons, they don’t show up in search results until you issue a commit Commits are sorta heavy 200ms – 2 sec Commits
  • 29. most of the time you don’t have to worry about this Lock the writer but it’s easy to screw this up if you flood the system with updates and commits Flush updates to disk Tear down the old Start a new reader reader Warm up the reader’s Unlock the writer cache Register the reader with Solr
  • 30. As you’re committing changes, you’re usually creating new files in “segments” Optimize takes your index and rewrites it into a more compact number of files Good to do this periodically to use less memory and avoid running out of open files Optimize
  • 31. Actual replication is pull from slave and really fast. Like, don’t worry. Best way to deal with high IO. Reads go to read cores, writes go to write cores. Scale read resources separately. Make sure writes don’t interrupt reads. Replication. Stupidly easy.
  • 32. All I’ll say is that it’s really powerful and gives you a lot of rope. I’ve seen cache warmups take down Tomcat — in particular, on a very large index with spatial search. Caching
  • 33. I’m a Rails generalist I like to do things the right way. Solr is fast, fully-featured, and can be scaled separately from the rest of your app. It takes the load off your database and app servers, and does a better job. In some cases, it offers features that just aren’t other wise even possible. In Conclusion

Editor's Notes