2010 08-06 - sd ruby - solr

Solr Powr
Enterprise-grade search for your app
Nick Zadrozny

Hi, my name is Nick.
I’m a webdev — full-time
w/ Rails since 2005.

Generalist background.

Perspective of a relative
Solr noob.

Brought my generalist
perspective to Websolr about
six months ago.

We do hosted search

I enjoy doing things the Right
Way.

websolr

What is Solr?
How can we make the most of it?

Take some text

Make a list of the words
and where they show up

Of course, being geeks,
we throw a lot of
features into that

Indexing

Java search library that
does indexing. You give it
some words, it builds
those indexes.

Most of what we will
talk about is actually
Lucene.

Apache Lucene

What is Solr?

Web application interface for Lucene

Essentially RESTful

POST in data, GET with queries

Various administrative features

Various web scaling features

Just so you know, I’m
going to be blurring Solr
and Lucene from here on
out.

Still with me?

Do smarter things with a
little bit of structure.

Schema

binary external file
long
boolean float
short
byte geohash
string
date int
text
double integer
trie

Most of the interesting
stuff happens here

Text

adding and updating
records, doing statistics,
correlating with your sql
database, etc

Unique key
Not required, but handy.

tokenize on whitespace or non-letter chars

standard tokenizer is sort of “type aware” and
understands acronyms, urls, words with

Text apostrophes

so-called stop words since we’re not doing
actual semantic language search

Shingles: consecutive n-sized word groups
“the quick” “quick brown” “brown fox” “fox
jumped”

Tokenize words
Stop words
Strip HTML
Language stemming
Normalize case
Phonetic stemming
Normalize accented
Synonyms
characters
Word shingles
Pattern replacement

Index rich content
HTML, PDF, Word, etc.

Add and Update
Serialize your Updates are
documents to XML, incremental
JSON and a handful
of others.

HTTP POST to your
Solr URL

Solr hands your data
to Lucene for
processing

Powerful query syntax.
Boolean logic is just the start.

min, max, average,
stddev

Numeric operations.

do stuff relative to
“now”

Date ranges,
date math.

Yeah, one killer feature
here is that Solr supports
spatial search.

Give it a lat/lon.

Distance.

Present the available values so your users
can ﬁlter by it.

Great for building out rich taxonomies.

Example: facet books by language, author,
genre.

Faceting.

spelling suggestion for
user queries.

query auto-suggest from
popular queries

“Did you mean…?”

Generate a list of similar
documents. Consider blog
posts.

More Like This

This is why we run Solr.

It’s really, really fast.
When properly configured.

Average max response
time is 75ms.

Even the 95 percentile is
way below that.

updates are incremental to keep things
running fast

for performance reasons, they don’t show up
in search results until you issue a commit

Commits are sorta heavy

200ms – 2 sec

Commits

most of the time you
don’t have to worry
about this

Lock the writer
but it’s easy to screw
this up if you ﬂood the
system with updates and
commits

Flush updates to disk
Tear down the old
Start a new reader reader

Warm up the reader’s Unlock the writer
cache

Register the reader
with Solr

As you’re committing changes,
you’re usually creating new
files in “segments”

Optimize takes your index
and rewrites it into a more
compact number of files

Good to do this periodically to
use less memory and avoid
running out of open files

Optimize

Actual replication is pull from slave and
really fast. Like, don’t worry.

Best way to deal with high IO.

Reads go to read cores, writes go to write
cores.

Scale read resources separately.

Make sure writes don’t interrupt reads.

Replication.
Stupidly easy.

All I’ll say is that it’s really
powerful and gives you a lot
of rope.

I’ve seen cache warmups
take down Tomcat — in
particular, on a very large
index with spatial search.

Caching

I’m a Rails generalist

I like to do things the right way.

Solr is fast, fully-featured, and can be
scaled separately from the rest of your
app.

It takes the load off your database and
app servers, and does a better job.

In some cases, it offers features that just
aren’t other wise even possible.

In Conclusion

2010 08-06 - sd ruby - solr

More Related Content

Similar to 2010 08-06 - sd ruby - solr (20)

2010 08-06 - sd ruby - solr

Editor's Notes