Been dealing with more and more SOLR work lately and ran was asked a question about how you can return only a subset of facets based on a wildcard.

Normally one could use the terms component in SOLR to get back all facets to implement something like auto-complete / auto-suggest.

However in this instance what we want is to perform a query (either a FQ or Q) against a slice of data and then provide auto-complete for the facets within the slice.

The terms component doesn’t help us here because it does not support querying within a slice of data, instead you can fall back to using a combination of a regular query with a filter query on-top of your slice like so:


http://localhost:8080/apache-solr-1.4.1/geo_us/select/?q=str_state_cd:FL&rows=0&facet=true&facet.field=str_nm&fq=str_nm:T*

In the above I am querying for every store in FL and I want a faceted group of lists of all stores in FL that start with T*. Now this works fine provided your facets match exactly to the specified query, however, if for example you have a facet “target” you will not get back that result as the facet query is case-sensitive.

The only way (as of SOLR 1.4.1) to perform a wildcard query with case-insentive matches for facets is to use copyField in your schema.xml to copy your “string” fields to another field that is lowercase tokenized. That way your statement becomes:


http://localhost:8080/apache-solr-1.4.1/geo_us/select/?q=str_state_cd:FL&rows=0&facet=true&facet.field=str_nm&fq=str_nm_indx:t*

Note that the fq becomes str_nm_indx which is a dynamic field that copies the contents of the str_nm (using copyField) and indexes using default solr indexing (text_gen).

Now this could be a maintenance nightmare but our SOLR schema.xml file can take care of the heavy lifting for us.

Example

To make this simple I strongly urge people to use a CSV format for data import into solr. It makes things drop-dead simple to handle – in our case below is the header of the CSV we want to use.

acct_num~acct_type~acct_rating~acct_timehorizon... etc

Now lets write a quick regex to pull out the header values

([^~]*)[~]

Next load up an online regular expression editor (example: http://gskinner.com/RegExr/) and lets do a regex replacement pattern like so:

<field name="$1" type="string" indexed="true" stored="true" multiValued="false"/>\r

And bingo we have the basics of our schema. Next we need to create our indexed search fields to solve the facet problem. Again with regex this is simple

<copyField source="$1" dest="$1_idx"/>\r

This creates a schema like so:

<fields>
<field name="acct_num" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="acct_type" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="acct_rating" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="acct_timehorizon" type="string" indexed="true" stored="true" multiValued="false"/>

<!-- index field which is used solely for searching, in most cases we only want exact matches for slicing -->
<-- but to support wildcard faceted drilldown we need to allow for searching via lowercase extensions -->
<dynamicField name="*_idx"  type="textgen"  indexed="true"  stored="false"/>

<copyField source="acct_num_idx" dest="text"/>
<copyField source="acct_type_idx" dest="text"/>
<copyField source="acct_rating_idx" dest="text"/>
<copyField source="acct_timehorizon_idx" dest="text"/>

<!-- catch all field for searching like google -->
<field name="text" type="text_ws" indexed="true" stored="false" multiValued="true"/>
<copyField source="acct_num" dest="text"/>
<copyField source="acct_type" dest="text"/>

<!-- ignore all other fields -->
<dynamicField name="*" type="ignored" multiValued="true"/>
</fields>

<uniqueKey>acct_num</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
 

One Response to Solr Faceted Query with Wildcards and IgnoreCase

  1. Mitch says:

    Hi Terrrence,

    Would you be interested in syndicating this post on DZone? If you are, you might also be interested in the MVB program (www.dzone.com/aboutmvb).

    Send me an email and we can talk: mitch {at} dzone {dot} com

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.