Haystack with Solr: Indexing Tips
I was troubleshooting an issue a couple of weeks ago with some of our Django views taking too long to return a response. The issue was due to post_save and post_delete signals making a request to Solr to update the index with the 'commit' parameter set to 'true' (Haystack default). This basically tells Solr to do the indexing right away (hard commit) and it was taking a few seconds to finish.
Below are some tips for handling this.
Use Celery
This is a perfect use for Celery as we don't care whether the new data is searchable right away (such as user questions and answers). It's not a big deal even if takes 5 minutes to make the data available for searching.
Another plus is if the Solr server goes down, it won't affect the user experience. The Celery tasks will just fail in the background and retry after a few minutes. You don't want your users seeing 500 errors when they hit save because the request is waiting for a response that's dependent on an external service that is currently down which has nothing to do with what they're trying to accomplish.
Change the 'autocommit' settings in the Solr config
In solrconfig.xml, look for this setting (available since Solr 4.0):
<autoCommit> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> <openSearcher>false</openSearcher> </autoCommit>
These are the default settings. The maxTime value is in milliseconds. So in this case, Solr will do a bulk auto-commit every 15 seconds. Setting the openSearcher to 'true' will basically do a hard commit, which will make the changes visible and open a new transaction log.
How you set this will depend on your environment. In our case, since our indexing is pretty light and we don't need the changes to be visible right away, I set the maxTime to 60 seconds with openSearcher set to 'true.'
Here's a nice article explaining how these all work.
Create a custom Haystack Solr backend to set the 'commit' option to 'False' by default
We can create a custom backend which will override the 'commit' parameter and set it to 'False' by default to not force a hard commit and let Solr commit the changes automatically.
Here's an example (stolen from here):
from haystack.backends.solr_backend import SolrEngine, SolrSearchBackend class AutoCommitSolrSearchBackend(SolrSearchBackend): def update(self, index, iterable, commit=False): super(AutoCommitSolrSearchBackend, self).update(index, iterable, commit=commit) def remove(self, obj_or_string, commit=False): super(AutoCommitSolrSearchBackend, self).remove(obj_or_string, commit=commit) def clear(self, models=[], commit=False): super(AutoCommitSolrSearchBackend, self).clear(models, commit=commit) class AutoCommitSolrEngine(SolrEngine): backend = AutoCommitSolrSearchBackend