One of the common mistakes I've seen in customizing Oak Lucene Indexes is with the included/query paths feature. To understand why this is a common area for mistakes and what impact this has, you have to understand what the includedPaths and queryPaths configurations do.
When executing an Oak query, you can supply constraints in the form of node types, property values or paths. As discussed previously, to write a fast query, you must narrowly constrain the set of nodes your your query may apply to. Therefore, it’s best to always use a path constraint so only a portion of the JCR needs to be included. This is especially important of you need to run a query against a generic node type such as nt:base or nt:hierarchyNode as constraining to a specific path can eliminate thousands of nodes that would otherwise need to be evaluated.
How Path Restrictions Work in Oak Indexes
If you set the property evaluatePathRestrictions=true on your index, your index will use the path restriction to filter out nodes without having to traverse the nodes.
There are two main properties to configure which paths are relevant to the index. The trick is that the Query Engine only cares about one when picking the index to use.
- includedPaths - used to configure what nodes will be included in the index at indexing time. This property is not used by the query engine when picking the index
- queryPaths - this property is used at query time by the Query Engine to determine if the path constraints for the query include the paths for the index.
For example, if I ran a query like: SELECT * FROM [dam:Asset] WHERE ISDESCENDANTPATH([/content/dam/testfolder]) OPTION(LIMIT 10) the query engine would pick an index with queryPaths=/content/dam but not queryPaths=/content/dam/prodfolder
If you mismatch these values, the Query Engine will match your index for paths not contained in the index. Because the Query Engine uses the estimated count of nodes matching the query from the index evaluation to determine which index to use, this can mean that your index may even override OOTB index. This is because an index with a restrictive includedPaths (e.g. /content/dam/custom) and a permissive queryPaths (e.g. /content) will look like if filters out more non-matching results simply because it doesn't contain them.
What all this Means
What all this means is that:
- Whenever possible you should support path restrictions in your query by setting evaluatePathRestrictions=true
- You should set the includedPaths property to the most restrictive path(s) to limit the size of the index by not indexing paths you don't need to query
- You should set the queryPaths to the same list of paths as the includedPaths to ensure that your index will not match for paths not contained in the index
I would be remiss to not mention that there are rare use cases where you may want to set these values differently. If you have content that you do not want to ever appear in search, or you want to exclude content from a query with a broader path restriction, you can mismatch the includedPaths and queryPaths to support this. You can even include a base path and exclude additional paths with the excludedPaths property.
These are all edge cases and generally you'd be better of either updating the query or your content model rather than using the index to control this.Hopefully you enjoyed this blog post. To continue to dive into Oak Search and Indexing, you can read more on my series Demystifying Oak Search: