Looking for words containing one or more dashes/hyphens ('sign-in', 'ice-cream' ...), there may be some valid records (ie, containing the word we're looking for) that are not returned.
The issue happens when the word that contains the dash is located after position 1000 in a document.
When the token you're looking for contains a dash, It triggers sequence expression matching.
What we mean by "sequence expression matching" is a lookup where the position of each subpart of the token must be adjacent to the previous one.
For instance, using the example of 'ice-cream', the engine will look for a sequence of 'ice' then 'cream'; all of those have to be consecutive (so 'ice cream' will match, but not 'ice and cream' )
The issue with sequence expression matching is that it won't work for all words whose position in a document is greater than position 1000.
For instance, It means that if the word 'ice-cream' is after 1000 other words in the document, 'ice-cream' lookup won't work because 'ice' and 'cream' positions haven't been recorded, so the engine is not able to validate that the position of the two words are consecutive.
There are two different ways to mitigate the issue.
- The first one, which we recommend, is to have smaller records, every time it is possible.
- The second is to replace dashes '-' with spaces ' '. In that case, records that contains all words regardless of their position, will be returned.