Say for example you want a user to be able to search for questions in your index in a way where they can formulate their search like a question. For example the user wants to find a question related to how high a cat can jump. So they enter a query like the one below.
"How high can a cat jump?"
For the sake of example lets say you have two records with searchable attributes like the ones below
- "Look how high this cat can jump!"
- "Cats can jump pretty high, but how high can they actually go?"
With the default configuration the user's query "How high can a cat jump?" will return both results but the first record returns before the second. The user wanted an answer to their question and the question mark should be part of the query so how do we fix this?
The first step is to add the "?" to separatorsToIndex. Only non-separator characters are indexed, and thus searchable, by default. By adding the "?" to separatorsToIndex we've now allowed the Algolia engine to tokenize the "?" and make it a searchable character.
However you might notice something strange, "jump" and "?" are not highlighted in the search results with our current query. However if you add a space between "go" and "?" then "jump" and "?" are highlighted. This is due to the way the Algolia engine handles splitting.
Splitting is a technique we apply only at query time. For each non-separator token in a query, we try to split the token into two parts at each possible position. The reason "How high can a cat jump?" does not match exactly on the word "jump" and the now indexed "?" separator is because the engine is not splitting "go?" by default because "?" is a separator token. It's looking for records that match "go? exactly treating it as one word, but if you add a space between the questions mark and the last word of the query they're treated as two words which now makes it a question of proximity.
To resolve this you can create a rule to replace "?" with " ?", anytime the query contains "?". Simply adding a space before the "?" will allow the engine to view this as two separate words and return the most relevant results.