Problem
Some sites can see a sudden increase in the number of search API requests being made at different times due to bot traffic on the site.
Standard bot traffic which is crawling the site (such as Google or Bing etc) shouldn't cause any increase in the number of API requests that would be seen on a site. However, if a bot is scraping the site then this is likely to be making a large number of common API search requests on a regular basis, causing a large increase in traffic.
Algolia is not a bot detection service but it does provide some ways to monitor a high number of unexpected search requests.
If having reviewed this the site is still experiencing a high number of search requests, then you may need to look at ways in which to prevent the bot from scraping the site.
Preventing
Often when a site receives a high number of requests from a single/ multiple IP addresses, this can be seen as web scraping or at an extreme level a DOS/ DDOS attack. When this happens often the only way to prevent the bot from scraping your site is by restricting/ preventing their IP address from accessing the site.
This detection and prevention of these high numbers of requests can often be handled through various third party tools, such as Cloudflare, AWS and Akamai etc.
However, often with these types of requests, the originating IP address can change on a regular basis which can mean that it becomes very difficult to keep on top of these changes. Bots are also becoming more and more intelligent about how they scrape sites and can adapt and evolve as changes are made to prevent them accessing the site.
Possible solutions
Algolia provides a few ways in which you can try to prevent a bot from scraping a site constantly.
- Rate limit the API key
- Secured API keys
- Proxy through a CDN
Rate limit the API key
Algolia’s InstantSearch uses a search API key to provide access to the relevant index. This API key can be configured to be rate limited to provide some level of protection against the site being crawled.
While this limits the number of API requests that can be made in an hour, this may have a negative effect on the standard traffic if set too low, as this will prevent normal users from being able to search your content.
In this situation, we would recommend starting with a higher number and reducing it gradually until you find the best balance between stopping spam searches and affecting their normal users.
Secured API keys
Bots replaying the site's API requests can be prevented from doing this by changing the search API key on a regular basis. However, this is a time consuming manual process and requires the generation of a new search key and then updating of the frontend application with this new key as well as removing the old one.
Also, as mentioned earlier, bots have become more intelligent these days and can often detect when their requests have failed. When this happens they will attempt to check the site again to get an updated request URL.
To help with this, Algolia has secured API keys. These are virtual API keys which are generated on the fly, usually for particular circumstances like granting temporary access or giving a user access to a subset of data. However, they can also be used in this use case to automatically generate an API key that only lasts for a small amount of time.
Algolia provides a way to generate secure API keys through its API. This is something that needs to be run on a backend service which can be requested by the frontend implementation. This request will need to return the generated key to be used within InstantSearch rather than the standard search API key.
When generating this secured API key you can set a property of validUntil which is the timestamp of when the API key will expire. This can be used to set a small time frame (e.g. a day or a few hours depending on how much you want to restrict the bot) for how long the API is valid for.
In order to check if the API is valid, an API request can be made to get the remaining time. If the API key has expired or is it close to expiring, a new request can be made to generate a new API key.
As this is a backend implementation to generate the secured API keys, a mechanism will need to be put in place to be able to generate this secured API key (e.g. our own API endpoint). This will also need to have appropriate levels of detection in place to prevent a bot making a large number of requests on this.
By implementing this mechanism, the API key that the frontend implements is constantly changing without any further involvement from a developer or the team. While a bot may grab the search request initially, the API key used on this search request will expire resulting in the requests no longer working.
Proxy through a CDN
Another option may be to proxy the Algolia API requests through a CDN service, such as Cloudflare, AWS or Akamai etc. These will be able to detect higher than normal requests from a single IP address and then be able to block that IP address.
Building your own backend proxy
Algolia’s InstantSearch UI uses a search client to query the API directly from the user's browser. It is possible to implement your own search client that will query your own backend service, which will then query Algolia’s backend on your server.
Firstly you will need to configure your own backend implementation to receive the search requests. The following is an example using Node with an Express server:
// Instantiate an Algolia client
const algoliasearch = require('algoliasearch');
const algoliaClient = algoliasearch('YourApplicationId', 'YourSearchOnlyAPIKey');
// Add the search endpoint
app.post('/search', async ({body}, res) => {
const { requests } = body;
const results = await algoliaClient.search(requests);
res.status(200).send(results);
});
app.post('/sffv', async ({body}, res) => {
const { requests } = body;
const results = await algoliaClient.searchForFacetValues(requests);
res.status(200).send(results);
});
Having set up the backend service, a custom search client needs to be created which provides the 2 methods for search and searchForFacetValues:
const customSearchClient = {
search(requests) {
return fetch('http://localhost:3000/search', {
method: 'post',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ requests }),
}).then(res => res.json());
},
searchForFacetValues(requests) {
return fetch('http://localhost:3000/sffv', {
method: 'post',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ requests }),
}).then(res => res.json());
}
};
Finally, when initiating InstantSearch the custom search client can be set rather than the standard Algolia one:
const search = instantsearch({
indexName: 'YourIndexName',
searchClient: customSearchClient
});
search.start();
Further details of setting up a backend search can be found here.