Why was one of my pages not crawled? – Algolia

The reasons why a page within a domain wasn’t crawled but others were can vary. Check if:

The Crawler has reached the maxUrls number and stopped before reaching the specific URL.
The crawling process has finished. Crawling a big site can take time: check the progress from the Crawler page.
The page is linked from the rest of your site. Ensure you can trace a path from the startUrls to the missing page. It should either be reachable from these starting points or listed in your sitemap. If not, add the missing page as a start URL.
You’ve given the crawler the correct path. Ensure the page matches one of the pathsToMatch you’ve told the crawler to look for.
You have instructed the crawler to ignore the page. If the page matches an exclusionPatterns, the crawler ignores it.
The page requires a login. If so, add the login parameter to your configuration.
The page is rendered using JavaScript, you may need to set renderJavaScript to true in your configuration (note: this makes the crawling process slower).

If none of these solve your problem, an error may have happened while crawling the page. Please check your logs using the Monitoring and URL Inspector tabs.

You can also use the URL tester in the Editor tab of the Admin to get details on why a URL was skipped / ignored.

You can find a complete troubleshooting guide on our official documentation https://www.algolia.com/doc/tools/crawler/troubleshooting/crawl-status

Related articles