Most documents have complex formats and are not structured as HTML pages.
To allow the crawler to index documents that are formatted differently, we rely on a Tika Server that is maintained by Apache. The server extracts a document’s content and transforms it into a basic HTML file.
In this example, you can inspect the document by selecting the PDF (ctrl+a) and see if it's selecting all of the contents in the PDF.
If it's only selecting certain part of the PDF, this indicates that the PDF has a restricted access. If the PDF is original generated by Microsoft Word for example, you can check why the document is not accessible as described in the documentations below and ensure that all of the contents of the PDF is accessible: