Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

PDFx won't see links in some PDFs #37

Open
ghost opened this issue Sep 26, 2019 · 1 comment
Open

PDFx won't see links in some PDFs #37

ghost opened this issue Sep 26, 2019 · 1 comment

Comments

@ghost
Copy link

ghost commented Sep 26, 2019

PDFx won't see most of the links in the PDF below. Is this a known issue? Is there a fix for it?
Many thanks!
https://webarchive.nationalarchives.gov.uk/20160613090753/https://www.litvinenkoinquiry.org/files/Litvinenko-Inquiry-Report-web-version.pdf

@htInEdin
Copy link

There are actually several problems with link processing in pdfx and pdfminer. I'm working on a pull request to address as many of them as I can, but in the interim the attached
simple patch (against pdfx version 1.3.0) will fix the most serious one.

This patch increases the number of links recovered from the above linked file by harvesting Link annotations (as opposed to those scraped from the text) from 3 to 1067.

backends_patch.txt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant