Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte #48

Open
Helias opened this issue Oct 15, 2021 · 2 comments · May be fixed by #49
Open

Comments

@Helias
Copy link

Helias commented Oct 15, 2021

Running pdfx file.pdf -v > output.txt I get this issue:

  File "/home/helias/.local/bin/pdfx", line 8, in <module>
    sys.exit(main())
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/cli.py", line 158, in main
    pdf = pdfx.PDFx(args.pdf)
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/__init__.py", line 128, in __init__
    self.reader = PDFMinerBackend(self.stream)
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 236, in __init__
    refs = self.resolve_PDFObjRef(page.annots)
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 273, in resolve_PDFObjRef
    return [self.resolve_PDFObjRef(item) for item in obj_ref]
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 273, in <listcomp>
    return [self.resolve_PDFObjRef(item) for item in obj_ref]
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 305, in resolve_PDFObjRef
    return Reference(obj_resolved["A"]["URI"].decode("utf-8"), self.curpage)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte

I guess it is related to some utf-8 codec, is there a way to solve it?

It should be related to this: https://github.com/metachris/pdfx/blob/master/pdfx/backends.py#L305

@Helias
Copy link
Author

Helias commented Oct 15, 2021

I solved replacing in the code decode('utf-8') with decode('ISO-8859-1'), I don't know if it's good to replace it or may we can do a try / except and in the except we can put the decode('ISO-8859-1')

Helias added a commit to Helias/pdfx that referenced this issue Oct 15, 2021
@Helias Helias linked a pull request Oct 15, 2021 that will close this issue
@Helias
Copy link
Author

Helias commented Oct 15, 2021

I made a Pull Request for this, hope you will appreciate it.

For me it's a bit dirty the try/except but it works locally, may it's a good temporary solution.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant