Add PDF metadata with accents in python
I want to use this code to change the metadata of a pdf file:
from PyPDF2 import PdfFileReader, PdfFileWriter
title = "Vice-présidence pour l'éducation"
fin = open(filename, 'rb')
reader = PdfFileReader(fin)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
metadata.update({'/Title':title})
writer.addMetadata(metadata)
fout = open(filename, 'wb')
writer.write(fout)
fin.close()
fout.close()
If the title is in English (without accents), it works fine, but when it has accents, I get the following error:
TypeError: createStringObject should have str or unicode arg
How do I add accented titles to metadata?
Thanks
Solution
The only way to get this error message is in the library in library The createStringObject(string)
function uses a parameter of the wrong type in the string
itself.
It uses these functions in utils.py to find string or byte types
import builtins
bytes_type = type(bytes()) # Works the same in Python 2.X and 3.X
string_type = getattr(builtins, "unicode", str)
If I rewrite your code with
such an obvious type of error, I can only reproduce your error (the code is rewritten using the with statement, but only comment lines matter):
from PyPDF2 import PdfFileReader, PdfFileWriter
with open(inputfile, "rb") as fr, open(outputfile, "wb") as fw:
reader = PdfFileReader(fr)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
# metadata.update({'/Title': "Vice-présidence pour l'éducation"})
metadata.update({'/Title': [1, 2, 3]}) # <- wrong type here !
writer.addMetadata(metadata)
writer.write(fw)
Your string type title = "Vice-présidence pour l'éducation"
doesn’t seem to match bytes_type
or string_type
> resolved. Either you have a weird title variable type (I don’t see it in your code, probably because the MCVE was created) or you can’t put bytes_type
or string_type
as the intent of the type library author (this could be a bug in the library or a wrong installation, it’s hard for me to tell).
Without reproducible code, it’s hard to provide a solution. But hopefully this will give you the right direction. Perhaps setting the type of the string to any type that resolves to bytes_type
or string_type
is sufficient. Other solutions will be on library websites or simple hacks.