Python – Add PDF metadata with accents in python

Add PDF metadata with accents in python… here is a solution to the problem.

Add PDF metadata with accents in python

I want to use this code to change the metadata of a pdf file:

from PyPDF2 import PdfFileReader, PdfFileWriter

title = "Vice-présidence pour l'éducation"
fin = open(filename, 'rb')
reader = PdfFileReader(fin)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()

metadata.update({'/Title':title})

writer.addMetadata(metadata)

fout = open(filename, 'wb')
writer.write(fout)

fin.close()
fout.close()

If the title is in English (without accents), it works fine, but when it has accents, I get the following error:

TypeError: createStringObject should have str or unicode arg

How do I add accented titles to metadata?

Thanks

Solution

The only way to get this error message is in the library in library The createStringObject(string) function uses a parameter of the wrong type in the string itself.

It uses these functions in utils.py to find string or byte types

import builtins
bytes_type = type(bytes()) # Works the same in Python 2.X and 3.X
string_type = getattr(builtins, "unicode", str)

If I rewrite your code with

such an obvious type of error, I can only reproduce your error (the code is rewritten using the with statement, but only comment lines matter):

from PyPDF2 import PdfFileReader, PdfFileWriter

with open(inputfile, "rb") as fr, open(outputfile, "wb") as fw:
    reader = PdfFileReader(fr)
    writer = PdfFileWriter()

writer.appendPagesFromReader(reader)
    metadata = reader.getDocumentInfo()

# metadata.update({'/Title': "Vice-présidence pour l'éducation"})
    metadata.update({'/Title': [1, 2, 3]})  # <- wrong type here !
    writer.addMetadata(metadata)

writer.write(fw)

Your string type title = "Vice-présidence pour l'éducation" doesn’t seem to match bytes_type or string_type> resolved. Either you have a weird title variable type (I don’t see it in your code, probably because the MCVE was created) or you can’t put bytes_type or string_type as the intent of the type library author (this could be a bug in the library or a wrong installation, it’s hard for me to tell).

Without reproducible code, it’s hard to provide a solution. But hopefully this will give you the right direction. Perhaps setting the type of the string to any type that resolves to bytes_type or string_type is sufficient. Other solutions will be on library websites or simple hacks.

Related Problems and Solutions