docx : iterate through paragraphs, tables and images while maintaining order
This is my first time posting here, and I would like to write a script that takes docx as input and selects certain paragraphs (including tables and images) to copy to another template document in the same order (not at the end). The problem I’m having is that when I start iterating through elements, my code can’t detect the image, so I can’t determine where the image is relative to the text and table, nor which image the image is.
In short, I got doc1:
text
Image
text
table
Text
What I ended up with was:
text
[Image Loss]
text
table
Text
What I’ve got so far :
– I can iterate through paragraphs and tables:
def iter_block_items(parent):
"""
Generate a reference to each paragraph and table child within *parent*,
in document order. Each returned value is an instance of either Table or
Paragraph. *parent* would most commonly be a reference to a main
Document object, but also works for a _Cell object, which itself can
contain paragraphs and tables.
"""
if isinstance(parent, _Document):
parent_elm = parent.element.body
# print(parent_elm.xml)
elif isinstance(parent, _Cell):
parent_elm = parent._tc
else:
raise ValueError("something's not right")
for child in parent_elm.iterchildren():
if isinstance(child, CT_P):
yield Paragraph(child, parent)
elif isinstance(child, CT_Tbl):
yield Table(child, parent)
I can get an ordered list of document images:
pictures = []
for pic in dwo.inline_shapes:
if pic.type == WD_INLINE_SHAPE. PICTURE:
pictures.append(pic)
I can insert a specific image at the end of the paragraph:
def insert_picture(index, paragraph):
inline = pictures[index]._inline
rId = inline.xpath('./a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed')[0]
image_part = dwo.part.related_parts[rId]
image_bytes = image_part.blob
image_stream = BytesIO(image_bytes)
paragraph.add_run().add_picture(image_stream, Inches(6.5))
return
I use the function iter_block_items():
start_copy = False
for block in iter_block_items(document):
if isinstance(block, Paragraph):
if block.text == "TEXT FROM WHERE WE STOP COPYING":
break
if start_copy:
if isinstance(block, Paragraph):
last_paragraph = insert_paragraph_after(last_paragraph,block.text)
elif isinstance(block, Table):
paragraphs_with_table.append(last_paragraph)
tables_to_apppend.append(block._tbl)
if isinstance(block, Paragraph):
if block.text == ""TEXT FROM WHERE WE START COPYING":
start_copy = True
Solution
You can find exactly the same working implementation of this at the following link: