Python – Parsing binary messages using kaitai struct and python

Parsing binary messages using kaitai struct and python… here is a solution to the problem.

Parsing binary messages using kaitai struct and python

I need to extract and process data (variable-sized binary messages) from a very large message log. Using the GIF sample and online documentation, I defined a variable-size message layout and compiled it into msg_log.py. Calling msg_log.from_file(“small_logfile”) enabled me to examine and verify the field value of the first message in the log file.

For small log files that fit in memory, how do I get msg_log.py to check the log for items 2, 3, and subsequent messages?

For very large log files, I want to page the input through a byte buffer. I haven’t done this yet and haven’t found examples or discussions on how to do it. How do I keep msg_log.py in sync with the paged byte buffer when content changes?

My message structure is currently defined as follows. (I also used “seq” instead of “instances”, but still only check the first message.) )

meta:
  id: message
  endian: be
instances:
  msg_header:
    pos: 0x00
    type: message_header
  dom_header:
    pos: 0x06
    type: domain_header
  body:
    pos: 0x2b
    size: msg_header.length - 43
types:
  message_header:
    seq:
      - id: length
        type: u1
      <other fixed-size fields - 5 bytes>
  domain_header:
    seq:
      <fixed-size fields - 37 bytes>
  message_body:
    seq:
      - id: body
        size-eos: true

Solution

Resolving multiple structures consecutively from a single stream can be achieved in the following ways:

from msg_log import Message
from kaitaistruct import KaitaiStream

f = open("yourfile.bin", "rb")
stream = KaitaiStream(f)
obj1 = Message(stream)
obj2 = Message(stream)
obj3 = Message(stream)    
# etc
stream.close()

I’m not sure what “paging through byte buffer” means. The above method itself does not load the entire file into memory, it reads it with a normal read()-like call upon request.

If you want better performance and you’re dealing with a large file of a fixed size, you can choose to do memory mapping. This allows you to use only one memory area, and the operating system handles the input/output required to load the relevant parts of the file into actual physical memory. For Python, there is a PR for runtime implementation assistant, or, you can do it yourself:

from kaitaistruct import KaitaiStream
import mmap

f = open("yourfile.bin", "rb")
with mmap.mmap(f.fileno(), 0, access=mmap. ACCESS_READ) as buf:
    stream = KaitaiStream(BytesIO(buf))
    obj1 = Message(stream)
    obj2 = Message(stream)
    obj3 = Message(stream)    
    # etc

Related Problems and Solutions