Python – Extract specific words (not keywords) from log files

Extract specific words (not keywords) from log files… here is a solution to the problem.

Extract specific words (not keywords) from log files

I’m trying to extract some words from the following sample.txt as shown in the expected output and put them into a list. I’m having trouble extracting the correct fields. I’ve tried my method but it doesn’t work most of the time. I prefer to use python to do this, but open to other languages. Any pointers to other methods are greatly appreciated.

Sample .log

//*********************************************************************************
 update section
//*********************************************************************************
      for (i=0; i< models; i = i+1) begin:modelgen

model_ip model_inst
         (
          .model_powerdown(model_powerdown),
          .mcg(model_powerdown),
          .lambda(_lambda[i])
          );
      assign fnl_verifier_lock = (tx_ready & rx_ready) ? &verifier_lock :1'b0;

native_my_ip native_my_inst
     (
      .tx_analogreset(tx_analogreset),     
     .unused_tx_parallel_data({1536{1'b0}})

);

 END Section I : 
   //*********************************************************************************
   resync 
     #(
       . INIT_VALUE (1)
       ) inst_reset_sync 
       (
    .clk    (tx_coreclkin),
    .reset  (!tx_ready), // tx_digitalreset from reset 
    .d      (1'b0),
    .q      (srst_tx_common  )
    );

Expected output

model_ip
native_my_ip
resync

My attempt

import re

input_file = open("sample.log", "r")
result = []
for line in input_file:
    # need a more generic match condition to extract expected results 
    match_instantiation = re.match(r'\s(.*) ([a-zA-Z_0-9]+) ([a-zA-Z_0-9]+)_inst (.*)', line)

if match_instantiation:
    print match_instantiation.group(1)
    result.append(match_instantiation.group(1))
    else:
        continue

Solution

You may need to read more than one line at a time to determine whether the string is the module name
Or not.
Try the following:

import re

input_file = open("sample.log", "r")
lines = input_file.read()   # reads all lines and store into a variable
input_file.close()
for m in re.finditer(r'^\s*([a-zA-Z_0-9]+)\s+([a-zA-Z_0-9]+\s+\(|#\()', lines, re. MULTILINE):
    print m.group(1)

Produce:

model_ip
native_my_ip
resync

The regular expression above looks ahead of time for possible instance names or #(.

Hope this helps you.

Related Problems and Solutions