Why does the INST_PTR (instruction pointer) value of the same program change due to different runs?
In Intel’s PinTool, you can use IARG_INST_PTR
or INS_Address
Print out the “instruction address” of each instruction in the program. I have observed that running the same program at different points in time generates different instruction addresses for the exact same instructions. However, I want the address to remain the same in operation. What is the root cause of this change? I’ve attached below two sample outputs that show the opcodes and instruction addresses of the first three instructions executed.
How do I find the PC for each instruction? Or the address displayed in OBJDUMP
via PinTool
?
–Run 1–
op: MOV addr:0x00007fac87a8d2d0
op: CALL_NEAR addr:0x00007fac87a8d2d3
op: PUSH addr:0x00007fac87a90a70
–Run 2–
op: MOV addr:0x00007fc529f402d0
op: CALL_NEAR addr:0x00007fc529f402d3
op: PUSH addr:0x00007fc529f43a70
Solution
(tl; dr version, there is a possible solution at the end. )
This is almost certainly due to the fact that address space randomization applies to shared libraries. Running the following command multiple times gives you an idea of how it works:
$ cat /proc/self/maps
/
proc/self/ refers to the current process (the process that opens the file). There is also the /proc/cat
process itself.
This is the output of running it once on my system:
00400000-0040c000 r-xp 00000000 08:01 3409248 /bin/cat 0060b000-0060c000 r--p 0000b000 08:01 3409248 /bin/cat 0060c000-0060d000 rw-p 0000c000 08:01 3409248 /bin/cat 0063a000-0065b000 rw-p 00000000 00:00 0 [heap] 7f017ef95000-7f017f761000 r--p 00000000 08:01 8126750 /usr/lib/locale/locale-archive 7f017f761000-7f017f91b000 r-xp 00000000 08:01 11155466 /lib/x86_64-linux-gnu/libc-2.19.so 7f017f91b000-7f017fb1a000 ---p 001ba000 08:01 11155466 /lib/x86_64-linux-gnu/libc-2.19.so 7f017fb1a000-7f017fb1e000 r--p 001b9000 08:01 11155466 /lib/x86_64-linux-gnu/libc-2.19.so 7f017fb1e000-7f017fb20000 rw-p 001bd000 08:01 11155466 /lib/x86_64-linux-gnu/libc-2.19.so 7f017fb20000-7f017fb25000 rw-p 00000000 00:00 0 7f017fb25000-7f017fb48000 r-xp 00000000 08:01 11155454 /lib/x86_64-linux-gnu/ld-2.19.so 7f017fd1c000-7f017fd1f000 rw-p 00000000 00:00 0 7f017fd23000-7f017fd47000 rw-p 00000000 00:00 0 7f017fd47000-7f017fd48000 r--p 00022000 08:01 11155454 /lib/x86_64-linux-gnu/ld-2.19.so 7f017fd48000-7f017fd49000 rw-p 00023000 08:01 11155454 /lib/x86_64-linux-gnu/ld-2.19.so 7f017fd49000-7f017fd4a000 rw-p 00000000 00:00 0 7fffacef5000-7fffacf16000 rw-p 00000000 00:00 0 [stack] 7fffacf5a000-7fffacf5c000 r-xp 00000000 00:00 0 [vdso] 7fffacf5c000-7fffacf5e000 r--p 00000000 00:00 0 [vvar] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
The first three lines are the code snippet, read-only data segment, and read-write data segment of the executable file. The remaining lines are the stack, the heap, the individual segments of the shared library, the memory-mapped file (as a side note, the library is also just a memory-mapped file), and some internals related to how some system calls are implemented.
If you repeat the command a few times, you may see all the mappings move randomly except for the code and data segments in the executable. This is a security measure. Not knowing what’s in memory makes some attacks more difficult to implement because you can’t jump directly to an address where you know there will be useful routines.
The main reason that address space randomization is not applied to the code and data segments of the executable itself may be efficiency. Code that is not loaded into a fixed address must be location-independent, which adds some overhead. This is why shared libraries need to be explicitly compiled with -fPIC
.
(For reasons other than security, shared libraries also need to be location-independent.) If two libraries happen to get overlapping load addresses, using a fixed address for each library causes problems. )
Unfortunately, I’m not familiar with PinTool. I believe GDB just disables address space randomization (using personality(2
) system calls) to get predictable addresses for the shared library.
Address space randomization can be turned onoff for a single shell session (this also seems to use personality()
), Or by executing echo 0 >
/proc/sys/kernel/randomize_va_space in the global scope (see /proc/sys/ ).
I found the following on this page. May be relevant.
Does Pin change the application code and data addresses?
…
Note: Recent linux kernels intentionally move the location of stack and dynamically allocated data from run to run, even if you are not using pin. On RedHat-based systems you can workaround this by running Pin as follows:
$ setarch i386 pin -t pintool — app
tl; The Doctor replied
If all you need to do is associate an address from the PinTool that happens to come from the library to the objdump
disassembly address, and you don’t mind doing some manual work each time, the following should work:
Print /proc/maps from your process. (You can also run it in the background and print /proc/
/maps from the shell, for example using$!
to get the PID.) )Check which mapping the address belongs to. In the case of libraries, it could be a piece of text for a library (labeled
r-xp
in /proc/maps).Subtract the mapped start address from the address you see in PinTool.
This will give you the address you see in the objdump
disassembly when you run it on the same library. If the library has debugging information, you can also use addr2line(1)
to get the source code lines.
Of course there may be a better workflow. This worked for me at least when using dlopen(3) and dlsym(3
).
The core dump should contain the library load address, so maybe it can be used somehow….