Python – Look for cache misses, hit ratios in I/O trace files

Look for cache misses, hit ratios in I/O trace files… here is a solution to the problem.

Look for cache misses, hit ratios in I/O trace files

I

have an I/O trace file with the following fields (‘asu’, ‘block_address’, ‘size’, ‘opcode’, ‘time_stamp’).
The data looks like this. (over 5 million rows).

0,20941264,8192,W,0.551706
0,20939840,8192,W,0.554041
0,20939808,8192,W,0.556202
1,3436288,15872,W,1.250720
1,3435888,512,W,1.609859
1,3435889,512,W,1.634761
0,7695360,4096,R,2.346628
1,10274472,4096,R,2.436645
2,30862016,4096,W,2 448003
2,30845544,4096,W,2.449733
1,10356592,4096,W,2.449733 

I want to add a caching layer to my project and want to count misses and hits.
I’m using @functools.lru_cache(maxsize = None).
Look for cache hits and misses for block_address.
Follow tutorial I try to count misses/hits. blk_trace is the trace array for block_address.

@functools.lru_cache(maxsize = None)
def blk_iter():
    blk_len = len(blk_trace)
    for i in range(0,blk_len):
        print(blk_trace[i])

When looking at the cache information blk_iter.cache_info(), I get CacheInfo(hits=0, misses=1, maxsize=None, currsize=1). This is not right.
I’m still new to Python and caching concepts. I don’t know what I’m doing wrong.
How do I find misses/hits for block addresses?

Solution

The cache is

used for function blk_iter — you only called the blk_iter once, so your cache size is 1 and there is 1 miss.

Consider the following function with lru_cache

@lru_cache(maxsize=None)
def myfunc(x):
    print('Cache miss: ', x)
    return x + 1

When called with a specific value of x, the function runs and the result is stored in the cache. If you call it again with the same parameters, the function will not run at all and return the cached value.

>>> for i in range(3):
...     print(myfunc(i))
...
Cache miss:  0
1
Cache miss:  1
2
Cache miss:  2
3
>>> myfunc(0) # this will be a cache hit
1
>>> myfunc(3) # this will be another miss
Cache miss:  3
4
>>> myfunc.cache_info()
CacheInfo(hits=1, misses=4, maxsize=None, currsize=4)   

In your case, even if the cache is set correctly, you will have all misses and no-hits for i in range(0,blk_len): Each iteration calls a new parameter, so the cache never hits.

Related Problems and Solutions