Python – An effective way to find the path with the highest average edge value between two points in a Pandas data frame?

An effective way to find the path with the highest average edge value between two points in a Pandas data frame?… here is a solution to the problem.

An effective way to find the path with the highest average edge value between two points in a Pandas data frame?

Forgive the seemingly confusing wording of this question. This is what I want to do.

Given a data frame df

Fruit1     Fruit2      Weight
orange     apple       0.2
orange     grape       0.4
orange     pineapple   0.6
orange     banana      0.8
apple      grape       0.9
apple      pineapple   0.3
apple      banana      0.2
grape      pineapple   0.1
pineapple  banana      0.8

and constraints on the maximum allowable path length, L

I want to return a dataframe with the highest average path (i.e. the sum of all edges between the point/path length is the maximum), where the edge is represented by the weight column, assuming no more than the length L.

Give an example of what the highest average path means:

Suppose we only have 4 points A, B, C, and D. We are interested in finding the highest average path between A and D.

The highest average path is max((A->D)/1, (A->B + B->D)/2, (A->C + C->D)/2, (A->B + B->C + C-> D)/3, (A->C + C->B

+ B-> D)/3 ) in the case of L=3

For L=2, it will be max((A->D)/1, (A->B + B->D)/2

, (A->C + C->D)/2

).

In the case of df, we get a similar result for L=2

Fruit1     Fruit2      Weight   MaxAvgPath(L=2)
orange     apple       0.2       [orange, grape, apple]  
orange     grape       0.4       [orange, apple, grape]
orange     pineapple   0.6       [orange, banana, pineapple]
orange     banana      0.8       [orange, banana]
apple      grape       0.9       [apple, grape]
apple      pineapple   0.3       [apple, grape, pineapple]
apple      banana      0.2       [apple, pineapple, banana] 
grape      pineapple   0.1       [grape, orange, pineapple]
grape      banana      0.1       [grape, apple, banana]
pineapple  banana      0.8       [pineapple, banana]

Note: This edge set contains just enough rows to represent the entire network. For example, while (orange, apple) exists, (apple, orange) does not exist because (apple, orange) has the same weight. However, if you include those that help create the solution, feel free to use them and leave them in the final output. The return path of the mirror pair should be the same as the return path of the original pair.

Solution

Thanks for the clarification, so you require the path with the highest cost/(number of edges) between each pair of nodes in the graph, where the path is limited to the upper limit of the connecting edges. The longest path problem is np-hard, so an effective solution is only possible if there is a limitation
(See https://en.wikipedia.org/wiki/Longest_path_problem). I think your edge join restriction artificially enforces the longest path of length L, so it can be reduced to an exponential of L.

The networkX module can find our simple paths with depth-first search, and we can rank paths by manually summing weights and sorting. We can do this for each pair of nodes and add it as a new series to the data frame.

import pandas
import networkx

# Create a graph from the dataframe
G = networkx.from_pandas_dataframe(path_frame, 'Fruit1', 'Fruit2', 'Weight')

# Find the longest path between source and target up to length L
def maxpath_cond(G, source, target, edge_attr, L=None):
    #Use networkx simple paths function which uses a depth first search
    paths = networkx.simple_paths.all_simple_paths(G,source, target, L)
    # Calculate and sort the costs of the path
    costs = [(pathcost(G, pth, edge_attr), pth) for pth in paths]
    return sorted(costs, key=lambda x:x[0], reverse=True)

def pathcost(G,path, edge_attr):
    lp = len(path)-1
    return sum(G[path[n]][path[n+1]][edge_attr] for n in range(lp))/lp
#Iterate through the dataframe and create a new series made up of long paths
mxs = []
for n in range(len(path_frame)):
    src, targ = path_frame.loc[n]['Fruit1'], path_frame.loc[n]['Fruit2']
    mxl = maxpath_cond(G, src, targ, 'Weight', 2)[0]
    mxs.append( mxl[1])

print(path_frame.assign(MaxAvgPath=mxs))

Which outputs:

      Fruit1     Fruit2  Weight                   MaxAvgPath
0     orange      apple     0.2       [orange, grape, apple]
1     orange      grape     0.4       [orange, apple, grape]
2     orange  pineapple     0.6  [orange, banana, pineapple]
3     orange     banana     0.8             [orange, banana]
4      apple      grape     0.9               [apple, grape]
5      apple  pineapple     0.3    [apple, grape, pineapple]
6      apple     banana     0.2   [apple, pineapple, banana]
7      grape  pineapple     0.1    [grape, apple, pineapple]
8  pineapple     banana     0.8          [pineapple, banana]

Related Problems and Solutions