Python – pd.qcut returns a negative value

pd.qcut returns a negative value… here is a solution to the problem.

pd.qcut returns a negative value

This is a simple data sample series:

sample
Out[2]: 
0    0.047515
1    0.026392
2    0.024652
3    0.022854
4    0.020397
5    0.000087
6    0.000087
7    0.000078
8    0.000078
9    0.000078

The lower value is 0.000078 and the maximum value is 0.047515.
When I use the qcut function on it, the result gives negative data for my category.

pd.qcut(sample, 4)
Out[31]: 
0         (0.0242, 0.0475]
1         (0.0242, 0.0475]
2         (0.0242, 0.0475]
3         (0.0102, 0.0242]
4         (0.0102, 0.0242]
5       (8.02e-05, 0.0102]
6       (8.02e-05, 0.0102]
7    (-0.000922, 8.02e-05]
8    (-0.000922, 8.02e-05]
9    (-0.000922, 8.02e-05]
Name: data, dtype: category
Categories (4, interval[float64]): [(-0.000922, 8.02e-05] < (8.02e-05, 0.0102] < (0.0102, 0.0242] < (0.0242, 0.0475]]

Is this expected behavior? I thought I would find my minimum and maximum as the lower and upper limits of my category.

(I’m using pandas 0.22.0 and python-2.7).

Solution

This happens because the boxing process subtracts .001 from the lowest value in your range. If the exact number in the edge == series of bin, it is not clear which bin the number should be put in. Therefore, it makes sense to adjust the minimum and maximum values slightly before creating qtiles.

See lines 210-213 in the pd.cut source code. https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/reshape/tile.py#L210-L213

0.000078 -.001
Out[21]: -0.0009220000000000001

Related Problems and Solutions