Python – The sorting algorithm used by Panda’s sort_values when the kind parameter is not applied

The sorting algorithm used by Panda’s sort_values when the kind parameter is not applied… here is a solution to the problem.

The sorting algorithm used by Panda’s sort_values when the kind parameter is not applied

In Pindas’ sort_values method, the kind parameter is applied only when sorting a single column or label. Why is this? What sort algorithm is used in these cases where the kind parameter is not applied? Is it stable sorting?

(For documentation, see .) https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html。 )

Solution

This is a docstring from the source file , declare get_group_index_sorter(group_index, ngroups):

algos.groupsort_indexer implements `counting sort` and it is at least
O(ngroups), where
    ngroups = prod(shape)
    shape = map(len, keys)
that is, linear in the number of combinations (cartesian product) of unique
values of groupby keys. This can be huge when doing multi-key groupby.
np.argsort(kind='mergesort') is O(count x log(count)) where count is the
length of the data-frame;

Both algorithms are `stable` sort and that is necessary for correctness of

groupby operations. e.g. consider:
    df.groupby(key)[col].transform('first')

PS Here is a “call chain” :

pandas.core.frame.DataFrame.sort_values() -> \
  pandas.core.sorting.lexsort_indexer() ->  \
    pandas.core.sorting.indexer_from_factorized() -> \
      pandas.core.sorting.get_group_index_sorter()

Related Problems and Solutions