Python – Django queryset: Need help in optimizing this set of queries

Django queryset: Need help in optimizing this set of queries… here is a solution to the problem.

Django queryset: Need help in optimizing this set of queries

I’m trying to filter out some common label combinations from the list of education issue records.

For this example, I just look at the 2-tag example (tag-tag), I should get an example of the result like:
“Points” + “Curves” (65 bars)
“Plus” + “Minus” (40 articles)

This is the desired result in the SQL statement:

SELECT a.tag, b.tag, count(*)
FROM examquestions.dbmanagement_tag as a
INNER JOIN examquestions.dbmanagement_tag as b on a.question_id_id = b.question_id_id
where a.tag != b.tag
group by a.tag, b.tag

Basically, we identify different tags with common problems into a list and group them in the same matching label combination.

I tried a similar query using the Django queryset :

    twotaglist = [] #final set of results

alphatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
    betatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
    startindex = 0 #startindex reduced by 1 to shorten betatag range each time the atag changes. this is to reduce the double count of comparison of similar matches of tags
    for atag in alphatags:
        for btag in betatags[startindex:]:
            if (atag['tag'] != btag['tag']):
                commonQns = [] #to check how many common qns
                atagQns = tag.objects.filter(tag=atag['tag'], question_id__in=qnlist).values('question_id').annotate()
                btagQns = tag.objects.filter(tag=btag['tag'], question_id__in=qnlist).values('question_id').annotate()
                for atagQ in atagQns:
                    for btagQ in btagQns:
                        if (atagQ['question_id'] == btagQ['question_id']):
                            commonQns.append(atagQ['question_id'])
                if (len(commonQns) > 0):
                    twotaglist.append({'atag': atag['tag'],
                                        'btag': btag['tag'],
                                        'count': len(commonQns)})
        startindex=startindex+1

The logic works fine, but since I’m new to this platform, I’m not sure if there’s a shorter workaround to be more efficient.

Currently, queries for approximately 5K X 5K label comparisons take about 45 seconds 🙁

Plugin: Label class

class tag(models. Model):
    id = models. IntegerField('id',primary_key=True,null=False)
    question_id = models. ForeignKey(question,null=False)
    tag = models. TextField('tag',null=True)
    type = models. CharField('type',max_length=1)

def __str__(self):
        return str(self.tag)

Solution

Unfortunately, Django is not allowed to join unless it involves foreign keys (or one-to-one). You will have to do this in your code. I found a way (completely untested) to do it with a single query, which should significantly reduce execution time.

from collections import Counter
from itertools import combinations

# Assuming Models
class Question(models. Model):
    ...

class Tag(models. Model):
    tag = models. CharField(..)
    question = models. ForeignKey(Question, related_name='tags')

c = Counter()
questions = Question.objects.all().prefetch_related('tags') # prefetch M2M
for q in questions:
    # sort them so 'point' + 'curve' == 'curve' + 'point'
    tags = sorted([tag.name for tag in q.tags.all()])
    c.update(combinations(tags,2)) # get all 2-pair combinations and update counter
c.most_common(5) # show the top 5

The above code uses Counters, itertools.combinations, and django prefetch_related this should cover most of what may be unknown above. If the above code is not working, review these resources and modify them accordingly.

If you are not using M2M fields on the Problem model, you can still use them reverse relations Access tag as if it were M2M fields are the same. Review my edits, changing the inverse relationship from tag_set to tags. I made some other edits that should be consistent with the way you defined the model.

If you don’t specify related_name=’tags’, you can simply change the tags and prefetch_related in the filter to tag_set and you're good to go.

Related Problems and Solutions