Django queryset: Need help in optimizing this set of queries
I’m trying to filter out some common label combinations from the list of education issue records.
For this example, I just look at the 2-tag example (tag-tag), I should get an example of the result like:
“Points” + “Curves” (65 bars)
“Plus” + “Minus” (40 articles)
…
This is the desired result in the SQL statement:
SELECT a.tag, b.tag, count(*)
FROM examquestions.dbmanagement_tag as a
INNER JOIN examquestions.dbmanagement_tag as b on a.question_id_id = b.question_id_id
where a.tag != b.tag
group by a.tag, b.tag
Basically, we identify different tags with common problems into a list and group them in the same matching label combination.
I tried a similar query using the Django queryset :
twotaglist = [] #final set of results
alphatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
betatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
startindex = 0 #startindex reduced by 1 to shorten betatag range each time the atag changes. this is to reduce the double count of comparison of similar matches of tags
for atag in alphatags:
for btag in betatags[startindex:]:
if (atag['tag'] != btag['tag']):
commonQns = [] #to check how many common qns
atagQns = tag.objects.filter(tag=atag['tag'], question_id__in=qnlist).values('question_id').annotate()
btagQns = tag.objects.filter(tag=btag['tag'], question_id__in=qnlist).values('question_id').annotate()
for atagQ in atagQns:
for btagQ in btagQns:
if (atagQ['question_id'] == btagQ['question_id']):
commonQns.append(atagQ['question_id'])
if (len(commonQns) > 0):
twotaglist.append({'atag': atag['tag'],
'btag': btag['tag'],
'count': len(commonQns)})
startindex=startindex+1
The logic works fine, but since I’m new to this platform, I’m not sure if there’s a shorter workaround to be more efficient.
Currently, queries for approximately 5K X 5K label comparisons take about 45 seconds 🙁
Plugin: Label class
class tag(models. Model):
id = models. IntegerField('id',primary_key=True,null=False)
question_id = models. ForeignKey(question,null=False)
tag = models. TextField('tag',null=True)
type = models. CharField('type',max_length=1)
def __str__(self):
return str(self.tag)
Solution
Unfortunately, Django is not allowed to join unless it involves foreign keys (or one-to-one). You will have to do this in your code. I found a way (completely untested) to do it with a single query, which should significantly reduce execution time.
from collections import Counter
from itertools import combinations
# Assuming Models
class Question(models. Model):
...
class Tag(models. Model):
tag = models. CharField(..)
question = models. ForeignKey(Question, related_name='tags')
c = Counter()
questions = Question.objects.all().prefetch_related('tags') # prefetch M2M
for q in questions:
# sort them so 'point' + 'curve' == 'curve' + 'point'
tags = sorted([tag.name for tag in q.tags.all()])
c.update(combinations(tags,2)) # get all 2-pair combinations and update counter
c.most_common(5) # show the top 5
The above code uses Counters, itertools.combinations, and django prefetch_related this should cover most of what may be unknown above. If the above code is not working, review these resources and modify them accordingly.
If you are not using M2M fields on the Problem
model, you can still use them reverse relations Access tag as if it were M2M fields are the same. Review my edits, changing the inverse relationship from tag_set
to tags
. I made some other edits that should be consistent with the way you defined the model.
If you don’t specify related_name=’tags’, you can simply change the tags
and prefetch_related in the filter to tag_set
and you're
good to go.