What’s wrong with Python – lazy = “dynamic”? What are the options?
10/
17/18 This question has been completely rewritten
In order to have an “edit version control system” (similar to StackOverflow), I configured the following class:
tags = db. Table(
"tags",
.db. Column("tag_id", db. Integer, db. ForeignKey("tag.id")),
.db. Column("post_version_id", db. Integer,
.db. ForeignKey("post_version.id"))
)
class Tag(db. Model):
id = db. Column(db. Integer, primary_key=True)
tag = db. Column(db. String(128), index=True, unique=True)
class Post(db. Model):
id = db. Column(db. Integer, primary_key=True)
head_id = db. Column(db. Integer, db. ForeignKey("post_version.id"))
class PostVersion(db. Model):
id = db. Column(db. Integer, primary_key=True)
previous_id = db. Column(db. Integer, db. ForeignKey("post_version.id"), default=None)
pointer_id = db. Column(db. Integer, db. ForeignKey("annotation.id"))
current = db. Column(db. Boolean, index=True)
tags = db.relationship("Tag", secondary=tags)
Excluding irrelevant columns such as post content, the real data model is annotations; For the sake of versatility, I simplified these models
The actual data consists of 136 POST
with variable markup and variable versioning through editing; That is: I generated 136 Post
. I have 15 Tags
. The initial 136 Posts
are marked as 2 Tags
. I then tagged Post
with different tags (using my editing system; So there are multiple PostVersion
for editing).
You may notice that there are circular references between Post and PostVersion; I use it to configure the following two relationships for experimentation:
Relationship 1 posts
posts = db.relationship("Post",
secondary="join(tags, PostVersion,"
"tags.c.post_version_id==PostVersion.id)",
primaryjoin="Tag.id==tags.c.tag_id",
secondaryjoin="Post.head_id==PostVersion.id",
lazy="dynamic")
Based on SQL statements
SELECT
post.id
FROM
tag
JOIN
tags ON tag.id=tags.tag_id
JOIN
post_version ON tags.post_version_id=post_version.id
JOIN
post ON post.head_id=post_version.id
WHERE
tag.id=<tag_id>
and
Relationship 2 posts2
posts2 = db.relationship("Post",
secondary="join(tags, PostVersion,"
"and_(tags.c.post_version_id==PostVersion.id,"
"AnnotationVersion.current==True))",
primaryjoin="Tag.id==tags.c.tag_id",
secondaryjoin="PostVersion.pointer_id==Post.id",
lazy="dynamic")
Based on SQL statements
SELECT
annotation.id
FROM
tag
JOIN
tags ON tag.id=tags.tag_id
JOIN
annotation_version ON tags.annotation_version_id=annotation_version.id AND
annotation_version.current=1
JOIN
annotation ON annotation_version.pointer_id = annotation.id
WHERE
tag_id=8;
This results in the following data:
Tag Actual len(t.posts.all()) len(t.posts.paginate(1,5,False).items)
t1 0 0 0
t2 1 136 5
t3 1 136 5
t8 136 136 1
t14 136 136 1
t15 24 136 1
Tag Actual t.posts.count() t.posts2.count()
t1 0 0 0
t2 1 136 163
t3 1 136 163
t8 136 22168 26569
t14 136 22168 26569
t15 24 3264 3912
I’ve ruled out redundant tags (i.e. all other tags
with 0 Post
) and the same data (i.e. the result from posts2
is the same as posts
).
As you can see, there are some serious problems with the results! Especially for both relationships, if lazy="dynamic"
is closed, the correct Post
always returns.
Use echo=True
to change SQL @IljaEverilä discovery lazy="dynamic"
when creating the engine. I quote the comment in this issue :
In a nutshell: with
lazy="dynamic"
you getFROM post, tags, post_version WHERE ...
, but without you getFROM post, tags JOIN post_version ON tags.post_version_id = post_version.id WHERE ....
As you can see your composite secondary is pretty much ignored with the dynamic setting. Now the question is “why?”
My question:
1。 Is this a bug?
2。 What can I do to correct this dilemma?
Update:
As if lazy
=”dynamic” is explicitly discouraged here But no other options are suggested. What is an alternative to still allowing paging and relying on large collections? This isn’t allowed by default (or at least, in the way I access it) and the documentation doesn’t seem to clarify the problem! In the title What kind of loading to use? It seems that the recommended loading strategy for large collections is lazy="subquery",
but this does not allow paginate()
and count().
Solution
This is indeed how an issue SQLAlchemy handles queries that form dynamic load relationships. Although the query should be
SELECT post.id AS post_id, post.head_id AS post_head_id
FROM post, tags JOIN post_version ON tags.post_version_id = post_version.id
WHERE ? = tags.tag_id AND post.head_id = post_version.id
The result is
SELECT post.id AS post_id, post.head_id AS post_head_id
FROM post, tags, post_version
WHERE ? = tags.tag_id AND post.head_id = post_version.id
So, while there is an internal connection between post
and post_version
(in pre-SQL-92 style), but The
internal join between tags and post_version
lost, so there is a CROSS JOIN Between tags and other tags
. The result is that the query loads all current post versions, regardless of tags, because each post is concatenated to every line in the tags
. This also explains the multiplication of t.posts.count().
The solution is to wait for fix while using some other relationship loading strategy.