What’s wrong with Python – lazy = “dynamic”? What are the options?

What’s wrong with Python – lazy = “dynamic”? What are the options? … here is a solution to the problem.

What’s wrong with Python – lazy = “dynamic”? What are the options?

10/

17/18 This question has been completely rewritten

In order to have an “edit version control system” (similar to StackOverflow), I configured the following class:

tags = db. Table(
    "tags",
    .db. Column("tag_id", db. Integer, db. ForeignKey("tag.id")),
    .db. Column("post_version_id", db. Integer,
        .db. ForeignKey("post_version.id"))
    )

class Tag(db. Model):
    id = db. Column(db. Integer, primary_key=True)
    tag = db. Column(db. String(128), index=True, unique=True)

class Post(db. Model):
    id = db. Column(db. Integer, primary_key=True)
    head_id = db. Column(db. Integer, db. ForeignKey("post_version.id"))

class PostVersion(db. Model):
    id = db. Column(db. Integer, primary_key=True)
    previous_id = db. Column(db. Integer, db. ForeignKey("post_version.id"), default=None)
    pointer_id = db. Column(db. Integer, db. ForeignKey("annotation.id"))
    current = db. Column(db. Boolean, index=True)
    tags = db.relationship("Tag", secondary=tags)

Excluding irrelevant columns such as post content, the real data model is annotations; For the sake of versatility, I simplified these models

The actual data consists of 136 POST with variable markup and variable versioning through editing; That is: I generated 136 Post. I have 15 Tags. The initial 136 Posts are marked as 2 Tags. I then tagged Post with different tags (using my editing system; So there are multiple PostVersion for editing).

You may notice that there are circular references between Post and PostVersion; I use it to configure the following two relationships for experimentation:

Relationship 1 posts

posts = db.relationship("Post",
    secondary="join(tags, PostVersion,"
        "tags.c.post_version_id==PostVersion.id)",
    primaryjoin="Tag.id==tags.c.tag_id",
    secondaryjoin="Post.head_id==PostVersion.id",
    lazy="dynamic")

Based on SQL statements

SELECT
    post.id
FROM
    tag
JOIN
    tags ON tag.id=tags.tag_id
JOIN
    post_version ON tags.post_version_id=post_version.id
JOIN
    post ON post.head_id=post_version.id
WHERE
    tag.id=<tag_id>

and

Relationship 2 posts2

posts2 = db.relationship("Post",
    secondary="join(tags, PostVersion,"
    "and_(tags.c.post_version_id==PostVersion.id,"
    "AnnotationVersion.current==True))",
    primaryjoin="Tag.id==tags.c.tag_id",
    secondaryjoin="PostVersion.pointer_id==Post.id",
    lazy="dynamic")

Based on SQL statements

SELECT
    annotation.id
FROM
    tag
JOIN
    tags ON tag.id=tags.tag_id
JOIN
    annotation_version ON tags.annotation_version_id=annotation_version.id AND 
    annotation_version.current=1
JOIN
    annotation ON annotation_version.pointer_id = annotation.id
WHERE
    tag_id=8;

This results in the following data:

Tag         Actual      len(t.posts.all())  len(t.posts.paginate(1,5,False).items)
t1          0           0                   0
t2          1           136                 5
t3          1           136                 5
t8          136         136                 1
t14         136         136                 1
t15         24          136                 1

Tag         Actual      t.posts.count()     t.posts2.count()
t1          0           0                   0
t2          1           136                 163
t3          1           136                 163
t8          136         22168               26569
t14         136         22168               26569
t15         24          3264                3912

I’ve ruled out redundant tags (i.e. all other tags with 0 Post) and the same data (i.e. the result from posts2 is the same as posts).

As you can see, there are some serious problems with the results! Especially for both relationships, if lazy="dynamic" is closed, the correct Post always returns.

Use echo=True to change SQL @IljaEverilä discovery lazy="dynamic" when creating the engine. I quote the comment in this issue :

In a nutshell: with lazy="dynamic" you get FROM post, tags, post_version WHERE ..., but without you get FROM post, tags JOIN post_version ON tags.post_version_id = post_version.id WHERE .... As you can see your composite secondary is pretty much ignored with the dynamic setting. Now the question is “why?”


My question:

1。 Is this a bug?

2。 What can I do to correct this dilemma?


Update:

As if lazy=”dynamic” is explicitly discouraged here But no other options are suggested. What is an alternative to still allowing paging and relying on large collections? This isn’t allowed by default (or at least, in the way I access it) and the documentation doesn’t seem to clarify the problem! In the title What kind of loading to use? It seems that the recommended loading strategy for large collections is lazy="subquery", but this does not allow paginate() and count().

Solution

This is indeed how an issue SQLAlchemy handles queries that form dynamic load relationships. Although the query should be

SELECT post.id AS post_id, post.head_id AS post_head_id 
FROM post, tags JOIN post_version ON tags.post_version_id = post_version.id 
WHERE ? = tags.tag_id AND post.head_id = post_version.id

The result is

SELECT post.id AS post_id, post.head_id AS post_head_id 
FROM post, tags, post_version
WHERE ? = tags.tag_id AND post.head_id = post_version.id

So, while there is an internal connection between post and post_version (in pre-SQL-92 style), but The internal join between tags and post_version lost, so there is a CROSS JOIN Between tags and other tags. The result is that the query loads all current post versions, regardless of tags, because each post is concatenated to every line in the tags. This also explains the multiplication of t.posts.count().

The solution is to wait for fix while using some other relationship loading strategy.

Related Problems and Solutions