在 App-Engine 中存储元组列表的最有效方法是什么?
当存储和检索包含元组列表的数据存储实体时,存储此列表的最有效方法是什么?
当我遇到这个问题时,元组可以是从键值对到日期时间和样本结果再到 (x, y) 坐标的任何内容。
元组的数量是可变的,范围从 1 到几百。
包含这些元组的实体需要快速/廉价地引用,并且元组值不需要索引。
我曾多次遇到过这个问题,并通过多种不同的方法解决了它。
方法1:
将元组值转换为字符串,并使用一些分隔符将它们连接在一起。
def PutEntity(entity, tuples):
entity.tuples = ['_'.join(tuple) for tuple in tuples]
entity.put()
优点:结果可以在数据存储查看器中轻松读取,所有内容均一次性获取。 缺点:潜在的精度损失、程序员需要反序列化/序列化、以字符串格式存储数据需要更多字节。
方法2:
将每个元组值存储在列表中并压缩/解压缩元组。
def PutEntity(entity, tuples):
entity.keys = [tuple[0] for tuple in tuples]
entity.values = [tuple[1] for tuple in tuples]
entity.put()
优点:没有精度损失、令人困惑但仍然可以在数据存储查看器中查看数据、能够强制执行类型、一次获取所有内容。
缺点:程序员需要压缩/解压缩元组或仔细维护列表中的顺序。
方法3:
序列化一些manor json、pickle、protocol buffers中的元组列表,并将其存储在blob或text属性中。
优点:可用于对象以及更复杂的对象,错误错过匹配元组值的风险较小。
缺点: Blob 存储访问需要额外的提取?,无法在数据存储查看器中查看数据。
方法 4:
将元组存储在另一个实体中并保留键列表。
优点:架构更加明显。如果实体是视图,我们不再需要保留元组数据的两个副本。
缺点:需要两次提取,一次用于实体和键列表,一次用于元组。
我想知道是否有人知道哪一个表现最好,是否有一种我没有想到的方法?
谢谢, 吉姆
When storing and retrieving a datastore entity that contains a list of tuples what is the most efficient way of storing this list?
When I have encountered this problem the tuples could be anything from key value pairs, to a datetime and sample results, to (x, y) coordinates.
The number of tuples is variable and ranges from 1 to a few hundred.
The entity containing these tuples, would need to be referenced quickly/cheaply, and the tuple values do not need to be indexed.
I have had this problem a few times, and have solved it a number of different ways.
Method 1:
Convert the tuple values to a string and concatenate them together with some delimiter.
def PutEntity(entity, tuples):
entity.tuples = ['_'.join(tuple) for tuple in tuples]
entity.put()
Advantages: Results are easily readable in the Datastore Viewer, everything is fetched in one get.
Disadvantages: Potential precision loss, programmer required to deserialize/serialize, more bytes required to store data in string format.
Method 2:
Store each tuple value in a list and zip / unzip the tuple.
def PutEntity(entity, tuples):
entity.keys = [tuple[0] for tuple in tuples]
entity.values = [tuple[1] for tuple in tuples]
entity.put()
Advantages: No loss of precision, Confusing but still possible to view data in Datastore viewer, Able to enforce types, Everything is fetched in one get.
Disadvantage: programmer needs to zip / unzip the tuples or carefully maintain order in the lists.
Method 3:
Serialize the list of tuples in some manor json, pickle, protocol buffers and store it in a blob or text property.
Advantages: Usable with objects, and more complex objects, less risk of a bug miss matching tuple values.
Disadvantages: Blob store access requires and additional fetch?, Can not view data in the Datastore Viewer.
Method 4:
Store the tuples in another entity and keep a list of the keys.
Advantages: More obvious architecture. If the entity is a view, we no longer need to keep two copies of the tuple data.
Disadvantages: Two fetches required one for the entity and key list and one for the tuples.
I am wondering if anyone knows which one performs the best and if there is a way I haven't thought about?
Thanks,
Jim
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我使用方法 3。Blobstore 可能需要额外的提取,但 db.BlobProperty 不需要。对于那些从存储中取出的对象与放入时完全相同这一点很重要的对象,我使用 PickleProperty(可以在 Tipfy 和其他一些实用程序库中找到)。
对于我只需要存储其状态的对象,我编写了一个 JsonProperty 函数,其工作方式与 PickleProperty 类似(但显然使用 SimpleJson)。
对我来说,在一次获取中获取所有数据并且防白痴比 CPU 性能(在 App Engine 中)更重要。根据 AppStats 上的 Google I/O 演讲,数据存储区的访问几乎总是比本地解析更昂贵。
I use Method 3. Blobstore may require an extra fetch, but db.BlobProperty does not. For objects where it is important that it comes out of storage exactly as it was put in I use PickleProperty (which can be found in tipfy, and some other utility libraries).
For objects where I just need its state stored I wrote a JsonProperty function that works similarly to PickleProperty (but uses SimpleJson, obviously).
For me getting all data in a single fetch, and being idiot-proof, is more important than cpu performance (in App Engine). According to the Google I/O talk on AppStats, a trip to the datastore is almost always going to be more expensive than a bit of local parsing.