我可以用来存储表格数据的最佳数据结构?

发布于 2024-12-29 17:16:24 字数 820 浏览 0 评论 0原文

我有一个> 10,000 的帧列表和一个源列表(坐标),我想找到哪个源存在于哪个帧上。每个帧都有一个过滤器属性,并且期望可以在同一过滤器的一个或多个帧上找到源。是这样的吗,我只想记录这样的事件的一次发生。

最终轻松运行脚本来生成网络表。下面是我想要生成的表格的示例。

Source | filter_1 |filter_2 |filter_3 |filter_4 |
-------------------------------------------------
1      | image1   | image 2 | image 3 | image 4 |
2      | image5   | image 6 | image 7 | image 8 |

这是我的代码,

webtable =[]
for frame in frames:
  for x, y in sources:
    if x_y_on_frame():
       webtable.append(
       {
       'source':(x,y), 
       'ifilter':frame.filter.name, 
       'ifile':frame.filename,
       'pFile':frame.pngfile,
       'fFile':frame.fitsfile,
       }
       )

在我追加之前,我需要检查源即 (x,y)ifilter 的组合是否已存在于 webtable 中记录。实现这个的最佳数据结构是什么?

I have a list of frames >10,000 and a list of sources (Coordinates), I want find which source exists on which frame. Each frame has a filter attribute, and it is expected that source can be found on one or more frames of the same filter. Is this is the case, i want to record only one one occurance of such an event.

Eventually run a script easily to generate a web-table. Below is an example of tables i want to generate.

Source | filter_1 |filter_2 |filter_3 |filter_4 |
-------------------------------------------------
1      | image1   | image 2 | image 3 | image 4 |
2      | image5   | image 6 | image 7 | image 8 |

this it my code

webtable =[]
for frame in frames:
  for x, y in sources:
    if x_y_on_frame():
       webtable.append(
       {
       'source':(x,y), 
       'ifilter':frame.filter.name, 
       'ifile':frame.filename,
       'pFile':frame.pngfile,
       'fFile':frame.fitsfile,
       }
       )

I need to check if a combination of a source i.e. (x,y) and ifilter already exist in webtable before i append the record. What is the best data structure to implement this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

烟酒忠诚 2025-01-05 17:16:24

我需要检查源即 (x,y) 和 ifilter 的组合是否
在我附加记录之前已经存在于 webtable 中。什么是最好的
数据结构来实现这个?

假设 x,y 和 ifilter 都可以表示为字符串或整数(或其他不可变类型),实际上将您的信息简单地存储在字典中会更容易,其中 (x,y,ifilter) 的元组是关键,这需要最少的代码,并且仍然非常高效:

webtable ={}
for frame in frames:
  for x, y in sources:
    if x_y_on_frame():
        keyTuple = (x,y,frame.filter.name)
        if not keyTuple in webtable:
            webtable[keyTuple] = {
            'ifile':frame.filename,
            'pFile':frame.pngfile,
            'fFile':frame.fitsfile,
            }

I need to check if a combination of a source i.e. (x,y) and ifilter
already exist in webtable before i append the record. What is the best
data structure to implement this?

Assuming that x,y and ifilter can all be represented as strings, or integers (or other immutable types), it would actually be even easier to simply store your information in a dictionary where a tuple of (x,y,ifilter) is the key, this would require a minimal amount of code, and still be very efficient:

webtable ={}
for frame in frames:
  for x, y in sources:
    if x_y_on_frame():
        keyTuple = (x,y,frame.filter.name)
        if not keyTuple in webtable:
            webtable[keyTuple] = {
            'ifile':frame.filename,
            'pFile':frame.pngfile,
            'fFile':frame.fitsfile,
            }
阪姬 2025-01-05 17:16:24

Python dict 就可以了。如果存在具有给定 ifilter、x 和 y 的条目 - 继续到源中的下一项:

webtable = []
webtable_cache = {}

for frame in frames:
  for x, y in sources:
    if x_y_on_frame():

        ifilter = frame.filter.name

        if ifilter in webtable_cache
           if y in webtable_cache[ifilter]:
                if x in webtable_cache[ifilter][y]:
                    continue     # already in webtable
                else:
                    webtable_cache[ifilter][y][x] = True
            else:
                webtable_cache[ifilter][y] = {x: True}
        else:
            webtable_cache[ifilter] = {y: {x: True}}

        webtable.append(
               {
               'source':(x,y), 
               'ifilter':ifilter, 
               'ifile':frame.filename,
               'pFile':frame.pngfile,
               'fFile':frame.fitsfile,
               }
           )

Python dict would be just fine. If there is an entry with given ifilter, x and y - continue to next item in sources:

webtable = []
webtable_cache = {}

for frame in frames:
  for x, y in sources:
    if x_y_on_frame():

        ifilter = frame.filter.name

        if ifilter in webtable_cache
           if y in webtable_cache[ifilter]:
                if x in webtable_cache[ifilter][y]:
                    continue     # already in webtable
                else:
                    webtable_cache[ifilter][y][x] = True
            else:
                webtable_cache[ifilter][y] = {x: True}
        else:
            webtable_cache[ifilter] = {y: {x: True}}

        webtable.append(
               {
               'source':(x,y), 
               'ifilter':ifilter, 
               'ifile':frame.filename,
               'pFile':frame.pngfile,
               'fFile':frame.fitsfile,
               }
           )
她比我温柔 2025-01-05 17:16:24

由于您的数据字典有一组静态键,因此 collections 模块中的 namedtuple 实际上会比匿名字典更好。命名元组的开销比字典低(因为不必为每个项目存储重复的键),但具有命名访问的便利性。

您可以定义类似于以下内容的命名元组:

from collections import namedtuple
Row = namedtuple('Row', 'iFile pFile fFile')

然后,而不是创建以下形式的字典:

{ 'iFile': foo, 'pFile': bar, ...}

您将创建从工厂函数返回的命名元组的实例:

Row(iFile=foo, pFile=bar, ...)

如果您需要访问附加值,则只需将其作为实例变量访问:

foo = Row(iFile="somevalue", pfile="different_value", fFile="yet another value")
if foo.iFile == "whatever":
   ....

Since you have a static set of keys for your data dictionaries, a namedtuple from the collections module would actually be better than the anonymous dictionary. Namedtuples have a lower overhead than dictionaries (since the duplicate keys don't have to be stored per item), but have the convenience of named access.

You could define your namedtuple similar to:

from collections import namedtuple
Row = namedtuple('Row', 'iFile pFile fFile')

Then, rather creating a dictionary of the form:

{ 'iFile': foo, 'pFile': bar, ...}

you would create an instance of your namedtuple you got back from the factoryfunction:

Row(iFile=foo, pFile=bar, ...)

If you need to access an attached value, you just access it as an instance variable:

foo = Row(iFile="somevalue", pfile="different_value", fFile="yet another value")
if foo.iFile == "whatever":
   ....
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文