对文件中的字符串进行标记
我有一个文件,我正在标记其中的所有字符串。
因此,每个令牌都存储在 char *token = (char *) malloc(len + 1); 令牌在分配新令牌之前被释放,因此我需要一种方法来存储令牌以供进一步使用。
存储代币的好策略是什么?我有一个函数可以一次打印一个令牌字符串。
我这里的问题不是关于如何标记或解析,所以请忽略它的实现。我的问题是,我有一堆字符串在循环内分配和释放多次。那么我如何将每个分配存储在其他地方以供进一步使用?
I have a file which I am tokenizing all the strings in it.
So each token gets stored in char *token = (char *) malloc(len + 1);
The token gets released before the new one is allocated so I need a way to store the tokens for further use.
What's a good strategy to store the tokens? I have a function which prints out one single token string at a time.
My question here is not about how to tokenize or parse so please disregard the implementation of that. My question is, I have a bunch of strings that get allocated and released several times within a loop. So how would I store each allocation somewhere else for further use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常令牌不存储到文件中。当解析器准备好读取更多输入时,解析器会请求它们。
因此,令牌存储在程序堆的内存中,并且在处理它们之后(可能在文件完全解析之前很久)它们被释放。
--- 更新以遵循编辑 ---
如果您担心过度分配和释放,那么您有多种解决方案,具体取决于您尝试解决的问题的细节。
对于字符串,您可以通过“字符串生成器”界面创建它们,该界面会检查该文本中是否已存在字符串,如果存在,则返回对已存在字符串的引用。请注意,为了使其正常工作,所有返回的字符串必须是不可变的(因为更改一个引用中的字符串将更改所有引用中的字符串)。对于数字、布尔值等,也可以使用类似的解决方案。
为了重用令牌,您可以将令牌制作成一个结构,该结构主要通过指针引用可能被解析器“使用”的数据。这样,解析器就会获取令牌的“字段”,并且可以将“骨架”令牌添加回“重用队列”。重用队列应在将令牌返回到令牌生成器之前重置令牌的“数据”引用,令牌生成器将被重写以向队列询问其数据结构。如果“队列中”没有令牌,则队列应该默默地分配它们。
还存在其他解决方案,具体取决于您想要的巧妙程度。
Generally tokens are not stored to file. They are requested by the parser when the parser is ready to read more input.
As such, tokens are stored in memory on the program's heap, and after they have been processed (which might be long before the file is fully parsed) they are freed.
--- Update to follow the edit ---
If you are worried about excessive allocation and deallocation, then you have a number of solutions, depending on the detail of the issue you are attempting to solve.
For strings, you can create them through a "string builder" interface, which checks to see if a string is already present with that text, and if so, returns a reference to the already present string. Note that for this to work properly, all returned strings must be immutable (as changing a string in one reference will change the string in all references). Similar solutions are possible for numbers, boolean values, etc.
For token reuse, you can make the token into a structure that mostly references by pointer the data likely to "used" by the parser. That way the parser grabs the "fields" of the token, and the "skeleton" token can be added back to a "reuse queue". The reuse queue should reset the "data" references of the token before returning it to the tokenizer, which would be rewritten to ask the queue for it's data structures. In the event that there are no tokens "in the queue" the queue should silently allocate them.
Other solutions exist too, depending on how crafty you want to get.