有没有办法清理文本文件（去掉相似的单词）？（不使用嵌套for循环）

发布于 2025-01-02 13:49:46 字数 412 浏览 0 评论 0原文

我正在尝试考虑清理文件中文本的最佳方法。所以我想做的是，给定一个输入文件，匹配相似的单词并替换它们。因此，如果文件中有 apple 和 ApPle，则 ApPle 将被 apple 替换。

有没有办法在不使用两个 for 循环的情况下做到这一点，如下所示：

for $word in @file
  for $word2 in @file
    if $word matches $word2
      replace $word2 with $word
    end
  end
end

我总是犹豫是否使用嵌套 for 循环，所以我只是想知道是否有更优雅的解决方案。另外，如果您想知道为什么它是伪代码，那是因为我还没有决定用什么来编程它。（对于那些不知道 @file 是单词列表而 $word 是非空白字符串的人）。

原文

I'm trying to think of the optimal way to clean up text in a file. So what I want to do is, given an input file, match words that are similar and replace them. So if apple and ApPle are in the file, ApPle would be replaced by apple.

Is there any way to do this without using two for loops like so:

for $word in @file
  for $word2 in @file
    if $word matches $word2
      replace $word2 with $word
    end
  end
end

I'm always hesitant to use nested for loops so I'm just wondering if there's a more elegant solution. Also, if you're wondering why it's pseudocode, it's because I haven't decided what to program this in yet. (For those who don't know @file is a list of words and $word is a non-whitespace string of characters).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不醒的梦 2025-01-09 13:49:46

也许这会起作用：

为相似的单词定义一个唯一的表示（“哈希函数”）。（如果只是大小写不同，那很容易。如果发音相似，那就更困难了。）
一次性读取文件，维护一个“哈希表”，仅当单词尚未出现在哈希表中时才打印该单词.

。

for $word in @file
  hash=hashfunction($word)
  if $hash not in §hashtable
    add $hash to §hashtable
    print $hash
  end
end

如果你的哈希函数不是单射的，事情会变得稍微复杂一些。

Perhaps this will work:

Define a unique representation (a "hash function") for similar words. (If it's only difference in case, that's easy. If it's similar pronounciation, that's more difficult.)
Read the file in one pass, maintain a "hash table" and print the word only if it's not yet in the hash table.

for $word in @file
  hash=hashfunction($word)
  if $hash not in §hashtable
    add $hash to §hashtable
    print $hash
  end
end

If your hash function is not injective, things get slightly more complicated.

回复收藏 0 原文

德意的啸 2025-01-09 13:49:46

这实际上取决于“相似”对您意味着什么，以及何时应该替换单词。代码应该确定这一点吗？您是否想将所有大写字母转换为小写字母，或者代码应该使用不同的标准来执行此操作？

在 PHP 中，您可以使用以下函数（的组合）：
http://www.php.net/manual/en/function.str -ireplace.php（不区分大小写的替换）
http://www.php.net/manual/en/function.strtolower.php （将字符串转换为小写）
http://www.php.net/manual/en/function.strtoupper.php （将字符串转换为大写）
http://php.net/manual/en/function.similar-text.php （看看字符串 A 与字符串 B 有多相似）

如果您可以发布有关您的预期用例的更多详细信息，您可能会得到更好的答案:)

回复收藏 0 原文

~没有更多了~

关于作者

墨落画卷

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

有没有办法清理文本文件（去掉相似的单词）？（不使用嵌套for循环）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

有没有办法清理文本文件（去掉相似的单词）？ （不使用嵌套for循环）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

有没有办法清理文本文件（去掉相似的单词）？（不使用嵌套for循环）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。