从较长的字符串创建人类可读的短字符串
我需要收缩一个字符串,例如......
你会考虑成为一个机器人吗?您将获得每年一次免费换油的机会。”
...更短但仍然人类可识别(需要从选择列表中找到 -我当前的解决方案让用户输入任意标题,其唯一目的是选择)
我想仅提取形成问题的字符串部分(如果可能),然后以某种方式将其减少为类似的内容
会考虑成为机器人
有没有任何语法算法可以帮助我解决这个问题?我认为可能有一些东西可以让 be 只挑选出动词和名词。
由于这只是充当钥匙,因此不必是完美的;我并不是想淡化英语固有的复杂性。
I have a requirement to contract a string such as...
Would you consider becoming a robot? You would be provided with a free annual oil change."
...to something much shorter but yet still humanly identifiable (it will need to be found from a select list - my current solution has users entering an arbitrary title for the sole purpose of selection)
I would like to extract only the portion of the string which forms a question (if possible) and then somehow reduce it to something like
WouldConsiderBecomingRobot
Are there any grammatical algorithms out there that might help me with this? I'm thinking there might be something that allows be to pick out just verbs and nouns.
As this is just to act as a key it doesn't have to be perfect; I'm not seeking to trivialise the inherant complexity of the english language.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
可能太简单了,但我可能会想从“填充词”列表开始:
然后提取问号之前的所有内容(使用正则表达式、字符串混合,无论你喜欢什么),产生“你会考虑成为一个机器人吗”。
然后遍历字符串,提取每个被认为是填充物的单词。
帕斯卡大小写每个单词将产生您想要的字符串 - 我将把它作为读者的练习。
Probably too simplistic, but I might be tempted to start with a list of "filler words":
Then extract everything before a questionmark (using regex, string mashing, whatever you fancy), yielding you "Would you consider becoming a robot".
Then go through the string extracting every word considered a filler.
Pascal casing each word would result in your desired string - i'll leave that as an excercise for the reader.
创建一个流行的社交媒体网站。当用户想要加入或发表评论时,提示他们解决验证码。验证码将包括将长字符串的缩短版本与其完整版本进行匹配。您的缩短算法将基于根据验证码结果进行训练的神经网络或遗传算法。
您还可以在网站上出售广告。
Create a popular social media website. When users want to join or post comments, prompt them to solve a captcha. The captcha will consist of matching your shortened versions of the long strings to their full versions. Your shortening algorithm will be based on a neural net or genetic algorithm which is trained from the capcha results.
You can also sell advertising on the website.
我最终创建了以下扩展方法,它的工作效果出奇的好。感谢 Joe Blow 出色而有效的建议:
这将以下内容缩减为 15 个字符:
I ended up creating the following extension method which does work surprisingly well. Thanks to Joe Blow for his excellent and effective suggestions:
This contracts the following to 15 chars:
我认为没有任何算法可以识别字符串中的每个单词是否是名词、形容词或其他什么。唯一的解决方案是使用自定义词典:只需创建一个无法识别为动词或名词的单词列表(我、你、他们、他们、他的、她的、的、a、the 等)。
然后你只需保留问号之前不在列表中的所有单词即可。
这只是一种解决方法,我也说过,它并不完美。
希望这有帮助!
I don't think there is any algorithm that can identify if each word of a string is a noun, adjective or whatever. The only solution would be to use a custom dictionary : just create a list of words that can't be identified as verbs or nouns (I, you, they, them, his, hers, of, a, the etc.).
Then you just have to keep all the words before the question mark that are not in the list.
It is just a workaround, and I you said, it is not perfect.
Hope this helps !
欢迎来到自然语言处理的奇妙世界。如果您想识别名词和动词,则需要一个词性标注器 。
Welcome to the wonderful world of natural language processing. If you want to identify nouns and verbs, you will need a part of speech tagger.