使用 jQuery 将复杂的标点字符串拆分为大于 2 个字符的常规单词

发布于 2024-12-09 05:22:41 字数 519 浏览 0 评论 0原文

正如标题所示:

我试图将句子拆分为逗号分隔的字符串或数组,其中包含长度大于 2 个字符且唯一的已清理单词(已删除重复项)。

示例字符串可能是:

$sString = 'Stackoverflow's users are awesome!!! Stackoverflow, is the "best" technical questions and answers website on the interwebnet!';

完成的文章:

$sStringAfterProcessing = 'stackoverflow, users, are, awesome, the, best, technical, questions, and, answers, website, interwebnet';

请注意,第一个堆栈流已删除 ,标点符号和重复项均已删除。

这看起来可能会变得非常复杂。

欢迎提出建议,非常感谢所有帮助。

As the title suggests:

I'm trying to split sentences into either a comma-separated string or array consisting of sanitized words greater than 2 characters in length and unique (duplicates removed).

An example string might be:

$sString = 'Stackoverflow's users are awesome!!! Stackoverflow, is the "best" technical questions and answers website on the interwebnet!';

Finished article:

$sStringAfterProcessing = 'stackoverflow, users, are, awesome, the, best, technical, questions, and, answers, website, interwebnet';

Note the first stackflow has the 's removed, punctuation and duplicates are removed.

This seems like it could get very complicated.

Suggestions welcome and all help is much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

从此见与不见 2024-12-16 05:22:41

这里...

str = str.replace(/[^\w\s]/ig, "").replace(/\s/g, ", ");

将会产生:

Stackoverflows 用户是很棒的 Stackoverflow,是最好的技术问题和答案网站,位于互联网上

示例: http://jsfiddle.net/ktFj2/1/

或者,采用数组格式:

var arr = str.replace(/[^\w\s]/ig, "").split(" ");

示例: http://jsfiddle.net/nnKV8/

更新: 从数组中删除重复项 (和长度 < 2 的项目),如下所示:

var cleaned = [];
for(var i = 0; i < arr.length; i++) {
    var el = arr[i];

    if (el.length > 2 && $.inArray(el, cleaned) < 0) {
        cleaned.push(el);
    }
}

Here goes...

str = str.replace(/[^\w\s]/ig, "").replace(/\s/g, ", ");

will yield:

Stackoverflows, users, are, awesome, Stackoverflow, is, the, best, technical, questions, and, answers, website, on, the, interwebnet

Example: http://jsfiddle.net/ktFj2/1/

Or, in array format:

var arr = str.replace(/[^\w\s]/ig, "").split(" ");

Example: http://jsfiddle.net/nnKV8/

Update: To remove duplicates from the array (and items with length < 2), something like this:

var cleaned = [];
for(var i = 0; i < arr.length; i++) {
    var el = arr[i];

    if (el.length > 2 && $.inArray(el, cleaned) < 0) {
        cleaned.push(el);
    }
}
偏闹i 2024-12-16 05:22:41

这是一个基本方法(已编辑):

    var s = 'Stackoverflow\'s users are awesome!!! Stackoverflow, is the "best" technical questions and answers website on the interwebnet!',
    a = s.split(/[^\w]/),
    h = {},
    l = a.length,
    i = 0,
    f = [];
    for(; i < l; i++){
        if(!h[a[i]] && a[i].length > 2){
            h[a[i]] = true;
            f.push(a[i]);
        }
    }
    console.log(f);

Here is a basic way(edited):

    var s = 'Stackoverflow\'s users are awesome!!! Stackoverflow, is the "best" technical questions and answers website on the interwebnet!',
    a = s.split(/[^\w]/),
    h = {},
    l = a.length,
    i = 0,
    f = [];
    for(; i < l; i++){
        if(!h[a[i]] && a[i].length > 2){
            h[a[i]] = true;
            f.push(a[i]);
        }
    }
    console.log(f);
隔纱相望 2024-12-16 05:22:41
var newStrings = str.split(/[\\ \\.\\,]/);

在正则表达式中放置任何其他标点符号,或使用 \W 表示非字母数字字符。

这将在 str 中产生一个实际单词数组。

然后,迭代 newStrings 并仅打印长度 >= 2 的元素!

var newStrings = str.split(/[\\ \\.\\,]/);

Put any other punctuation in the regex, or use \W for non-alphanumeric characters.

This will yield an array of actual words in str.

Then, iterate through newStrings and print only the elements whose length >= 2!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文