清理带有标题的 URL 的最佳方法是什么

发布于 2024-08-03 07:46:19 字数 1766 浏览 7 评论 0原文

清理 URL 的最佳方法是什么?我正在寻找这样的 URL

What_is_the_best_headache_medication

我当前的代码

public string CleanURL(string str)
{
    str = str.Replace("!", "");
    str = str.Replace("@", "");
    str = str.Replace("#", "");
    str = str.Replace("$", "");
    str = str.Replace("%", "");
    str = str.Replace("^", "");
    str = str.Replace("&", "");
    str = str.Replace("*", "");
    str = str.Replace("(", "");
    str = str.Replace(")", "");
    str = str.Replace("-", "");
    str = str.Replace("_", "");
    str = str.Replace("+", "");
    str = str.Replace("=", "");
    str = str.Replace("{", "");
    str = str.Replace("[", "");
    str = str.Replace("]", "");
    str = str.Replace("}", "");
    str = str.Replace("|", "");
    str = str.Replace(@"\", "");
    str = str.Replace(":", "");
    str = str.Replace(";", "");
    str = str.Replace(@"\", "");
    str = str.Replace("'", "");
    str = str.Replace("<", "");
    str = str.Replace(">", "");
    str = str.Replace(",", "");
    str = str.Replace(".", "");
    str = str.Replace("`", "");
    str = str.Replace("~", "");
    str = str.Replace("/", "");
    str = str.Replace("?", "");
    str = str.Replace("  ", " ");
    str = str.Replace("   ", " ");
    str = str.Replace("    ", " ");
    str = str.Replace("     ", " ");
    str = str.Replace("      ", " ");
    str = str.Replace("       ", " ");
    str = str.Replace("        ", " ");
    str = str.Replace("         ", " ");
    str = str.Replace("          ", " ");
    str = str.Replace("           ", " ");
    str = str.Replace("            ", " ");
    str = str.Replace("             ", " ");
    str = str.Replace("              ", " ");
    str = str.Replace(" ", "_");
    return str;
}

What is the best way to clean a URL? I am looking for a URL like this

what_is_the_best_headache_medication

My current code

public string CleanURL(string str)
{
    str = str.Replace("!", "");
    str = str.Replace("@", "");
    str = str.Replace("#", "");
    str = str.Replace("$", "");
    str = str.Replace("%", "");
    str = str.Replace("^", "");
    str = str.Replace("&", "");
    str = str.Replace("*", "");
    str = str.Replace("(", "");
    str = str.Replace(")", "");
    str = str.Replace("-", "");
    str = str.Replace("_", "");
    str = str.Replace("+", "");
    str = str.Replace("=", "");
    str = str.Replace("{", "");
    str = str.Replace("[", "");
    str = str.Replace("]", "");
    str = str.Replace("}", "");
    str = str.Replace("|", "");
    str = str.Replace(@"\", "");
    str = str.Replace(":", "");
    str = str.Replace(";", "");
    str = str.Replace(@"\", "");
    str = str.Replace("'", "");
    str = str.Replace("<", "");
    str = str.Replace(">", "");
    str = str.Replace(",", "");
    str = str.Replace(".", "");
    str = str.Replace("`", "");
    str = str.Replace("~", "");
    str = str.Replace("/", "");
    str = str.Replace("?", "");
    str = str.Replace("  ", " ");
    str = str.Replace("   ", " ");
    str = str.Replace("    ", " ");
    str = str.Replace("     ", " ");
    str = str.Replace("      ", " ");
    str = str.Replace("       ", " ");
    str = str.Replace("        ", " ");
    str = str.Replace("         ", " ");
    str = str.Replace("          ", " ");
    str = str.Replace("           ", " ");
    str = str.Replace("            ", " ");
    str = str.Replace("             ", " ");
    str = str.Replace("              ", " ");
    str = str.Replace(" ", "_");
    return str;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

幽蝶幻影 2024-08-10 07:46:19

正则表达式肯定是:(

public string CleanURL(string str)
{
    str = Regex.Replace(str, "[^a-zA-Z0-9 ]", "");
    str = Regex.Replace(str, " +", "_");
    return str;
}

没有实际测试过,超出我的想象。)

让我解释一下:

第一行删除所有不是字母数字字符(大写或小写)或空格的内容。
第二行用单个下划线替换任何空格序列(1 个或多个,按顺序)。

Regular expressions for sure:

public string CleanURL(string str)
{
    str = Regex.Replace(str, "[^a-zA-Z0-9 ]", "");
    str = Regex.Replace(str, " +", "_");
    return str;
}

(Not actually tested, off the top of my head.)

Let me explain:

The first line removes everything that's not an alphanumeric character (upper or lowercase) or a space .
The second line replaces any sequence of spaces (1 or more, sequentially) with a single underscore.

执妄 2024-08-10 07:46:19

一般来说,您最好的选择是使用白名单正则表达式方法,而不是删除所有不需要的字符,因为您肯定会错过一些字符。

到目前为止,这里的答案都很好,但我个人不想完全删除变音符号和带有重音符号的字符。所以我想出的最终解决方案如下所示:

public static string CleanUrl(string value)
{
    if (value.IsNullOrEmpty())
        return value;

    // replace hyphens to spaces, remove all leading and trailing whitespace
    value = value.Replace("-", " ").Trim().ToLower();

    // replace multiple whitespace to one hyphen
    value = Regex.Replace(value, @"[\s]+", "-");

    // replace umlauts and eszett with their equivalent
    value = value.Replace("ß", "ss");
    value = value.Replace("ä", "ae");
    value = value.Replace("ö", "oe");
    value = value.Replace("ü", "ue");

    // removes diacritic marks (often called accent marks) from characters
    value = RemoveDiacritics(value);

    // remove all left unwanted chars (white list)
    value = Regex.Replace(value, @"[^a-z0-9\s-]", String.Empty);

    return value;
}

使用的 RemoveDiacritics 方法基于 布莱尔·康拉德(Blair Conrad)的回答

public static string RemoveDiacritics(string value)
{
    if (value.IsNullOrEmpty())
        return value;

    string normalized = value.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    foreach (char c in normalized)
    {
        if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            sb.Append(c);
    }

    Encoding nonunicode = Encoding.GetEncoding(850);
    Encoding unicode = Encoding.Unicode;

    byte[] nonunicodeBytes = Encoding.Convert(unicode, nonunicode, unicode.GetBytes(sb.ToString()));
    char[] nonunicodeChars = new char[nonunicode.GetCharCount(nonunicodeBytes, 0, nonunicodeBytes.Length)];
    nonunicode.GetChars(nonunicodeBytes, 0, nonunicodeBytes.Length, nonunicodeChars, 0);

    return new string(nonunicodeChars);
}

希望能帮助那些通过slugifying URL并保留变音符号和朋友与他们的URL友好等效项来挑战的人同时。

Generally your best bet is to go with a white list regular expression approach instead of removing all the unwanted characters because you definitely are going to miss some.

The answers here are fine so far but I personally did not want to remove umlauts and characters with accent marks entirely. So the final solution I came up with looks like this:

public static string CleanUrl(string value)
{
    if (value.IsNullOrEmpty())
        return value;

    // replace hyphens to spaces, remove all leading and trailing whitespace
    value = value.Replace("-", " ").Trim().ToLower();

    // replace multiple whitespace to one hyphen
    value = Regex.Replace(value, @"[\s]+", "-");

    // replace umlauts and eszett with their equivalent
    value = value.Replace("ß", "ss");
    value = value.Replace("ä", "ae");
    value = value.Replace("ö", "oe");
    value = value.Replace("ü", "ue");

    // removes diacritic marks (often called accent marks) from characters
    value = RemoveDiacritics(value);

    // remove all left unwanted chars (white list)
    value = Regex.Replace(value, @"[^a-z0-9\s-]", String.Empty);

    return value;
}

The used RemoveDiacritics method is based on the SO answer by Blair Conrad:

public static string RemoveDiacritics(string value)
{
    if (value.IsNullOrEmpty())
        return value;

    string normalized = value.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    foreach (char c in normalized)
    {
        if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            sb.Append(c);
    }

    Encoding nonunicode = Encoding.GetEncoding(850);
    Encoding unicode = Encoding.Unicode;

    byte[] nonunicodeBytes = Encoding.Convert(unicode, nonunicode, unicode.GetBytes(sb.ToString()));
    char[] nonunicodeChars = new char[nonunicode.GetCharCount(nonunicodeBytes, 0, nonunicodeBytes.Length)];
    nonunicode.GetChars(nonunicodeBytes, 0, nonunicodeBytes.Length, nonunicodeChars, 0);

    return new string(nonunicodeChars);
}

Hope that helps somebody challenged by slugifying URLs and keeping umlauts and friends with their URL friendly equivalent at the same time.

昔日梦未散 2024-08-10 07:46:19

您应该考虑使用正则表达式。它比您上面尝试做的事情要高效得多。

有关正则表达式的更多信息请此处

You should consider using a regular expression instead. It's much more efficient than what you're trying to do above.

More on Regular Expressions here.

茶底世界 2024-08-10 07:46:19
  1. 你如何定义“友好”URL - 我假设你的意思是删除 _ 等。
  2. 我会在这里研究正则表达式。

如果您想坚持使用上面的方法,我建议通过字符串转向 StringBuilder。这是因为每个替换操作都会创建一个新字符串。

  1. How do you define "friendly" URL - I'm assuming you mean to remove _'s etc.
  2. I'd look into a regular expression here.

If you want to persist with the method above, I would suggest moving to StringBuilder over a string. This is because each of your replace operations is creating a new string.

背叛残局 2024-08-10 07:46:19

我可以收紧其中的一部分:

while (str.IndexOf("  ") > 0)
    str = str.Replace("  ", " ");

...而不是无限数量的 " " 替换。但您几乎肯定需要一个正则表达式。

I can tighten up one piece of that:

while (str.IndexOf("  ") > 0)
    str = str.Replace("  ", " ");

...instead of your infinite number of " " replacements. But you almost certainly want a regular expression instead.

眼波传意 2024-08-10 07:46:19

或者,更详细一点,但这只允许字母数字和空格(用“-”替换)

string Cleaned = String.Empty;
foreach (char c in Dirty)
    if (((c >= 'a') && (c <= 'z')) ||
         (c >= 'A') && (c <= 'Z') ||
         (c >= '0') && (c <= '9') ||
         (c == ' '))
           Cleaned += c;
Cleaned = Cleaned.Replace(" ", "-");

Or, a bit more verbose, but this only allows alphanumeric and spaces (which are replaced by '-')

string Cleaned = String.Empty;
foreach (char c in Dirty)
    if (((c >= 'a') && (c <= 'z')) ||
         (c >= 'A') && (c <= 'Z') ||
         (c >= '0') && (c <= '9') ||
         (c == ' '))
           Cleaned += c;
Cleaned = Cleaned.Replace(" ", "-");
瑕疵 2024-08-10 07:46:19

stackoverflow 的做法可以在这里找到:

https://stackoverflow.com/a/25486/142014

优化速度(“这是第二个版本,展开后性能提高 5 倍”)并处理许多特殊字符。

The way stackoverflow is doing it can be found here:

https://stackoverflow.com/a/25486/142014

optimized for speed ("This is the second version, unrolled for 5x more performance") and taking care of a lot of special characters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文