字符串分词器

发布于 2024-08-17 13:20:41 字数 1036 浏览 6 评论 0原文

有人可以通过在代码中添加一些注释来帮助我理解这个字符串标记器是如何工作的吗？我非常感谢任何帮助，谢谢！

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        } else {
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                stringStack.addElement(buffer.toString());
            }
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

原文

Can anybody help me understand how this string tokenizer works by adding some comments into the code? I would very much appreciate any help thanks!

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        } else {
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                stringStack.addElement(buffer.toString());
            }
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

佼人 2024-08-24 13:20:42

这不是世界上最好的书面方法！但下面评论。总的来说，它的作用是将字符串拆分为“单词”，并使用字符 delim 来分隔它们。如果 ignoreEmpty 为 true，则不计算空字（即两个连续的分隔符充当一个）。

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Buffer to construct words
    StringBuffer buffer = new StringBuffer();
    // Stack to store complete words
    Stack stringStack = new Stack();

    // Go through input string one character at a time
    for (int i = 0; i < toSplit.length(); i++) {
        // If next character is not the delimiter,
        // add it to the buffer
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        // Else it is the delimiter, so process the
        // complete word
        } else {
            // If the word is empty (0 characters) we
            // have the choice of ignoring it
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            // Otherwise, we push it onto the stack
            } else {
                stringStack.addElement(buffer.toString());
            }
            // Clear the buffer ready for the next word
            buffer = new StringBuffer();
        }
    }

    // If there are remaining characters in the buffer,
    // then a word rather than the delimiter ends the
    // string, so we push that onto the stack as well
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    // We set up a new array to store the contents of
    // the stack
    String[] split = new String[stringStack.size()];

    // Then we pop each element from the stack into an
    // indexed position in the array, starting at the
    // end as the last word was last on the stack
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

    // Then return the array
//        System.out.println("There are " + split.length + " Words");
    return split;
}

您可以使用 string.split 方法编写一个更高效的方法，将分隔符转换为合适的正则表达式（如果 ignoreEmpty 则以 + 结尾）是真的）。

Not the best written method in the world! But comments below. Overall, what it does is to split a string into "words", using the character delim to delimit them. If ignoreEmpty is true, then empty words are not counted (i.e. two consecutive delimiters act as one).

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Buffer to construct words
    StringBuffer buffer = new StringBuffer();
    // Stack to store complete words
    Stack stringStack = new Stack();

    // Go through input string one character at a time
    for (int i = 0; i < toSplit.length(); i++) {
        // If next character is not the delimiter,
        // add it to the buffer
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        // Else it is the delimiter, so process the
        // complete word
        } else {
            // If the word is empty (0 characters) we
            // have the choice of ignoring it
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            // Otherwise, we push it onto the stack
            } else {
                stringStack.addElement(buffer.toString());
            }
            // Clear the buffer ready for the next word
            buffer = new StringBuffer();
        }
    }

    // If there are remaining characters in the buffer,
    // then a word rather than the delimiter ends the
    // string, so we push that onto the stack as well
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    // We set up a new array to store the contents of
    // the stack
    String[] split = new String[stringStack.size()];

    // Then we pop each element from the stack into an
    // indexed position in the array, starting at the
    // end as the last word was last on the stack
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

    // Then return the array
//        System.out.println("There are " + split.length + " Words");
    return split;
}

You could write a far more efficient one using the string.split method, translating the delimiter into a suitable regular expression (ending with + if ignoreEmpty is true).

回复收藏 0 原文

路还长，别太狂 2024-08-24 13:20:42

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Holds each character efficiently while parsing the string
    // in a temporary buffer
    StringBuffer buffer = new StringBuffer();
    // Collection for holding the intermediate result
    Stack stringStack = new Stack();

    // for each character in the string to split
    for (int i = 0; i < toSplit.length(); i++) 
    {
        // if the character is NOT the delimeter
        if (toSplit.charAt(i) != delim) 
        {
            // add this character to the temporary buffer
            buffer.append((char) toSplit.charAt(i));
        } else { // we are at a delimeter!
            // if the buffer is empty and we are ignoring empty
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
              // do nothing
            } else { // if the buffer is not empty or if ignoreEmpty is not true
                // add the buffer to the intermediate result collection and
                stringStack.addElement(buffer.toString());
            }
            // reset the buffer 
            buffer = new StringBuffer();
        }

    }
    // we might have extra characters left in the buffer from the last loop
    // if so, add it to the intermediate result
    // IMHO, this might contain a bug
    // what happens when the buffer contains a space at the end and 
    // ignoreEmpty is true?  Seems like it would still be added
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }
    // we are going to convert the intermediate result to an array
    // we create a result array the size of the stack
    String[] split = new String[stringStack.size()];
    // and each item in the stack to the return array
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // release our temp vars
    // (to let the GC collect at the earliest possible moment)
    stringStack = null;
    buffer = null;

    // and return it
    return split;
}

这是直接来自 String.Split 还是其他东西？因为在我看来，代码中存在错误（即使 IgnoreEmpty 为 true，如果最后留下，也会添加空结果）？

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Holds each character efficiently while parsing the string
    // in a temporary buffer
    StringBuffer buffer = new StringBuffer();
    // Collection for holding the intermediate result
    Stack stringStack = new Stack();

    // for each character in the string to split
    for (int i = 0; i < toSplit.length(); i++) 
    {
        // if the character is NOT the delimeter
        if (toSplit.charAt(i) != delim) 
        {
            // add this character to the temporary buffer
            buffer.append((char) toSplit.charAt(i));
        } else { // we are at a delimeter!
            // if the buffer is empty and we are ignoring empty
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
              // do nothing
            } else { // if the buffer is not empty or if ignoreEmpty is not true
                // add the buffer to the intermediate result collection and
                stringStack.addElement(buffer.toString());
            }
            // reset the buffer 
            buffer = new StringBuffer();
        }

    }
    // we might have extra characters left in the buffer from the last loop
    // if so, add it to the intermediate result
    // IMHO, this might contain a bug
    // what happens when the buffer contains a space at the end and 
    // ignoreEmpty is true?  Seems like it would still be added
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }
    // we are going to convert the intermediate result to an array
    // we create a result array the size of the stack
    String[] split = new String[stringStack.size()];
    // and each item in the stack to the return array
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // release our temp vars
    // (to let the GC collect at the earliest possible moment)
    stringStack = null;
    buffer = null;

    // and return it
    return split;
}

Is this directly from String.Split or is it something else? Because it seems to me there's a bug in the code (empty result added if left over at the end even when IgnoreEmpty is true)?

回复收藏 0 原文

找回味觉 2024-08-24 13:20:42

此代码循环遍历一个字符串，通过查找分隔符将其拆分为单词，并返回包含所有找到的单词的字符串数组。

在 C# 中，您可以编写与以下相同的代码：

toSplit.Split(
    new char[]{ delim }, !ignoreEmpty ? 
        StringSplitOptions.None:
        StringSplitOptions.RemoveEmptyEntries);

This code loops through a string, splits it in words by looking for a delimiter and returns a string array with all found words.

In C# you could to write same code as:

toSplit.Split(
    new char[]{ delim }, !ignoreEmpty ? 
        StringSplitOptions.None:
        StringSplitOptions.RemoveEmptyEntries);

回复收藏 0 原文

黄昏下泛黄的笔记 2024-08-24 13:20:42

public String[] split(String toSplit, char delim, boolean ignoreEmpty) { 

    StringBuffer buffer = new StringBuffer(); //Make a StringBuffer
    Stack stringStack = new Stack();          //Make a set of elements, a stack

    for (int i = 0; i < toSplit.length(); i++) { //For how many characters are in the string, run this loop
        if (toSplit.charAt(i) != delim) { //If the current character (while in the loop, is NOT equal to the specified delimiter (passed into the function), add it to a buffer
            buffer.append((char) toSplit.charAt(i));
        } else { //Otherwise...
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) { //If it's whitespace do nothing (only if ignoreempty is true
            } else { //otherwise...
                stringStack.addElement(buffer.toString()); //Add the previously found characters to the output stack
            }
            buffer = new StringBuffer(); //Make another buffer.
        }
    }

    if (buffer.length() !=0) { //If nothing was added
        stringStack.addElement(buffer.toString()); //Add the whole String
    }

    String[] split = new String[stringStack.size()]; //Split
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

public String[] split(String toSplit, char delim, boolean ignoreEmpty) { 

    StringBuffer buffer = new StringBuffer(); //Make a StringBuffer
    Stack stringStack = new Stack();          //Make a set of elements, a stack

    for (int i = 0; i < toSplit.length(); i++) { //For how many characters are in the string, run this loop
        if (toSplit.charAt(i) != delim) { //If the current character (while in the loop, is NOT equal to the specified delimiter (passed into the function), add it to a buffer
            buffer.append((char) toSplit.charAt(i));
        } else { //Otherwise...
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) { //If it's whitespace do nothing (only if ignoreempty is true
            } else { //otherwise...
                stringStack.addElement(buffer.toString()); //Add the previously found characters to the output stack
            }
            buffer = new StringBuffer(); //Make another buffer.
        }
    }

    if (buffer.length() !=0) { //If nothing was added
        stringStack.addElement(buffer.toString()); //Add the whole String
    }

    String[] split = new String[stringStack.size()]; //Split
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

回复收藏 0 原文

情泪▽动烟 2024-08-24 13:20:42

好的，在继续回答之前，我应该指出这段代码存在多个问题。这里是：

/**
*
*/
public String[] split(   
    String toSplit       //string to split in tokens, delimited by delim
,   char delim           //character that delimits tokens
,   boolean ignoreEmpty  //if true, tokens consisting of only whitespace are ignored
) {

StringBuffer buffer = new StringBuffer();
Stack stringStack = new Stack();

for (int i = 0; i < toSplit.length(); i++) {     //examine each character
    if (toSplit.charAt(i) != delim) {            //no delimiter: this char is part of a token, so add it to the current (partial) token.
        buffer.append((char) toSplit.charAt(i)); 
    } else {
        if (buffer.toString().trim().length() == 0 && ignoreEmpty) {   //'token' consists only of whitespace, and ignoreEmpty was set: do nothing
        } else {
            stringStack.addElement(buffer.toString());  //found a token, so save it.
        }
        buffer = new StringBuffer();                    //reset the buffer so we can store the next token.
    }
}

if (buffer.length() !=0) {                              //save the last (partial) token (if it contains at least one character)
    stringStack.addElement(buffer.toString());
}

String[] split = new String[stringStack.size()];        //copy the stack of tokens to an array
for (int i = 0; i < split.length; i++) {
    split[split.length - 1 - i] = (String) stringStack.pop();
}

stringStack = null;                                     //uhm?...
buffer = null;

//        System.out.println("There are " + split.length + " Words");
return split;                                           //return the array of tokens.

}

问题：

有一个非常好的内置字符串标记生成器，java.util.StringTokenizer
该代码为每个标记分配一个新的StringBuffer！它应该简单地重置 StringBuffer 的长度
循环内嵌套的 if 可以写得更高效，至少更具可读性
令牌被复制到数组中以供返回。任何调用者都应该满足于传递一些可以迭代的结构。如果需要数组，可以将其复制到该函数之外。这可能会节省大量内存和 CPU 资源

也许所有问题都应该通过简单地使用内置 java.util.StringTokenizerr 来解决

Ok, before proceeding to the answer, I should point out that there are multiple issues with this code. Here goes:

/**
*
*/
public String[] split(   
    String toSplit       //string to split in tokens, delimited by delim
,   char delim           //character that delimits tokens
,   boolean ignoreEmpty  //if true, tokens consisting of only whitespace are ignored
) {

StringBuffer buffer = new StringBuffer();
Stack stringStack = new Stack();

for (int i = 0; i < toSplit.length(); i++) {     //examine each character
    if (toSplit.charAt(i) != delim) {            //no delimiter: this char is part of a token, so add it to the current (partial) token.
        buffer.append((char) toSplit.charAt(i)); 
    } else {
        if (buffer.toString().trim().length() == 0 && ignoreEmpty) {   //'token' consists only of whitespace, and ignoreEmpty was set: do nothing
        } else {
            stringStack.addElement(buffer.toString());  //found a token, so save it.
        }
        buffer = new StringBuffer();                    //reset the buffer so we can store the next token.
    }
}

if (buffer.length() !=0) {                              //save the last (partial) token (if it contains at least one character)
    stringStack.addElement(buffer.toString());
}

String[] split = new String[stringStack.size()];        //copy the stack of tokens to an array
for (int i = 0; i < split.length; i++) {
    split[split.length - 1 - i] = (String) stringStack.pop();
}

stringStack = null;                                     //uhm?...
buffer = null;

//        System.out.println("There are " + split.length + " Words");
return split;                                           //return the array of tokens.

}

Issues:

There is a perfectly good buuilt-in string tokenizer, java.util.StringTokenizer
The code allocates a new StringBuffer for each token! it should simply reset the length of the StringBuffer
The nested ifs inside the loop can be written more efficiently, more readable at least
The tokens are copied to an array for return. Any callers should simply be satisfied with being passed some structure that can be iterated. If you need an array, you can copy it outside of this function. This may save considereable memory and cpu resources

Probably all issues should be resolved by simply using the built-in java.util.StringTokenizerr

回复收藏 0 原文

戏蝶舞 2024-08-24 13:20:42

这段代码根据给定的分隔符将字符串拆分为子字符串。例如，字符串：

String str = "foo,bar,foobar";
String[] strArray = split(str, ',' true);

将作为字符串数组返回：

strArray ==> [ "foo", "bar", "foobar" ];


public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    // Loop through each char in the string (so 'f', then 'o', then 'o' etc).
    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            // If the char at the current position in the string does not equal 
            // the delimiter, add this char to the string buffer (so we're 
            // building up another string that consists of letters between two 
            // of the 'delim' characters).
            buffer.append((char) toSplit.charAt(i));
        } else {
            // If the string is just whitespace or has length 0 and we are 
            // removing empty strings, do not include this substring
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                // It's not empty, add this substring to a stack of substrings.
                stringStack.addElement(buffer.toString());
            }
            // Reset the buffer for the next substring.
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        // Make sure to add the last buffer/substring to the stack!
        stringStack.addElement(buffer.toString());
    }

    // Make an array of string the size of the stack (the number of substrings found)
    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        // Pop off each substring we found and add it into the array we are returning.
        // Fill up the array backwards, as we are taking values off a stack.
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // Unnecessary, but clears the variables
    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

This piece of code splits a string into substrings based on a given delimiter. For example, the string:

String str = "foo,bar,foobar";
String[] strArray = split(str, ',' true);

would get returned as this array of strings:

strArray ==> [ "foo", "bar", "foobar" ];


public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    // Loop through each char in the string (so 'f', then 'o', then 'o' etc).
    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            // If the char at the current position in the string does not equal 
            // the delimiter, add this char to the string buffer (so we're 
            // building up another string that consists of letters between two 
            // of the 'delim' characters).
            buffer.append((char) toSplit.charAt(i));
        } else {
            // If the string is just whitespace or has length 0 and we are 
            // removing empty strings, do not include this substring
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                // It's not empty, add this substring to a stack of substrings.
                stringStack.addElement(buffer.toString());
            }
            // Reset the buffer for the next substring.
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        // Make sure to add the last buffer/substring to the stack!
        stringStack.addElement(buffer.toString());
    }

    // Make an array of string the size of the stack (the number of substrings found)
    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        // Pop off each substring we found and add it into the array we are returning.
        // Fill up the array backwards, as we are taking values off a stack.
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // Unnecessary, but clears the variables
    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

回复收藏 0 原文

~没有更多了~