当前位置：文江博客话题详情

DXL 中的字符串拆分

发布于 2024-07-25 15:40:50 字数 189 浏览 12 评论 0原文

我有一个字符串

例如：“我们更喜欢可以回答的问题；而不仅仅是讨论”

现在我想将此字符串与“;”分开喜欢我们更喜欢可以回答的问题和不仅仅讨论了

这在 DXL 中是否可能。

我正在学习DXL，所以我不知道我们是否可以分开。

注意：这不是家庭作业。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

两人的回忆 2024-08-01 15:40:50

我很抱歉破坏了这篇文章。作为 DXL 新手，我花了一些时间来应对同样的挑战。我注意到可用的实现具有不同的“拆分”字符串规范。由于喜欢 Ruby 语言，我错过了一个至少接近 String#split 的 Ruby 版本。
也许我的发现对任何人都有帮助。

以下是变体 A： niol 实现的功能比较

（乍一看，它似乎与通常在 Capri Soft、
变体 B： PJT 的实现、
变体 C： Brett 的实现和
变体 D： 我的实现（在我看来，它提供了正确的功能）。

为了消除结构差异，所有实现都在函数中实现，返回 Skip 列表或 Array

分割结果

注意，所有实现都会返回不同的结果，具体取决于它们对“分割”的定义：

string mellow Yellow;

    splitVariantA returns 1 elements: ["mellow yellow" ]
    splitVariantB returns 2 elements: ["m" "llow yellow" ]
    splitVariantC returns 3 elements: ["w" "w y" "" ]
    splitVariantD returns 3 elements: ["m" "w y" "w" ]

string now's the time< /code>; delimiter

    splitVariantA returns 3 elements: ["now's" "the" "time" ]
    splitVariantB returns 2 elements: ["" "now's  the time" ]
    splitVariantC returns 5 elements: ["time" "the" "" "now's" "" ]
    splitVariantD returns 3 elements: ["now's" "the" "time" ]

string 1,2,,3,4,, delimiter ,

    splitVariantA returns 4 elements: ["1" "2" "3" "4" ]
    splitVariantB returns 2 elements: ["1" "2,,3,4,," ]
    splitVariantC returns 7 elements: ["" "" "4" "3" "" "2" "" ]
    splitVariantD returns 7 elements: ["1" "2" "" "3" "4" "" "" ]

时机

分割字符串1,2 ,,3,4,, 与模式 , 在我的机器上运行 10000 次给出以下计时：

    splitVariantA() : 406 ms
    splitVariantB() : 46 ms
    splitVariantC() : 749 ms
    splitVariantD() : 1077 ms

不幸的是，我的实现 D 是最慢的。令人惊讶的是，C 的正则表达式实现非常快。

源代码

// niol, modified
Array splitVariantA(string splitter, string str){
    Array tokens = create(1, 1);
    Buffer buf = create;
    int str_index;
    buf = "";

    for(str_index = 0; str_index < length(str); str_index++){
        if( str[str_index:str_index] == splitter ){
            array_push_str(tokens, stringOf(buf));
            buf = "";
        } 
        else
            buf += str[str_index:str_index];
    }
    array_push_str(tokens, stringOf(buf));
    delete buf;
    return tokens;
}

// PJT, modified
Skip splitVariantB(string s, string delimiter) {

    int offset  
    int len
    Skip skp = create

    if ( findPlainText(s, delimiter, offset, len, false)) {
        put(skp, 0, s[0 : offset -1])
        put(skp, 1, s[offset +1 :])
    }

    return skp  
}

// Brett, modified
Skip splitVariantC (string s, string delim) {

    Skip skp = create
    int i = 0
    Regexp split = regexp "^(.*)" delim "(.*)$"
    while (split s) {
        string temp_s = s[match 1]
        put(skp, i++, s[match 2])
        s = temp_s
    }
    put(skp, i++, s[match 2])
    return  skp
}

Skip splitVariantD(string str, string pattern) {

    if (null(pattern) || 0 == length(pattern))
        pattern = " ";

    if (pattern == " ")
        str = stringStrip(stringSqueeze(str, ' '));

    Skip result = create;
    int i = 0; // index for searching in str
    int j = 0; // index counter for result array
    bool found = true;

    while (found) {
        // find pattern     
        int pos = 0;
        int len = 0;
        found = findPlainText(str[i:], pattern, pos, len, true);

        if (found) {
            // insert into result
            put(result, j++, str[i:i+pos-1]);
            i += pos + len;
        }
    }
    // append the rest after last found pattern
    put(result, j, str[i:]);

    return result;
}

I'm sorry for necroing this post. Being new to DXL I spent some time with the same challenge. I noticed that the implementations available on the have different specifications of "splitting" a string. Loving the Ruby language, I missed an implementation which comes at least close to the Ruby version of String#split.
Maybe my findings will be helpful to anybody.

Here's a functional comparison of

Variant A: niol's implementation (which at a first glance, appears to be the same implementation which is usually found at Capri Soft,
Variant B: PJT's implementation,
Variant C: Brett's implementation and
Variant D: my implementation (which provides the correct functionality imo).

To eliminate structural difference, all implementations were implemented in functions, returning a Skip list or an Array.

Splitting results

Note that all implementations return different results, depending on their definition of "splitting":

string mellow yellow; delimiter ello

    splitVariantA returns 1 elements: ["mellow yellow" ]
    splitVariantB returns 2 elements: ["m" "llow yellow" ]
    splitVariantC returns 3 elements: ["w" "w y" "" ]
    splitVariantD returns 3 elements: ["m" "w y" "w" ]

string now's the time; delimiter

    splitVariantA returns 3 elements: ["now's" "the" "time" ]
    splitVariantB returns 2 elements: ["" "now's  the time" ]
    splitVariantC returns 5 elements: ["time" "the" "" "now's" "" ]
    splitVariantD returns 3 elements: ["now's" "the" "time" ]

string 1,2,,3,4,,; delimiter ,

    splitVariantA returns 4 elements: ["1" "2" "3" "4" ]
    splitVariantB returns 2 elements: ["1" "2,,3,4,," ]
    splitVariantC returns 7 elements: ["" "" "4" "3" "" "2" "" ]
    splitVariantD returns 7 elements: ["1" "2" "" "3" "4" "" "" ]

Timing

Splitting the string 1,2,,3,4,, with the pattern , for 10000 times on my machine gives these timings:

    splitVariantA() : 406 ms
    splitVariantB() : 46 ms
    splitVariantC() : 749 ms
    splitVariantD() : 1077 ms

Unfortunately, my implementation D is the slowest. Surprisingly, the regular expressions implementation C is pretty fast.

Source code

// niol, modified
Array splitVariantA(string splitter, string str){
    Array tokens = create(1, 1);
    Buffer buf = create;
    int str_index;
    buf = "";

    for(str_index = 0; str_index < length(str); str_index++){
        if( str[str_index:str_index] == splitter ){
            array_push_str(tokens, stringOf(buf));
            buf = "";
        } 
        else
            buf += str[str_index:str_index];
    }
    array_push_str(tokens, stringOf(buf));
    delete buf;
    return tokens;
}

// PJT, modified
Skip splitVariantB(string s, string delimiter) {

    int offset  
    int len
    Skip skp = create

    if ( findPlainText(s, delimiter, offset, len, false)) {
        put(skp, 0, s[0 : offset -1])
        put(skp, 1, s[offset +1 :])
    }

    return skp  
}

// Brett, modified
Skip splitVariantC (string s, string delim) {

    Skip skp = create
    int i = 0
    Regexp split = regexp "^(.*)" delim "(.*)$"
    while (split s) {
        string temp_s = s[match 1]
        put(skp, i++, s[match 2])
        s = temp_s
    }
    put(skp, i++, s[match 2])
    return  skp
}

Skip splitVariantD(string str, string pattern) {

    if (null(pattern) || 0 == length(pattern))
        pattern = " ";

    if (pattern == " ")
        str = stringStrip(stringSqueeze(str, ' '));

    Skip result = create;
    int i = 0; // index for searching in str
    int j = 0; // index counter for result array
    bool found = true;

    while (found) {
        // find pattern     
        int pos = 0;
        int len = 0;
        found = findPlainText(str[i:], pattern, pos, len, true);

        if (found) {
            // insert into result
            put(result, j++, str[i:i+pos-1]);
            i += pos + len;
        }
    }
    // append the rest after last found pattern
    put(result, j, str[i:]);

    return result;
}

回复收藏 0 原文

幸福丶如此 2024-08-01 15:40:50

我能想到的快速连接和分离。接缝工作正常。

int array_size(Array a){
    int size = 0;
    while( !null(get(a, size, 0) ) )
        size++;
    return size;
}

void array_push_str(Array a, string str){
    int array_index = array_size(a);

    put(a, str, array_index, 0);
}

string array_get_str(Array a, int index){
    return (string get(a, index, 0));
}

string str_join(string joiner, Array str_array){
    Buffer joined = create;
    int array_index = 0;

    joined += "";

    for(array_index = 0; array_index < array_size(str_array); array_index++){
        joined += array_get_str(str_array, array_index);
        if( array_index + 1 < array_size(str_array) )
            joined += joiner;
    }

    return stringOf(joined)
}

Array str_split(string splitter, string str){
    Array tokens = create(1, 1);
    Buffer buf = create;
    int str_index;

    buf = "";

    for(str_index = 0; str_index < length(str); str_index++){
        if( str[str_index:str_index] == splitter ){
            array_push_str(tokens, stringOf(buf));
            buf = "";
        }else{
            buf += str[str_index:str_index];
        }
    }
    array_push_str(tokens, stringOf(buf));

    delete buf;
    return tokens;
}

Quick join&split I could come up with. Seams to work okay.

int array_size(Array a){
    int size = 0;
    while( !null(get(a, size, 0) ) )
        size++;
    return size;
}

void array_push_str(Array a, string str){
    int array_index = array_size(a);

    put(a, str, array_index, 0);
}

string array_get_str(Array a, int index){
    return (string get(a, index, 0));
}

string str_join(string joiner, Array str_array){
    Buffer joined = create;
    int array_index = 0;

    joined += "";

    for(array_index = 0; array_index < array_size(str_array); array_index++){
        joined += array_get_str(str_array, array_index);
        if( array_index + 1 < array_size(str_array) )
            joined += joiner;
    }

    return stringOf(joined)
}

Array str_split(string splitter, string str){
    Array tokens = create(1, 1);
    Buffer buf = create;
    int str_index;

    buf = "";

    for(str_index = 0; str_index < length(str); str_index++){
        if( str[str_index:str_index] == splitter ){
            array_push_str(tokens, stringOf(buf));
            buf = "";
        }else{
            buf += str[str_index:str_index];
        }
    }
    array_push_str(tokens, stringOf(buf));

    delete buf;
    return tokens;
}

回复收藏 0 原文

小傻瓜 2024-08-01 15:40:50

如果您只拆分字符串一次，我会这样做：

string s = "We prefer questions that can be answered; not just discussed"

string sub = ";"

int offset

int len

if ( findPlainText(s, sub, offset, len, false)) {

/* the reason why I subtract one and add one is to remove the delimiter from the out put.
 First print is to print the prefix and then second is the suffix.*/

print s[0 : offset -1]

print s[offset +1 :]


} else {
// no delimiter found
print "Failed to match"

}

您还可以使用正则表达式，请参阅 DXL 参考手册。如果你想用多个分隔符分割字符串，例如 str = "this ; is an;example" ，那么最好使用正则表达式

If you only split the string once this is how I would do it:

string s = "We prefer questions that can be answered; not just discussed"

string sub = ";"

int offset

int len

if ( findPlainText(s, sub, offset, len, false)) {

/* the reason why I subtract one and add one is to remove the delimiter from the out put.
 First print is to print the prefix and then second is the suffix.*/

print s[0 : offset -1]

print s[offset +1 :]


} else {
// no delimiter found
print "Failed to match"

}

You could also use regular expressions refer to the DXL reference manual. It would be better to use regular expressions if you want to split up the string by multiple delimiters such as str = "this ; is an;example"

回复收藏 0 原文

抚你发端 2024-08-01 15:40:50

实际上有效：

如果字符串中不存在分隔符，此解决方案将根据需要多次拆分，或者不拆分。

这就是我使用的而不是传统的“split”命令。
它实际上跳过了数组的创建，只是循环遍历数组中的每个字符串并对每个字符串调用“someFunction”。

string s = "We prefer questions that can be answered; not just discussed"

// for this example, ";" is used as the delimiter
Regexp split = regexp "^(.*);(.*)$"

// while a ";" exists in s
while (split s) {

    // save the text before the last ";"
    string temp_s = s[match 1]

    // call someFunction on the text after the last ";"
    someFunction(s[match 2])

    // remove the text after the last ";" (including ";")
    s = temp_s
}

// call someFunction again for the last (or only) string
someFunction(s)

抱歉，破坏了旧帖子；我只是觉得其他答案没有用。

ACTUALLY WORKS:

This solution will split as many times as needed, or none, if the delimiter doesn't exist in the string.

This is what I have used instead of a traditional "split" command.
It actually skips the creation of an array, and just loops through each string that would be in the array and calls "someFunction" on each of those strings.

string s = "We prefer questions that can be answered; not just discussed"

// for this example, ";" is used as the delimiter
Regexp split = regexp "^(.*);(.*)$"

// while a ";" exists in s
while (split s) {

    // save the text before the last ";"
    string temp_s = s[match 1]

    // call someFunction on the text after the last ";"
    someFunction(s[match 2])

    // remove the text after the last ";" (including ";")
    s = temp_s
}

// call someFunction again for the last (or only) string
someFunction(s)

Sorry for necroing an old post; I just didn't find the other answers useful.

回复收藏 0 原文

帅气尐潴 2024-08-01 15:40:50

也许有人也会发现这种融合解决方案很方便。它根据分隔符分割 Skip 中的字符串，分隔符的长度实际上可以大于 1。

Skip splitString(string s1, string delimit)
{
    int offset, len
    Skip splited = create

    while(findPlainText(s1, delimit, offset, len, false))
    {
        put(splited, s1[0:offset-1], s1[0:offset-1])
        s1 = s1[offset+length(delimit):length(s1)-1]
    }


    if(length(s1)>0)
    {
        put (splited, s1, s1)
    }

    return splited
}

Perhaps someone would find handy this fused solution as well. It splits string in Skip, based on delimiter, which can actually have length more then one.

Skip splitString(string s1, string delimit)
{
    int offset, len
    Skip splited = create

    while(findPlainText(s1, delimit, offset, len, false))
    {
        put(splited, s1[0:offset-1], s1[0:offset-1])
        s1 = s1[offset+length(delimit):length(s1)-1]
    }


    if(length(s1)>0)
    {
        put (splited, s1, s1)
    }

    return splited
}

回复收藏 0 原文

情泪▽动烟 2024-08-01 15:40:50

我尝试过这个并为我解决了......

string s = "We prefer questions that can be answered,not just discussed,hiyas"

string sub = ","
int offset

int len

string s1=s

while(length(s1)>0){

    if ( findPlainText(s1, sub, offset, len, false)) {

        print s1[0 : offset -1]"\n"

        s1= s1[offset+1:length(s1)]

    }

    else

    {

        print s1

        s1=""

    }

}

I tried this out and worked out for me...

string s = "We prefer questions that can be answered,not just discussed,hiyas"

string sub = ","
int offset

int len

string s1=s

while(length(s1)>0){

    if ( findPlainText(s1, sub, offset, len, false)) {

        print s1[0 : offset -1]"\n"

        s1= s1[offset+1:length(s1)]

    }

    else

    {

        print s1

        s1=""

    }

}

回复收藏 0 原文

っ左 2024-08-01 15:40:50

这是一个更好的实现。这是通过搜索关键字来递归分割字符串。

pragma runLim, 10000
string s = "We prefer questions that can be answered,not just discussed,hiyas;
Next Line,Var1,Nemesis;
Next Line,Var2,Nemesis1;
Next Line,Var3,Nemesis2;
New,Var4,Nemesis3;
Next Line,Var5,Nemesis4;
New,Var5,Nemesis5;"
string sub = "," 
int offset
int len

string searchkey=null
string curr=s
string nxt=s
string searchline=null
string Modulename=""
string Attributename=""
string Attributevalue=""

while(findPlainText(curr,"Next Line", offset,len,false))
{
    int intlen=offset

    searchkey=curr[offset:length(curr)]

    if(findPlainText(searchkey,"Next Line",offset,len,false))
    {
        curr=searchkey[offset+1:length(searchkey)]
    }

    if(findPlainText(searchkey,";",offset,len,false))
    {       
        searchline=searchkey[0:offset]  
    }

    int counter=0
    while(length(searchline)>0)
    {   
        if (findPlainText(searchline, sub, offset, len, false))
        {
            if(counter==0)
            {
                Modulename=searchline[0 : offset -1]
                counter++
            }
            else if(counter==1)
            {
                Attributename=searchline[0 : offset -1]
                counter++
            }
            searchline= searchline[offset+1:length(searchline)]
        }
        else
        {

            if(counter==2)
            {
                Attributevalue=searchline[0:length(searchline)-2]
                counter++
            }
            searchline=""
        }       
    }
    print "Modulename="Modulename " Attributename=" Attributename " Attributevalue= "Attributevalue "\n"
}

Here is a better implementation. This is a recursive split of the string by searching a keyword.

pragma runLim, 10000
string s = "We prefer questions that can be answered,not just discussed,hiyas;
Next Line,Var1,Nemesis;
Next Line,Var2,Nemesis1;
Next Line,Var3,Nemesis2;
New,Var4,Nemesis3;
Next Line,Var5,Nemesis4;
New,Var5,Nemesis5;"
string sub = "," 
int offset
int len

string searchkey=null
string curr=s
string nxt=s
string searchline=null
string Modulename=""
string Attributename=""
string Attributevalue=""

while(findPlainText(curr,"Next Line", offset,len,false))
{
    int intlen=offset

    searchkey=curr[offset:length(curr)]

    if(findPlainText(searchkey,"Next Line",offset,len,false))
    {
        curr=searchkey[offset+1:length(searchkey)]
    }

    if(findPlainText(searchkey,";",offset,len,false))
    {       
        searchline=searchkey[0:offset]  
    }

    int counter=0
    while(length(searchline)>0)
    {   
        if (findPlainText(searchline, sub, offset, len, false))
        {
            if(counter==0)
            {
                Modulename=searchline[0 : offset -1]
                counter++
            }
            else if(counter==1)
            {
                Attributename=searchline[0 : offset -1]
                counter++
            }
            searchline= searchline[offset+1:length(searchline)]
        }
        else
        {

            if(counter==2)
            {
                Attributevalue=searchline[0:length(searchline)-2]
                counter++
            }
            searchline=""
        }       
    }
    print "Modulename="Modulename " Attributename=" Attributename " Attributevalue= "Attributevalue "\n"
}

回复收藏 0 原文

天荒地未老 2024-08-01 15:40:50

上述解决方案都不适合我，所以我写了自己的：

// Function, splits string by delimeter (warning: "//aaaa//bbbbb///cccc" with "/" delimeter --> "aaaa", "bbbbb", "cccc")
//
// -->| in's: 
//      s - string, which need to be splitted
//      delimeter - substring, by which the string will be divided
//      skp - skip list, where splitted string will be stored
// |--> out's:
//      none
void splitVariant(string s, string delimiter, Skip skp) {

    int offset  
    int len
    int i = 0
    
    if (!findPlainText(s, delimiter, offset, len, false))
    {
        return
    }
    
    
    while ( findPlainText(s, delimiter, offset, len, false)) {
        
        bool pass = false
        
        if (s[0 : offset - 1] == "")
        {
            pass = true
        }
        
        if (!pass)
        {
            put(skp, i, s[0 : offset - 1])
            i++ 
        }
        
        s = s[offset + len :]
    }
    
    bool pass = false
    
    if (s == "")
    {
        pass = true
    }
    
    if (!pass)
    {
        put(skp, i, s)
    }
    
    return
}

结果（countSkip 也是 custum）：

splitVariant("//aaaaaa//ffadsfasdf/ddddddddddds/asdassf//fsdfsdfffffffffff", "//", otpt)

string o

print "\n ******** \n"

int i

for (i = 0; i < countSkip(otpt); i++)
{
    find(otpt, i, o)
    print "This is " i " element of skip list: " o "\n"
}

输出：

******** 
This is 0 element of skip list: aaaaaa
This is 1 element of skip list: ffadsfasdf/ddddddddddds/asdassf
This is 2 element of skip list: fsdfsdfffffffffff

********

None of above solutions worked for me, so I wrote my own:

// Function, splits string by delimeter (warning: "//aaaa//bbbbb///cccc" with "/" delimeter --> "aaaa", "bbbbb", "cccc")
//
// -->| in's: 
//      s - string, which need to be splitted
//      delimeter - substring, by which the string will be divided
//      skp - skip list, where splitted string will be stored
// |--> out's:
//      none
void splitVariant(string s, string delimiter, Skip skp) {

    int offset  
    int len
    int i = 0
    
    if (!findPlainText(s, delimiter, offset, len, false))
    {
        return
    }
    
    
    while ( findPlainText(s, delimiter, offset, len, false)) {
        
        bool pass = false
        
        if (s[0 : offset - 1] == "")
        {
            pass = true
        }
        
        if (!pass)
        {
            put(skp, i, s[0 : offset - 1])
            i++ 
        }
        
        s = s[offset + len :]
    }
    
    bool pass = false
    
    if (s == "")
    {
        pass = true
    }
    
    if (!pass)
    {
        put(skp, i, s)
    }
    
    return
}

Results (countSkip is custum too):

splitVariant("//aaaaaa//ffadsfasdf/ddddddddddds/asdassf//fsdfsdfffffffffff", "//", otpt)

string o

print "\n ******** \n"

int i

for (i = 0; i < countSkip(otpt); i++)
{
    find(otpt, i, o)
    print "This is " i " element of skip list: " o "\n"
}

Output:

******** 
This is 0 element of skip list: aaaaaa
This is 1 element of skip list: ffadsfasdf/ddddddddddds/asdassf
This is 2 element of skip list: fsdfsdfffffffffff

********

回复收藏 0 原文

~没有更多了~