拥有带有路径的地图如何将其与给定路径进行比较?

发布于 2024-11-07 03:35:07 字数 685 浏览 3 评论 0原文

我们有 boost 路径到字符串对的映射,例如 name:location (绝对位置路径,如 usr/myfolder/)。我们获得了 usr/myfolder/mysubfolder/myfile 中的某个位置。如何找到哪个地图位置最适合给定的网址?

例如,我们有一个地图,如果需要,我们可以使用它:

service1:myfolder/
service2:myfolder/mysubfolder/
service3:myfolder/myothersubfolder/
service4:myfolder/mysubfolder/myfile

我们被赋予值myfolder/mysubfolder/myfile/blablabla/(路径)。 我们想要找出它与地图中的哪个项目最相关。 搜索结果应为 service4 作为具有最相关内容的地图项。

那么如何根据给定的字符串值找到与哪个映射元素最相关的呢?

所以原始问题是关于一般字符串情况,但我进行了一些重新配置,所以不,我只是在升压路径上工作。

We have map of boost path to string pairs like name:location (absolute location paths a la usr/myfolder/). We are given with some location a la usr/myfolder/mysubfolder/myfile. How to find which of maps location fit to given url most?

Example we have a map like which we can resort if we need:

service1:myfolder/
service2:myfolder/mysubfolder/
service3:myfolder/myothersubfolder/
service4:myfolder/mysubfolder/myfile

We are given value myfolder/mysubfolder/myfile/blablabla/ (path).
We want to find out to which item in our map it relates the most.
Search result shall be service4 as map item with most related content.

So how to find by given string value to which map element it relates the most?

So original question was about general string case but I had some reconfiguration so no I just work on boost paths.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

笑叹一世浮沉 2024-11-14 03:35:07

我确实没有现成的 C++ 答案,但最近我不得不在 C# 中做一些类似的事情,并提出以下建议:

循环遍历整个向量,检查有趣的路径以查看它是否以元素开头。此类比赛中最长的获胜者。这将是一个 O(n) 操作,具体取决于比较集中的路径数量。

我对上述内容的改进版本变得有点不同,因为我将检查我之前已经检查过的一些条目。

因此,我按路径长度降序对向量进行排序,这样我遇到的第一个匹配项也是最好的(我认为给我一个平均 O(n/2) 操作),并将结果存储到字典中,所以我不需要再次强力搜索。

希望这有帮助!

I don't really have a ready C++ answer, but I had to do something similar in C# recently, and came up with the following:

Loop through the whole vector, checking the interesting path to see if it begins with an element. The longest such match is the winner. This would be an O(n) operation, depending upon the number of paths in the comparison set.

My refined version of the above became a little different, because I was going to be checking against a number of entries I'd already checked before.

So, I sorted the vector by descending length of path, so that the first match I come across would also be the best (giving me an average O(n/2) operation, I think), and stored results into a dictionary, so I wouldn't need to brute force the search again.

Hope this helps!

來不及說愛妳 2024-11-14 03:35:07

您可以使用 Levenshtein 距离

编辑

因为我自己终于需要类似的东西,这个问题仍然悬而未决。这是我使用过的一些代码。既可以直接计算字符串距离,也可以将 Levenshtein 算法应用于路径标记。

C++代码

/*
----- String based Levenshtein ----
Matching : this/is/a/strange/path

0 : this/is/a/strange/path
2 : this/is/a/strong/path
2 : that/is/a/strange/path
4 : is/this/a/strange/path
5 : this/is/a/strange
13 : this/is/a
15 : this/is
16 : that/was
18 : this
24 : completely/different/folder

----- Path based Levenshtein ----
Matching : this/is/a/strange/path

0 : this/is/a/strange/path
1 : this/is/a/strange
2 : this/is/a/strong/path
2 : that/is/a/strange/path
2 : this/is/a
2 : is/this/a/strange/path
3 : this/is
4 : this
7 : that/was
8 : completely/different/folder
*/

#include <string>
#include <vector>
#include <algorithm>
#include <stdio.h>

/// returns the smaller of two parameters
template< typename T >
    T levmin( T v1, T v2 )
    {   return ( v1 < v2 ) ? v1 : v2; }

/// Returns the Levenshtein distance between the specified strings
template < typename T, typename T_STR > 
    typename T_STR::size_type levstr(const T_STR &s1, const T_STR &s2)
{
    typename T_STR::size_type l1 = s1.length(), l2 = s2.length();
    std::vector< typename T_STR::size_type > d( ( l1 + 1 ) * ( l2 + 1 ) );

    typename T_STR::size_type i, j;
    for ( i = 0; i <= l1; i++ )
        d[ i * l2 ] = i;

    for ( i = 0; i <= l2; i++ )
        d[ i ] = i;

    for ( i = 1; i <= l1; i++ )
        for ( j = 1; j <= l2; j++ )
            d[ i * l2 + j ] = levmin( levmin( d[ ( i - 1 ) * l2 + j ] + 1, d[ i * l2 + ( j - 1 ) ] + 1 ),
                                      d[ ( i - 1 ) * l2 + ( j - 1 ) ] + ( s1[ i - 1 ] == s2[ j - 1 ] ? 0 : 1 ) 
                                    );

    return d[ ( l1 * l2 ) + l2 ];       
}

/// Returns the Levenshtein distance between the specified split paths
template < typename T, typename T_STR, typename T_LST > 
    typename T_STR::size_type levpath(const T_LST &lst1, const T_LST &lst2)
{
    typename T_STR::size_type l1 = lst1.size(), l2 = lst2.size();
    std::vector< typename T_STR::size_type > d( ( l1 + 1 ) * ( l2 + 1 ) );

    typename T_STR::size_type i, j;
    for ( i = 0; i <= l1; i++ )
        d[ i * l2 ] = i;

    for ( i = 0; i <= l2; i++ )
        d[ i ] = i;

    for ( i = 1; i <= l1; i++ )
        for ( j = 1; j <= l2; j++ )
            d[ i * l2 + j ] = levmin( levmin( d[ ( i - 1 ) * l2 + j ] + 1, d[ i * l2 + ( j - 1 ) ] + 1 ),
                                      d[ ( i - 1 ) * l2 + ( j - 1 ) ] + levstr< T, T_STR>( lst1[ i - 1 ], lst2[ j - 1 ] )
                                    );

    return d[ ( l1 * l2 ) + l2 ];       
}

/// Generic split path function
/*
    Returns a vector of path tokens
*/
template < typename T, typename T_STR, typename T_LST >
    T_LST splitpath( const T_STR &s, const T sep )
    {   T_LST lst;
        typename T_STR::size_type i = 0, l = 0;
        while( T_STR::npos != ( i = s.find_first_of( sep, i ) ) )
        {   if ( l < i )
                lst.push_back( T_STR( s, l, i - l ) );
            l = ++i;
        } // end while
        if ( l < s.length() )
            lst.push_back( T_STR( s, l, s.length() - l ) );
        return lst;
    }

/// Generic join path function
/*
    Returns a string of joined vector tokens
*/
template < typename T, typename T_STR, typename T_LST >
    T_STR joinpath( const T_LST &p, const T sep )
    {   T_STR s;
        for ( typename T_LST::const_iterator it = p.begin(); it != p.end(); it++ )
        {   if ( s.length() ) s += sep; s += *it; }
        return s;
    }


// String types
typedef char t_levchar;
typedef std::basic_string< t_levchar > t_levstr;
typedef std::vector< t_levstr > t_levpath;
typedef std::vector< t_levpath > t_levpathlist;

// Sort compare for split paths
template< typename T, typename T_STR, typename T_LST > struct levcmp 
{   levcmp( const T_LST &p ) { m_p = p; }
    bool operator() ( const T_LST &i, const T_LST &j ) 
    { return levpath< T, T_STR, T_LST >( i, m_p ) < levpath< T, T_STR, T_LST >( j, m_p ); }
    T_LST m_p;
};

// Sort compare for strings
template< typename T, typename T_STR > struct levcmp_str 
{   levcmp_str( const T_STR &s ) { m_s = s; }
    bool operator() ( const T_STR &i, const T_STR &j ) 
    { return levstr< T, T_STR >( i, m_s ) < levstr< T, T_STR >( j, m_s ); }
    T_STR m_s;
};

int main( int argc, char* argv[] )
{
    // Path to compare with
    const t_levchar* compare_path = "this/is/a/strange/path";

    // Paths to sort
    const t_levchar* path_list[] = 
    { 
        "this/is/a/strong/path",
        "that/is/a/strange/path",
        "this/is/a/strange",
        "this/is",
        "this/is/a",
        "this",
        "this/is/a/strange/path", 
        "is/this/a/strange/path", 
        "that/was",
        "completely/different/folder",
        0
    };

    printf( "\n----- String based Levenshtein ----\n"
            "Matching : %s\n\n", compare_path );

    // Create vector from paths         
    std::vector< t_levstr > paths;
    for( int i = 0; path_list[ i ]; i++ ) 
        paths.push_back( path_list[ i ] );

    // Sort the paths
    std::sort( paths.begin(), paths.end(), levcmp_str< t_levchar, t_levstr >( compare_path ) );

    // Show the result
    for ( std::vector< t_levstr >::iterator it = paths.begin(); it != paths.end(); it++ )
        printf( "%d : %s\n", 
                (int)levstr< t_levchar, t_levstr >( *it, compare_path ),
                (const char*)it->c_str() );

    printf( "\n----- Path based Levenshtein ----\n"
            "Matching : %s\n\n", compare_path );

    // Create vector from split paths
    t_levpath splitcompare = splitpath< t_levchar, t_levstr, t_levpath >( compare_path, '/' );
    t_levpathlist splitpaths;
    for ( int i = 0; path_list[ i ]; i++ ) 
        splitpaths.push_back( splitpath< t_levchar, t_levstr, t_levpath >( path_list[ i ], '/' ) );

    // Sort the paths
    std::sort( splitpaths.begin(), splitpaths.end(),
                levcmp< t_levchar, t_levstr, t_levpath >( splitcompare ) );

    // Show the results
    for ( t_levpathlist::iterator it = splitpaths.begin(); it != splitpaths.end(); it++ )
        printf( "%d : %s\n", 
                (int)levpath< t_levchar, t_levstr, t_levpath >( *it, splitcompare ),
                (const char*)joinpath< t_levchar, t_levstr, t_levpath >( *it, '/' ).c_str() );

    return 0;
}

You could use the Levenshtein distance

EDIT

Since I finally needed something similar myself, and this question remains open. Here is some code I played around with. Both straight up string distance and also applying the Levenshtein algorithm to the path tokens.

C++ Code

/*
----- String based Levenshtein ----
Matching : this/is/a/strange/path

0 : this/is/a/strange/path
2 : this/is/a/strong/path
2 : that/is/a/strange/path
4 : is/this/a/strange/path
5 : this/is/a/strange
13 : this/is/a
15 : this/is
16 : that/was
18 : this
24 : completely/different/folder

----- Path based Levenshtein ----
Matching : this/is/a/strange/path

0 : this/is/a/strange/path
1 : this/is/a/strange
2 : this/is/a/strong/path
2 : that/is/a/strange/path
2 : this/is/a
2 : is/this/a/strange/path
3 : this/is
4 : this
7 : that/was
8 : completely/different/folder
*/

#include <string>
#include <vector>
#include <algorithm>
#include <stdio.h>

/// returns the smaller of two parameters
template< typename T >
    T levmin( T v1, T v2 )
    {   return ( v1 < v2 ) ? v1 : v2; }

/// Returns the Levenshtein distance between the specified strings
template < typename T, typename T_STR > 
    typename T_STR::size_type levstr(const T_STR &s1, const T_STR &s2)
{
    typename T_STR::size_type l1 = s1.length(), l2 = s2.length();
    std::vector< typename T_STR::size_type > d( ( l1 + 1 ) * ( l2 + 1 ) );

    typename T_STR::size_type i, j;
    for ( i = 0; i <= l1; i++ )
        d[ i * l2 ] = i;

    for ( i = 0; i <= l2; i++ )
        d[ i ] = i;

    for ( i = 1; i <= l1; i++ )
        for ( j = 1; j <= l2; j++ )
            d[ i * l2 + j ] = levmin( levmin( d[ ( i - 1 ) * l2 + j ] + 1, d[ i * l2 + ( j - 1 ) ] + 1 ),
                                      d[ ( i - 1 ) * l2 + ( j - 1 ) ] + ( s1[ i - 1 ] == s2[ j - 1 ] ? 0 : 1 ) 
                                    );

    return d[ ( l1 * l2 ) + l2 ];       
}

/// Returns the Levenshtein distance between the specified split paths
template < typename T, typename T_STR, typename T_LST > 
    typename T_STR::size_type levpath(const T_LST &lst1, const T_LST &lst2)
{
    typename T_STR::size_type l1 = lst1.size(), l2 = lst2.size();
    std::vector< typename T_STR::size_type > d( ( l1 + 1 ) * ( l2 + 1 ) );

    typename T_STR::size_type i, j;
    for ( i = 0; i <= l1; i++ )
        d[ i * l2 ] = i;

    for ( i = 0; i <= l2; i++ )
        d[ i ] = i;

    for ( i = 1; i <= l1; i++ )
        for ( j = 1; j <= l2; j++ )
            d[ i * l2 + j ] = levmin( levmin( d[ ( i - 1 ) * l2 + j ] + 1, d[ i * l2 + ( j - 1 ) ] + 1 ),
                                      d[ ( i - 1 ) * l2 + ( j - 1 ) ] + levstr< T, T_STR>( lst1[ i - 1 ], lst2[ j - 1 ] )
                                    );

    return d[ ( l1 * l2 ) + l2 ];       
}

/// Generic split path function
/*
    Returns a vector of path tokens
*/
template < typename T, typename T_STR, typename T_LST >
    T_LST splitpath( const T_STR &s, const T sep )
    {   T_LST lst;
        typename T_STR::size_type i = 0, l = 0;
        while( T_STR::npos != ( i = s.find_first_of( sep, i ) ) )
        {   if ( l < i )
                lst.push_back( T_STR( s, l, i - l ) );
            l = ++i;
        } // end while
        if ( l < s.length() )
            lst.push_back( T_STR( s, l, s.length() - l ) );
        return lst;
    }

/// Generic join path function
/*
    Returns a string of joined vector tokens
*/
template < typename T, typename T_STR, typename T_LST >
    T_STR joinpath( const T_LST &p, const T sep )
    {   T_STR s;
        for ( typename T_LST::const_iterator it = p.begin(); it != p.end(); it++ )
        {   if ( s.length() ) s += sep; s += *it; }
        return s;
    }


// String types
typedef char t_levchar;
typedef std::basic_string< t_levchar > t_levstr;
typedef std::vector< t_levstr > t_levpath;
typedef std::vector< t_levpath > t_levpathlist;

// Sort compare for split paths
template< typename T, typename T_STR, typename T_LST > struct levcmp 
{   levcmp( const T_LST &p ) { m_p = p; }
    bool operator() ( const T_LST &i, const T_LST &j ) 
    { return levpath< T, T_STR, T_LST >( i, m_p ) < levpath< T, T_STR, T_LST >( j, m_p ); }
    T_LST m_p;
};

// Sort compare for strings
template< typename T, typename T_STR > struct levcmp_str 
{   levcmp_str( const T_STR &s ) { m_s = s; }
    bool operator() ( const T_STR &i, const T_STR &j ) 
    { return levstr< T, T_STR >( i, m_s ) < levstr< T, T_STR >( j, m_s ); }
    T_STR m_s;
};

int main( int argc, char* argv[] )
{
    // Path to compare with
    const t_levchar* compare_path = "this/is/a/strange/path";

    // Paths to sort
    const t_levchar* path_list[] = 
    { 
        "this/is/a/strong/path",
        "that/is/a/strange/path",
        "this/is/a/strange",
        "this/is",
        "this/is/a",
        "this",
        "this/is/a/strange/path", 
        "is/this/a/strange/path", 
        "that/was",
        "completely/different/folder",
        0
    };

    printf( "\n----- String based Levenshtein ----\n"
            "Matching : %s\n\n", compare_path );

    // Create vector from paths         
    std::vector< t_levstr > paths;
    for( int i = 0; path_list[ i ]; i++ ) 
        paths.push_back( path_list[ i ] );

    // Sort the paths
    std::sort( paths.begin(), paths.end(), levcmp_str< t_levchar, t_levstr >( compare_path ) );

    // Show the result
    for ( std::vector< t_levstr >::iterator it = paths.begin(); it != paths.end(); it++ )
        printf( "%d : %s\n", 
                (int)levstr< t_levchar, t_levstr >( *it, compare_path ),
                (const char*)it->c_str() );

    printf( "\n----- Path based Levenshtein ----\n"
            "Matching : %s\n\n", compare_path );

    // Create vector from split paths
    t_levpath splitcompare = splitpath< t_levchar, t_levstr, t_levpath >( compare_path, '/' );
    t_levpathlist splitpaths;
    for ( int i = 0; path_list[ i ]; i++ ) 
        splitpaths.push_back( splitpath< t_levchar, t_levstr, t_levpath >( path_list[ i ], '/' ) );

    // Sort the paths
    std::sort( splitpaths.begin(), splitpaths.end(),
                levcmp< t_levchar, t_levstr, t_levpath >( splitcompare ) );

    // Show the results
    for ( t_levpathlist::iterator it = splitpaths.begin(); it != splitpaths.end(); it++ )
        printf( "%d : %s\n", 
                (int)levpath< t_levchar, t_levstr, t_levpath >( *it, splitcompare ),
                (const char*)joinpath< t_levchar, t_levstr, t_levpath >( *it, '/' ).c_str() );

    return 0;
}
下壹個目標 2024-11-14 03:35:07
//I don't know what path is, I'll pretend it's a std::string
//algorithm is the same, just functions change
const std::vector<boost::path>& services;
const std::string& findme = ???

std::vector<boost::path>::iterator result = services.end();
for(auto i = services.begin(); i != services.end(); ++i) {
     //if this path is shorter than the best so far, skip it
     if (result == services.end() || i->second.size()>result->second.size()) {
         const size_t colonpos = i->second.find(':', 8); //find colon
         //if this path is longer than findme, skip it
         if (findme.size() >= i->second.size()-colonpos) {
             //make sure they match, in _reverse_ order (faster in this case)
             int o=i->second.size()-1;
             for( ; o>=colonpos; --o) {
                 if (i->second[o] != findme[o-colonpos])
                     break;
             }
             //if it was a match, this is the new best result
             if (o < colonpos)
                 result = i;
         }
     }
}

//either result is services.end() meaning none matched
//or result is the _longest_ path that matched

显然,boost::path不是std::string,但可能有一个成员来获取std::string或类似的对象,因此您只需将该成员添加到 i->secondresult->second

//I don't know what path is, I'll pretend it's a std::string
//algorithm is the same, just functions change
const std::vector<boost::path>& services;
const std::string& findme = ???

std::vector<boost::path>::iterator result = services.end();
for(auto i = services.begin(); i != services.end(); ++i) {
     //if this path is shorter than the best so far, skip it
     if (result == services.end() || i->second.size()>result->second.size()) {
         const size_t colonpos = i->second.find(':', 8); //find colon
         //if this path is longer than findme, skip it
         if (findme.size() >= i->second.size()-colonpos) {
             //make sure they match, in _reverse_ order (faster in this case)
             int o=i->second.size()-1;
             for( ; o>=colonpos; --o) {
                 if (i->second[o] != findme[o-colonpos])
                     break;
             }
             //if it was a match, this is the new best result
             if (o < colonpos)
                 result = i;
         }
     }
}

//either result is services.end() meaning none matched
//or result is the _longest_ path that matched

Obviously, boost::path isn't a std::string, but probably has a member to get a std::string or similar object, so you'll just have to add that member to i->second and result->second

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文