在 C# 中为文件中的条目分配编号

发布于 2024-11-06 06:06:20 字数 759 浏览 0 评论 0原文

改进的格式

所以,我以前使用的方法都没有帮助我对日志文件数据进行聚类:(

现在我要尝试一种索引方法..为此我需要根据 URL 中出现的关键字对每个日志文件条目建立索引字段..

示例:

192.162.1.4 [3/May/2009 00:34:45] "GET /books/casual/4534.pdf" 200 454353  "http://ljdhjg.com" "Mozillablahblah"<br/>
190.100.1.4 [3/May/2009 00:37:45] "GET /resources/help.pdf" 200 4353 "http://ljdhjg.com" "Mozillablahblah"<br/>
192.162.1.4 [3/May/2009 00:40:45] "GET /books/serious/44.pdf" 200 234353 "http://ljdhjg.com" "Mozillablahblah"<br/>

....我还有数千个这样的条目..

现在,所有 "books" 都需要分配一个编号...1(例如)..接下来,需要分配 "resources" 2..我如何在 C# 中完成这个任务?我的意思是,我知道逻辑...

提取关键字..分配数字..将关键字数组与文件的每一行进行比较..如果匹配,则分配。但由于我是 C# 新手,我真的不知道如何编写上述逻辑。所以..帮忙?

Improved Formatting

So,none of the previous approaches i used helped me to cluster my log file data :(

Now Im going to try an indexing approach..for which I need to index each log file entry based on the keyword that appears in the URL field..

example:

192.162.1.4 [3/May/2009 00:34:45] "GET /books/casual/4534.pdf" 200 454353  "http://ljdhjg.com" "Mozillablahblah"<br/>
190.100.1.4 [3/May/2009 00:37:45] "GET /resources/help.pdf" 200 4353 "http://ljdhjg.com" "Mozillablahblah"<br/>
192.162.1.4 [3/May/2009 00:40:45] "GET /books/serious/44.pdf" 200 234353 "http://ljdhjg.com" "Mozillablahblah"<br/>

....And i have thousands more entries like this..

Now all of "books" needs to be assigned a number...1 (say)..and next, "resources" needs to be assigned 2..how do i go about accomplishing this in C# ? I mean,i know the logic...

Extract keyword..assign number..compare keyword array with each line of file..if match,assign. But since im new to C#, i dont really know how to code the above mentioned logic. So..help?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

黑色毁心梦 2024-11-13 06:06:20

您可以尝试这种临时方法进行分配(我假设这意味着将索引添加到日志条目的前缀),

/// <summary>
/// Gets the indexed entry.
/// </summary>
/// <param name="entry">The log entry.</param>
/// <returns>The log entry prefixed with the index.</returns>
private string GetIndexedEntry(string entry)
{
    string keyword = GetKeyword(entry);

    switch (keyword)
    {
        case "books":
            entry = "1 : " + entry;
            break;

        case "resources":
            entry = "2 : " + entry;
            break;
    }

    // Alternative code (using dictionary)
    // entry = keywordIndexDictionary[keyword] + " : " + entry;

    return entry;
}

/// <summary>
/// Gets the keyword.
/// </summary>
/// <param name="entry">The log entry.</param>
/// <returns>The keyword for the specified log entry.</returns>
private string GetKeyword(string entry)
{
    int index = entry.IndexOf("\"GET");
    entry = entry.Substring(index + ("\"GET").Length);
    index = entry.IndexOf('"');
    entry = entry.Substring(0, index).Trim();
    return entry.Split('/')[1];
}

// Alternative approach
/// <summary>
/// Stores the keyword-index pair
/// </summary>
private Dictionary<string, int> keywordIndexDictionary;

/// <summary>
/// Builds the dictionary.
/// </summary>
private void BuildDictionary()
{
    // Build the dictionary manually 
    // or alternatively read from an settings file and then build the dictionary
    keywordIndexDictionary.Add("books", 1);
    keywordIndexDictionary.Add("resources", 2);
}

GetIndexedEntry() 的调用如下所示,

string indexedLogEntry = GetIndexedEntry(logEntry);

其中 logEntry 是代表日志文件中每个条目的字符串。

对于192.162.1.4 [3/May/2009 00:34:45]的logEntry

“GET /books/casual/4534.pdf”200 454353“http://ljdhjg.com " "Mozillablahblah"

indexedLogEntry 将是

1 : 192.162.1.4 [3/May/2009 00:34:45] "GET /books/casual/4534.pdf" 200 454353 "http://ljdhjg.com" "Mozillablahblah"

可以采用更优雅的方法如果使用正则表达式。

You can try this adhoc approach to assign (I am assuming this means prefixing the index to the log entry),

/// <summary>
/// Gets the indexed entry.
/// </summary>
/// <param name="entry">The log entry.</param>
/// <returns>The log entry prefixed with the index.</returns>
private string GetIndexedEntry(string entry)
{
    string keyword = GetKeyword(entry);

    switch (keyword)
    {
        case "books":
            entry = "1 : " + entry;
            break;

        case "resources":
            entry = "2 : " + entry;
            break;
    }

    // Alternative code (using dictionary)
    // entry = keywordIndexDictionary[keyword] + " : " + entry;

    return entry;
}

/// <summary>
/// Gets the keyword.
/// </summary>
/// <param name="entry">The log entry.</param>
/// <returns>The keyword for the specified log entry.</returns>
private string GetKeyword(string entry)
{
    int index = entry.IndexOf("\"GET");
    entry = entry.Substring(index + ("\"GET").Length);
    index = entry.IndexOf('"');
    entry = entry.Substring(0, index).Trim();
    return entry.Split('/')[1];
}

// Alternative approach
/// <summary>
/// Stores the keyword-index pair
/// </summary>
private Dictionary<string, int> keywordIndexDictionary;

/// <summary>
/// Builds the dictionary.
/// </summary>
private void BuildDictionary()
{
    // Build the dictionary manually 
    // or alternatively read from an settings file and then build the dictionary
    keywordIndexDictionary.Add("books", 1);
    keywordIndexDictionary.Add("resources", 2);
}

The call to GetIndexedEntry() would look like,

string indexedLogEntry = GetIndexedEntry(logEntry);

where logEntry is the string representing each entry in the log file.

For a logEntry of

192.162.1.4 [3/May/2009 00:34:45] "GET /books/casual/4534.pdf" 200 454353 "http://ljdhjg.com" "Mozillablahblah"

the indexedLogEntry would be

1 : 192.162.1.4 [3/May/2009 00:34:45] "GET /books/casual/4534.pdf" 200 454353 "http://ljdhjg.com" "Mozillablahblah"

A more elegant approach is possible if one uses regular expressions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文