用于解析简单的基于文本的数据文件的正则表达式

发布于 2024-07-21 08:43:24 字数 773 浏览 3 评论 0原文

有人可以帮我一点正则表达式吗？

我正在阅读一个简单的文本冒险的“地点”列表（那些当时非常流行的地点）。但是，我不确定如何获取输入。

这些位置都遵循以下格式：

<location_name>, [<item>]
    [direction, location_name]

例如：（

Albus Square, Flowers, Traffic Cone
    NORTH, Franklandclaw Lecture Theatre
    WEST, Library of Enchanted Books
    SOUTH, Furnesspuff College

Library of Enchanted Books
    EAST, Albus Square
    UP, Reading Room

后续位置由空行分隔。）

我将它们存储为具有以下结构的 Location 对象：

public class Location {

    private String name;

    private Map<Direction, Location> links;

    private List<Item> items;

}

我使用一种方法从 URL 检索数据并从中创建 Location 对象阅读文本，但我完全无法做到这一点。我认为正则表达式会有帮助。谁能向我伸出援助之手吗？

原文

Can anyone give me a hand with a touch of regex?

I'm reading in a list of "locations" for a simple text adventure (those so popular back in the day). However, I'm unsure as to how to obtain the input.

The locations all follow the format:

<location_name>, [<item>]
    [direction, location_name]

Such as:

Albus Square, Flowers, Traffic Cone
    NORTH, Franklandclaw Lecture Theatre
    WEST, Library of Enchanted Books
    SOUTH, Furnesspuff College

Library of Enchanted Books
    EAST, Albus Square
    UP, Reading Room

(Subsequent locations are separated by a blank line.)

I'm storing these as Location objects with the structure:

public class Location {

    private String name;

    private Map<Direction, Location> links;

    private List<Item> items;

}

I use a method to retrieve the data from a URL and create the Location objects from the read text, but I'm at a complete block as to do this. I think regex would be of help. Can anyone lend me a well-needed hand?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无声静候 2024-07-28 08:43:24

同意 w/willcodejavaforfood，可以使用正则表达式，但在这里并不是一个很大的提升。

听起来你只需要一点算法帮助（下面是草率的 p 代码）......

currloc = null
while( line from file )
    if line begins w/ whitespace
        (dir, loc) = split( line, ", " )
        add dir, loc to currloc
    else
        newlocdata = split( line, ", " )
        currloc = newlocdata[0]
        for i = 1 to size( newlocdata ) - 1
            item = newlocdata[i]
            add item to currloc

Agree w/ willcodejavaforfood, regex could be used but isn't a big boost here.

Sounds like you just need a little algorithm help (sloppy p-code follows)...

currloc = null
while( line from file )
    if line begins w/ whitespace
        (dir, loc) = split( line, ", " )
        add dir, loc to currloc
    else
        newlocdata = split( line, ", " )
        currloc = newlocdata[0]
        for i = 1 to size( newlocdata ) - 1
            item = newlocdata[i]
            add item to currloc

回复收藏 0 原文

苍景流年 2024-07-28 08:43:24

您不想为此使用纯文本格式：

当您有多个花卉项目时会发生什么？它们都一样吗？冒险家不能通过在多个地点采摘一朵花来收集花束吗？
可能会有几个同名的房间（“地窖”、“街角”），即填充房间，它们增加了气氛，但对游戏没有任何帮助。但他们没有得到自己的描述。如何将它们分开？
如果名称包含逗号怎么办？
最终，您将希望对外国名称或格式指令使用 Unicode。

由于这是结构化数据，可能包含许多奇怪的情况，因此我建议为此使用 XML：

<locations>
    <location>
        <name>Albus Square</name>
        <summary>Short description for returning adventurer</summary>
        <description>Long text here ... with formatting, etc.</description>
        <items>
            <item>Flowers</item>
            <item>Traffic Cone</item>
        <items>
        <directions>
            <north>Franklandclaw Lecture Theatre</north>
            <west>Library of Enchanted Books</west>
            <south>Furnesspuff College</south>
        </directions>
    </location>
    <location>
        <name>Library of Enchanted Books</name>
        <directions>
            <east>Albus Square</east>
            <up>Reading Room</up>
        </directions>
    </location>
</locations>

这提供了更大的灵活性，解决了许多问题，例如格式化描述文本、Unicode 字符等。此外，您还可以使用多个使用 ID（数字）而不是文本来显示具有相同名称的单个项目/位置。

使用 JDom 或 DecentXML 解析游戏配置。

You don't want to use a text-only format for this:

What happens when you have more than a single flower item? Are they all the same? Can't an adventurer collect a bouqet at by picking single flowers at several locations?
There will probably be several rooms with the same name ("cellar", "street corner"), i.e. filler rooms which add to the atmosphere but nothing to the game. They don't get a description of their own, though. How to keep them apart?
What if a name contains a comma?
Eventually, you'll want to use Unicode for foreign names or formatting instructions.

Since this is structured data which can contain lots of odd cases, I suggest to use XML for this:

<locations>
    <location>
        <name>Albus Square</name>
        <summary>Short description for returning adventurer</summary>
        <description>Long text here ... with formatting, etc.</description>
        <items>
            <item>Flowers</item>
            <item>Traffic Cone</item>
        <items>
        <directions>
            <north>Franklandclaw Lecture Theatre</north>
            <west>Library of Enchanted Books</west>
            <south>Furnesspuff College</south>
        </directions>
    </location>
    <location>
        <name>Library of Enchanted Books</name>
        <directions>
            <east>Albus Square</east>
            <up>Reading Room</up>
        </directions>
    </location>
</locations>

This allows for much greater flexibility, solves a lot of issues like formatting description text, Unicode characters, etc. plus you can use more than a single item/location with the same name by using IDs (numbers) instead of text.

Use JDom or DecentXML to parse the game config.

回复收藏 0 原文

知你几分 2024-07-28 08:43:24

我现在无法进入 Java 模式，所以这里有一些伪代码可以做到这一点：

Data = MyString.split('\n\n++\s*+');

for ( i=0 ; i<Data.length ; i++ )
{
    CurLocation = Data[i].split('\n\s*+');

    LocationInfo = CurLocation[0].split(',\s*+');

    LocationName = LocationInfo[0];

    for ( n=1 ; n<LocationInfo.length ; n++ )
    {
        Items[n-1] = LocationInfo[n];
    }


    for ( n=1 ; n<CurLocation.length ; n++ )
    {
        DirectionInfo = LocationInfo[n].split(',\s*+');

        DirectionName = DirectionInfo[0];

        for ( x=1 ; x<DirectionInfo.length ; x++ )
        {
            DirectionLocation[x-1] = DirectionInfo[x];
        }

    }


}

Can't get my head into Java-mode right now, so here's some pseudo-code that should do it:

Data = MyString.split('\n\n++\s*+');

for ( i=0 ; i<Data.length ; i++ )
{
    CurLocation = Data[i].split('\n\s*+');

    LocationInfo = CurLocation[0].split(',\s*+');

    LocationName = LocationInfo[0];

    for ( n=1 ; n<LocationInfo.length ; n++ )
    {
        Items[n-1] = LocationInfo[n];
    }


    for ( n=1 ; n<CurLocation.length ; n++ )
    {
        DirectionInfo = LocationInfo[n].split(',\s*+');

        DirectionName = DirectionInfo[0];

        for ( x=1 ; x<DirectionInfo.length ; x++ )
        {
            DirectionLocation[x-1] = DirectionInfo[x];
        }

    }


}

回复收藏 0 原文

末骤雨初歇 2024-07-28 08:43:24

你能改变数据的格式吗？这种格式很笨拙。我怀疑您正忙着重新发明方轮...这对我来说是“只需使用 XML”。

回复收藏 0 原文

_蜘蛛 2024-07-28 08:43:24

我认为使用 XML 是杀伤力过大（用大炮射麻雀），而正则表达式则是“杀伤力不足”（使用太弱的工具，用牙刷擦地板）。

正确的平衡听起来像是“.ini 格式”或“带有部分的邮件标头”。对于 python，有库文档 http://docs.python.org/library/configparser.html 。

一个简单的例子：

[albus_square]
name: Albus Square
items: Flowers, Traffic Cone
north: lecture_theatre
west: library_enchanted_books
south: furnesspuff_college

我假设有一个针对这种格式的 Java 库。正如另一位发帖者指出的那样，您可能会发生名称冲突，因此我冒昧地添加了“名称：”字段。方括号中的名称将是唯一标识符。

I think using XML is overkill (shooting sparrows with cannons) while regexps are "underkill" (using a too weak tool, scrubbing floors with a toothbrush).

The right balance sounds like it's "the .ini format" or "mail headers with sections". For python there are library docs at http://docs.python.org/library/configparser.html.

A brief example:

[albus_square]
name: Albus Square
items: Flowers, Traffic Cone
north: lecture_theatre
west: library_enchanted_books
south: furnesspuff_college

I'd assume there's a Java library for this format. As another poster has pointed out, you might have name collision so I took the liberty of adding a "name:" field. The name in the square brackets would be the unique identifier.

回复收藏 0 原文

~没有更多了~