Pro regex 转换这些不可能的正则表达式示例？

发布于 2024-12-23 08:35:26 字数 3232 浏览 1 评论 0原文

输入示例

vulture (wing)
tabulations: one leg; two legs; flying
father; master; patriarch    
mat (box)
pedistal; blockade; pilar
animal belly (oval)
old style: naval
jackal's belly; jester    slope of hill (arch)
key; visible; enlightened

基本上，我在使用一些更复杂的正则表达式命令时遇到了问题。我发现使用正则表达式的大多数代码都非常简单，但如果我能熟练使用它，我可以在很多地方使用它。您能看看我正在尝试做的事情，看看您是否可以转换其中的任何内容吗？

将大括号“(”和“)”之间的一个或多个单词排列起来。
数组化新行后面的第一个单词，以 xor 结尾的四个空格，然后是一个右大括号“)”，以及一个空格和一个左大括号“(”，以及文档中的第一个单词，直到一个空格和一个左大括号“(” 在任何带有分号的行上，
对以分号分隔的单词进行数组化，获取最后一个分号之后的单词，但不要获取以字符串“tabulations”开头的行中的换行符或四个连续空格之后的单词。：“不应该包含在此数组中，即使以字符串“tabulations:”开头的行上有分号。如果以右大括号结尾的新行，“)”位于包含分号且不以“tabulations”开头的行之前。取而代之的是数组中的“无替代”。
获取以字符串“old style:”开头的行中冒号后面和换行符之前的一个或多个单词。如果以右大括号结尾的新行，则出现“)”之前“tabulations:”-起始行，将“no old style”添加到数组中。
与 3 相同，但仅适用于以字符串“tabulations:”开头的行。如果新行以右大括号结尾，“)”出现在“tabulations:”起始行之前，则将“no tabulations”添加到数组中。

我正在尝试弄清楚如何通过 PHP 来做到这一点，但如果有人可以用任何语言（尤其是 php、C++、javascript 或批处理）来处理这些请求，我会很高兴。我也知道，即使对于拼图爱好者来说，这些都很难展示。因此，我承诺一旦任何完整答案可获得赏金，我将立即给予 100 点奖励积分。

-编辑

- 我正在研究的第一个解决方案

好的，所以我正在研究的第一个解决方案是解决 3。我尝试在分号处换行，然后我希望获取数据，逐行并进一步编辑。

$input = file_get_contents('explode.txt');
foreach(explode("\n", $input) as $line){
  $words = explode(';', $line); 
  foreach($words as $word){
  echo $word;
  }
}

基本上，查看输出，数据最终的格式与原来的格式相同，只是减去分号。这不是很有用，我决定停止。

我正在研究的第二个解决方案

这是基于以下代码行：preg_match_all('/\;([^;]+)\}/', $myFile, $matches)< /代码>。

现在问题的第 1 部分有了一个可行的解决方案，感谢 EPB 和fge：

$myFile = file_get_contents('fakexample.txt');
function get_between($startString, $endString, $myFile){
  //Escape start and end strings.
  $startStringSafe = preg_quote($startString, '/');
  $endStringSafe = preg_quote($endString, '/');
  //non-greedy match any character between start and end strings. 
  //s modifier should make it also match newlines.
  preg_match_all("/$startStringSafe(.*?)$endStringSafe/s", $myFile, $matches);
  return $matches;
}
$list = get_between("(", ")", $myFile);
foreach($list[1] as $list){
  echo $list."\n";
}

我遇到的一些问题是我没有正确使用 RegEx。我认为 ArrayArray 返回问题是因为我没有封装 preg_match_all 函数，使其将 $matches 返回到私有函数。我还是不确定。我还不确定是否应该使用 file_get_contents() 函数来读取文件。

第三次解决方案尝试

因此，我初步了解了如何解决这个问题，并决定以自己的方式进行。我再次从问题 1 开始，因为它看起来最简单。它的例外情况最少

function find_between($input,$start,$end) {
  if (strpos($input,$start) === false || strpos($input,$end) === false) {
    return false;
  } else {
    $start_position = strpos($input,$start)+strlen($start);
    $end_position = strpos($input,$end);
    return substr($input,$start_position,$end_position-$start_position);
  }
}

$myFile = file_get_contents('explode.txt');

$output = find_between($myFile,'(',')');

echo $output;

据我所知，，这是可行的。我遇到的问题是递归。我尝试了 foreach($output as $output){echo $output;}，但这给了我一个错误。对我来说很明显，这是因为我没有递归，所以没有数组化。我之所以在这条路上停下来是因为几个程序员告诉我我注定要失败。因此，我目前正在继续研究解决方案 2。

原文

Example of input

vulture (wing)
tabulations: one leg; two legs; flying
father; master; patriarch    
mat (box)
pedistal; blockade; pilar
animal belly (oval)
old style: naval
jackal's belly; jester    slope of hill (arch)
key; visible; enlightened

Basically, I'm having trouble with some more complicated regex commands. Most of the code I'm finding that uses regex is very simple, but I could use it in so many places if I could get good with it. Would you look at the kind of stuff I'm trying to do and see if you can convert any of it?

Arrayize the word or words between the braces, "(" and ")".
Arrayize the first words following a new line ending xor four spaces and then a closing brace, ")", and a space and an open brace " (" AND the first words in the document up until a space and an open brace " (".
On any line with semicolons, arrayize the words which are separated by semicolons. Get the word or words after the last semicolon but do not get the words after a line break or four consecutive spaces. Words from lines that begin with the string "tabulations:" should not be included in this array, even though lines that begin with the string "tabulations:" have semicolons on them. If a new line ending in a close brace, ")" comes before a line containing semicolons and not starting with "tabulations" "no alternates" to the array, instead.
Get the word or words following the colon and preceding the line break on a line that begins with the string "old style:". If a new line ending in a close brace, ")" comes before a "tabulations:"-starting line, add "no old style" to the array, instead.
The same as 3, except only for lines that begin with the string "tabulations:". If a new line ending in a close brace, ")" comes before a "tabulations:"-starting line, add "no tabulations" to the array, instead.

I am trying to figure out how to do this via PHP, but I would be happy if anyone could field these requests in any language, especially php, C++, javascript, or batch. I also know that these are all very difficult to show, even for a puzzle lover. So, I promise 100 bonus points as soon as bounties are available for any complete answer.

-Edit-

First solution I was working on

Okay, so the first solution I was working on is to solve 3. I tried breaking the lines at the semicolons, and I was then hoping to grab the data, line-by-line and edit it further.

$input = file_get_contents('explode.txt');
foreach(explode("\n", $input) as $line){
  $words = explode(';', $line); 
  foreach($words as $word){
  echo $word;
  }
}

Basically, looking at the output, the data ended up in the same format it was already in, only subtract the semicolons. This wasn't very useful, and I decided to stop.

Second solution I am working on

This is based around this line of code: preg_match_all('/\;([^;]+)\}/', $myFile, $matches).

There's now a working solution to part 1 of the question, thanks to EPB and fge:

$myFile = file_get_contents('fakexample.txt');
function get_between($startString, $endString, $myFile){
  //Escape start and end strings.
  $startStringSafe = preg_quote($startString, '/');
  $endStringSafe = preg_quote($endString, '/');
  //non-greedy match any character between start and end strings. 
  //s modifier should make it also match newlines.
  preg_match_all("/$startStringSafe(.*?)$endStringSafe/s", $myFile, $matches);
  return $matches;
}
$list = get_between("(", ")", $myFile);
foreach($list[1] as $list){
  echo $list."\n";
}

Some issues I had were that I wasn't using RegEx correctly. I think the ArrayArray return problem was because I didn't encapsulate the preg_match_all function such that it returned $matches to a private function. I'm still unsure. I'm also still unsure about whether I should be using the file_get_contents() function to read the file.

The third solution attempt

So, I had an initial idea of how I wanted to approach this, and I decided to go about it my own way. Again, I started with question 1 because it seemed easiest. It has the fewest exceptions

function find_between($input,$start,$end) {
  if (strpos($input,$start) === false || strpos($input,$end) === false) {
    return false;
  } else {
    $start_position = strpos($input,$start)+strlen($start);
    $end_position = strpos($input,$end);
    return substr($input,$start_position,$end_position-$start_position);
  }
}

$myFile = file_get_contents('explode.txt');

$output = find_between($myFile,'(',')');

echo $output;

As far as I can tell, this will work. The issue I'm having is with the recursion. I tried foreach($output as $output){echo $output;}, but this gave me an error. It seems obvious to me that it's because I haven't recursed and so haven't arrayized. The reason I stopped along this path is because I was told by several programmers that I was doomed to failure. So, I'm currently back to working on solution 2.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怕倦 2024-12-30 08:35:26

这是家庭作业吗？这些说明（1-5）对我来说没有任何意义，只要你有理由在学术追求之外做其中任何一个。看起来您不仅对正则表达式很陌生，而且对一般的 PHP 也很陌生。正如@Howard 指出的，我们不会为您做您的工作。

除此之外，如果您需要正则表达式的帮助，我将非常乐意提供帮助；然而，这似乎并不是您最需要帮助的。

因此，关于您的问题，我可以向您提供以下内容：

3）“在任何带有分号的行上，对由分号分隔的单词进行数组化。
获取最后一个分号之后的单词，但不要获取换行符或四个连续空格之后的单词。 ->简单：按换行符 (\n) 展开

以字符串“tabulations:”开头的行中的单词不应包含在此数组中，即使以字符串“tabulations:”开头的行上有分号。 ->这有点棘手。首先，正则表达式用于分号而不是冒号。这很可能必须由两个单独的正则表达式处理：第一个“制表：”，如果未找到，则搜索分号。如果这个正则表达式成功，那么您可以用分号进行爆炸，现在您已经获得了构成所有数组的所有数据。

如果新行以右大括号结尾，则“）”出现在包含分号的行之前，并且不是以数组的“制表符”“无替代项”开头。” -> 这个我将由您决定弄清楚，原因不止于此；-)

回复收藏 0 原文

~没有更多了~