使用正则表达式将标签内容转换为lipsum

发布于 2024-11-24 01:50:08 字数 290 浏览 2 评论 0原文

我正在取消一个微型网站的品牌以用作作品集。它是用静态 html 构建的,我需要用 Lipsum 甚至乱序文本替换每个非脚本标记的内容 - 但它必须与当前文本具有相同的字符数,以保持格式良好。此外,我真的宁愿使用 GUI grep 编辑器来完成此操作,而不是编写脚本,因为可能有一些我需要保留其内容的标签。

我使用正则表达式 \>([^$]+?)\< 来查找它们(所有脚本都以 $ 开头,因此它会跳过脚本标记),但我找不到任何方法计算匹配的字符数,并替换为相应数量的唇语或随机字符。

感谢您的帮助!

I'm debranding a micro-site to use as a portfolio piece. It's built with static html, I need to replace the contents of every non-script tag with lipsum or even scrambled text - but it has to be the same number of characters as the current text to keep the formatting nice. Furthermore, I really would rather do this with GUI grep editor rather than writing a script because there may be a few tags I need to keep the contents of.

I used the regex \>([^$]+?)\< to find them (all the scripts start with $ so it skips the script tag) but I can't find any way to count the number of characters matched and replace with a corresponding number of lipsum or random characters.

Thanks for any help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

猫烠⑼条掵仅有一顆心 2024-12-01 01:50:08

我能够成功地做到这一点,尽管我最终不得不使用 Java 程序。事实证明正则表达式很好,因为我没有解析整个事情,只是解析几个部分。有一些怪癖,但这完成了工作。

public class Debrander {

public static void main(String[] args) {

       // reads in html from StdIn
       String htmlPage = StdIn.readAll();

       //regex matches all content within non-script non-style tags
       Pattern tagContentRegex = Pattern.compile("\\>(.*?)\\<(?!/script)(?!/style)");
       Matcher myMatcher = tagContentRegex.matcher(htmlPage);

       //different regex to check for whitespace
       Pattern whiteRegex = Pattern.compile("[^\\s]");

       StringBuffer sb = new StringBuffer();

       LoremIpsum4J loremIpsum = new LoremIpsum4J();
       loremIpsum.setStartWithLoremIpsum(false);

       //loop through all matches
       while(myMatcher.find()){
           String tagContent = htmlPage.substring(myMatcher.start(1), myMatcher.end(1));
           Matcher whiteMatcher = whiteRegex.matcher(tagContent);
           //whiteMatcher makes sure there is a NON-WHITESPACE character in the string
           if (whiteMatcher.find()){
               Integer charCount = (myMatcher.end(1) - myMatcher.start(1));

               String[] lipsum = loremIpsum.getBytes(charCount);
               String replaceString = ">";

               for (int i=0; i<lipsum.length; i++){
                   replaceString += lipsum[i];
               }
               replaceString += "<";
               myMatcher.appendReplacement(sb, replaceString);
           }
       }
       myMatcher.appendTail(sb);
       StdOut.println(sb.toString());
   }

}

I was able to successfully do this, though I had to end up using a Java program. Turns out regex is fine cause I'm not parsing the whole thing, just a few parts. There are a few quirks but this got the job done.

public class Debrander {

public static void main(String[] args) {

       // reads in html from StdIn
       String htmlPage = StdIn.readAll();

       //regex matches all content within non-script non-style tags
       Pattern tagContentRegex = Pattern.compile("\\>(.*?)\\<(?!/script)(?!/style)");
       Matcher myMatcher = tagContentRegex.matcher(htmlPage);

       //different regex to check for whitespace
       Pattern whiteRegex = Pattern.compile("[^\\s]");

       StringBuffer sb = new StringBuffer();

       LoremIpsum4J loremIpsum = new LoremIpsum4J();
       loremIpsum.setStartWithLoremIpsum(false);

       //loop through all matches
       while(myMatcher.find()){
           String tagContent = htmlPage.substring(myMatcher.start(1), myMatcher.end(1));
           Matcher whiteMatcher = whiteRegex.matcher(tagContent);
           //whiteMatcher makes sure there is a NON-WHITESPACE character in the string
           if (whiteMatcher.find()){
               Integer charCount = (myMatcher.end(1) - myMatcher.start(1));

               String[] lipsum = loremIpsum.getBytes(charCount);
               String replaceString = ">";

               for (int i=0; i<lipsum.length; i++){
                   replaceString += lipsum[i];
               }
               replaceString += "<";
               myMatcher.appendReplacement(sb, replaceString);
           }
       }
       myMatcher.appendTail(sb);
       StdOut.println(sb.toString());
   }

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文