Java 扫描器令人头疼

发布于 2024-08-07 04:14:40 字数 308 浏览 8 评论 0原文

我有一个文本文件，如下所示：

即一个纯文本标签，后跟几行，其中 1/0 用空格分隔。 1/0 的行数是可变的，但任何两个特定标签之间的每一行都应具有相同数量的 1/0（尽管可能不是）。

如何使用扫描仪抓取每个名称+行块？是否有任何优雅的方法来强制行数的一致性（并在不一致时提供某种反馈）？

我认为可能有一种巧妙的分隔符规范的便捷方法，但我似乎无法实现这一点。

原文

I have a text file which looks like:

i.e., a plaintext label followed by a few rows with 1/0 separated by spaces. The number of rows of 1/0 is variable, but each row between any two particular labels should have the same number of 1/0s (though might potentially not).

How do I grab each name+rows chunk with a scanner? Is there any elegant way to enforce the consistency on the number of rows (and provide some sort of feedback if they aren't consistent)?

I'm thinking there might be a convenient way with clever delimiter specification, but I can't seem to get that working.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

好菇凉咱不稀罕他 2024-08-14 04:14:40

我会用简单的方法来做。将每一行作为 String 获取，并通过与 1 或 0 后跟空格模式匹配的正则表达式来提供它。如果匹配，则将其视为一行。如果不是，请将其视为纯文本标签。通过检查每个标签的数据数组是否与第一个标签的数据数组的大小匹配来检查行列大小的一致性。

编辑：我不知道 Scanner 类，尽管它听起来很方便。我认为基本思想应该仍然大致相同......使用 Scanner 解析您的输入，并自己处理尺寸问题。

另外，理论上，您可以生成一个与标签和整个数组匹配的正则表达式，尽管我不知道您是否可以生成一个保证它只匹配每组具有相同数量值的行的正则表达式。排。但是，为了设置更自动化的检查，您可能需要构造第二个正则表达式，该正则表达式与第一个条目的数组大小完全匹配，并将其用于所有其他条目。我认为这是一个治疗方法比疾病本身更糟糕的情况。

回复收藏 0 原文

沉鱼一梦 2024-08-14 04:14:40

更好的是，在对另一个问题有帮助的回答之后（感谢巴特）：

static final String labelRegex="^\\s*\\w+$";
static final Pattern labelPattern = Pattern.compile(labelRegex, Pattern.MULTILINE);
Matcher labelMatcher = labelPattern.matcher("");

static final String stateRegex = "([10] )+[10]\\s+";
static final String statesRegex = "("+stateRegex+")+";
static final Pattern statesPattern = Pattern.compile(statesRegex, Pattern.MULTILINE);
Matcher stateMatcher = statesPattern.matcher("");

static final String chunkRegex = "(?="+labelRegex+")";
static final Pattern chunkPattern = Pattern.compile(chunkRegex,Pattern.MULTILINE);
Scanner chunkScan;

public void setSource(File source) {
    if(source!=null && source.canRead()) {
     try {
      chunkScan = new Scanner(new BufferedReader(new FileReader(source)));
      chunkScan.useDelimiter(chunkPattern);
     } catch (IOException e) {
      e.printStackTrace();
     }
    }
}

public Map<String, List<GraphState>> next(int n) {
 Map<String,List<GraphState>> result = new LinkedHashMap<String,List<GraphState>>(n);
  String chunk, rows;
  int i=0;
  while (chunkScan.hasNext()&&i++<n) {
    chunk = chunkScan.next().trim();
    labelMatcher.reset(chunk);
    stateMatcher.reset(chunk);
   if (labelMatcher.find()&&stateMatcher.find()) {
    rows = stateMatcher.group().replace(" ", "");
    result.put(labelMatcher.group(), rowsToList(rows.split("\\n")));
   }
  }
  return result;
}

Even better, after a helpful answer to another question (thanks Bart):

static final String labelRegex="^\\s*\\w+$";
static final Pattern labelPattern = Pattern.compile(labelRegex, Pattern.MULTILINE);
Matcher labelMatcher = labelPattern.matcher("");

static final String stateRegex = "([10] )+[10]\\s+";
static final String statesRegex = "("+stateRegex+")+";
static final Pattern statesPattern = Pattern.compile(statesRegex, Pattern.MULTILINE);
Matcher stateMatcher = statesPattern.matcher("");

static final String chunkRegex = "(?="+labelRegex+")";
static final Pattern chunkPattern = Pattern.compile(chunkRegex,Pattern.MULTILINE);
Scanner chunkScan;

public void setSource(File source) {
    if(source!=null && source.canRead()) {
     try {
      chunkScan = new Scanner(new BufferedReader(new FileReader(source)));
      chunkScan.useDelimiter(chunkPattern);
     } catch (IOException e) {
      e.printStackTrace();
     }
    }
}

public Map<String, List<GraphState>> next(int n) {
 Map<String,List<GraphState>> result = new LinkedHashMap<String,List<GraphState>>(n);
  String chunk, rows;
  int i=0;
  while (chunkScan.hasNext()&&i++<n) {
    chunk = chunkScan.next().trim();
    labelMatcher.reset(chunk);
    stateMatcher.reset(chunk);
   if (labelMatcher.find()&&stateMatcher.find()) {
    rows = stateMatcher.group().replace(" ", "");
    result.put(labelMatcher.group(), rowsToList(rows.split("\\n")));
   }
  }
  return result;
}

回复收藏 0 原文

躲猫猫 2024-08-14 04:14:40

您需要打开文件并使用 readLine() 循环遍历每一行，直到到达文件末尾。

-- 我假设您在遍历文件时正在保持一致性。如果您想存储信息并在以后使用它，我会考虑使用某种类型的数据结构。

当您遍历此行时，您可以使用简单的正则表达式检查该行，以检查它是否是标签名称。如果没有，则根据“ ”（空格字符）拆分行，它将以数组形式返回给您。然后在尺寸一致的基础上检查尺寸。

基本伪代码：

int consistentSize = 5; // assume you have a size in mind

while ( (line = readLine()) != EOF)
{
    // check for if label, if it's a simple name, you won't really need a regex
    if (line == label)
    {
         // not sure if you want to do any consistency checking in here
    } else {
         String[] currLine = line.split(' ');
         bool consist = true;
         // now loop through currLine and do a check if each character is a number
         for (int i = 0; i < currLine.size(); i++)
         {
            // can't remember java function for this (isNum() I think)
            if (!currLine[i].isNum) { consist = false; break; }
         }
         // if got past this, the row has all numbers, therefore it is ok
            // could easily add another array to keep track of rows that didn't have valid numbers and suhc
         if (currLine.size() < consistentSize) System.out.println("row "+j + " is inconsistent");
    }
}

如果您不知道每行的预期大小，您还可以添加另一个循环，并放入一些逻辑来查找最常见的大小，然后找出不匹配的内容。我不确定你的一致性检查需要有多复杂。

You would need to open the file and loop through every line with readLine() until you hit the end of the file.

-- I assumed you are doing consistency as you traverse the file. If you want to store the information and use it later, I would consider using some type of data structure.

As you traverse this, you can check the row with a simple regex to check if it is a label name. If not, split the row based on the ' ' (space character) and it will return to you in an array. Then check the size based on a consistent size.

Basic pseudocode:

int consistentSize = 5; // assume you have a size in mind

while ( (line = readLine()) != EOF)
{
    // check for if label, if it's a simple name, you won't really need a regex
    if (line == label)
    {
         // not sure if you want to do any consistency checking in here
    } else {
         String[] currLine = line.split(' ');
         bool consist = true;
         // now loop through currLine and do a check if each character is a number
         for (int i = 0; i < currLine.size(); i++)
         {
            // can't remember java function for this (isNum() I think)
            if (!currLine[i].isNum) { consist = false; break; }
         }
         // if got past this, the row has all numbers, therefore it is ok
            // could easily add another array to keep track of rows that didn't have valid numbers and suhc
         if (currLine.size() < consistentSize) System.out.println("row "+j + " is inconsistent");
    }
}

You could also add another loop if you don't know the size you expect for each row and put some logic in to find the most common size and then figure out what doesn't match. I am unsure of how complicated your consistency checking needs to be.

回复收藏 0 原文

~没有更多了~