从 csv 生成树结构
我已经为这个问题摸不着头脑有一段时间了。我基本上试图从一组 CSV 数据生成树层次结构。 CSV 数据不一定是有序的。这就像如下所示:
Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98
我试图使分组的执行方式尽可能灵活。最简单的分组方法是对 Record1 和 Record2 进行分组,并将 Value1 和 Value2 存储在 Record2 下,以便我们得到以下输出:
Record1
Record2
Value1 Value2
这将是:
A
XX
22,33
777,888
YY
33,11
13,23
B
XX
12,0
YY
44,98
我目前将我的组设置存储在列表中 - 我不这样做知道这是否阻碍了我的想法。该列表包含组的层次结构,例如:
Record1 (SchemaGroup)
.column = Record1
.columns = null
.childGroups =
Record2 (SchemaGroup)
.column = Record1
.columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
.childGroups = null
其代码如下所示:
private class SchemaGroup {
private SchemaGroupType type = SchemaGroupType.StaticText; // default to text
private String text;
private CSVColumnInformation column = null;
private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}
private enum SchemaGroupType {
/** Allow fixed text groups to be added */
StaticText,
/** Related to a column with common value */
ColumnGroup
}
我正在努力为此生成一个算法,试图考虑要使用的底层结构。目前,我正在使用我自己的包装类从上到下解析 CSV:
CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
...
}
我只是想启动我的编码大脑。
有什么想法吗?
I have scratched my head over this problem for a while now. I am basically trying to generate a tree hierarchy from a set of CSV data. The CSV data is not necessarily ordered. This is like something as follows:
Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98
I am trying to make the way the grouping is performed as flexible as possible. The simplest for of grouping would to do it for Record1 and Record2 with the Value1 and Value2 stored under Record2 so that we get the following output:
Record1
Record2
Value1 Value2
Which would be:
A
XX
22,33
777,888
YY
33,11
13,23
B
XX
12,0
YY
44,98
I am storing my group settings in a List at present - which I don't know if this is hindering my thoughts. This list contains a hierarchy of the groups for example:
Record1 (SchemaGroup)
.column = Record1
.columns = null
.childGroups =
Record2 (SchemaGroup)
.column = Record1
.columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
.childGroups = null
The code for this looks like as follows:
private class SchemaGroup {
private SchemaGroupType type = SchemaGroupType.StaticText; // default to text
private String text;
private CSVColumnInformation column = null;
private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}
private enum SchemaGroupType {
/** Allow fixed text groups to be added */
StaticText,
/** Related to a column with common value */
ColumnGroup
}
I am stuggling producing an algorithm for this, trying to think of the underlying structure to use. At present I am parsing the CSV top to bottom, using my own wrapper class:
CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
...
}
I am just trying to kick start my coding brain.
Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
基本思想并不困难:按第一条记录分组,然后按第二条记录分组,依此类推,直到得到类似这样的结果:
然后向后构建树。
然而,有一个递归组件使得推理这个问题或一步一步地显示它变得有些困难,所以实际上编写伪代码更容易。
我假设 csv 中的每一行都表示为一个元组。每个元组都有“记录”和“值”,使用您在问题中使用的相同术语。 “记录”是必须放入层次结构中的东西。 “价值观”将是树的叶子。当我使用具有这些特定含义的术语时,我将使用引号。
我还假设所有“记录”都位于所有“值”之前。
言归正传,代码:
现在您必须考虑要使用的具体数据结构,但希望如果您理解算法,这应该不会太困难(正如您提到的,我认为尽早决定数据结构)可能阻碍了你的想法)。
The basic idea isn't difficult: group by the first record, then by the second record, etc. until you get something like this:
and then work backwards to build the trees.
However, there is a recursive component that makes it somewhat hard to reason about this problem, or show it step by step, so it's actually easier to write pseudocode.
I'll assume that every row in your csv is represented like a tuple. Each tuple has "records" and "values", using the same terms you use in your question. "Records" are the things that must be put into a hierarchic structure. "Values" will be the leaves of the tree. I'll use quotations when I use these terms with these specific meanings.
I also assume that all "records" come before all "values".
Without further ado, the code:
Now you'd have to think about the specific data structures to use, but hopefully this shouldn't be too difficult if you understand the algorithm (as you mention, I think deciding on a data structure early on may have been hindering your thoughts).
这是通过使用 google-guava 简化的 junit 形式的基本工作解决方案(尽管没有断言)集合。该代码是不言自明的,您使用 csv 库代替 file io 来读取 csv。这应该给你基本的想法。
Here is the basic working solution in the form of junit (no assertions though) simplified by using google-guava collections. The code is self-explanatory and instead of file io you use csv libraries for reading the csv. This should give you the basic idea.
如果您知道只有两层记录,我会使用类似
当您读取新行时,您会查看外部映射来检查记录1的值是否为code> 已经存在,如果不存在,则为其创建新的空内部
Map
。然后检查内部映射是否存在该
Record2
的值。如果没有,则创建新的List
。然后读取值并将它们添加到列表中。
If you know you'll have just two levels of
Record
s, I would use something likeWhen you read new line, you look into the outer map to check whether that value for
Record1
already exists and if not, create new empty innerMap
for it.Then check the inner map whether a value for that
Record2
exists. If not, create newList
.Then read the values and add them to the list.
我最近需要做几乎同样的事情,并写了 tree-builder.com 来完成任务。唯一的区别是,当您布置 CSV 时,最后两个参数将是父参数和子参数,而不是同级参数。另外,我的版本不接受标题行。
代码全部用 JavaScript 编写;它使用 jstree 来构建树。您可以使用 firebug 或仅查看页面上的源代码来了解它是如何完成的。调整它以转义 CSV 中的逗号以保持最后两个参数是单个子参数可能非常容易。
I recently had a need to do pretty much the same thing and wrote tree-builder.com to accomplish the task. The only difference is that as you have your CSV laid out, the last two parameters will be parent and child instead of peers. Also, my version doesn't accept a header row.
The code is all in JavaScript; it uses jstree to build the tree. You can use firebug or just view the source on the page to see how it's done. It would probably be pretty easy to tweak it to escape the comma in your CSV in order to keep the last two parameters is a single child.
根据这个问题的提出方式,我将执行以下操作:
树。
(为了灵活性,可能是链表)
这有帮助吗?
Based upon how this problem is posed, I would do the following:
tree.
(perhaps a linked list for flexibility)
Does that help?