从 csv 生成树结构

发布于 2024-12-08 10:48:16 字数 1720 浏览 0 评论 0原文

我已经为这个问题摸不着头脑有一段时间了。我基本上试图从一组 CSV 数据生成树层次结构。 CSV 数据不一定是有序的。这就像如下所示:

Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98

我试图使分组的执行方式尽可能灵活。最简单的分组方法是对 Record1 和 Record2 进行分组,并将 Value1 和 Value2 存储在 Record2 下,以便我们得到以下输出:

Record1
    Record2
        Value1 Value2

这将是:

A
    XX
        22,33
        777,888
    YY
        33,11
        13,23
B
    XX
        12,0
    YY
        44,98 

我目前将我的组设置存储在列表中 - 我不这样做知道这是否阻碍了我的想法。该列表包含组的层次结构,例如:

Record1 (SchemaGroup)
    .column = Record1
    .columns = null
    .childGroups =
        Record2 (SchemaGroup)
            .column = Record1
            .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
            .childGroups = null

其代码如下所示:

private class SchemaGroup {
    private SchemaGroupType type = SchemaGroupType.StaticText;  // default to text
    private String text;
    private CSVColumnInformation column = null;
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}


private enum SchemaGroupType {
    /** Allow fixed text groups to be added */
    StaticText,
    /** Related to a column with common value */
    ColumnGroup
}

我正在努力为此生成一个算法,试图考虑要使用的底层结构。目前,我正在使用我自己的包装类从上到下解析 CSV:

CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
    ...
}

我只是想启动我的编码大脑。

有什么想法吗?

I have scratched my head over this problem for a while now. I am basically trying to generate a tree hierarchy from a set of CSV data. The CSV data is not necessarily ordered. This is like something as follows:

Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98

I am trying to make the way the grouping is performed as flexible as possible. The simplest for of grouping would to do it for Record1 and Record2 with the Value1 and Value2 stored under Record2 so that we get the following output:

Record1
    Record2
        Value1 Value2

Which would be:

A
    XX
        22,33
        777,888
    YY
        33,11
        13,23
B
    XX
        12,0
    YY
        44,98 

I am storing my group settings in a List at present - which I don't know if this is hindering my thoughts. This list contains a hierarchy of the groups for example:

Record1 (SchemaGroup)
    .column = Record1
    .columns = null
    .childGroups =
        Record2 (SchemaGroup)
            .column = Record1
            .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
            .childGroups = null

The code for this looks like as follows:

private class SchemaGroup {
    private SchemaGroupType type = SchemaGroupType.StaticText;  // default to text
    private String text;
    private CSVColumnInformation column = null;
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}


private enum SchemaGroupType {
    /** Allow fixed text groups to be added */
    StaticText,
    /** Related to a column with common value */
    ColumnGroup
}

I am stuggling producing an algorithm for this, trying to think of the underlying structure to use. At present I am parsing the CSV top to bottom, using my own wrapper class:

CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
    ...
}

I am just trying to kick start my coding brain.

Any thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

孤者何惧 2024-12-15 10:48:16

基本思想并不困难:按第一条记录分组,然后按第二条记录分组,依此类推,直到得到类似这样的结果:

(A,XX,22,33)
(A,XX,777,888)
-------------------------
(A,YY,33,11)
(A,YY,13,23)
=============
(B,XX,12,0)
-------------------------
(B,YY,44,98)

然后向后构建树。

然而,有一个递归组件使得推理这个问题或一步一步地显示它变得有些困难,所以实际上编写伪代码更容易。

我假设 csv 中的每一行都表示为一个元组。每个元组都有“记录”和“值”,使用您在问题中使用的相同术语。 “记录”是必须放入层次结构中的东西。 “价值观”将是树的叶子。当我使用具有这些特定含义的术语时,我将使用引号。

我还假设所有“记录”都位于所有“值”之前。

言归正传,代码:

// builds tree and returns a list of root nodes
// list_of_tuples: a list of tuples read from your csv
// curr_position: used to keep track of recursive calls
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n
function build_tree(list_of_tuples, curr_position, number_of_records) {
    // check if we have already reached the "values" (which shouldn't get converted into trees)
    if (curr_position == number_of_records) {
        return list of nodes, each containing a "value" (i.e. everything from position number_of_records on)
    }

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value
    unique_values = get unique values in curr_position

    list_of_nodes = empty list

   // create the nodes and (recursively) their children
    for each val in unique_values {
        the_node = create tree node containing val
        the_children = build_tree(grouped[val], curr_position+1, number_of_records)
        the_node.set_children(the_children)

        list_of_nodes.append(the_node)
    }

    return list_of_nodes
}

// in your example, this returns a node with "A" and a node with "B"
// third parameter is 2 because you have 2 "records"
build_tree(list_parsed_from_csv, 0, 2)

现在您必须考虑要使用的具体数据结构,但希望如果您理解算法,这应该不会太困难(正如您提到的,我认为尽早决定数据结构)可能阻碍了你的想法)。

The basic idea isn't difficult: group by the first record, then by the second record, etc. until you get something like this:

(A,XX,22,33)
(A,XX,777,888)
-------------------------
(A,YY,33,11)
(A,YY,13,23)
=============
(B,XX,12,0)
-------------------------
(B,YY,44,98)

and then work backwards to build the trees.

However, there is a recursive component that makes it somewhat hard to reason about this problem, or show it step by step, so it's actually easier to write pseudocode.

I'll assume that every row in your csv is represented like a tuple. Each tuple has "records" and "values", using the same terms you use in your question. "Records" are the things that must be put into a hierarchic structure. "Values" will be the leaves of the tree. I'll use quotations when I use these terms with these specific meanings.

I also assume that all "records" come before all "values".

Without further ado, the code:

// builds tree and returns a list of root nodes
// list_of_tuples: a list of tuples read from your csv
// curr_position: used to keep track of recursive calls
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n
function build_tree(list_of_tuples, curr_position, number_of_records) {
    // check if we have already reached the "values" (which shouldn't get converted into trees)
    if (curr_position == number_of_records) {
        return list of nodes, each containing a "value" (i.e. everything from position number_of_records on)
    }

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value
    unique_values = get unique values in curr_position

    list_of_nodes = empty list

   // create the nodes and (recursively) their children
    for each val in unique_values {
        the_node = create tree node containing val
        the_children = build_tree(grouped[val], curr_position+1, number_of_records)
        the_node.set_children(the_children)

        list_of_nodes.append(the_node)
    }

    return list_of_nodes
}

// in your example, this returns a node with "A" and a node with "B"
// third parameter is 2 because you have 2 "records"
build_tree(list_parsed_from_csv, 0, 2)

Now you'd have to think about the specific data structures to use, but hopefully this shouldn't be too difficult if you understand the algorithm (as you mention, I think deciding on a data structure early on may have been hindering your thoughts).

巷子口的你 2024-12-15 10:48:16

这是通过使用 google-guava 简化的 junit 形式的基本工作解决方案(尽管没有断言)集合。该代码是不言自明的,您使用 csv 库代替 file io 来读取 csv。这应该给你基本的想法。

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Set;

import org.junit.Test;

import com.google.common.base.Charsets;
import com.google.common.base.Splitter;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.Iterables;
import com.google.common.collect.Multimap;
import com.google.common.collect.Sets;
import com.google.common.io.Files;

public class MyTest
{
    @Test
    public void test1()
    {
        List<String> rows = getAllDataRows();

        Multimap<Records, Values> table = indexData(rows);

        printTree(table);

    }

    private void printTree(Multimap<Records, Values> table)
    {
        Set<String> alreadyPrintedRecord1s = Sets.newHashSet();

        for (Records r : table.keySet())
        {
            if (!alreadyPrintedRecord1s.contains(r.r1))
            {
                System.err.println(r.r1);
                alreadyPrintedRecord1s.add(r.r1);
            }

            System.err.println("\t" + r.r2);

            Collection<Values> allValues = table.get(r);

            for (Values v : allValues)
            {
                System.err.println("\t\t" + v.v1 + " , " + v.v2);
            }
        }
    }

    private Multimap<Records, Values> indexData(List<String> lines)
    {
        Multimap<Records, Values> table = ArrayListMultimap.create();

        for (String row : lines)
        {
            Iterable<String> split = Splitter.on(",").split(row);
            String[] data = Iterables.toArray(split, String.class);

            table.put(new Records(data[0], data[1]), new Values(data[2], data[3]));
        }
        return table;
    }

    private List<String> getAllDataRows()
    {
        List<String> lines = Collections.emptyList();

        try
        {
            lines = Files.readLines(new File("C:/test.csv"), Charsets.US_ASCII);
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

        lines.remove(0);// remove header

        return lines;
    }
}



public class Records
{
    public final String r1, r2;

    public Records(final String r1, final String r2)
    {
        this.r1 = r1;
        this.r2 = r2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((r1 == null) ? 0 : r1.hashCode());
        result = prime * result + ((r2 == null) ? 0 : r2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Records))
        {
            return false;
        }
        Records other = (Records) obj;
        if (r1 == null)
        {
            if (other.r1 != null)
            {
                return false;
            }
        }
        else if (!r1.equals(other.r1))
        {
            return false;
        }
        if (r2 == null)
        {
            if (other.r2 != null)
            {
                return false;
            }
        }
        else if (!r2.equals(other.r2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Records1and2 [r1=").append(r1).append(", r2=").append(r2).append("]");
        return builder.toString();
    }

}


public class Values
{
    public final String v1, v2;

    public Values(final String v1, final String v2)
    {
        this.v1 = v1;
        this.v2 = v2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((v1 == null) ? 0 : v1.hashCode());
        result = prime * result + ((v2 == null) ? 0 : v2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Values))
        {
            return false;
        }
        Values other = (Values) obj;
        if (v1 == null)
        {
            if (other.v1 != null)
            {
                return false;
            }
        }
        else if (!v1.equals(other.v1))
        {
            return false;
        }
        if (v2 == null)
        {
            if (other.v2 != null)
            {
                return false;
            }
        }
        else if (!v2.equals(other.v2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Values1and2 [v1=").append(v1).append(", v2=").append(v2).append("]");
        return builder.toString();
    }

}

Here is the basic working solution in the form of junit (no assertions though) simplified by using google-guava collections. The code is self-explanatory and instead of file io you use csv libraries for reading the csv. This should give you the basic idea.

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Set;

import org.junit.Test;

import com.google.common.base.Charsets;
import com.google.common.base.Splitter;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.Iterables;
import com.google.common.collect.Multimap;
import com.google.common.collect.Sets;
import com.google.common.io.Files;

public class MyTest
{
    @Test
    public void test1()
    {
        List<String> rows = getAllDataRows();

        Multimap<Records, Values> table = indexData(rows);

        printTree(table);

    }

    private void printTree(Multimap<Records, Values> table)
    {
        Set<String> alreadyPrintedRecord1s = Sets.newHashSet();

        for (Records r : table.keySet())
        {
            if (!alreadyPrintedRecord1s.contains(r.r1))
            {
                System.err.println(r.r1);
                alreadyPrintedRecord1s.add(r.r1);
            }

            System.err.println("\t" + r.r2);

            Collection<Values> allValues = table.get(r);

            for (Values v : allValues)
            {
                System.err.println("\t\t" + v.v1 + " , " + v.v2);
            }
        }
    }

    private Multimap<Records, Values> indexData(List<String> lines)
    {
        Multimap<Records, Values> table = ArrayListMultimap.create();

        for (String row : lines)
        {
            Iterable<String> split = Splitter.on(",").split(row);
            String[] data = Iterables.toArray(split, String.class);

            table.put(new Records(data[0], data[1]), new Values(data[2], data[3]));
        }
        return table;
    }

    private List<String> getAllDataRows()
    {
        List<String> lines = Collections.emptyList();

        try
        {
            lines = Files.readLines(new File("C:/test.csv"), Charsets.US_ASCII);
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

        lines.remove(0);// remove header

        return lines;
    }
}



public class Records
{
    public final String r1, r2;

    public Records(final String r1, final String r2)
    {
        this.r1 = r1;
        this.r2 = r2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((r1 == null) ? 0 : r1.hashCode());
        result = prime * result + ((r2 == null) ? 0 : r2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Records))
        {
            return false;
        }
        Records other = (Records) obj;
        if (r1 == null)
        {
            if (other.r1 != null)
            {
                return false;
            }
        }
        else if (!r1.equals(other.r1))
        {
            return false;
        }
        if (r2 == null)
        {
            if (other.r2 != null)
            {
                return false;
            }
        }
        else if (!r2.equals(other.r2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Records1and2 [r1=").append(r1).append(", r2=").append(r2).append("]");
        return builder.toString();
    }

}


public class Values
{
    public final String v1, v2;

    public Values(final String v1, final String v2)
    {
        this.v1 = v1;
        this.v2 = v2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((v1 == null) ? 0 : v1.hashCode());
        result = prime * result + ((v2 == null) ? 0 : v2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Values))
        {
            return false;
        }
        Values other = (Values) obj;
        if (v1 == null)
        {
            if (other.v1 != null)
            {
                return false;
            }
        }
        else if (!v1.equals(other.v1))
        {
            return false;
        }
        if (v2 == null)
        {
            if (other.v2 != null)
            {
                return false;
            }
        }
        else if (!v2.equals(other.v2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Values1and2 [v1=").append(v1).append(", v2=").append(v2).append("]");
        return builder.toString();
    }

}
眼眸 2024-12-15 10:48:16

如果您知道只有两层记录,我会使用类似

Map<string, Map<string, List<Values>>>

当您读取新行时,您会查看外部映射来检查记录1的值是否为code> 已经存在,如果不存在,则为其创建新的空内部 Map

然后检查内部映射是否存在该 Record2 的值。如果没有,则创建新的List

然后读取值并将它们添加到列表中。

If you know you'll have just two levels of Records, I would use something like

Map<string, Map<string, List<Values>>>

When you read new line, you look into the outer map to check whether that value for Record1 already exists and if not, create new empty inner Map for it.

Then check the inner map whether a value for that Record2 exists. If not, create new List.

Then read the values and add them to the list.

半枫 2024-12-15 10:48:16

我最近需要做几乎同样的事情,并写了 tree-builder.com 来完成任务。唯一的区别是,当您布置 CSV 时,最后两个参数将是父参数和子参数,而不是同级参数。另外,我的版本不接受标题行。

代码全部用 JavaScript 编写;它使用 jstree 来构建树。您可以使用 firebug 或仅查看页面上的源代码来了解它是如何完成的。调整它以转义 CSV 中的逗号以保持最后两个参数是单个子参数可能非常容易。

I recently had a need to do pretty much the same thing and wrote tree-builder.com to accomplish the task. The only difference is that as you have your CSV laid out, the last two parameters will be parent and child instead of peers. Also, my version doesn't accept a header row.

The code is all in JavaScript; it uses jstree to build the tree. You can use firebug or just view the source on the page to see how it's done. It would probably be pretty easy to tweak it to escape the comma in your CSV in order to keep the last two parameters is a single child.

时间你老了 2024-12-15 10:48:16
    public static void main (String arg[]) throws Exception
{
    ArrayList<String> arRows = new ArrayList<String>();
    arRows.add("A,XX,22,33");
    arRows.add("A,XX,777,888");
    arRows.add("A,YY,33,11");
    arRows.add("B,XX,12,0");
    arRows.add("A,YY,13,23");
    arRows.add("B,YY,44,98");
    for(String sTreeRow:createTree(arRows,",")) //or use //// or whatever applicable
        System.out.println(sTreeRow);
}
    public static ArrayList<String> createTree (ArrayList<String> arRows, String sSeperator) throws Exception
{
    ArrayList<String> arReturnNodes = new ArrayList<String>();
    Collections.sort(arRows);
    String sLastPath = "";
    int iFolderLength = 0;
    for(int iRow=0;iRow<arRows.size();iRow++)
    {
        String sRow = arRows.get(iRow);
        String[] sFolders = sRow.split(sSeperator);
        iFolderLength = sFolders.length;
        String sTab = "";
        String[] sLastFolders = sLastPath.split(sSeperator);
        for(int i=0;i<iFolderLength;i++)
        {
            if(i>0)
                sTab = sTab+"    ";
            if(!sLastPath.equals(sRow))
            {

                if(sLastFolders!=null && sLastFolders.length>i)
                {
                    if(!sLastFolders[i].equals(sFolders[i]))
                    {
                        arReturnNodes.add(sTab+sFolders[i]+"");
                        sLastFolders = null;
                    }
                }
                else
                {
                    arReturnNodes.add(sTab+sFolders[i]+"");
                }
            }
        }
        sLastPath = sRow;
    }
    return arReturnNodes;
}
    public static void main (String arg[]) throws Exception
{
    ArrayList<String> arRows = new ArrayList<String>();
    arRows.add("A,XX,22,33");
    arRows.add("A,XX,777,888");
    arRows.add("A,YY,33,11");
    arRows.add("B,XX,12,0");
    arRows.add("A,YY,13,23");
    arRows.add("B,YY,44,98");
    for(String sTreeRow:createTree(arRows,",")) //or use //// or whatever applicable
        System.out.println(sTreeRow);
}
    public static ArrayList<String> createTree (ArrayList<String> arRows, String sSeperator) throws Exception
{
    ArrayList<String> arReturnNodes = new ArrayList<String>();
    Collections.sort(arRows);
    String sLastPath = "";
    int iFolderLength = 0;
    for(int iRow=0;iRow<arRows.size();iRow++)
    {
        String sRow = arRows.get(iRow);
        String[] sFolders = sRow.split(sSeperator);
        iFolderLength = sFolders.length;
        String sTab = "";
        String[] sLastFolders = sLastPath.split(sSeperator);
        for(int i=0;i<iFolderLength;i++)
        {
            if(i>0)
                sTab = sTab+"    ";
            if(!sLastPath.equals(sRow))
            {

                if(sLastFolders!=null && sLastFolders.length>i)
                {
                    if(!sLastFolders[i].equals(sFolders[i]))
                    {
                        arReturnNodes.add(sTab+sFolders[i]+"");
                        sLastFolders = null;
                    }
                }
                else
                {
                    arReturnNodes.add(sTab+sFolders[i]+"");
                }
            }
        }
        sLastPath = sRow;
    }
    return arReturnNodes;
}
霊感 2024-12-15 10:48:16

根据这个问题的提出方式,我将执行以下操作:

  1. 定义最终的数据结构将包含以下内容:
    树。
  2. 为原始文本中的每一行定义表示形式
    (为了灵活性,可能是链表)
  3. 编写一个方法,获取表示的行并将其插入到树数据结构中。对于每个不存在的分支,创建它;对于每个现有分支,当您逐步浏览“行”链接列表结构时,遍历它。
  4. 从一棵空树开始。
  5. 将文件的每一行读入行项结构并调用步骤 3 中定义的方法。

这有帮助吗?

Based upon how this problem is posed, I would do the following:

  1. Define what your final data structure will look like to contain the
    tree.
  2. Define a representation for each row in your original text
    (perhaps a linked list for flexibility)
  3. Write a method that takes the represented row and inserts it into the tree data structure. For each non-existent branch, create it; for each existing branch, traverse it, as you step through your "row" link list structure.
  4. Start with an empty tree.
  5. Read each line of the file into your row item structure and call the method defined in step 3.

Does that help?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文