从 csv 生成树结构

发布于 2024-12-08 10:48:16 字数 1720 浏览 0 评论 0原文

我已经为这个问题摸不着头脑有一段时间了。我基本上试图从一组 CSV 数据生成树层次结构。 CSV 数据不一定是有序的。这就像如下所示：

Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98

我试图使分组的执行方式尽可能灵活。最简单的分组方法是对 Record1 和 Record2 进行分组，并将 Value1 和 Value2 存储在 Record2 下，以便我们得到以下输出：

Record1
    Record2
        Value1 Value2

这将是：

我目前将我的组设置存储在列表中 - 我不这样做知道这是否阻碍了我的想法。该列表包含组的层次结构，例如：

Record1 (SchemaGroup)
    .column = Record1
    .columns = null
    .childGroups =
        Record2 (SchemaGroup)
            .column = Record1
            .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
            .childGroups = null

其代码如下所示：

private class SchemaGroup {
    private SchemaGroupType type = SchemaGroupType.StaticText;  // default to text
    private String text;
    private CSVColumnInformation column = null;
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}


private enum SchemaGroupType {
    /** Allow fixed text groups to be added */
    StaticText,
    /** Related to a column with common value */
    ColumnGroup
}

我正在努力为此生成一个算法，试图考虑要使用的底层结构。目前，我正在使用我自己的包装类从上到下解析 CSV：

CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
    ...
}

我只是想启动我的编码大脑。

有什么想法吗？

原文

I have scratched my head over this problem for a while now. I am basically trying to generate a tree hierarchy from a set of CSV data. The CSV data is not necessarily ordered. This is like something as follows:

Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98

I am trying to make the way the grouping is performed as flexible as possible. The simplest for of grouping would to do it for Record1 and Record2 with the Value1 and Value2 stored under Record2 so that we get the following output:

Record1
    Record2
        Value1 Value2

Which would be:

I am storing my group settings in a List at present - which I don't know if this is hindering my thoughts. This list contains a hierarchy of the groups for example:

Record1 (SchemaGroup)
    .column = Record1
    .columns = null
    .childGroups =
        Record2 (SchemaGroup)
            .column = Record1
            .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
            .childGroups = null

The code for this looks like as follows:

private class SchemaGroup {
    private SchemaGroupType type = SchemaGroupType.StaticText;  // default to text
    private String text;
    private CSVColumnInformation column = null;
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}


private enum SchemaGroupType {
    /** Allow fixed text groups to be added */
    StaticText,
    /** Related to a column with common value */
    ColumnGroup
}

I am stuggling producing an algorithm for this, trying to think of the underlying structure to use. At present I am parsing the CSV top to bottom, using my own wrapper class:

CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
    ...
}

I am just trying to kick start my coding brain.

Any thoughts?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤者何惧 2024-12-15 10:48:16

基本思想并不困难：按第一条记录分组，然后按第二条记录分组，依此类推，直到得到类似这样的结果：

(A,XX,22,33)
(A,XX,777,888)
-------------------------
(A,YY,33,11)
(A,YY,13,23)
=============
(B,XX,12,0)
-------------------------
(B,YY,44,98)

然后向后构建树。

然而，有一个递归组件使得推理这个问题或一步一步地显示它变得有些困难，所以实际上编写伪代码更容易。

我假设 csv 中的每一行都表示为一个元组。每个元组都有“记录”和“值”，使用您在问题中使用的相同术语。 “记录”是必须放入层次结构中的东西。 “价值观”将是树的叶子。当我使用具有这些特定含义的术语时，我将使用引号。

我还假设所有“记录”都位于所有“值”之前。

言归正传，代码：

// builds tree and returns a list of root nodes
// list_of_tuples: a list of tuples read from your csv
// curr_position: used to keep track of recursive calls
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n
function build_tree(list_of_tuples, curr_position, number_of_records) {
    // check if we have already reached the "values" (which shouldn't get converted into trees)
    if (curr_position == number_of_records) {
        return list of nodes, each containing a "value" (i.e. everything from position number_of_records on)
    }

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value
    unique_values = get unique values in curr_position

    list_of_nodes = empty list

   // create the nodes and (recursively) their children
    for each val in unique_values {
        the_node = create tree node containing val
        the_children = build_tree(grouped[val], curr_position+1, number_of_records)
        the_node.set_children(the_children)

        list_of_nodes.append(the_node)
    }

    return list_of_nodes
}

// in your example, this returns a node with "A" and a node with "B"
// third parameter is 2 because you have 2 "records"
build_tree(list_parsed_from_csv, 0, 2)

现在您必须考虑要使用的具体数据结构，但希望如果您理解算法，这应该不会太困难（正如您提到的，我认为尽早决定数据结构）可能阻碍了你的想法）。

The basic idea isn't difficult: group by the first record, then by the second record, etc. until you get something like this:

(A,XX,22,33)
(A,XX,777,888)
-------------------------
(A,YY,33,11)
(A,YY,13,23)
=============
(B,XX,12,0)
-------------------------
(B,YY,44,98)

and then work backwards to build the trees.

However, there is a recursive component that makes it somewhat hard to reason about this problem, or show it step by step, so it's actually easier to write pseudocode.

I'll assume that every row in your csv is represented like a tuple. Each tuple has "records" and "values", using the same terms you use in your question. "Records" are the things that must be put into a hierarchic structure. "Values" will be the leaves of the tree. I'll use quotations when I use these terms with these specific meanings.

I also assume that all "records" come before all "values".

Without further ado, the code:

// builds tree and returns a list of root nodes
// list_of_tuples: a list of tuples read from your csv
// curr_position: used to keep track of recursive calls
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n
function build_tree(list_of_tuples, curr_position, number_of_records) {
    // check if we have already reached the "values" (which shouldn't get converted into trees)
    if (curr_position == number_of_records) {
        return list of nodes, each containing a "value" (i.e. everything from position number_of_records on)
    }

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value
    unique_values = get unique values in curr_position

    list_of_nodes = empty list

   // create the nodes and (recursively) their children
    for each val in unique_values {
        the_node = create tree node containing val
        the_children = build_tree(grouped[val], curr_position+1, number_of_records)
        the_node.set_children(the_children)

        list_of_nodes.append(the_node)
    }

    return list_of_nodes
}

// in your example, this returns a node with "A" and a node with "B"
// third parameter is 2 because you have 2 "records"
build_tree(list_parsed_from_csv, 0, 2)

Now you'd have to think about the specific data structures to use, but hopefully this shouldn't be too difficult if you understand the algorithm (as you mention, I think deciding on a data structure early on may have been hindering your thoughts).

回复收藏 0 原文

巷子口的你 2024-12-15 10:48:16

这是通过使用 google-guava 简化的 junit 形式的基本工作解决方案（尽管没有断言）集合。该代码是不言自明的，您使用 csv 库代替 file io 来读取 csv。这应该给你基本的想法。

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Set;

import org.junit.Test;

import com.google.common.base.Charsets;
import com.google.common.base.Splitter;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.Iterables;
import com.google.common.collect.Multimap;
import com.google.common.collect.Sets;
import com.google.common.io.Files;

public class MyTest
{
    @Test
    public void test1()
    {
        List<String> rows = getAllDataRows();

        Multimap<Records, Values> table = indexData(rows);

        printTree(table);

    }

    private void printTree(Multimap<Records, Values> table)
    {
        Set<String> alreadyPrintedRecord1s = Sets.newHashSet();

        for (Records r : table.keySet())
        {
            if (!alreadyPrintedRecord1s.contains(r.r1))
            {
                System.err.println(r.r1);
                alreadyPrintedRecord1s.add(r.r1);
            }

            System.err.println("\t" + r.r2);

            Collection<Values> allValues = table.get(r);

            for (Values v : allValues)
            {
                System.err.println("\t\t" + v.v1 + " , " + v.v2);
            }
        }
    }

    private Multimap<Records, Values> indexData(List<String> lines)
    {
        Multimap<Records, Values> table = ArrayListMultimap.create();

        for (String row : lines)
        {
            Iterable<String> split = Splitter.on(",").split(row);
            String[] data = Iterables.toArray(split, String.class);

            table.put(new Records(data[0], data[1]), new Values(data[2], data[3]));
        }
        return table;
    }

    private List<String> getAllDataRows()
    {
        List<String> lines = Collections.emptyList();

        try
        {
            lines = Files.readLines(new File("C:/test.csv"), Charsets.US_ASCII);
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

        lines.remove(0);// remove header

        return lines;
    }
}



public class Records
{
    public final String r1, r2;

    public Records(final String r1, final String r2)
    {
        this.r1 = r1;
        this.r2 = r2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((r1 == null) ? 0 : r1.hashCode());
        result = prime * result + ((r2 == null) ? 0 : r2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Records))
        {
            return false;
        }
        Records other = (Records) obj;
        if (r1 == null)
        {
            if (other.r1 != null)
            {
                return false;
            }
        }
        else if (!r1.equals(other.r1))
        {
            return false;
        }
        if (r2 == null)
        {
            if (other.r2 != null)
            {
                return false;
            }
        }
        else if (!r2.equals(other.r2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Records1and2 [r1=").append(r1).append(", r2=").append(r2).append("]");
        return builder.toString();
    }

}


public class Values
{
    public final String v1, v2;

    public Values(final String v1, final String v2)
    {
        this.v1 = v1;
        this.v2 = v2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((v1 == null) ? 0 : v1.hashCode());
        result = prime * result + ((v2 == null) ? 0 : v2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Values))
        {
            return false;
        }
        Values other = (Values) obj;
        if (v1 == null)
        {
            if (other.v1 != null)
            {
                return false;
            }
        }
        else if (!v1.equals(other.v1))
        {
            return false;
        }
        if (v2 == null)
        {
            if (other.v2 != null)
            {
                return false;
            }
        }
        else if (!v2.equals(other.v2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Values1and2 [v1=").append(v1).append(", v2=").append(v2).append("]");
        return builder.toString();
    }

}

Here is the basic working solution in the form of junit (no assertions though) simplified by using google-guava collections. The code is self-explanatory and instead of file io you use csv libraries for reading the csv. This should give you the basic idea.

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Set;

import org.junit.Test;

import com.google.common.base.Charsets;
import com.google.common.base.Splitter;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.Iterables;
import com.google.common.collect.Multimap;
import com.google.common.collect.Sets;
import com.google.common.io.Files;

public class MyTest
{
    @Test
    public void test1()
    {
        List<String> rows = getAllDataRows();

        Multimap<Records, Values> table = indexData(rows);

        printTree(table);

    }

    private void printTree(Multimap<Records, Values> table)
    {
        Set<String> alreadyPrintedRecord1s = Sets.newHashSet();

        for (Records r : table.keySet())
        {
            if (!alreadyPrintedRecord1s.contains(r.r1))
            {
                System.err.println(r.r1);
                alreadyPrintedRecord1s.add(r.r1);
            }

            System.err.println("\t" + r.r2);

            Collection<Values> allValues = table.get(r);

            for (Values v : allValues)
            {
                System.err.println("\t\t" + v.v1 + " , " + v.v2);
            }
        }
    }

    private Multimap<Records, Values> indexData(List<String> lines)
    {
        Multimap<Records, Values> table = ArrayListMultimap.create();

        for (String row : lines)
        {
            Iterable<String> split = Splitter.on(",").split(row);
            String[] data = Iterables.toArray(split, String.class);

            table.put(new Records(data[0], data[1]), new Values(data[2], data[3]));
        }
        return table;
    }

    private List<String> getAllDataRows()
    {
        List<String> lines = Collections.emptyList();

        try
        {
            lines = Files.readLines(new File("C:/test.csv"), Charsets.US_ASCII);
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

        lines.remove(0);// remove header

        return lines;
    }
}



public class Records
{
    public final String r1, r2;

    public Records(final String r1, final String r2)
    {
        this.r1 = r1;
        this.r2 = r2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((r1 == null) ? 0 : r1.hashCode());
        result = prime * result + ((r2 == null) ? 0 : r2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Records))
        {
            return false;
        }
        Records other = (Records) obj;
        if (r1 == null)
        {
            if (other.r1 != null)
            {
                return false;
            }
        }
        else if (!r1.equals(other.r1))
        {
            return false;
        }
        if (r2 == null)
        {
            if (other.r2 != null)
            {
                return false;
            }
        }
        else if (!r2.equals(other.r2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Records1and2 [r1=").append(r1).append(", r2=").append(r2).append("]");
        return builder.toString();
    }

}


public class Values
{
    public final String v1, v2;

    public Values(final String v1, final String v2)
    {
        this.v1 = v1;
        this.v2 = v2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((v1 == null) ? 0 : v1.hashCode());
        result = prime * result + ((v2 == null) ? 0 : v2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Values))
        {
            return false;
        }
        Values other = (Values) obj;
        if (v1 == null)
        {
            if (other.v1 != null)
            {
                return false;
            }
        }
        else if (!v1.equals(other.v1))
        {
            return false;
        }
        if (v2 == null)
        {
            if (other.v2 != null)
            {
                return false;
            }
        }
        else if (!v2.equals(other.v2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Values1and2 [v1=").append(v1).append(", v2=").append(v2).append("]");
        return builder.toString();
    }

}

回复收藏 0 原文

眼眸 2024-12-15 10:48:16

如果您知道只有两层记录，我会使用类似

Map<string, Map<string, List<Values>>>

当您读取新行时，您会查看外部映射来检查记录1的值是否为code> 已经存在，如果不存在，则为其创建新的空内部 Map。

然后检查内部映射是否存在该 Record2 的值。如果没有，则创建新的List。

然后读取值并将它们添加到列表中。

If you know you'll have just two levels of Records, I would use something like

Map<string, Map<string, List<Values>>>

When you read new line, you look into the outer map to check whether that value for Record1 already exists and if not, create new empty inner Map for it.

Then check the inner map whether a value for that Record2 exists. If not, create new List.

Then read the values and add them to the list.

回复收藏 0 原文

半枫 2024-12-15 10:48:16

我最近需要做几乎同样的事情，并写了 tree-builder.com 来完成任务。唯一的区别是，当您布置 CSV 时，最后两个参数将是父参数和子参数，而不是同级参数。另外，我的版本不接受标题行。

代码全部用 JavaScript 编写；它使用 jstree 来构建树。您可以使用 firebug 或仅查看页面上的源代码来了解它是如何完成的。调整它以转义 CSV 中的逗号以保持最后两个参数是单个子参数可能非常容易。

回复收藏 0 原文

时间你老了 2024-12-15 10:48:16

    public static void main (String arg[]) throws Exception
{
    ArrayList<String> arRows = new ArrayList<String>();
    arRows.add("A,XX,22,33");
    arRows.add("A,XX,777,888");
    arRows.add("A,YY,33,11");
    arRows.add("B,XX,12,0");
    arRows.add("A,YY,13,23");
    arRows.add("B,YY,44,98");
    for(String sTreeRow:createTree(arRows,",")) //or use //// or whatever applicable
        System.out.println(sTreeRow);
}
    public static ArrayList<String> createTree (ArrayList<String> arRows, String sSeperator) throws Exception
{
    ArrayList<String> arReturnNodes = new ArrayList<String>();
    Collections.sort(arRows);
    String sLastPath = "";
    int iFolderLength = 0;
    for(int iRow=0;iRow<arRows.size();iRow++)
    {
        String sRow = arRows.get(iRow);
        String[] sFolders = sRow.split(sSeperator);
        iFolderLength = sFolders.length;
        String sTab = "";
        String[] sLastFolders = sLastPath.split(sSeperator);
        for(int i=0;i<iFolderLength;i++)
        {
            if(i>0)
                sTab = sTab+"    ";
            if(!sLastPath.equals(sRow))
            {

                if(sLastFolders!=null && sLastFolders.length>i)
                {
                    if(!sLastFolders[i].equals(sFolders[i]))
                    {
                        arReturnNodes.add(sTab+sFolders[i]+"");
                        sLastFolders = null;
                    }
                }
                else
                {
                    arReturnNodes.add(sTab+sFolders[i]+"");
                }
            }
        }
        sLastPath = sRow;
    }
    return arReturnNodes;
}

    public static void main (String arg[]) throws Exception
{
    ArrayList<String> arRows = new ArrayList<String>();
    arRows.add("A,XX,22,33");
    arRows.add("A,XX,777,888");
    arRows.add("A,YY,33,11");
    arRows.add("B,XX,12,0");
    arRows.add("A,YY,13,23");
    arRows.add("B,YY,44,98");
    for(String sTreeRow:createTree(arRows,",")) //or use //// or whatever applicable
        System.out.println(sTreeRow);
}
    public static ArrayList<String> createTree (ArrayList<String> arRows, String sSeperator) throws Exception
{
    ArrayList<String> arReturnNodes = new ArrayList<String>();
    Collections.sort(arRows);
    String sLastPath = "";
    int iFolderLength = 0;
    for(int iRow=0;iRow<arRows.size();iRow++)
    {
        String sRow = arRows.get(iRow);
        String[] sFolders = sRow.split(sSeperator);
        iFolderLength = sFolders.length;
        String sTab = "";
        String[] sLastFolders = sLastPath.split(sSeperator);
        for(int i=0;i<iFolderLength;i++)
        {
            if(i>0)
                sTab = sTab+"    ";
            if(!sLastPath.equals(sRow))
            {

                if(sLastFolders!=null && sLastFolders.length>i)
                {
                    if(!sLastFolders[i].equals(sFolders[i]))
                    {
                        arReturnNodes.add(sTab+sFolders[i]+"");
                        sLastFolders = null;
                    }
                }
                else
                {
                    arReturnNodes.add(sTab+sFolders[i]+"");
                }
            }
        }
        sLastPath = sRow;
    }
    return arReturnNodes;
}

回复收藏 0 原文