如何使用 python 对 CSV 文件的多列进行排序?
我在弄清楚如何将大型数据集分类为更有用的数据时遇到问题。
CSV 格式的原始文件如下所示 - 数据表示 x、y、z 位置以及最后的能量。 x、y、z 坐标分布很广,这是下面的一个小片段 - 基本上它是对一个体积的能量搜索。
-2.800000,-1.000000,5.470000,-0.26488315
-3.000000,1.000000,4.070000,-0.81185718
-2.800000,-1.000000,3.270000,1.29303723
-2.800000,-0.400000,4.870000,-0.51165026
不幸的是,很难在必需的四个维度上进行绘制,因此我需要修剪这些数据。我想以这样一种方式来做到这一点:将体积转变为最低能量 z 轴上的表面。在较小的数据集上,这很简单,在 Excel 中按 X 排序,然后按 Y 排序,然后按能量排序,然后删除高于最低值的所有能量。这对于小数据集来说很容易,但很快就出现了问题。
我尝试了各种方法来执行此操作,例如拆分 csv 和使用排序命令,但我运气不佳。任何有关如何解决此问题的建议将不胜感激。
I am having issues figuring out how to sort a large data set into more useful data.
The original file in CSV format is shown below- the data indicates x,y,z positions and finally energy. The x,y,z coordinates spread for quite a ways this is a small snippet below- basically it was an energy search over a volume.
-2.800000,-1.000000,5.470000,-0.26488315
-3.000000,1.000000,4.070000,-0.81185718
-2.800000,-1.000000,3.270000,1.29303723
-2.800000,-0.400000,4.870000,-0.51165026
Unfortunately its very difficult to plot in the requisite four dimensions so I need to trim this data. I would like to do this in such a way that I will turn the volume into a surface on the lowest energy z axis. On smaller data sets this was simple, in excel sort by X then Y and then energy, then delete all energies above the lowest. This was easy enough for small sets of data but has quickly become problematic.
I have tried various ways of doing this such as splitting the csv and using the sort command, but I am having little luck. Any advice on how to approach this would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这就是您在对 Raymond 的答案的评论中提出的要求 - 仅返回每个
x, y
对的具有最低z
的行:
This does what you ask in your comment to Raymond's answer -- returns just the row with the lowest
z
for eachx, y
pair:Prints:
使用 csv.reader 将数据读入元组列表后,按
(x, y)
值对数据进行排序。为了清楚起见,请使用命名元组来标识字段。然后使用 itertools.groupby 对相关的
(x, y)
数据点。对于每一组,使用 min 来隔离能量最低的一组:After reading the data into a list of tuples with the csv.reader, sort the data by
(x, y)
values. For clarity, use named tuples to identify the fields.Then use itertools.groupby to cluster the related
(x, y)
data points. For each group, use min to isolate the one with the lowest energy:我认为 numpy 的 lexsort 将解决您的排序需求。
一般来说,我认为你的步骤是:
将 csv 读入 numpy 数组 - 你尝试过 python 的 csv 包 或 numpy 的
genfromtext()
函数?使用 lexsort 排序
修剪掉不必要的行
编辑:请参阅这个相关的SO问题。
I think numpy's lexsort will address your sorting needs.
In general I think your steps are:
Read csv into numpy array -- have you tried python's csv package or numpy's
genfromtext()
function?Sort using lexsort
Trim off unnecessary rows
EDIT: See this related SO question.