在交叉引用检查数据之前,在巨大的XLS文件中规范数据
我有一个巨大的 14MB XLS文件,其中包含不正常的数据,如所附图片所示。 (以说明这里说明重点的假数据)
我需要进行语义完整性检查。对于每个行(事件)交叉引用,如果它将数据发送到其他事件(如果确实如此),则还必须在接收事件的“获取数据”列中有一个条目,反之亦然。问题该数据不是归一化的,而是通过半分离。
[![dataset ntorresanization] [1]] [1]
[“ 2”>]
是归一化的最佳解决方案,然后是语义参考吗?
可能的解决方案A:
- 保存到CSV文件中,编写一个可以读取每一行并分解数据的脚本。如果发现一个“”;在任何行中,它都会在新文件中添加一条新线路。
- 在Java,JavaScript,Python中编写另一个脚本,该脚本将在标准化集可能的解决方案上进行语义参考
B: 只需编写一个VBA-MACRO,该VBA-MACRO将读取“发送到”列的每行(每个事件),然后立即检查发现事件的“获取数据”中是否有相应的条目。
I have a huge 14MB XLS file with unnormalized data like shown in the attached picture. (fake data to illustrate the point here)
I need to make a semantic integrity check. For each row (event) cross reference if it sends data to some other event, if it does, than there must also be an entry in the "gets data from" column of the receiving event and vice versa. The problem the data is not normalized but separated by semicolon.
[![Dataset without normalisation][1]][1]
[]
Question: What is the best solution for normalization and then semantic reference?
Possible Solution A:
- Save to CSV-File, write a script that would read each line and decompose the data. If it finds a ";" in any row, it would add a new line with that data to the new file.
- Write another script in Java, JavaScript, Python that would do the semantic reference on the normalized set
Possible Solution B:
Just write a VBA-Macro that would take each row (each event) read out the "sends to" column and immediately check if there is a corresponding entry in the "gets data from" column of the found event.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论