我一直遇到数据位于错误字段的问题。我有一些大型 csv 文件,在加载到 QLIK 之前必须手动更新它们。 csv 有一个逗号 (,) 分隔符 &双引号 (") 文本限定符。某些数据具有额外的字符,会导致文本字段中出现数字,反之亦然。有人可以建议解决此问题的最佳/最快方法吗?删除不需要的 " 并救救我从手动删除引号并粘贴到数百条记录的正确字段。我在下面创建了虚拟数据。
请注意,我可用于清理 csv 的工具有点有限。或者您能否建议为此所需的最佳工具/应用程序?只是不确定从哪里开始
记事本:
ID,T_No,T_Type,T_Date,T_Name,T_TNo,
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1
EXCEL:
[在此处输入图像描述][1]
非常感谢任何帮助。
谢谢
[1]: https://i.sstatic.net/vyYAT.png
I have been having issues where my data is in the wrong fields. I have a few large csv files that I have to manually update before loading into QLIK. The csv's have a comma (,) delimiter & double quote (") text qualifier. Some data has extra characters that throw it off and results in numeric numbers in text fields and vice versa. Can someone please advise the best/fastest way to combat this? To remove the unwanted " and save me from manually deleting quotes and pasting to correct fields for hundreds of records. I have created dummy data below.
Please note I am bit limited with the tools I have available to clean the csv. Or could you please advise the best tools/applications needed for this? Just unsure where to start
IN NOTEPAD:
ID,T_No,T_Type,T_Date,T_Name,T_TNo,
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1
IN EXCEL:
[enter image description here][1]
Any assistance is greatly appreciated.
Thank you
[1]: https://i.sstatic.net/vyYAT.png
发布评论
评论(2)
如果问题仅出在
T_Name
列,您可以将模式设置为CsvMode.NoEscape
,使用ClassMap
获取您知道的字段您可以毫无问题地获取,然后使用一些逻辑来确定T_Name
列的结束位置和T_TNo
列的开始位置。这段代码中有很多地方可能会被破坏,具体取决于其余数据的样子,但它至少应该给你一些想法。If the issue is just with the
T_Name
column, you could set the mode toCsvMode.NoEscape
, use theClassMap
to get the fields you know you can get without issue and then use some logic to figure out where theT_Name
column ends and theT_TNo
column starts. There is a lot that could break in this code, depending on what the rest of the data looks like, but it should at least give you some ideas.如果您可以使用 C#(有免费版本),您可以处理该文件并修复不良记录。我会首先弄清楚是否存在问题,如果存在则再解决。找出名称字段的开始和结束位置,并解决引号问题。
这将是一个很好的起点:
If you have access to C# (there is a free version) you could process the file and fix the bad records. I would do that by figuring out first if there is an issue, and if there is then. Figure out where the name field starts and ends and fix the issues with the quotes.
This would be a good starting point: