Pentaho Spoon搜索并替换了行中的特别角色
我有一个带有MIME TYPE US-ASCII的CSV文件,数据集中的一列看起来像:
ID | v_name |
---|---|
210001 | cha?ne des Puys |
210030 | m?los |
213004 | g?ll? |
213021 | s?phan |
221110 | afd?ra |
等等。
我想将这些字符更改为:
id | v_name |
---|---|
210001 | 链210030 |
米尔斯 | 213004 |
Gollu | 213021 |
Suphan | 22111111111110 |
Afdera | 是 |
有95行,我该如何搜索和更换这些行? 我使用套房PDI汤匙。 提前致谢。
I have a csv file with mime type US-ASCII and one column in the dataset look like this:
id | V_name |
---|---|
210001 | cha?ne des Puys |
210030 | M?los |
213004 | G?ll? |
213021 | S?phan |
221110 | Afd?ra |
And so on.
I would like to change those characters to:
id | V_name |
---|---|
210001 | chaine des Puys |
210030 | Milos |
213004 | Gollu |
213021 | Suphan |
221110 | Afdera |
The thing is that there are 95 rows of this kind, how can I search and replace those rows?
I using the suite PDI spoon.
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如@iłyaBursov所说,您正在阅读的源文件没有提供正确的字符,它提供了?在来源中,如果要纠正它,则必须手动进行。
我认为这是不值得的,除非您知道随着时间的流逝和不同的文件,您将始终获得相同的v_name集。在这种情况下,您可以创建一个文件,以将源与?字符 v_name_corred 具有正确的字符显示。这似乎是一种练习,所以我会尽其所能。在现实生活中,我将使用不正确的字符与文件提供商联系,以告诉他们他们需要纠正文件的生成以提供文件中正确的字符。
As @Iłya Bursov has stated, the source file you are reading doesn't provide the correct characters, it is providing the ? in the source, so if you want to correct it, you'll have to do it manually.
I don't think it is worth it, unless you know you are going to get always the same set of V_name over time and different files. In that case you could create a file to correlate the V_name in your source with the ? characters to a V_name_corrected with the correct display for the characters. This seems to be an exercise, so I would let the names as they are. In real life, I would contact the provider of the file with the incorrect character set to tell them that they need to correct the generation of the file to provide the correct characters in the file.