具有多个选项的子字符串提取
我的数据集中有一个 Stata 变量,如下所示:
city
Washington city
Boston city
El Paso city
Nashville-Davidson metropolitan government (balance)
Lexington-Fayette urban county
我希望它看起来像:
city
Washington
Boston
El Paso
Nashville-Davidson
Lexington-Fayette
“城市”、“县”和“城市县”是城市名称后面唯一的三个单词。 换句话说,我想从左侧提取子字符串到城市、县或城市之前的空格。
我能想到使用 subinstring 来解决这个问题的唯一方法:
replace city = subinstr(city, " city", "", .)
但是,我不认为我可以在这里添加多个选项。
I have a variable in Stata in my dataset that looks like this:
city
Washington city
Boston city
El Paso city
Nashville-Davidson metropolitan government (balance)
Lexington-Fayette urban county
And I want it to look like:
city
Washington
Boston
El Paso
Nashville-Davidson
Lexington-Fayette
"city," "county," and "urban county" are the only three words that follow after a city name.
In other words, I want to extract the substring from left to the space before either city, county, or urban.
The only way I can think of approaching this using subinstring:
replace city = subinstr(city, " city", "", .)
I don't think, however, that I can add multiple options here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我使用
subinstr
将所需的单词替换为空字符串,并使用trim
删除额外的空格。编辑:按照建议,我合并了一个前导空格,这样名称中带有“城市”的地方(例如 Audacity)就不会被无意中替换。 “县”也是如此(尽管这似乎不太可能)。
I used
subinstr
to replace the desired words with empty strings, andtrim
to remove additional spaces.Edit: As suggested, I have incorporated a leading space so that places with "city" in their name (e.g. Audacity) are not inadvertently replaced. The same for "county" (although this seems less likely).
split
可能是一种方法。split
could be a way.我认为使用正则表达式替换来搜索空格后跟相关子字符串将是这里最灵活的选项。例如:
I think using regular expression replacement to search for a space followed by a relevant substring would be the most flexible option here. For example: