如何在Web-Scred CSV文件的单元格中提取一些内容?
我正在努力处理一个刮擦一个众筹网站的CSV文件。
我的目标是成功地加载所有信息作为单独的列,但是当我使用1)R,2)Stata和3)Python时发现一些信息混合在单列中。
由于真实数据确实很脏,让我建议当前数据集的缩写版本。
ID | 质认识 | 创建者 |
---|---|---|
000001 | 13.7 | {“ urls”:{“ web”:{“ user”:“ www.kickstarter.com/profile/731”}},“名称”:john“:john”,“”:709510333} |
000002 | 26.4 | { “ urls”:{“ web”:{“ user”:“ www.kickstarter.com/profile/759”}},“ name”:kellen“:kellen”,“ id”:703514812} |
000003 | 7.6 7.6 | 7.6 7.6 { web“:{“ user”:“ www.kickstarter.com/profile/7522”}}},“ name”:jach“:iD”:609542647} |
我的目标是将“名称”和“ ID”作为单独的列提取,尽管它们都与创建列中的URL混合在一起
。 我更喜欢R,但是Stata和Python也会有所帮助!
非常感谢您考虑这一点。
I am struggling with dealing with a csv file that scraped one crowdfunding website.
My goal is successfully load all information as separate columns, but I found some information are mixed in a single column when I load it using 1) R, 2) Stata, and 3) Python.
Since the real data is really dirty, let me suggest abbreviate version of current dataset.
ID | Pledge | creator |
---|---|---|
000001 | 13.7 | {"urls":{"web":{"user":"www.kickstarter.com/profile/731"}}, "name":John","id":709510333} |
000002 | 26.4 | {"urls":{"web":{"user":"www.kickstarter.com/profile/759"}}, "name":Kellen","id":703514812} |
000003 | 7.6 | {"urls":{"web":{"user":"www.kickstarter.com/profile/7522"}}, "name":Jach","id":609542647} |
My goal was extracting the "name" and "id" as separate columns, though they are all mixed with URLs in the creator column.
Is there any way that I can extract names (John, Kellen, Jach) and ids as separate columns?
I prefer R, but Stata and Python would also be helpful!
Thank you so much for considering this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果要提取没有任何其他值的名称和ID,则可以简单地替换正在用
用词典替换创建者,您可以尝试使用正则表达式
如果JSON数据未正确格式化(例如缺少报价),则可以
if you want to extract the name and id without any other values you can simply replace the code that is setting the creator column with
replace the creator with what ever variable that holds the dictionary
also if the json data is not formatted correctly (like missing a quote) you can try using regular expressions