将 html 行拆分为字符串数组
我的 html 文件和表格中有数据:
<table>
<tr><td>001</td><td>MC Hammer</td><td>Can't Touch This</td></tr>
<tr><td>002</td><td>Tone Loc</td><td>Funky Cold Medina</td></tr>
<tr><td>003</td><td>Funkdoobiest</td><td>Bow Wow Wow</td></tr>
</table>
如何将单行拆分为数组或列表?
string row = streamReader.ReadLine();
List<string> data = row.Split //... how do I do this bit?
string artist = data[1];
I have data in an html file, in a table:
<table>
<tr><td>001</td><td>MC Hammer</td><td>Can't Touch This</td></tr>
<tr><td>002</td><td>Tone Loc</td><td>Funky Cold Medina</td></tr>
<tr><td>003</td><td>Funkdoobiest</td><td>Bow Wow Wow</td></tr>
</table>
How do I split a single row into an array or list?
string row = streamReader.ReadLine();
List<string> data = row.Split //... how do I do this bit?
string artist = data[1];
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
简短回答:永远不要尝试使用正则表达式从野外解析 HTML。它很可能会回来困扰你。
更长的答案:只要您能够绝对、积极地保证您正在解析的 HTML 适合给定的结构,您就可以按照 Jenni 的建议使用 string.Split() 。
独立列出标签可以使此稍微更具可读性,并且
.RemoveEmptyEntries
将防止您在列表中相邻的结束标签和开始标签之间出现空字符串。如果此 HTML 来自野外,或者来自可能会更改的工具 - 换句话说,如果这不仅仅是一次性事务 - 我强烈鼓励您使用类似 HTML Agility Pack 代替。它非常容易集成,并且 Intarwebs 上有很多示例。
Short answer: never try to parse HTML from the wild with regular expressions. It will most likely come back to haunt you.
Longer answer: As long as you can absolutely, positively guarantee that the HTML that you are parsing fits the given structure, you can use string.Split() as Jenni suggested.
Listing the tags independently keeps this slightly more readable, and the
.RemoveEmptyEntries
will keep you from getting an empty string in your list between adjacent closing and opening tags.If this HTML is coming from the wild, or from a tool that may change - in other words, if this is more than a one-off transaction - I strongly encourage you to use something like the HTML Agility Pack instead. It's pretty easy to integrate, and there are lots of examples on the Intarwebs.
如果您的 HTML 格式良好,您可以使用 LINQ to XML:
If your HTML is well-formed you could use LINQ to XML:
您可以尝试:
但这取决于 HTML 的规则程度。它是通过编程生成的,还是由人编写的?仅当您确定始终以相同方式生成正则表达式时,才应使用正则表达式,否则应使用适当的 HTML 解析器
You could try:
But it depends on how regular the HTML is. Is it programmatically generated, or does a human write it? You should only use a regular expression if you're sure it will always be generated the same way, otherwise you should use a proper HTML parser
解析 HTML 时,我通常会使用 HTML Agility Pack。
When parsing HTML, I usually turn to the HTML Agility Pack.