使用 Javascript 从 HTML 中提取文本
我想用纯 Javascript 从 HTML 中提取文本(这是针对 Chrome 扩展的)。
具体来说,我希望能够在页面上查找文本并在其后提取文本。
更具体地说,在
https://picasaweb.google.com/kevin.smilak 之类的页面上/BestOfAmericaSGrandCircle#4974033581081755666
我想查找文本“Latitude”并提取其后面的值。 HTML 并不是非常结构化的形式。
什么是一个优雅的解决方案呢?
I would like to extract text from HTML with pure Javascript (this is for a Chrome extension).
Specifically, I would like to be able to find text on a page and extract text after it.
Even more specifically, on a page like
https://picasaweb.google.com/kevin.smilak/BestOfAmericaSGrandCircle#4974033581081755666
I would like to find text "Latitude" and extract the value that goes after it. HTML there is not in a very structured form.
What is an elegant solution to do it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为没有优雅的解决方案,因为正如您所说,HTML 不是结构化的,并且“纬度”和“经度”一词取决于页面本地化。
我能想到的最好的办法就是依靠基点,这可能不会改变......
There is no elegant solution in my opinion because as you said HTML is not structured and the words "Latitude" and "Longitude" depends on page localization.
Best I can think of is relying on the cardinal points, which might not change...
你可以做
you could do
您感兴趣的文本位于类
gphoto-exifbox-exif-field
的div
内。由于这是针对 Chrome 扩展的,因此我们有document.querySelectorAll
,这使得选择该元素变得容易:现在很容易获得您想要的内容:
我使用
trim()
而不是split('Longitude: ')
因为这实际上不是innerText
中的空格字符(URL 编码,它是%C2%A0
.. .没时间弄清楚它映射到什么,抱歉)。The text you're interested in is found inside of a
div
with classgphoto-exifbox-exif-field
. Since this is for a Chrome extension, we havedocument.querySelectorAll
which makes selecting that element easy:It's easy to get what you want now:
I used
trim()
instead ofsplit('Longitude: ')
since that's not actually a space character in theinnerText
(URL-encoded, it's%C2%A0
...no time to figure out what that maps to, sorry).我会查询 DOM 并将图像信息收集到一个对象中,这样您就可以引用任何您想要的属性。
例如
I would query the DOM and just collect the image information into an object, so you can reference any property you want.
E.g.
好吧,如果其他站点需要更一般的答案,那么您可以尝试类似的操作:
对于该示例,返回包含“纬度:36.872068° N”的 1 个元素的数组(这应该很容易解析)。
Well if a more general answer is required for other sites then you can try something like:
For that example an array of 1 element containing "Latitude: 36.872068° N" is returned (which should be easy to parse).