如何使用lxml查找XHTML文档中的元素文本

发布于 2024-10-14 05:51:21 字数 939 浏览 5 评论 0原文

我已经为此苦恼了很多年,我一定是做了一些愚蠢的事情。

我正在尝试检索所有可能的维基百科支持的语言,并通过遍历 List_of_Wikipedias 上的表将它们输出到文本文件

到目前为止,这是我的 python 代码,它只是尝试检索其中一个表:

import httplib
from lxml import etree

def main():
    conn = httplib.HTTPConnection("meta.wikimedia.org")
    conn.request("GET","/wiki/List_of_Wikipedias")
    res = conn.getresponse()
    root = etree.fromstring(res.read())
    table = root.xpath('//table')
    print table

main()

在我的机器上,这只打印一个空列表。为了提高速度,我在本地缓存了页面并使用:

wikipage = open("wikipage.html")
root = lxml.parse(wikipage)

但这不会产生任何影响(除了明显的加速之外)。我还尝试过

lxml.find('table')

and:

for element in root.iter():
    print("%s - %s" % (element.tag, element.text))

成功打印出所有元素,所以我知道正在创建树。

我做错了什么?

任何帮助将不胜感激。 谢谢。

I've been bashing my head at this for ages, I must be doing something stupid.

I am trying to retrieve all of the possible Wikipedia supported languages and output them to a text file by traversing the tables on List_of_Wikipedias.

Here is my python code so far, which is simply trying to retrieve one of the tables:

import httplib
from lxml import etree

def main():
    conn = httplib.HTTPConnection("meta.wikimedia.org")
    conn.request("GET","/wiki/List_of_Wikipedias")
    res = conn.getresponse()
    root = etree.fromstring(res.read())
    table = root.xpath('//table')
    print table

main()

On my machine this only prints an empty list. To increase speed I cached the page locally and used:

wikipage = open("wikipage.html")
root = lxml.parse(wikipage)

but this makes no impact whatsoever (other than the obvious speedup). I have also tried

lxml.find('table')

and:

for element in root.iter():
    print("%s - %s" % (element.tag, element.text))

which successfully prints out all of the elements, so I know the tree is being created.

What am I doing wrong?

Any help would be appreciated.
Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

↘紸啶 2024-10-21 05:51:21
I am trying to retrieve all of the possible Wikipedia supported languages and output them to a text file by traversing the tables on List_of_Wikipedias

您的问题是文档中的元素名称位于默认命名空间中。如何编写涉及此类元素名称的 XPath 表达式是 XPath 中最常见的常见问题,并且在 SO xpath 标签中有许多很好的答案。只要寻找他们就可以了。

这是一个完整的解决方案:

使用

(//x:table)[1]/x:tr[not(x:th)]/x:td[2]//text()

您已注册绑定到前缀的 XHTML 命名空间 ("http://www.w3.org/1999/xhtml") “x”

当我根据从以下位置获得的文档评估此 XPath 表达式时: http://s23.org/wikistats/wikipedias_html< /a>

我需要在文档的开头添加以下内容,因为我在本地工作并且没有 XHTML 的 DTD - 也许您不需要这些:

<!DOCTYPE html [
<!ENTITY uarr "↑">
<!ENTITY darr "↓">
<!ENTITY ccedil "Ç">
<!ENTITY oslash "Ø">
<!ENTITY aacute "á">
<!ENTITY aring "å">
<!ENTITY agrave "À">
<!ENTITY egrave "è">
<!ENTITY ograve "Ò">
<!ENTITY ocirc "ô">
]>

应用的结果本文档的上述 XPath 表达式为

                    English

                    German

                    French

                    Polish

                    Italian

                    Japanese

                    Spanish

                    Portuguese

                    Dutch

                    Russian

                    Swedish

                    Chinese

                    Catalan

                    Norwegian (Bokmål)

                    Finnish

                    Ukrainian

                    Czech

                    Hungarian

                    Romanian

                    Korean

                    Turkish

                    Vietnamese

                    Indonesian

                    Danish

                    Arabic

                    Esperanto

                    Serbian

                    Lithuanian

                    Slovak

                    Volapük

                    Persian

                    Hebrew

                    Bulgarian

                    Slovenian

                    Malay

                    Waray-Waray

                    Croatian

                    Estonian

                    Newar / Nepal Bhasa

                    Simple English

                    Hindi

                    Galician

                    Thai

                    Basque

                    Norwegian (Nynorsk)

                    Aromanian

                    Greek

                    Haitian

                    Azerbaijani

                    Tagalog

                    Latin

                    Telugu

                    Georgian

                    Macedonian

                    Cebuano

                    Serbo-Croatian

                    Breton

                    Piedmontese

                    Marathi

                    Latvian

                    Luxembourgish

                    Javanese

                    Belarusian (Taraškievica)

                    Welsh

                    Icelandic

                    Bosnian

                    Albanian

                    Tamil

                    Belarusian

                    Bishnupriya Manipuri

                    Aragonese

                    Occitan

                    Bengali

                    Swahili

                    Ido

                    Lombard

                    West Frisian

                    Gujarati

                    Afrikaans

                    Low Saxon

                    Malayalam

                    Quechua

                    Sicilian

                    Urdu

                    Kurdish

                    Cantonese

                    Sundanese

                    Asturian

                    Neapolitan

                    Samogitian

                    Armenian

                    Yoruba

                    Irish

                    Chuvash

                    Walloon

                    Nepali

                    Ripuarian

                    Western Panjabi

                    Kannada

                    Tajik

                    Tarantino

                    Venetian

                    Yiddish

                    Scottish Gaelic

                    Tatar

                    Min Nan

                    Ossetian

                    Uzbek

                    Alemannic

                    Kapampangan

                    Sakha

                    Egyptian Arabic

                    Kazakh

                    Maori

                    Limburgian

                    Amharic

                    Nahuatl

                    Upper Sorbian

                    Gilaki

                    Corsican

                    Gan

                    Mongolian

                    Scots

                    Interlingua

                    Central_Bicolano

                    Burmese

                    Faroese

                    Võro

                    Dutch Low Saxon

                    Sinhalese

                    Turkmen

                    West Flemish

                    Sanskrit

                    Bavarian

                    Malagasy

                    Manx

                    Ilokano

                    Divehi

                    Norman

                    Pangasinan

                    Banyumasan

                    Sorani

                    Romansh

                    Northern Sami

                    Zazaki

                    Mazandarani

                    Wu

                    Friulian

                    Uyghur

                    Ligurian

                    Maltese

                    Bihari

                    Novial

                    Tibetan

                    Anglo-Saxon

                    Kashubian

                    Sardinian

                    Classical Chinese

                    Fiji Hindi

                    Khmer

                    Ladino

                    Zamboanga Chavacano

                    Pali

                    Franco-Provençal/Arpitan

                    Pashto

                    Hakka

                    Cornish

                    Punjabi

                    Navajo

                    Silesian

                    Kalmyk

                    Pennsylvania German

                    Hawaiian

                    Saterland Frisian

                    Interlingue

                    Somali

                    Komi

                    Karachay-Balkar

                    Crimean Tatar

                    Tongan

                    Acehnese

                    Meadow Mari

                    Picard

                    Erzya

                    Lingala

                    Kinyarwanda

                    Extremaduran

                    Guarani

                    Kirghiz

                    Emilian-Romagnol

                    Assyrian Neo-Aramaic

                    Papiamentu

                    Aymara

                    Chechen

                    Lojban

                    Wolof

                    Banjar

                    Bashkir

                    North Frisian

                    Greenlandic

                    Tok Pisin

                    Udmurt

                    Kabyle

                    Tahitian

                    Sranan

                    Zealandic

                    Hill Mari

                    Komi-Permyak

                    Lower Sorbian

                    Abkhazian

                    Gagauz

                    Igbo

                    Oriya

                    Lao

                    Kongo

                    Avar

                    Moksha

                    Mirandese

                    Romani

                    Old Church Slavonic

                    Karakalpak

                    Samoan

                    Moldovan

                    Tetum

                    Gothic

                    Kashmiri

                    Bambara

                    Inupiak

                    Sindhi

                    Bislama

                    Lak

                    Nauruan

                    Norfolk

                    Inuktitut

                    Pontic

                    Assamese

                    Cherokee

                    Min Dong

                    Swati

                    Palatinate German

                    Hausa

                    Ewe

                    Tigrinya

                    Oromo

                    Zulu

                    Zhuang

                    Venda

                    Tsonga

                    Kirundi

                    Dzongkha

                    Sango

                    Cree

                    Chamorro

                    Luganda

                    Buginese

                    Buryat (Russia)

                    Fijian

                    Chichewa

                    Akan

                    Sesotho

                    Xhosa

                    Fula

                    Tswana

                    Kikuyu

                    Tumbuka

                    Shona

                    Twi

                    Cheyenne

                    Ndonga

                    Sichuan Yi

                    Choctaw

                    Marshallese

                    Afar

                    Kuanyama

                    Hiri Motu

                    Muscogee

                    Kanuri

                    Herero

请注意:每隔一个选定的节点都是一个仅包含空格的文本节点。如果您不想选择这些,请使用:

(//x:table)[1]/x:tr[not(x:th)]/x:td[2]//text()[normalize-space()]
I am trying to retrieve all of the possible Wikipedia supported languages and output them to a text file by traversing the tables on List_of_Wikipedias

Your problem is that the element names in the document are in a default namespace. How to write XPath expressions that involve such element names is the most FAQ in XPath and has numerous good answer in the SO xpath tag. Just search for them.

Here is a complete solution:

Use:

(//x:table)[1]/x:tr[not(x:th)]/x:td[2]//text()

where you have registered the XHTML namespace ("http://www.w3.org/1999/xhtml") bound to the prefix "x".

When I evaluated this XPath expression against the document obtained from: http://s23.org/wikistats/wikipedias_html

I needed to add the following at the start of the document, because I was working locally and didn't have the DTD for XHTML -- maybe you will not need these:

<!DOCTYPE html [
<!ENTITY uarr "↑">
<!ENTITY darr "↓">
<!ENTITY ccedil "Ç">
<!ENTITY oslash "Ø">
<!ENTITY aacute "á">
<!ENTITY aring "å">
<!ENTITY agrave "À">
<!ENTITY egrave "è">
<!ENTITY ograve "Ò">
<!ENTITY ocirc "ô">
]>

The result of applying the above XPath expression to this document is:

                    English

                    German

                    French

                    Polish

                    Italian

                    Japanese

                    Spanish

                    Portuguese

                    Dutch

                    Russian

                    Swedish

                    Chinese

                    Catalan

                    Norwegian (Bokmål)

                    Finnish

                    Ukrainian

                    Czech

                    Hungarian

                    Romanian

                    Korean

                    Turkish

                    Vietnamese

                    Indonesian

                    Danish

                    Arabic

                    Esperanto

                    Serbian

                    Lithuanian

                    Slovak

                    Volapük

                    Persian

                    Hebrew

                    Bulgarian

                    Slovenian

                    Malay

                    Waray-Waray

                    Croatian

                    Estonian

                    Newar / Nepal Bhasa

                    Simple English

                    Hindi

                    Galician

                    Thai

                    Basque

                    Norwegian (Nynorsk)

                    Aromanian

                    Greek

                    Haitian

                    Azerbaijani

                    Tagalog

                    Latin

                    Telugu

                    Georgian

                    Macedonian

                    Cebuano

                    Serbo-Croatian

                    Breton

                    Piedmontese

                    Marathi

                    Latvian

                    Luxembourgish

                    Javanese

                    Belarusian (Taraškievica)

                    Welsh

                    Icelandic

                    Bosnian

                    Albanian

                    Tamil

                    Belarusian

                    Bishnupriya Manipuri

                    Aragonese

                    Occitan

                    Bengali

                    Swahili

                    Ido

                    Lombard

                    West Frisian

                    Gujarati

                    Afrikaans

                    Low Saxon

                    Malayalam

                    Quechua

                    Sicilian

                    Urdu

                    Kurdish

                    Cantonese

                    Sundanese

                    Asturian

                    Neapolitan

                    Samogitian

                    Armenian

                    Yoruba

                    Irish

                    Chuvash

                    Walloon

                    Nepali

                    Ripuarian

                    Western Panjabi

                    Kannada

                    Tajik

                    Tarantino

                    Venetian

                    Yiddish

                    Scottish Gaelic

                    Tatar

                    Min Nan

                    Ossetian

                    Uzbek

                    Alemannic

                    Kapampangan

                    Sakha

                    Egyptian Arabic

                    Kazakh

                    Maori

                    Limburgian

                    Amharic

                    Nahuatl

                    Upper Sorbian

                    Gilaki

                    Corsican

                    Gan

                    Mongolian

                    Scots

                    Interlingua

                    Central_Bicolano

                    Burmese

                    Faroese

                    Võro

                    Dutch Low Saxon

                    Sinhalese

                    Turkmen

                    West Flemish

                    Sanskrit

                    Bavarian

                    Malagasy

                    Manx

                    Ilokano

                    Divehi

                    Norman

                    Pangasinan

                    Banyumasan

                    Sorani

                    Romansh

                    Northern Sami

                    Zazaki

                    Mazandarani

                    Wu

                    Friulian

                    Uyghur

                    Ligurian

                    Maltese

                    Bihari

                    Novial

                    Tibetan

                    Anglo-Saxon

                    Kashubian

                    Sardinian

                    Classical Chinese

                    Fiji Hindi

                    Khmer

                    Ladino

                    Zamboanga Chavacano

                    Pali

                    Franco-Provençal/Arpitan

                    Pashto

                    Hakka

                    Cornish

                    Punjabi

                    Navajo

                    Silesian

                    Kalmyk

                    Pennsylvania German

                    Hawaiian

                    Saterland Frisian

                    Interlingue

                    Somali

                    Komi

                    Karachay-Balkar

                    Crimean Tatar

                    Tongan

                    Acehnese

                    Meadow Mari

                    Picard

                    Erzya

                    Lingala

                    Kinyarwanda

                    Extremaduran

                    Guarani

                    Kirghiz

                    Emilian-Romagnol

                    Assyrian Neo-Aramaic

                    Papiamentu

                    Aymara

                    Chechen

                    Lojban

                    Wolof

                    Banjar

                    Bashkir

                    North Frisian

                    Greenlandic

                    Tok Pisin

                    Udmurt

                    Kabyle

                    Tahitian

                    Sranan

                    Zealandic

                    Hill Mari

                    Komi-Permyak

                    Lower Sorbian

                    Abkhazian

                    Gagauz

                    Igbo

                    Oriya

                    Lao

                    Kongo

                    Avar

                    Moksha

                    Mirandese

                    Romani

                    Old Church Slavonic

                    Karakalpak

                    Samoan

                    Moldovan

                    Tetum

                    Gothic

                    Kashmiri

                    Bambara

                    Inupiak

                    Sindhi

                    Bislama

                    Lak

                    Nauruan

                    Norfolk

                    Inuktitut

                    Pontic

                    Assamese

                    Cherokee

                    Min Dong

                    Swati

                    Palatinate German

                    Hausa

                    Ewe

                    Tigrinya

                    Oromo

                    Zulu

                    Zhuang

                    Venda

                    Tsonga

                    Kirundi

                    Dzongkha

                    Sango

                    Cree

                    Chamorro

                    Luganda

                    Buginese

                    Buryat (Russia)

                    Fijian

                    Chichewa

                    Akan

                    Sesotho

                    Xhosa

                    Fula

                    Tswana

                    Kikuyu

                    Tumbuka

                    Shona

                    Twi

                    Cheyenne

                    Ndonga

                    Sichuan Yi

                    Choctaw

                    Marshallese

                    Afar

                    Kuanyama

                    Hiri Motu

                    Muscogee

                    Kanuri

                    Herero

Do note: Every second selected node is a white-space-only text node. If you don't want these selected, use:

(//x:table)[1]/x:tr[not(x:th)]/x:td[2]//text()[normalize-space()]
手心的海 2024-10-21 05:51:21

将其解析为 html。

from lxml import html

url = 'http://meta.wikimedia.org/wiki/List_of_Wikipedias'
tree = html.parse(url)
languages = tree.xpath('//table/tr/td[2]/a/text()')
print('\n'.join(languages))

输出

English
German
French
Polish
Italian
Japanese
Spanish
Portuguese
Dutch
Russian
Swedish
Chinese
Catalan
Norwegian (Bokmål)
Finnish
Ukrainian
Czech
Hungarian
Romanian
Korean
Turkish
Vietnamese
Indonesian
Danish
Arabic
Esperanto
Serbian
Lithuanian
Slovak
Volapük
Persian
Hebrew
Bulgarian
Slovenian
Malay
Waray-Waray
Croatian
Estonian
Newar / Nepal Bhasa
Simple English
Hindi
Galician
Thai
Basque
Norwegian (Nynorsk)
Aromanian
Greek
Haitian
Azerbaijani
Tagalog
Latin
Telugu
Georgian
Macedonian
Cebuano
Serbo-Croatian
Breton
Piedmontese
Marathi
Latvian
Luxembourgish
Javanese
Belarusian (Taraškievica)
Welsh
Icelandic
Bosnian
Albanian
Tamil
Belarusian
Bishnupriya Manipuri
Aragonese
Occitan
Bengali
Swahili
Ido
Lombard
West Frisian
Gujarati
Afrikaans
Low Saxon
Malayalam
Quechua
Sicilian
Urdu
Kurdish
Cantonese
Sundanese
Asturian
Neapolitan
Samogitian
Armenian
Yoruba
Irish
Chuvash
Walloon
Nepali
Ripuarian
Western Panjabi
Kannada
Tajik
Tarantino
Venetian
Yiddish
Scottish Gaelic
Tatar
Min Nan
Ossetian
Uzbek
Alemannic
Kapampangan
Sakha
Kazakh
Egyptian Arabic
Maori
Amharic
Limburgian
Nahuatl
Upper Sorbian
Gilaki
Corsican
Gan
Mongolian
Scots
Interlingua
Central_Bicolano
Burmese
Faroese
Võro
Dutch Low Saxon
Sinhalese
Turkmen
West Flemish
Sanskrit
Bavarian
Malagasy
Manx
Ilokano
Divehi
Norman
Pangasinan
Banyumasan
Sorani
Romansh
Northern Sami
Zazaki
Mazandarani
Wu
Friulian
Uyghur
Ligurian
Maltese
Bihari
Novial
Tibetan
Anglo-Saxon
Kashubian
Sardinian
Classical Chinese
Fiji Hindi
Khmer
Ladino
Zamboanga Chavacano
Pali
Franco-Provençal/Arpitan
Pashto
Hakka
Cornish
Punjabi
Navajo
Silesian
Kalmyk
Pennsylvania German
Hawaiian
Saterland Frisian
Interlingue
Somali
Komi
Karachay-Balkar
Crimean Tatar
Tongan
Acehnese
Meadow Mari
Picard
Kinyarwanda
Erzya
Lingala
Extremaduran
Guarani
Kirghiz
Emilian-Romagnol
Assyrian Neo-Aramaic
Papiamentu
Aymara
Chechen
Lojban
Wolof
Banjar
Bashkir
North Frisian
Greenlandic
Tok Pisin
Udmurt
Kabyle
Tahitian
Sranan
Zealandic
Hill Mari
Komi-Permyak
Lower Sorbian
Abkhazian
Gagauz
Igbo
Oriya
Lao
Kongo
Avar
Moksha
Mirandese
Romani
Old Church Slavonic
Karakalpak
Samoan
Moldovan
Tetum
Gothic
Kashmiri
Bambara
Inupiak
Sindhi
Bislama
Lak
Nauruan
Norfolk
Inuktitut
Pontic
Assamese
Cherokee
Min Dong
Palatinate German
Swati
Hausa
Ewe
Tigrinya
Oromo
Zulu
Zhuang
Venda
Tsonga
Kirundi
Cree
Dzongkha
Sango
Chamorro
Luganda
Buginese
Buryat (Russia)
Fijian
Chichewa
Akan
Sesotho
Xhosa
Fula
Tswana
Kikuyu
Tumbuka
Shona
Twi
Cheyenne
Ndonga
Sichuan Yi
Choctaw
Marshallese
Afar
Kuanyama
Hiri Motu
Muscogee
Kanuri
Herero

Parse it as html.

from lxml import html

url = 'http://meta.wikimedia.org/wiki/List_of_Wikipedias'
tree = html.parse(url)
languages = tree.xpath('//table/tr/td[2]/a/text()')
print('\n'.join(languages))

Output

English
German
French
Polish
Italian
Japanese
Spanish
Portuguese
Dutch
Russian
Swedish
Chinese
Catalan
Norwegian (Bokmål)
Finnish
Ukrainian
Czech
Hungarian
Romanian
Korean
Turkish
Vietnamese
Indonesian
Danish
Arabic
Esperanto
Serbian
Lithuanian
Slovak
Volapük
Persian
Hebrew
Bulgarian
Slovenian
Malay
Waray-Waray
Croatian
Estonian
Newar / Nepal Bhasa
Simple English
Hindi
Galician
Thai
Basque
Norwegian (Nynorsk)
Aromanian
Greek
Haitian
Azerbaijani
Tagalog
Latin
Telugu
Georgian
Macedonian
Cebuano
Serbo-Croatian
Breton
Piedmontese
Marathi
Latvian
Luxembourgish
Javanese
Belarusian (Taraškievica)
Welsh
Icelandic
Bosnian
Albanian
Tamil
Belarusian
Bishnupriya Manipuri
Aragonese
Occitan
Bengali
Swahili
Ido
Lombard
West Frisian
Gujarati
Afrikaans
Low Saxon
Malayalam
Quechua
Sicilian
Urdu
Kurdish
Cantonese
Sundanese
Asturian
Neapolitan
Samogitian
Armenian
Yoruba
Irish
Chuvash
Walloon
Nepali
Ripuarian
Western Panjabi
Kannada
Tajik
Tarantino
Venetian
Yiddish
Scottish Gaelic
Tatar
Min Nan
Ossetian
Uzbek
Alemannic
Kapampangan
Sakha
Kazakh
Egyptian Arabic
Maori
Amharic
Limburgian
Nahuatl
Upper Sorbian
Gilaki
Corsican
Gan
Mongolian
Scots
Interlingua
Central_Bicolano
Burmese
Faroese
Võro
Dutch Low Saxon
Sinhalese
Turkmen
West Flemish
Sanskrit
Bavarian
Malagasy
Manx
Ilokano
Divehi
Norman
Pangasinan
Banyumasan
Sorani
Romansh
Northern Sami
Zazaki
Mazandarani
Wu
Friulian
Uyghur
Ligurian
Maltese
Bihari
Novial
Tibetan
Anglo-Saxon
Kashubian
Sardinian
Classical Chinese
Fiji Hindi
Khmer
Ladino
Zamboanga Chavacano
Pali
Franco-Provençal/Arpitan
Pashto
Hakka
Cornish
Punjabi
Navajo
Silesian
Kalmyk
Pennsylvania German
Hawaiian
Saterland Frisian
Interlingue
Somali
Komi
Karachay-Balkar
Crimean Tatar
Tongan
Acehnese
Meadow Mari
Picard
Kinyarwanda
Erzya
Lingala
Extremaduran
Guarani
Kirghiz
Emilian-Romagnol
Assyrian Neo-Aramaic
Papiamentu
Aymara
Chechen
Lojban
Wolof
Banjar
Bashkir
North Frisian
Greenlandic
Tok Pisin
Udmurt
Kabyle
Tahitian
Sranan
Zealandic
Hill Mari
Komi-Permyak
Lower Sorbian
Abkhazian
Gagauz
Igbo
Oriya
Lao
Kongo
Avar
Moksha
Mirandese
Romani
Old Church Slavonic
Karakalpak
Samoan
Moldovan
Tetum
Gothic
Kashmiri
Bambara
Inupiak
Sindhi
Bislama
Lak
Nauruan
Norfolk
Inuktitut
Pontic
Assamese
Cherokee
Min Dong
Palatinate German
Swati
Hausa
Ewe
Tigrinya
Oromo
Zulu
Zhuang
Venda
Tsonga
Kirundi
Cree
Dzongkha
Sango
Chamorro
Luganda
Buginese
Buryat (Russia)
Fijian
Chichewa
Akan
Sesotho
Xhosa
Fula
Tswana
Kikuyu
Tumbuka
Shona
Twi
Cheyenne
Ndonga
Sichuan Yi
Choctaw
Marshallese
Afar
Kuanyama
Hiri Motu
Muscogee
Kanuri
Herero
梓梦 2024-10-21 05:51:21

XPath 需要命名空间。您下载的页面开始:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" dir="ltr">

所以您实际上想要

xpath('//html:table')

其中 html 是绑定到 "http://www.w3.org/1999/xhtml" 的前缀,

您将拥有了解如何在 lxml 中绑定名称空间 - 我不是 python 专家。

如果这是你的问题,我表示同情 - 它已经让我和其他许多人陷入困境!

XPath requires namespaces. The page you have downloaded starts:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" dir="ltr">

So you actually want

xpath('//html:table')

where html is the prefix bound to "http://www.w3.org/1999/xhtml"

You will have to find out how to bind namespaces in lxml - I am not a python expert.

If this is your problem I sympathize - it has caught me and many others out!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文