Golang Table Webscraping
我有下面的代码,可以从HTML表中刮擦特定的单元格值。您可以转到 https://www.haremaltin.com/altin-fiyatlari 在检查模式下“ satis__ata_eski”以查看该值。我是Golang的初学者,并尽了我最大的努力,但不幸的是我无法获得这一价值。有没有人可以帮助我?顺便说一句,他们没有社区API。还有一件事,请添加时间。睡觉等待加载页面。如果它返回“ - ”是因为没有加载页面尚未加载
package main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func main() {
url := "https://www.haremaltin.com/altin-fiyatlari"
resp, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("failed to fetch data: %d %s", resp.StatusCode, resp.Status)
}
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatal(err)
}
doc.Find("tr__ATA_ESKI tr").Each(func(j int, tr *goquery.Selection) {
data := []string{}
tr.Find("td").Each(func(ix int, td *goquery.Selection) {
e := td.Text()
data = append(data, e)
fmt.Println(data)
})
})
}
解决方案:
您可以看到下面的答案,如果您想检查 看看为什么使用这种解决方案,
我们可以使用这种解决方案。使用迭代从地图获取特定值。我也有一个代码。但是,如果您有任何更轻松的方法,请发表评论,请提供
for _, v := range data { // we need value part of the map
m, ok := v.(map[string]interface{}) // we need the convert the map
// into interface for iteration
if !ok {
fmt.Printf("Error %T", v)
}
for k, l := range m {
if k == "ATA_ESKI"{ // the value we want is inside of this map
a, ok := l.(map[string]interface{}) // interface convert again
if !ok {
fmt.Printf("Error %T", v)
}
for b,c := range a{
if b == "satis"{ // the value we want
fmt.Println("Price is", c)
}
}
}
}
}
以下迭代的完整解决方案:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strings"
)
func main() {
fecthData()
}
func fecthData() (map[string]interface{}, error) {
body := strings.NewReader("dil_kodu=tr")
req, err := http.NewRequest("POST",
"https://www.haremaltin.com/dashboard/ajax/doviz", body)
if err != nil {
// handle err
return nil, err
}
req.Header.Set("X-Requested-With", "XMLHttpRequest")
resp, err := http.DefaultClient.Do(req)
if err != nil {
// handle err
return nil, err
}
defer resp.Body.Close()
jsonData, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
return nil, err
}
var data map[string]interface{}
err = json.Unmarshal(jsonData, &data)
if err != nil {
return nil, err
}
for _, v := range data {
m, ok := v.(map[string]interface{})
if !ok {
fmt.Printf("Error %T", v)
}
for k, l := range m {
if k == "ATA_ESKI" {
a, ok := l.(map[string]interface{})
if !ok {
fmt.Printf("Error %T", v)
}
for b, c := range a {
if b == "satis" {
fmt.Println("Price", c)
}
}
}
}
}
return data, nil
}
I have a code as below to scrape the specific cell value from html table. You can go to https://www.haremaltin.com/altin-fiyatlari website and search "satis__ATA_ESKI" on inspect mode to see that value. I am beginner on golang and did my best but unfortunately I couldn't get that value. Is there anybody to help me? Btw they don't have a community api. And one more thing, add time.sleep to wait for page to be loaded. If it returns "-" it is because page wasn't be loaded yet
package main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func main() {
url := "https://www.haremaltin.com/altin-fiyatlari"
resp, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("failed to fetch data: %d %s", resp.StatusCode, resp.Status)
}
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatal(err)
}
doc.Find("tr__ATA_ESKI tr").Each(func(j int, tr *goquery.Selection) {
data := []string{}
tr.Find("td").Each(func(ix int, td *goquery.Selection) {
e := td.Text()
data = append(data, e)
fmt.Println(data)
})
})
}
SOLUTION:
You can see the answer below and if you want you can check to see why this kind of solution is used
Btw we can use iteration to fetch the specific value from map. I have a code for this too. But if you have any easier method just comment please
for _, v := range data { // we need value part of the map
m, ok := v.(map[string]interface{}) // we need the convert the map
// into interface for iteration
if !ok {
fmt.Printf("Error %T", v)
}
for k, l := range m {
if k == "ATA_ESKI"{ // the value we want is inside of this map
a, ok := l.(map[string]interface{}) // interface convert again
if !ok {
fmt.Printf("Error %T", v)
}
for b,c := range a{
if b == "satis"{ // the value we want
fmt.Println("Price is", c)
}
}
}
}
}
Full solution with iteration below:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strings"
)
func main() {
fecthData()
}
func fecthData() (map[string]interface{}, error) {
body := strings.NewReader("dil_kodu=tr")
req, err := http.NewRequest("POST",
"https://www.haremaltin.com/dashboard/ajax/doviz", body)
if err != nil {
// handle err
return nil, err
}
req.Header.Set("X-Requested-With", "XMLHttpRequest")
resp, err := http.DefaultClient.Do(req)
if err != nil {
// handle err
return nil, err
}
defer resp.Body.Close()
jsonData, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
return nil, err
}
var data map[string]interface{}
err = json.Unmarshal(jsonData, &data)
if err != nil {
return nil, err
}
for _, v := range data {
m, ok := v.(map[string]interface{})
if !ok {
fmt.Printf("Error %T", v)
}
for k, l := range m {
if k == "ATA_ESKI" {
a, ok := l.(map[string]interface{})
if !ok {
fmt.Printf("Error %T", v)
}
for b, c := range a {
if b == "satis" {
fmt.Println("Price", c)
}
}
}
}
}
return data, nil
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以通过HTTP POST请求获取。不要忘记添加带有标头的X征用标题。
You can fetch via http Post request. Do not forget to add X-Requested-With header to request.
由于该表由JavaScript提供动力,因此我建议您使用其他方法。这就是原因。
您真正抓取的是
此网页。您可以在终端中运行此卷发,并获得与GO的REST请求完全相同的答复(确切地说是一个很强的词,大多数时候,在这种情况
下,您可以看到该
OUT中没有值。 HTML
您创建的文件这就是为什么您的GO脚本没有返回任何值的原因。您需要运行JavaScript才能填充页面,因此您可以刮擦它。
我已经使用过此 https://github.com/chromedp/chromedp/chromedp 成功。通过使用此工具,您的工作流程看起来像是..
Since the table is powered by javascript, i would suggest you use a different approach. Here's why.
What you're really scraping is
this web page. You can run this curl in a terminal and get the exact same reply as go's rest request ( exact is a strong word, most of the time, for sure this case )
As you can see no values are present in that
out.html
file you created, thats why your go script isn't returning any values.You need to have javascript running to populate the page, so you can then scrape it.
I've used this https://github.com/chromedp/chromedp in a couple projects with great success. By using this tool your workflow will look something like..