在 .NET 和 C# 中从网站提取数据时出现问题

发布于 2024-09-05 10:52:33 字数 931 浏览 11 评论 0原文

我编写了一个网页抓取程序来转到页面列表并将所有 html 写入文件。问题是，当我拉出一段文本时，某些字符被写成“�”。如何将这些字符提取到我的文本文件中？这是我的代码：

string baseUri = String.Format("http://www.rogersmushrooms.com/gallery/loadimage.asp?did={0}&blockName={1}", id.ToString(), name.Trim());

// our third request is for the actual webpage after the login.
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(baseUri);
request.Method = "GET";
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)";
//get the response object, so that we may get the session cookie.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());

// and read the response
string page = reader.ReadToEnd();

StreamWriter SW;
string filename = string.Format("{0}.txt", id.ToString());
SW = File.AppendText("C:\\Share\\" + filename);

SW.Write(page);

reader.Close();
response.Close();

原文

I have written a web scraping program to go to a list of pages and write all the html to a file. The problem is that when I pull a block of text some of the characters get written as '�'. How do I pull those characters into my text file? Here is my code:

string baseUri = String.Format("http://www.rogersmushrooms.com/gallery/loadimage.asp?did={0}&blockName={1}", id.ToString(), name.Trim());

// our third request is for the actual webpage after the login.
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(baseUri);
request.Method = "GET";
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)";
//get the response object, so that we may get the session cookie.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());

// and read the response
string page = reader.ReadToEnd();

StreamWriter SW;
string filename = string.Format("{0}.txt", id.ToString());
SW = File.AppendText("C:\\Share\\" + filename);

SW.Write(page);

reader.Close();
response.Close();

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三五鸿雁 2024-09-12 10:52:33

您正在将名为 loadimage 的页面保存到文本文件中。你确定这真的是所有文字吗？

无论哪种方式，您都可以使用 System.Net.WebClient.DownloadFile() 节省大量代码。

回复收藏 0 原文

半城柳色半声笛 2024-09-12 10:52:33

您需要在此行中指定编码：

StreamReader reader = new StreamReader(response.GetResponseStream());

并且

File.AppendText("C:\\Share\\" + filename); 使用 UTF-8

You need to specify your encoding in this line:

StreamReader reader = new StreamReader(response.GetResponseStream());

and

File.AppendText("C:\\Share\\" + filename); uses UTF-8

回复收藏 0 原文

看轻我的陪伴 2024-09-12 10:52:33

指定 Unicode 编码，如下所示：

New StreamReader(response.GetResponseStream(), Text.Encoding.UTF8)

..same for the StreamWriter

Specify Unicode encoding, like so:

New StreamReader(response.GetResponseStream(), Text.Encoding.UTF8)

..same for the StreamWriter

回复收藏 0 原文

~没有更多了~

关于作者

太傻旳人生

暂无简介

文章

27 人气

关注发私信

qq_VRzBBA45

文章 0 评论 0

关注

痴情

文章 0 评论 0

关注

。

文章 0 评论 0

关注

Mu.

文章 0 评论 0

关注

凉薄对峙

文章 0 评论 0

关注

不落城

文章 0 评论 0

友情链接

文江博客

在 .NET 和 C# 中从网站提取数据时出现问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

qq_VRzBBA45

痴情

。

Mu.

凉薄对峙

不落城

友情链接

在 .NET 和 C# 中从网站提取数据时出现问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

qq_VRzBBA45

痴情

。

Mu.

凉薄对峙

不落城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。