从存储库下载 tarball

发布于 2024-11-09 06:59:35 字数 447 浏览 6 评论 0原文

我目前正在开发一个从 SourceForge 抓取源代码的项目。 我想从代码存储库下载 tarball。

下面给出了一个示例链接: http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view =tar

我在下载时遇到的问题是,我无法使用传统的 URLConnection、HttpClient、HtmlUnit、Jsoup 等 API 来下载文件。指定的链接不包含任何文件名或扩展名,这使得下载过程更加复杂。

您能否建议一种方法,通过给定一组 tarball 链接作为参数,我应该能够将它们下载到我的磁盘上?另外,我可以使用 wget 下载它。有没有办法可以在 Windows 中用 Java 以编程方式执行此操作?

I am currently working on a project for scraping source code from SourceForge.
I would like to download the tarball from the code repository.

An example link is given below:
http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar

The problems I faced while downloading is that, I am unable to use conventional URLConnection, HttpClient, HtmlUnit, Jsoup, etc API's to download the file. The specified link does not contain any filename or extension, this makes the download process even more complicated.

Can you suggest a means by which given a set of tarball links as parameters, I should be able to download them to my disk? Also, I was able to download it using wget. Is there a way I can programatically do it in Java in Windows?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

懷念過去 2024-11-16 06:59:35

在进一步努力之前,请仔细阅读 Sourceforge 条款使用页面。如果您不理解 ToS,请联系 Sourceforge 并询问他们是否允许您执行您所提议的操作。


我在下载时遇到的问题是,我无法使用常规的 url、http、htmlunit、jsoup api 等来下载文件。

你的假设是不正确的。

您可以使用标准 HttpURLConnection API 或 Apache HttpClient API 等 API 来完成此类操作。如果它不起作用,那是因为

  • 您以错误的方式做了一些事情(例如,您没有配置您的Java应用程序以使用本地HTTP代理),或者
  • Sourceforge正在使用一些技术手段来阻止您这样做;请参阅服务条款。

如果您发布一些有关尝试这些方法时发生的情况的详细信息,也许我们可以为您提供帮助。

(HtmlUnit 和 Jsoup 可能不合适,因为它们针对的是 HTML 内容。)

指定的链接不包含任何文件名或扩展名,这使得下载过程更加复杂。

您可以从响应标头获取源文件名和/或内容类型。详细信息请参阅 HTTP 规范。

Before you go any further with your efforts, carefully read the Sourceforge Terms of Use page. If you don't understand the ToS, contact Sourceforge and ask them if you are allowed to do what you are proposing.


The problems i faced while downloading is that, I am unable to use conventional url, http, htmlunit, jsoup apis etc to download the file.

Your assumption is incorrect.

You CAN use APIs such as the standard HttpURLConnection API or the Apache HttpClient APIs to do this kind of thing. If it is not working, it is because

  • you are doing something the wrong way (e.g. you haven't configured your Java app to use your local HTTP proxy), or
  • Sourceforge are using some technical means to stop you doing this; see the ToS.

If you post some details on what is happening when you try these approaches, maybe we can help you.

(HtmlUnit and Jsoup are probably inappropriate because they target HTML content.)

The specified link does not contain any filename or extension, this makes the download process even more complicated.

You can get the source filename and / or content type from the response headers. Refer to the HTTP specifications for details.

自在安然 2024-11-16 06:59:35

如果您确实想要违反 SourceForges ToS,那么这可能会有所帮助。

您需要 wget.exe,如您所愿。

ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
Process p = pb.start();

只要 wget.exe 与类文件位于同一目录中,此操作就可以工作。

您可能还想检查该文件是否存在,在这种情况下,您可以执行以下操作:

ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
       Process p = pb.start();
       int exitValue = p.waitFor();
       BufferedReader reader;
       // System.out.println("Exit Value" + exitValue);
       if (exitValue == 0) {
               reader = new BufferedReader(new InputStreamReader(p
                               .getInputStream()));
       } else {
               reader = new BufferedReader(new InputStreamReader(p
                               .getErrorStream()));
       }
       StringBuffer sb = new StringBuffer();
       String temp = reader.readLine();
       while (temp != null) {
               sb.append(temp);
               temp = reader.readLine();
       }

       reader.close();
       System.out.println(sb.toString());
if(sb.toString().indexOf("404") != -1) {
//means that the file does not exist
System.out.println("File does not exist, or access is denied");
} else {
if(sb.toString().indexOf("200") != -1) {
//file exists, download it
System.out.println("File exists, downloading...");
ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
    Process p = pb.start();
}

但我建议不要抓取 SourceForge,除非它是您自己的代码(我作为更新程序做过一次) 。如果你这样做,并且我的例子有帮助,请不要提及我。 =]

希望我有帮助!

In the case that you really DO want to perhaps violate SourceForges ToS, then this may help.

You need wget.exe, as you wanted.

ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
Process p = pb.start();

This will work as long as you have wget.exe in the same directory as the class file.

You may also want to check if the file DOES exist, in which case you would do something among the lines of:

ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
       Process p = pb.start();
       int exitValue = p.waitFor();
       BufferedReader reader;
       // System.out.println("Exit Value" + exitValue);
       if (exitValue == 0) {
               reader = new BufferedReader(new InputStreamReader(p
                               .getInputStream()));
       } else {
               reader = new BufferedReader(new InputStreamReader(p
                               .getErrorStream()));
       }
       StringBuffer sb = new StringBuffer();
       String temp = reader.readLine();
       while (temp != null) {
               sb.append(temp);
               temp = reader.readLine();
       }

       reader.close();
       System.out.println(sb.toString());
if(sb.toString().indexOf("404") != -1) {
//means that the file does not exist
System.out.println("File does not exist, or access is denied");
} else {
if(sb.toString().indexOf("200") != -1) {
//file exists, download it
System.out.println("File exists, downloading...");
ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
    Process p = pb.start();
}

But I reccomend NOT scraping SourceForge, unless its your own code that you are scraping (I did that once as an updater program). If you do, and my example helps, please kindly don't mention me. =]

Hope I helped!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文