Selenium 可以验证浏览器加载的 PDF 中的文本吗?

发布于 2024-09-15 12:57:36 字数 189 浏览 4 评论 0原文

我的网络应用程序在浏览器中加载 pdf 文件。我已经弄清楚如何使用以下命令检查 pdf 是否已正确加载:

verifyAttribute xpath=//嵌入/@src {此处显示 PDF 的 URL}

如果能够使用 Selenium 检查 pdf 的内容,那就太好了 - 例如验证是否存在某些文本。有什么办法可以做到这一点吗?

My web application loads a pdf in the browser. I have figured out how to check that the pdf has loaded correctly using:

verifyAttribute
xpath=//embed/@src
{URL of PDF goes here}

It would be really nice to be able to check the contents of the pdf with Selenium - for example verify that some text is present. Is there any way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

无语# 2024-09-22 12:57:36

虽然本机不支持,但我找到了几种使用 java 驱动程序的方法。一种方法是在浏览器中打开 pdf(已安装 adobe acrobat),然后使用键盘快捷键选择所有文本 (CTRL+A),然后将其复制到剪贴板 (CTRL+C),然后您可以验证剪贴板中的文本。例如:

protected String getLastWindow() {
    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}

@Test
public void testTextInPDF() {
    session().click("link=View PDF");
    String popupName = getLastWindow();
    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
    session().selectWindow(popupName);

    session().windowMaximize();
    session().windowFocus();
    Thread.sleep(3000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("65"); // Stands for A "ascii code for A"
    session().keyUpNative("17"); //Releases CTRL key
    Thread.sleep(1000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("67"); // Stands for C "ascii code for C"
    session().keyUpNative("17"); //Releases CTRL key

    TextTransfer textTransfer = new TextTransfer();
    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}

另一种方法,仍然是在java中,是下载pdf,然后使用PDFBox将pdf转换为文本,请参见http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html 的有关如何执行此操作的示例。

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:

protected String getLastWindow() {
    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}

@Test
public void testTextInPDF() {
    session().click("link=View PDF");
    String popupName = getLastWindow();
    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
    session().selectWindow(popupName);

    session().windowMaximize();
    session().windowFocus();
    Thread.sleep(3000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("65"); // Stands for A "ascii code for A"
    session().keyUpNative("17"); //Releases CTRL key
    Thread.sleep(1000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("67"); // Stands for C "ascii code for C"
    session().keyUpNative("17"); //Releases CTRL key

    TextTransfer textTransfer = new TextTransfer();
    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}

Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

月棠 2024-09-22 12:57:36

您无法使用本机 WebDriver 来执行此操作。然而,这里可以使用PDFBox API来读取PDF文件的内容。您必须首先将焦点转移到打开 PDF 文件的浏览器窗口。然后,您可以解析 PDF 文件的所有内容并搜索所需的文本字符串。

这里是使用 PDFBox API 在 PDF 文档中进行搜索的代码。

You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.

Here is a code to use PDFBox API to search within PDF document.

十二 2024-09-22 12:57:36

不幸的是你根本不能用 Selenium 来做到这一点

Unfortunately you can not do this at all with Selenium

双马尾 2024-09-22 12:57:36
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import org.pdfbox.cos.COSDocument;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;

public class pdfToTextConverter {

public static void pdfToText(String path_to_PDF_file, String Path_to_output_text_file) throws FileNotFoundException, IOException{
     //Parse text from a PDF into a string variable
     File f = new File("path_to_PDF_file");

     PDFParser parser = new PDFParser(new FileInputStream(f));
     parser.parse();

     COSDocument cosDoc = parser.getDocument();
     PDDocument pdDoc = new PDDocument(cosDoc);

     PDFTextStripper pdfStripper = new PDFTextStripper();
     String parsedText = pdfStripper.getText(pdDoc);

     System.out.println(parsedText);

     //Write parsed text into a file
     PrintWriter pw = new PrintWriter("Path_to_output_text_file");
     pw.print(parsedText);
     pw.close(); 

}

}


JAR Source
http://sourceforge.net/projects/pdfbox/files/latest/download?source=files
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import org.pdfbox.cos.COSDocument;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;

public class pdfToTextConverter {

public static void pdfToText(String path_to_PDF_file, String Path_to_output_text_file) throws FileNotFoundException, IOException{
     //Parse text from a PDF into a string variable
     File f = new File("path_to_PDF_file");

     PDFParser parser = new PDFParser(new FileInputStream(f));
     parser.parse();

     COSDocument cosDoc = parser.getDocument();
     PDDocument pdDoc = new PDDocument(cosDoc);

     PDFTextStripper pdfStripper = new PDFTextStripper();
     String parsedText = pdfStripper.getText(pdDoc);

     System.out.println(parsedText);

     //Write parsed text into a file
     PrintWriter pw = new PrintWriter("Path_to_output_text_file");
     pw.print(parsedText);
     pw.close(); 

}

}


JAR Source
http://sourceforge.net/projects/pdfbox/files/latest/download?source=files
难忘№最初的完美 2024-09-22 12:57:36

有一个办法。

  1. 在点击链接之前您可以获取href值
    element.FindElement(By.TagName("href")).Text
  2. 然后在 PDF 加载后您可以获取 Url
    驱动程序.GetUrl();
  3. 然后你可以检查 url 是否包含 href。

这不是最好的,但总比没有好。

There is a way.

  1. Before you click the link you can obtain the href value
    element.FindElement(By.TagName("href")).Text
  2. Then after the PDF loads you can get the Url
    driver.GetUrl();
  3. Then you can just check to see if the url contains the href.

It's not the best, but it's better than nothing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文