我可以从多个线程安全地使用 xpath 表达式查询 DOM 文档吗?

发布于 2024-09-03 04:44:22 字数 3324 浏览 7 评论 0 原文

我计划在应用程序中使用 dom4j DOM Document 作为静态缓存,其中多个线程可以查询文档。 考虑到文档本身永远不会改变,从多个线程查询它是否安全?

我写了下面的代码来测试它,但我不确定它是否真的证明操作是安全的?

    package test.concurrent_dom;

    import org.dom4j.Document;
    import org.dom4j.DocumentException;
    import org.dom4j.DocumentHelper;
    import org.dom4j.Element;
    import org.dom4j.Node;

    /**
     * Hello world!
     *
     */
    public class App extends Thread
    {
        private static final String xml = 
            "<Session>"
                + "<child1 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText1</child1>"
                + "<child2 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText2</child2>" 
                + "<child3 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText3</child3>"
            + "</Session>";

        private static Document document;

        private static Element root;

        public static void main( String[] args ) throws DocumentException
        {
            document = DocumentHelper.parseText(xml);
            root = document.getRootElement();

            Thread t1 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child1");                 
                        if(!n1.getText().equals("ChildText1")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            Thread t2 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child2");                 
                        if(!n1.getText().equals("ChildText2")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            Thread t3 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child3");                 
                        if(!n1.getText().equals("ChildText3")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            t1.start();
            t2.start();
            t3.start();
            System.out.println( "Hello World!" );
        }    

    }

I plan to use dom4j DOM Document as a static cache in an application where multiples threads can query the document.
Taking into the account that the document itself will never change, is it safe to query it from multiple threads?

I wrote the following code to test it, but I am not sure that it actually does prove that operation is safe?

    package test.concurrent_dom;

    import org.dom4j.Document;
    import org.dom4j.DocumentException;
    import org.dom4j.DocumentHelper;
    import org.dom4j.Element;
    import org.dom4j.Node;

    /**
     * Hello world!
     *
     */
    public class App extends Thread
    {
        private static final String xml = 
            "<Session>"
                + "<child1 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText1</child1>"
                + "<child2 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText2</child2>" 
                + "<child3 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText3</child3>"
            + "</Session>";

        private static Document document;

        private static Element root;

        public static void main( String[] args ) throws DocumentException
        {
            document = DocumentHelper.parseText(xml);
            root = document.getRootElement();

            Thread t1 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child1");                 
                        if(!n1.getText().equals("ChildText1")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            Thread t2 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child2");                 
                        if(!n1.getText().equals("ChildText2")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            Thread t3 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child3");                 
                        if(!n1.getText().equals("ChildText3")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            t1.start();
            t2.start();
            t3.start();
            System.out.println( "Hello World!" );
        }    

    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

白馒头 2024-09-10 04:44:22

http://xerces.apache.org/xerces2-j/faq-dom.html

没有。 DOM 不需要
实现是线程安全的。如果
你需要从以下位置访问 DOM
多线程,你需要
添加适当的锁到您的
应用程序代码。

如果不查看实现,就不可能知道 selectSingleNode 是否使用任何共享状态来读取 DOM。我认为假设它不是线程安全的是最安全的。

另一种方法是使用您自己的 XPath 处理器,例如 Jaxen,它是线程安全的。

XPath 对象是完全可重入的并且
线程安全。它们不包含内部
状态进行评估,因此可以是
轻松缓存并在
应用。一旦你有了 XPath
对象,您可以将其应用于
各种初始上下文并检索
结果有几种不同的方式:
--- SAX 路径和 Jaxen 简介

JAXEN Jira 对线程安全有各种修复问题,提供了 Jaxen 被设计为线程安全的证据。这是我偶然遇到的一个
并且确认 Jaxen 是线程安全的作者之一。

除了线程安全之外,Jaxen 与模型无关 - 它可以与许多模型(W3C DOM、XOM、Dom4J、JDOM)配合使用,并且可以通过实现几个接口来插入自定义模型。

我认为 W3C DOM 上的简单访问器和迭代器是线程安全的。但这只是一种预感,并非具体事实。如果您想 100% 确定,请使用专为线程安全而设计的 DOM,例如 dom4j

一些入门资源:
- 使用 Jaxen 的示例
- Jaxen 常见问题解答主页

http://xerces.apache.org/xerces2-j/faq-dom.html says

No. DOM does not require
implementations to be thread safe. If
you need to access the DOM from
multiple threads, you are required to
add the appropriate locks to your
application code.

Without seeing the implementation, it's impossible to know if selectSingleNode uses any shared state for reading the DOM. I think it's safest to assume that it's not thread-safe.

An alternative is to use your own XPath processor, such as Jaxen, which is thread-safe.

XPath objects are fully reentrant and
thread-safe. They contain no internal
state for evaluation and thus can be
cached easily and shared within an
application. Once you have an XPath
object, you can apply it against
various initial contexts and retrieve
results in several different ways:
--- Introduction to SAX path and Jaxen

The JAXEN Jira has various fixes for thread-safe issues, providing evidence that Jaxen is designed to be thread-safe. This is one I came across by chance.
And confirmation that Jaxen is thread-safe from one of the authors.

As well as being thread-safe, Jaxen is model-agnostic - it works with many models (W3C DOM, XOM, Dom4J, JDOM) and custom models can be plugged in by implementing a couple of interfaces.

I would imagine that simple accessors and iterators on the W3C DOM are thread safe. But this is just a hunch, and not a concrete fact. If you want to be 100% sure, then use a DOM that is designed for thread-saftey, for example, dom4j.

Some resources to get started:
- An example of using Jaxen.
- Jaxen FAQ and homepage

葵雨 2024-09-10 04:44:22

我实际上对 dom4j DOM 并不熟悉,但如果你不确定它是否可以正确处理只读数据,我不确定它有多好。

我将做出操作假设,即您的可运行对象的可执行部分(睡眠后的部分)花费的时间不到一微秒,并且在您的测试运行中它们是连续发生的,而不是同时发生的。因此你的测试并不能真正证明任何事情。

为了进行更稳健的测试,我

  1. 消除了 3 微秒的睡眠 - 您的测试代码应该忙于生成潜在的冲突,而不是睡眠。
  2. 增加线程数 - 并发执行的线程越多,
  3. 添加原始冲突检测的机会就越大

    final AtomicReferenceowner=new AtomicReference() ;
    测试线程类
    {
        私有字符串 url ;
        预期私有字符串;
        TestThread(int index) { url = "/Session/child" + i ;预期 =“ChildText”+ i ; }
        public String toString () {预期返回;}
        公共无效运行()
        {
            而(真)
            {
                boolean own=owner.compareAndSet(null,this);
                节点 n1 = root.selectSingleNode(url);                 
                布尔错误 = !n1.getText().equals(结果);
                所有者.compareAndSet(this,null);
                if(!own) { System.out.println (owner.get() + " 冲突 " + this ) }
                if(错误){System.out.println(this + " 错误!");
            }
        }
    }
    

    }

那么

try{
    while(true){
    Thread t1 = new TestThread(1);
    t1.start();
    Thread t2 = new TestThread(2);
    t2.start();
    Thread t3 = new TestThread(3);
    t3.start();
    }
}
catch(Throwable thr){
    thr.printStackTrace();
}

如果它按预测工作(这是未编译和未经测试的),那么它将继续生成新线程,新线程将尝试读取文档。他们将报告是否与另一个线程可能发生时间冲突。如果读取到错误值,他们会报告。它会不断生成新线程,直到你的系统耗尽资源,然后它就会崩溃。

I am actually not familiar with dom4j DOM but if you are not sure it can properly handle read-only data, I am not sure how good it is.

I will make the operational assumption that the executable part of your runnables (the part after the sleep) takes less than one microsecond and in your test run they happened consecutively, not concurrently. Thus your test does not really prove anything.

For a more robust test, I

  1. eliminated the 3 microsecond sleep - your test code should be busy generating potential conflicts, not sleeping.
  2. increased the thread count - the more concurrently executing threads, the more chance
  3. added primitive conflict detection

    final AtomicReference<Thread>owner=new AtomicReference<Thread>() ;
    class TestThread
    {
        private String url ;
        private String expected ;
        TestThread(int index) { url = "/Session/child" + i ; expected="ChildText" + i ; }
        public String toString () {return expected;}
        public void run()
        {
            while(true)
            {
                boolean own=owner.compareAndSet(null,this);
                Node n1 = root.selectSingleNode(url);                 
                boolean wrong = !n1.getText().equals(result);
                owner.compareAndSet(this,null);
                if(!own) { System.out.println ( owner.get() + " conflicts " + this ) }
                if(wrong){System.out.println(this + " WRONG!");
            }
        }
    }
    

    }

then

try{
    while(true){
    Thread t1 = new TestThread(1);
    t1.start();
    Thread t2 = new TestThread(2);
    t2.start();
    Thread t3 = new TestThread(3);
    t3.start();
    }
}
catch(Throwable thr){
    thr.printStackTrace();
}

If it works as predicted (this is uncompiled and untested) then it will keep generating new threads, the new threads will try to read the document. They will report if they potentially time conflict with another thread. They will report if they read a wrong value. It will keep generating new threads until your system runs out of resources, then it will crash.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文