在 Java 应用程序中查询内存中的一组对象的技术
我们有一个系统,它通过调用另一个系统上的接口来执行“粗略搜索”,该接口返回一组 Java 对象。一旦我们收到搜索结果,我需要能够根据描述属性状态的某些标准进一步过滤生成的 Java 对象(例如,从初始对象返回 xy > z && ab == 的所有对象) c).
每次用于过滤对象集的标准部分是用户可配置的,我的意思是用户将能够选择要匹配的值和范围,但他们可以从中选择的属性将是固定集。
对于每次搜索,数据集可能包含 <= 10,000 个对象。应用程序用户群将手动执行搜索,每天可能不超过 2000 次(大约)。可能值得一提的是,结果集中的所有对象都是已知的域对象类,它们具有描述其结构和关系的 Hibernate 和 JPA 注释。
可能的解决方案
我突然想到了 3 种方法:
- 对于每次搜索,将初始结果集对象保留在我们的数据库中,然后使用 Hibernate 使用更细粒度的标准重新查询它们。
- 使用内存数据库(例如hsqldb?)来查询和细化初始结果集。
- 编写一些自定义代码来迭代初始结果集并提取所需的记录。
选项 1
选项 1 似乎涉及通过网络到物理数据库 (Oracle 10g) 的大量往返操作,这可能会导致大量网络和磁盘活动。它还要求每个搜索的结果与其他结果集隔离,以确保不同的搜索不会相互干扰。
选项 2
选项 2 原则上似乎是一个好主意,因为它允许我在内存中进行更精细的查询,并且不需要结果数据的持久性,结果数据只会在搜索完成后被丢弃。直觉是,这也可能具有相当高的性能,但可能会导致更大的内存开销(这很好,因为我们可以非常灵活地控制 JVM 获取的内存量)。
选项 3
选项 3 可能非常高效,但我想避免这样做,因为我们编写的任何代码都需要如此仔细的测试,以至于实现足够灵活和健壮的东西所花费的时间可能会令人望而却步。
我没有时间对所有 3 个想法进行原型设计,因此我正在寻找人们对上述 3 个选项的评论,以及我尚未考虑的任何进一步的想法,以帮助我决定哪个想法可能最合适。我目前倾向于选项 2(在内存数据库中),因此也希望听到有在内存中查询 POJO 经验的人的意见。
希望我已经足够详细地描述了这种情况,但请不要犹豫询问是否需要任何进一步的信息来更好地理解该场景。
干杯,
艾德
We have a system which performs a 'coarse search' by invoking an interface on another system which returns a set of Java objects. Once we have received the search results I need to be able to further filter the resulting Java objects based on certain criteria describing the state of the attributes (e.g. from the initial objects return all objects where x.y > z && a.b == c).
The criteria used to filter the set of objects each time is partially user configurable, by this I mean that users will be able to select the values and ranges to match on but the attributes they can pick from will be a fixed set.
The data sets are likely to contain <= 10,000 objects for each search. The search will be executed manually by the application user base probably no more than 2000 times a day (approx). It's probably worth mentioning that all the objects in the result set are known domain object classes which have Hibernate and JPA annotations describing their structure and relationship.
Possible Solutions
Off the top of my head I can think of 3 ways of doing this:
- For each search persist the initial result set objects in our database, then use Hibernate to re-query them using the finer grained criteria.
- Use an in-memory Database (such as hsqldb?) to query and refine the initial result set.
- Write some custom code which iterates the initial result set and pulls out the desired records.
Option 1
Option 1 seems to involve a lot of toing and froing across a network to a physical Database (Oracle 10g) which might result in a lot of network and disk activity. It would also require the results from each search to be isolated from other result sets to ensure that different searches don't interfere with each other.
Option 2
Option 2 seems like a good idea in principle as it would allow me to do the finer query in memory and would not require the persistence of result data which would only be discarded after the search was complete. Gut feeling is that this could be pretty performant too but might result in larger memory overheads (which is fine as we can be pretty flexible on the amount of memory our JVM gets).
Option 3
Option 3 could be very performant but is something I would like to avoid as any code we write would require such careful testing that the time taken to acheive something flexible and robust enough would probably be prohibitive.
I don't have time to prototype all 3 ideas so I am looking for comments people may have on the 3 options above, plus any further ideas I have not considered, to help me decide which idea might be most suitable. I'm currently leaning toward option 2 (in memory database) so would be keen to hear from people with experience of querying POJOs in memory too.
Hopefully I have described the situation in enough detail but don't hesitate to ask if any further information is required to better understand the scenario.
Cheers,
Edd
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
选项 1 和 2 非常兼容:通过实现其中一个,您可以通过简单地重新配置 persistence.xml 将其替换为另一个(假定内存数据库与 JPA 兼容,例如 JavaDB、Derby 等)。
选项 3 是重新实现第三方软件(数据库)和您自己的代码(现有的 JPA 实体)。您还列出了其优点作为关注点。对于您的情况来说,这显然是一个不太可行的选择。我也想不出还有什么可以推广选项 3 的。
考虑到用例及其时间跨度,内存数据库似乎更合适。如果需求变得不太瞬态,那么您可以切换到 Oracle。
Options 1 and 2 are quite compatible: by implementing one you can replace it with the other with simple reconfiguration of persistence.xml (given that in-memory database is JPA compatible, e.g. JavaDB, Derby, etc.).
Option 3 is re-implementing both third-party software (database) and your own code (existing JPA entities). You also listed its advantages as concerns. It's clearly a less feasible option in your case. I can't think of anything else to promote Option 3 either.
It seems that in-memory database is more suitable given use cases and their time span. If requirements evolve into less transient ones then you can switch to Oracle.
如果您的表达式不太复杂,您可以使用表达式语言来评估 Java 对象 (POJO) 上的字符串查询。我可以推荐 MVEL http://mvel.codehaus.org 。
这个想法是将对象放入 MVEL 上下文中。然后你提供根据MVEL简单表示法编写的字符串查询,最后评估表达式。
示例取自 MVEL 站点:
通常表达式语言支持遍历对象图(集合)和
以 JSP EL 风格(点表示法)访问成员。
另外,我可以建议看看 OGNL(谷歌一下,我不能添加多个链接)
If your expressions are not too complex, you can use an expression language for evaluating string queries on your Java objects (POJOs). I can recommend MVEL http://mvel.codehaus.org .
The idea is that you put your objects into MVEL context. Then you provide string query written according to MVEL simple notation, and finally evaluate expression.
Example taken from MVEL site:
Usually expression languages support traversing your object graph (collections) and
accessing members in JSP EL style (dot notation).
Also, I can suggest looking at OGNL (google it, I can't add more than one link)
精炼标准有多复杂?如果大多数都非常简单,我会很想从选项(3)开始,但要确保它封装在合适的接口后面,这样,如果您遇到过于复杂或效率低下而无法自己编写代码的东西,此时可以切换到内存数据库(如果设置临时表有开销,则可以批量切换到所有查询,或者仅切换到复杂的查询)。
How complex are the refining criteria? If the majority are quite simple, I'd be tempted to go for option (3) to start with, but make sure it's encapsulated behind a suitable interface so that if you come across something that is too complex or inefficient to code up yourself you can switch to the in-memory DB at that point (either wholesale for all queries, or just for the complex ones if there's an overhead in setting up the temporary tables).
选项 2 似乎不错 - 因为您可以在 1 和 1 之间切换。 2 根据需要。 3 在未来的数据大小问题方面也受到限制。查询对象意味着对存储和查询的代码结构有更大的依赖性。
也许最好的做法是包含一些缓存机制(ehcache/memcache)以及选项 2 的使用,然后进行分析以检查性能差异。
Option 2 seems to be good - since you can toggle between 1 & 2 as per need. 3 is restricted in terms of future data sizing issue as well. Querying objects would imply greater dependency on the code structure for storage and querying.
Probably it would be good idea to include some caching mechanism (ehcache/memcache) along with usage of Option 2 and then profiling to check the performance difference.