JAVA:文件存在与搜索大型 xml 数据库
我对 Java 编程很陌生,正在编写我的第一个桌面应用程序,该应用程序采用唯一的 isbn 并首先检查它是否已全部准备好保存在本地数据库中,如果是,则它只是从本地数据库读取,如果它不是从 isbndb.com 请求数据并将其输入到数据库中,而是本地数据库采用 XML 格式。现在我想知道的是,在检查条目是否全部准备好时,以下两种方法中哪一种会产生最少的开销。
方法 1.) 文件存在。
在创建所述数据库条目时,应用程序将为每个 isbn 编号创建一个名为 isbn number.xml(即 3846504937540.xml)的单独文件,并且在检查时将使用文件存在方法来检查是否使用用户提供的 isbn 准备好条目存在。
方法 2.) SAX XML 解析器。
所有条目都将输入到一个大型 XML 文件中,在检查现有条目时,将使用 SAX XML 解析器来解析该文件,然后将用户提供的 isbn 与 XML DB 中的条目进行检查是否匹配。
笔记 : 随着时间的推移,生成的条目可能会达到数千个。
任何信息将不胜感激。
I'm quite new to Java Programming and am writing my first desktop app, this app takes a unique isbn and first checks to see if its all ready held in the local DB, if it is then it just reads from the local DB, if not it requests the data from isbndb.com and enters it into the DB the local DB is in XML format. Now what im wondering is which of the following two methods would create the least overhead when checking to see if the entry all ready exists.
Method 1.) File Exists.
On creating said DB entry the app would create a seperate file for every isbn number named isbn number.xml (ie. 3846504937540.xml) and when checking would use the file exists method to check if an entry all ready exists using the user provided isbn .
Method 2.) SAX XML Parser.
All entries would be entered into a single large XML file and when checking for existing entries the SAX XML Parser would be used to parse the file and then the user provided isbn would be checked against those in the XML DB for a match.
Note :
The resulting entries could number in the thousands over time.
Any information would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为你的方法都不是那么好。我强烈建议使用 DBMS 来存储数据。如果您的系统上没有 DBMS,或者您想要一个可以在没有安装 DBMS 的系统上运行的应用程序,请查看使用 SQLite。您可以通过 David Crawshaw 的 SQLiteJDBC 在 Java 中使用它。
就您的两种方法而言,第一种方法会产生大量文件混乱,更不用说维护和一致性问题了。一旦您拥有大量条目,第二种方法就会很慢,因为您基本上必须为每个查询读取(平均)一半的数据库。使用 DBMS,您可以通过为需要快速查找的信息定义索引来避免这种情况。 DBMS 将自动维护索引。
I don't think either of your methods is all that great. I strongly suggest using a DBMS to store the data. If you don't have a DBMS on the system, or if you want an app that can run on systems without an installed DBMS, take a look at using SQLite. You can use it from Java with SQLiteJDBC by David Crawshaw.
As far as your two methods are concerned, the first will generate a huge amount of file clutter, not to mention maintenance and consistency headaches. The second method will be slow once you have a sizable number of entries because you basically have to read (on the average) half the data base for every query. With a DBMS, you can avoid this by defining indexes for the info you need to look up quickly. The DBMS will automatically maintain the indexes.
我不太喜欢依赖文件系统来完成该任务:我不知道您的应用程序有多重要,但这些 xml 文件可能会发生很多事情:) 另外,如果文件夹变得非常非常大,您需要考虑将这些文件拆分为某些分层文件夹结构,以获得良好的性能。
另一方面,如果您需要频繁更新,我不明白为什么使用 xml 文件作为数据库。
我将使用关系数据库,并在表中为每个条目添加一条新记录,并在 isbn_number 列上添加索引。
如果您有数千条记录,您很可能会使用 sqlite,并且您可以将其替换为更多强大的非嵌入式数据库(如果您需要的话),无需(或很少:))代码修改。
I don't like too much the idea of relying on the file system for that task: I don't know how critical is your application, but many things may happen to these xml files :) plus, if the folder gets very very big, you would need to think about splitting these files in some hierarchcal folder structure, to have decent performance.
On the other hand, I don't see why using an xml file as a database, if you need to update frequently.
I would use a relational database, and add a new record in a table for each entry, with an index on the isbn_number column.
If you are in the thousands records, you may very well go with sqlite, and you can replace it with a more powerful non-embedded DB if you ever need it, with no (or little :) ) code modification.
我认为你最好使用 DBMS 而不是你的两种方法。
I think you'd better use DBMS instead of your 2 methods.
如果您希望检查存在性的开销最少,那么选项 1 可能就是您想要的,因为它是直接查找。在最坏的情况下,每次解析 XML 进行检查都需要遍历整个 XML 文件。虽然您可以使用选项 2 进行缓存,但这比选项 1 更复杂。
不过,使用选项 1 时,您需要注意一个目录下可以存储的文件数量是有限的,因此您可能必须存储 XML多层文件(例如/xmldb/38/46/3846504937540.xml)。
也就是说,从长远来看,您的选择都不是存储数据的好方法,您会发现随着数据的增长,它们变得相当具有限制性并且难以管理。
人们已经推荐使用 DBMS,我同意。最重要的是,我建议您研究基于文档的数据库(例如 MongoDB)作为您的数据库。
If you want least overhead just for checking existence, then option 1 is probably what you want, since it's direct look up. Parsing XML each time for checking requires you to to pass through the whole XML file in worst case. Although you can do caching with option 2 but that gets more complicated than option 1.
With option 1 though, you need to beware that there is a limit of how many files you can store under a directory, so you probably have to store the XML files by multiple layer (for example /xmldb/38/46/3846504937540.xml).
That said, neither of your options is good way to store data in the long run, you will find them become quite restrictive and hard to manage as data grows.
People already recommended using DBMS and I agree. On top of that I would suggest you to look into document-based database like MongoDB as your database.
扩展您的数据库表,使其不仅包含 XML 字符串,还包含 ISBN 编号。
然后根据 ISBN 列选择 XML 列。
查询:Java转义,
“从cacheTable中选择XMLString,其中isbn='”+ isbn +“'”
另一种方法可能是使用ORM,例如Hibernate< /strong>.
在 ORM 中,您不是将整个 XML 文档保存在一列中,而是为每个元素和属性使用不同的列,甚至可以将文档拆分为多个表,以实现更简单的长期设计。
Extend your db table to not only include the XML string but also the ISBN number.
Then you select the XML column based on the ISBN column.
Query: Java escaped,
"select XMLString from cacheTable where isbn='"+ isbn +"'"
A different approach could be to use an ORM like Hibernate.
In ORM instead of saving the whole XML document in one column you use different different columns for each element and attribute and you could even split upp your document over several tables for a simpler long term design.