更新用于本地元数据的本地 sqlite 数据库从服务缓存?
我搜索过该网站,但没有找到完全回答我的问题的问题/答案,我发现的最接近的问题/答案是: 在两个不同系统之间同步对象的最佳方法。
无论如何,首先,因为没有可用的 RSS 提要,所以我在屏幕上抓取网页,因此它会进行一次提取,然后遍历网页以删除我感兴趣的所有信息,并将该信息转储到sqlite 数据库,以便我可以在闲暇时查询信息,而无需从网站重复获取。
不过,我还在存储在 sqlite 数据库中的数据本身上存储各种元数据,例如:我查看过数据吗?数据是新的/旧的吗?数据块的书签(将其视为一个集合)不相关的数据,书签只是一个指向我在处理/读取所述数据的位置的指针)。
所以现在我当前的问题是试图找出如何以有效且直接的方式使用网站上的新数据和/或更改的数据更新本地 sqlite 数据库。
这是我当前的想法:
- 下载页面本身
- 创建一个临时表以供解析的数据进入
- 在官方表和临时表之间进行比较,并将更新和/或新信息复制到官方表
这个过程似乎有点复杂,因为我必须弄清楚如何确定临时表中的数据是新的、更新的还是未更改的。所以我想知道是否没有更好的方法,或者是否有人对如何架构/构造这样的系统有任何建议?
编辑1: 我不确定将附加信息放在哪里,在评论中还是作为编辑,所以我将在此处添加它。
这在书签方面扩展了元数据,基本上数据源可以创建新数据/添加到当前数据,所以我考虑使用临时表想法的原因之一是这样我能够确定是否已“添加书签”的数据源是否有任何新数据。
I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.
Anyway to begun, because there is no RSS feeds available, I'm screen scraping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I can query the information at my leisure without doing repeat fetching from the website.
However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).
So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.
Here's my current idea:
- Download the page itself
- Create a temporary table for the parsed data to go into
- Do a comparison between the official and the temporary table and copy updates and/or new information to the official table
This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?
Edit 1:
I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.
This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
确定临时表中的数据是新的、更新的还是未更改的真的很重要吗?您真的需要保留更改历史记录吗?
否:不要使用临时表,而只是将旧记录标记为旧(时间戳),不进行更新,而只插入新数据。
是的:你的想法对我来说似乎是正确的,但这一切都取决于你每次需要处理多少数据;我认为对于大量数据来说这是不可行的。
Is it really important to determine if the data in the temporary table is new, updated or unchanged? Do you really need to keep an history of the changes?
NO: don't use the temporary table but just mark as old (timestamp) your old records, don't do updates, and just insert your new data.
YES: your idea seems correct to me but all depends on how much data you need to process each time; i don't think it is feasible with a large amount of data.