MySQL 数据库的最佳更新方法
我已经阅读了类似问题的解决方案,但它们似乎都涉及脚本和额外的工具。 我希望我的问题足够简单以避免这种情况。
因此,用户上传下周数据的 csv。 它被插入到数据库中,没问题。
但
一个小时后,他收到了每个人的反馈,并且必须做出相应的更新。 他更新了 csv 并将其上传到数据库。
现在,我使用的系统检查该周的数据是否已经存在,如果存在,则从数据库中提取所有数据,脚本找到差异并将其发送出去,然后此时,旧数据被删除并被新数据替换。
显然,将其清除并重新输入数据要容易得多,但这不是最好的方法,特别是在存在大量更改或大量数据的情况下。 但我必须知道进行了哪些更改才能发出警报。 但我不需要事务日志,因为警报只需要发送一次,之后旧数据就没用了。
所以!
是否有一种智能方法可以将新数据与现有数据进行比较,仅获取更改/删除/添加的行,然后进行这些更改? 现在看来我可以进行更新,但随后我不会得到任何有关更改内容的回复...
谢谢!
快速编辑:
当前没有使用外键。 这很快就会改变,但不会产生什么影响,因为外键只会指向数据影响的对象,因此不需要更改。 就主键而言,这确实带来了一些困境:
所涉及的数据是每个人的工作日程。 因此,每个班次都有一个密钥会很好(对于该时间表的特定应用,除了简单的输出之外)。 但问题是,假设 user1 周一迟到了。 迟到情况记录在单独的表中,并使用 Shift 键与班次相关联。 但是,如果在星期二需要对已经进行的一周进行一些更改,我担心的是,确保数据库中已经发生的所有条目(因此可能具有不应该发生的关联)将变得太困难被破坏)将在此过程中重新输入密钥。 不幸的是,它并不像只更新当前时间之后发生的所有事件那么简单,因为这会增加上传人员的工作量(从而降低其市场价值)。 基本上,他们在一个程序上制定计划,将其导出到 CSV,然后将其上传到网页上,供所有需要该数据的网络应用程序使用。 因此,对他们来说,每次导出整周并上传时都执行相同的例程(并且对每个相关人员来说压力更小)要容易得多。
所以我最关心的是让上传脚本在两端都尽可能智能。 它不会在尝试查找更改时变得臃肿,无论输入如何,它都可以找到更改,并且没有任何未更改的数据有被重新输入密钥的风险。
这是一个相关的问题:
Suppose Joe User was schedule to wash dishes from 7:00 PM to 8:00 PM, but the new
data has him working 6:45 PM to 8:30 PM. Has the shift been changed? Or has the old
one been deleted and a new one added?
另一个问题:
Say Jane was schedule to work 1:00 PM to 3:00 PM, but now everyone has a mandatory
staff meeting at 2:00 to 3:00. Has she lost one shift and gained two? Or has one
shift changed and she gained one?
我真的很想知道这种数据通常是如何处理/接近的,而不是上述问题的具体答案。
再次谢谢你。
I have read through the solutions to similar problems, but they all seem to involve scripts and extra tools. I'm hoping my problem simple enough to avoid that.
So the user uploads a csv of next week's data. It gets inserted into the DB, no problem.
BUT
an hour later he gets feedback from everyone, and must make updates accordingly. He updates the csv and goes to upload it to the DB.
Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.
Obviously, it is a lot easier to just wipe it clean and reenter the data, but not the best method, especially if there are lots of changes or tons of data. But I have to know WHAT changes have been made to send out alerts. But I don't want a transaction log, as the alerts only need to be sent out the one time and after that, the old data is useless.
So!
Is there a smart way to compare the new data to the already existing data, get only the rows that are changed/deleted/added, and make those changes? Right now it seems like I could do an update, but then I won't get any response on what has changed...
Thanks!
Quick Edit:
No foreign keys are currently in use. This will soon change, but it shouldn't make a difference, because the foreign keys will only point to who the data effects and thus won't need to be changed. As far as primary keys go, that does present a bit of a dilemma:
The data in question is everyone's work schedule. So it would be nice (for specific applications of this schedule beyond simple output) for each shift to have a key. But the problem is, let's say that user1 was late on Monday. The tardiness is recorded in a separate table and is tied to the shift using the shift key. But if on Tuesday there is some need to make some changes to the week already in progress, my fear is that it will become too difficult to insure that all entries in the DB that have already happened (and thus may have associations that shouldn't be broken) will get re-keyed in the process. Unfortunately, it is not as simple as only updating all events occurring AFTER the current time, as this would add work (and thus make it less marketable) to the people who do the uploading. Basically, they make the schedule on one program, export it to a CSV, and then upload it on a web page for all of the webapps that need that data. So it is simply much easier for them (and less stressful for everyone involved) to do the same routine every time of exporting the entire week and uploading it.
So my biggest concern is to make the upload script as smart as possible on both ends. It doesn't get bloated trying to find the changes, it can find the changes no matter the input AND none of the data that is unchanged risks getting re-keyed.
Here's a related question:
Suppose Joe User was schedule to wash dishes from 7:00 PM to 8:00 PM, but the new
data has him working 6:45 PM to 8:30 PM. Has the shift been changed? Or has the old
one been deleted and a new one added?
And another one:
Say Jane was schedule to work 1:00 PM to 3:00 PM, but now everyone has a mandatory
staff meeting at 2:00 to 3:00. Has she lost one shift and gained two? Or has one
shift changed and she gained one?
I'm really interested in knowing how this kind of data is typically handled/approached, more than specific answers to the above.
Again, thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所以你的脚本知道差异,对吧? 除了脚本和 MySQL 之外,您不想使用一些额外的工具,对吗?
我非常确信 MySQL 本身不提供任何“差异”工具,因此您能实现的最好方法就是制作新的 CSV 文件,仅用于更新。 我的意思是 - 它应该只包含更改的行。 更新会更快,并且所有更改的数据都可以轻松获得。
So your script knows the differences, right? And you don't want to use some extra extra tools, apart from your script and MySQL, right?
I'm quite convinced that MySQL doesn't offer any 'diff' tool by itself, so the best you can achieve is making new CSV file for updates only. I mean - it should contain only changed rows. Updating would be quicker, and all changed data would be easily available.
如果您在其中一个字段上有唯一键,则可以使用:
If you have a unique key on one of the fields, you can use: