阶乘和斐波那契函数的关系数据库等效项是什么?
当学习一门新的编程语言时,总会有一些传统的问题可以帮助你前进。例如,Hello world 和 Fibonacci 将展示如何读取输入、打印输出和计算函数(基本上可以解决所有问题的面包和黄油),虽然它们非常简单,但它们足够重要,值得花时间(并且总是有通过使用 bignums 的语言计算一个大得离谱的数字的阶乘会带来一些乐趣)
所以现在我正在尝试掌握一些 SQL 系统,我能想到的所有教科书示例都涉及令人头脑麻木的无聊表,例如“学生”或“雇员”。 我可以使用什么好的替代数据集?我正在寻找一些东西(按重要性排序)...
- 数据可以通过简单的算法生成.
- 我不想手动输入内容。
- 我希望能够轻松增加表格的大小以强调效率等
- 可以用来展示尽可能多的内容。选择、联接、索引...凡是你能想到的。
- 可用于获取一些有趣的结果。
- 如果数据是真实的并且本身有用途,我可以忍受“无聊”的数据操作,但如果我从头开始创建数据集,我宁愿拥有更有趣的东西。
在最坏的情况下,我至少认为应该有某种基准数据集至少符合前两个标准,我也很想听听。
When learning a new programming language there are always a couple of traditional problems that are good to get yourself moving. For example, Hello world and Fibonacci will show how to read input, print output and compute functions (the bread and butter that will solve basically everything) and while they are really simple they are nontrivial enough to be worth their time (and there is always some fun to be had by calculating the factorial of a ridiculously large number in a language with bignums)
So now I'm trying to get to grips with some SQL system and all the textbook examples I can think of involve mind-numbingly boring tables like "Student" or "Employee". What nice alternate datasets could I use instead? I am looking for something that (in order of importance) ...
- The data can be generated by a straightforward algorithm.
- I don't want to have to enter things by hand.
- I want to be able to easily increase the size of my tables to stress efficiency, etc
- Can be used to showcase as much stuff as possible. Selects, Joins, Indexing... You name it.
- Can be used to get back some interesting results.
- I can live with "boring" data manipulation if the data is real and has an use by itself but I'd rather have something more interesting if I am creating the dataset from scratch.
In the worst case, I at least presume there should be some sort of benchmark dataset out there that would at least fit the first two criteria and I would love to hear about that too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
微软世界的基准数据库是Northwind。一种类似的开源 (EPL) 数据库是 Eclipse 的 Classic Models 数据库。
据我所知,你也不能自动生成。
然而,Northwind“进出口世界各地的特色食品”,而Classic Models则销售“经典汽车的比例模型”。两者都非常有趣。 :)
The benchmark database in the Microsoft world is Northwind. One similar open source (EPL) one is Eclipse's Classic Models database.
You can't autogenerate either as far as I know.
However, Northwind "imports and exports specialty foods from around the world", while Classic Models sells "scale models of classic cars". Both are pretty interesting. :)
SQL 是一种查询语言,而不是过程语言,因此除非您要使用 PL/SQL 或类似的语言,否则您的示例将是操作数据。
这就是我的乐趣所在——数据挖掘!转到:
http://usa.ipums.org/usa/
并下载他们的微观数据(您需要注册一个帐户,但它是免费的)。
您需要编写一个小脚本来将固定宽度文件注入到数据库中,这本身应该很有趣。您将需要编写一个小脚本来根据解析元文件自动创建字段(因为有很多)。这也很有趣。
然后,您可以开始提问。假设问题与房价有关:
假设您想了解过去 40 年来收入最高 10% 的人口的房价变化情况。然后限制他们是否居住在加利福尼亚州。看看收入与抵押贷款占收入的比例之间是否存在相关性。然后按地理区域对其进行分组。然后看看抵押贷款负担最高的地区与租房者占用的单位百分比之间是否存在相关性。您的数据库将有一些内置的统计函数,但您也可以随时编写自己的统计函数——因此 correl 可能相当于斐波那契。然后编写一个小脚本在 R 中执行相同的操作,从数据库导入数据,对其进行操作并存储结果。
了解 DB 的最佳方法是将它们用于其他目的。
使用完 iPUMS 后,使用(取决于您的数据库)类似 PostGis 的东西查看 GEO 数据 - 唯一的区别是 iPUMS 为您提供区域分辨率,而 GIS 数据具有纬度/经度坐标。然后,您可以绘制美国抵押贷款负担的热图,并在不同的时间尺度上演变该热图。
SQL is a query language, not a procedural language, so unless you will be playing with PL/SQL or something similar, your examples will be manipulating data.
So here is what was fun for me -- data mining! Go to:
http://usa.ipums.org/usa/
And download their micro-data (you will need to make an account, but its free).
You'll need to write a little script to inject the fixed width file into your db, which in itself should be fun. And you will need to write a little script to auto create the fields (since there are many) based on parsing their meta-file. That's fun, too.
Then, you can start asking questions. Suppose the questions are about house prices:
Say you want to look at the evolution of house price values by those with incomes in the top 10% of the population over the last 40 years. Then restrict to if they are living in california. See if there is a correlation between income and the proportion of mortgage payments as a percentage of income. Then group this by geographic area. Then see if there is a correlation between those areas with the highest mortgage burden and the percentage of units occupied by renters. Your db will have some built-in statistical functions, but you can always program your own as well -- so correl might be the equivalent of fibonnacci. Then write a little script to do the same thing in R, importing data from your db, manipulating it, and storing the result.
The best way to learn about DBs is to use them for some other purpose.
Once you are done playing with iPUMS, take a look at GEO data, with (depending on your database) something like PostGis -- the only difference is that iPUMS gives you resolution in terms of tracts, whereas GIS data has latitude/longitude coordinates. Then you can plot a heat map of mortgage burdens for the U.S., and evolve this heat map over different time scales.
也许你可以用化学做点什么。输入 118 个元素,或提取它们以获取在线资源。使用基本规则将它们组合成分子,您可以将其存储在数据库中。将分子组合成更大的分子并对它们执行更复杂的查询。
Perhaps you can do something with chemistry. Input the 118 elements, or extract them for an online source. Use basic rules to combine them into molecules, which you can store in the database. Combine molecules into bigger molecules and perform more complex queries upon them.
您将很难找到与数据库无关的教程。主要原因是大多数示例都基于 SQL-92 标准很无聊。有更新的标准,但大多数与数据库无关的教程都会简化为最低公分母:SQL-92。
如果您想以软件工程师的身份学习数据库,我绝对建议您从 Microsoft SQL Server 开始。原因有很多,有些是事实,有些是观点。但主要原因是使用 SQL Server 更容易取得更进一步的成果。
至于示例数据,Northwind 已被 AdventureWorks 取代。您可以从 codeplex 获取最新版本。这是一个更加真实的数据库,并且可以演示比基本连接、过滤和汇总更多的方式。同样伟大的事情是,它实际上是针对 SQL Server 的每个版本进行维护的,并进行更新以展示数据库的一些新功能。
现在,对于你的目标#1,我会考虑扩大规模作为一项练习。当你完成了基本而无聊的事情之后,你应该逐渐能够执行高效的大规模数据操作,虽然不能真正生成数据,但至少可以复制/粘贴/修改你的 SQL 数据,使其达到你认为的大小。
请记住,基准测试数据库并不是微不足道的。数据库的性能和效率取决于应用程序的许多方面。。如何使用它与如何设置同样重要。
祝你好运,如果您在本论坛之外找到可行的解决方案,请告诉我们。
You will have a hard time finding database agnostic tutorials. The main reason for that is that the SQL-92 standard on which most examples are based on is plain old boring. There are updated standards, but most database agnostic tutorials will dumb-it-down to the lowest common denomiator: SQL-92.
If you want to learn about databases as a software engineer, I would definitely recommend starting with Microsoft SQL Server. There are many reasons for that, some are facts, some are opinions. The primary reason though is that it's a lot easier to get a lot further with SQL Server.
As for sample data, Northwind has been replaced by AdventureWorks. You can get the latest versions from codeplex. This is a much more realistic database and allows demonstrating way more than basic joins, filtering and roll-ups. The great thing too, is that it is actually maintained for each release of SQL Server and updated to showcase some of the new features of the database.
Now, for your goal #1, well, I would consider the scaling out an exercise. After you go through the basic and boring stuff, you should gradually be able to perform efficient large-scale data manipulation and while not really generating data, at least copy/paste/modify your SQL data to take it to the size you think.
Keep in mind though that benchmarking databases is not trivial. The performance and efficiency of a database depends on many aspect of your application. How it is used is just as important as how it is setup.
Good luck and do let us know if you find a viable solution outside this forum.
在单个表中实现您的家谱树并打印它。其本身并不是一个非常普遍的问题,但该方法确实是一个非常普遍的问题,而且它应该被证明具有相当的挑战性。
Implement your genealogical tree within a single table and print it. In itself is not a very general problem, but the approach certainly is, and it should prove reasonably challenging.
地理数据可以展示很多 SQL 功能,同时有些复杂(但不是太复杂)。它也可以从许多在线来源轻松获得 - 国际组织等。
您可以创建一个包含国家、城市、邮政编码等的数据库。标记国家的首都(请记住,有些国家有多个首都城市......)。如果您想变得更奇特,请包括 GIS 数据。另外,请考虑如何对不同的地址信息进行建模。现在,如果地址信息必须支持国际地址怎么办?您也可以对电话号码执行相同的操作。一旦你掌握了窍门,你甚至可以与谷歌地图或类似的东西集成。
您可能必须自己完成数据库设计和导入工作,但实际上这是使用数据库的一个相当大的部分。
Geographic data can showcase a lot of SQL capabilities while being somewhat complicated (but not too complicated). It's also readily available from many sources online - international organizations, etc.
You could create a database with countries, cities, zip codes, etc. Mark capitals of countries (remember that some countries have more than one capital city...). Include GIS data if you want to get really fancy. Also, consider how you might model different address information. Now what if the address information had to support international addresses? You can do the same with phone numbers as well. Once you get the hang of things you could even integrate with Google Maps or something similar.
You'd likely have to do the database design and import work yourself, but really that's a pretty huge part of working with databases.
Eclipse 的经典模型 数据库是相当于 Factorial 和 Fibonacci 函数的最佳开源数据库。和微软的 Northwind 是您可以使用的另一个强大的替代方案。
Eclipse's Classic Model database is the best open source database equivalent of Factorial and the Fibonacci function .And Microsoft's Northwind is the another powerful alternative that you can use .