这是一个很好的位置数据库架构吗
我正在开发一个特定于位置的应用程序 - 将其视为商店定位器,商店所有者在其中输入他们的地址信息,而其他用户只能看到一定范围内的附近商店。然而,从某种意义上说,它有点不同,不需要确切的位置,只需要城市/州(我知道很奇怪)。我考虑过存储位置的模式,并决定采用这个模式。
地点
id -- int
formatted_address -- varchar(200)
is_point_of_interest -- bool
name -- varchar(100) -- NULL
street_number -- varchar(10) -- NULL
street -- varchar(40) -- NULL
city -- varchar(40)
state -- varchar(40)
state_code -- varchar(3)
postal_code -- varchar(10)
country -- varchar(40)
country_code -- varchar(3)
latitude -- float(10,6)
longitude -- float(10,6)
last_updated_at -- timestamp
以下是有关该应用程序的一些注意事项:
- 我想对国际地点敞开大门
- 我计划使用地理编码服务来搜索和验证店主指定的地点
- 我确实只需要纬度/经度,但显示商店信息需要其他数据。
- formatted_address 字段将包含完全格式化的地址 - 例如,Giants Stadium, 50 NJ-120, East Rutherford, NJ 07073, USA - 以允许更容易搜索存储的位置
- 可能会有很多重复字段,因为每行可能具有不同的粒度级别 - 例如,
123 Main Street, City, State 12345
与不同>Main Street, City, State 12345
因为一个有指定的街道号码,而另一个则不
知道该模式不是很标准化,但我也认为没有必要再对其进行标准化,因为位置非常复杂,其中这就是为什么我依赖稳定的地理编码服务(谷歌)。另外,我计划允许自由格式的文本输入/搜索,因此不需要任何下拉列表。
考虑到我提到的内容,有人认为有什么问题或有任何改进吗?我可以看到这张桌子变得相当大。
I'm working on an application that is location specific -- think of it as a store locator where store owners enter their address information and other users can only see nearby stores within a certain range. However, it's a little different in the sense that an exact location is not required, only the city/state is required (weird, I know). I have thought about the schema for storing locations, and have decided on this one.
Locations
id -- int
formatted_address -- varchar(200)
is_point_of_interest -- bool
name -- varchar(100) -- NULL
street_number -- varchar(10) -- NULL
street -- varchar(40) -- NULL
city -- varchar(40)
state -- varchar(40)
state_code -- varchar(3)
postal_code -- varchar(10)
country -- varchar(40)
country_code -- varchar(3)
latitude -- float(10,6)
longitude -- float(10,6)
last_updated_at -- timestamp
Here are some notes about the application:
- I want to keep the door open for international locations
- I plan to use a geocoding service to search for and validate the locations specified by the store owner
- I truly only need the lat/lon, but the other data is necessary for displaying store information
- The formatted_address field will contain the fully formatted address -- e.g., Giants Stadium, 50 NJ-120, East Rutherford, NJ 07073, USA -- to allow for easier searching of stored locations
- There will possibly be a lot of duplicate fields, because each row may have a different level of granularity -- for instance,
123 Main Street, City, State 12345
is different fromMain Street, City, State 12345
because one has a specified street number and the other doesn't
I understand that the schema is not very normalized, but I also do not see the need to normalize it any more because locations are very complex, which is why I'm relying on a stable geocode service (google). Also, I plan to allow freeform text input/search, so theres no need for any dropdown lists.
Does anybody see anything wrong or have any improvements, taking into consideration what I've mentioned? I can see this table growing rather large.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不这么认为。这是我的两分钟概要:
这非常糟糕的标准化。至少应该将
city
->country
移出到另一个表(并从那里标准化)。我相信邮政编码可以跨越城市边界(或者我记错了);我不知道有这样一个跨越国界的城市。formatted_address
是一个“优化”,应该是一个计算字段:也就是说,重新创建它的所有数据都应该存在于其他地方。 (这意味着现在不需要担心。)快乐设计。
简单的“更规范化”形式只是执行上述建议:
当然,CITIES可以进一步规范化,因此可以一个POSTALS表:我知道得不够关于邮政编码或应用程序域。
postal_code
充当隐式复合代理 FK 的一部分,因此它并不像它所在的那样超级可怕。但是,将其移至单独的表中可以轻松实现验证和完整性约束。编辑:规范化 POSTALs 表是最好的,因为对于给定城市只有很少数量的邮政编码有效:不过,我不确定邮政编码和城市之间的关系,所以我无法推荐如何做到这一点。也许看看现有的使用模式?
I do not think so. Here is my two-minute synopsis:
This very badly normalized. At least
city
->country
should be moved out to a different table (and normalized from there). I believe postal codes can cross city boundaries though (or I am very badly misremembering); I am not aware of such a city that crosses a state boundary.formatted_address
is an "optimization" and should likely be a computed field: that is, all the data to re-create it should exist elsewhere. (This means that it doesn't need to worried about now.)Happy designing.
The simple "more-normalized" form just doing the above proposed:
Of course, CITIES can be further normalized, and so could a POSTALS table: I don't know enough about postal codes, or the application domain though.
postal_code
acts as part of an implicit compound-surrogate-FK so it's not super terrible as it is there. However, moving it into a separate table could easily allow verification and integrity constraints.Edit: Normalizing a POSTALs table would be best, as only a very samll number of postal codes are valid for a given city: I am not sure the relation between a postal code and a city, though, so I can't recommend how to do this. Perhaps look at existing schemas used?
既然你说“我真的只需要纬度/经度”,我鼓励你使用 2 个具有 1:1 关系的表。
在许多(大多数?)情况下,将会缓存更多的纬度/经度对,从而加快您的工作速度。如果您需要其他信息,请在需要时获取。
简短形式:不要强迫数据库通过 IO 和 RAM 移动不需要的数据
此外,这样的模式将为进一步的自然扩展敞开大门:可以通过添加其他表而不是更改现有表来链接其他信息。我认为这对于你的软件质量来说是一件好事。
Since you say "I truly only need the lat/lon" I encourage you to use 2 tables with a 1:1 relationship.
In many (most?) cases a LOT more of lat/lon pairs will be cached, speeding up your workhorse. If you need the additional info, get it when you need it.
Short form: Dont force the DB to move data you don't need through IO and RAM
Additionally, such a schema would keep your doors open for further natural expansion: Linking other info can be done by adding other tables rather than altering existing ones. I consider this a good thing for your SW quality.