MySQL子查询中的IP地址数字

发布于 2024-09-06 12:34:14 字数 2071 浏览 6 评论 0原文

我在涉及存储在 MySQL (MySQL 5.0) 中的 IPV4 地址的子查询时遇到问题。

IP 地址存储在两个表中,均采用网络号格式,例如 MySQL 的 INET_ATON() 输出的格式。第一个表(“events”)包含许多行以及与其关联的 IP 地址,第二个表(“network_providers”)包含给定网络块的提供商信息列表。

events 表(约 4,000,000 行):

event_id (int)
event_name (varchar)
ip_address (unsigned int)

network_providers 表(约 60,000 行):

ip_start (unsigned int)
ip_end  (unsigned int)
provider_name (varchar)

针对我遇到的问题进行了简化,目标是创建导出 也就是说

event_id,event_name,ip_address,provider_name

如果按照以下任一方式进行查询,我会得到我期望的结果:

SELECT provider_name FROM network_providers WHERE INET_ATON('192.168.0.1') >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1

SELECT provider_name FROM network_providers WHERE 3232235521 >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1

,它会为我查找的任何 IP 返回正确的 provider_name (当然我在查询中并没有真正使用 192.168.0.1)。

但是,当以以下方式执行相同的查询作为子查询时,它不会产生我期望的结果:

SELECT 
events.event_id,
events.event_name,
    (SELECT provider_name FROM network_providers 
    WHERE events.ip_address >= network_providers.ip_start 
    ORDER BY network_providers.ip_start DESC LIMIT 1) as provider
FROM events

而是返回 provider 的不同(不正确)值。 provider 列中返回的值超过 90%(但奇怪的是不是全部)包含该 IP 的错误提供商信息。

在子查询中使用 events.ip_address 只是为了回显该值,确认它包含我期望的值并且子查询可以解析它。将 events.ip_address 替换为实际的网络号也可以,只是以这种方式在子查询中动态使用它,这对我来说不起作用。

我怀疑问题在于 MySQL 中的子查询有一些基本且重要的东西我不明白。我以前在 MySQL 中使用过类似的 IP 地址,但之前没有使用子查询来查找它们。

问题:

我真的很感激一个关于如何获得我想要的输出的例子,如果这里有人知道,一些关于为什么我正在做的事情不起作用的启发,这样我就可以避免再犯这个错误。

注释:

我尝试做的实际实际用法要复杂得多(涉及连接两个或三个表)。这是一个简化版本,以避免使问题过于复杂。

另外,我知道我没有在 ip_start 和 ip_start 之间使用ip_end - 这是故意的(数据库可能会过时,在这种情况下,数据库中的所有者几乎总是在下一个指定的范围内,并且“最佳猜测”在这种情况下很好),但是我很感激任何改进建议与问题相关的。

效率总是好的,但在这种情况下绝对不是必要的 - 任何帮助表示赞赏。

I have a problem with a subquery involving IPV4 addresses stored in MySQL (MySQL 5.0).

The IP addresses are stored in two tables, both in network number format - e.g. the format output by MySQL's INET_ATON(). The first table ('events') contains lots of rows with IP addresses associated with them, the second table ('network_providers') contains a list of provider information for given netblocks.

events table (~4,000,000 rows):

event_id (int)
event_name (varchar)
ip_address (unsigned int)

network_providers table (~60,000 rows):

ip_start (unsigned int)
ip_end  (unsigned int)
provider_name (varchar)

Simplified for the purposes of the problem I'm having, the goal is to create an export along the lines of:

event_id,event_name,ip_address,provider_name

If do a query along the lines of either of the following, I get the result I expect:

SELECT provider_name FROM network_providers WHERE INET_ATON('192.168.0.1') >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1

SELECT provider_name FROM network_providers WHERE 3232235521 >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1

That is to say, it returns the correct provider_name for whatever IP I look up (of course I'm not really using 192.168.0.1 in my queries).

However, when performing this same query as a subquery, in the following manner, it doesn't yield the result I would expect:

SELECT 
events.event_id,
events.event_name,
    (SELECT provider_name FROM network_providers 
    WHERE events.ip_address >= network_providers.ip_start 
    ORDER BY network_providers.ip_start DESC LIMIT 1) as provider
FROM events

Instead the a different (incorrect) value for provider is returned. Over 90% (but curiously not all) values returned in the provider column contain the wrong provider information for that IP.

Using events.ip_address in a subquery just to echo out the value confirms it contains the value I'd expect and that the subquery can parse it. Replacing events.ip_address with an actual network number also works, just using it dynamically in the subquery in this manner that doesn't work for me.

I suspect the problem is there is something fundamental and important about subqueries in MySQL that I don't get. I've worked with IP addresses like this in MySQL quite a bit before, but haven't previously done lookups for them using a subquery.

The question:

I'd really appreciate an example of how I could get the output I want, and if someone here knows, some enlightenment as to why what I'm doing doesn't work so I can avoid making this mistake again.

Notes:

The actual real-world usage I'm trying to do is considerably more complicated (involving joining two or three tables). This is a simplified version, to avoid overly complicating the question.

Additionally, I know I'm not using a between on ip_start & ip_end - that's intentional (the DB's can be out of date, and such cases the owner in the DB is almost always in the next specified range and 'best guess' is fine in this context) however I'm grateful for any suggestions for improvement that relate to the question.

Efficiency is always nice, but in this case absolutely not essential - any help appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

顾忌 2024-09-13 12:34:14

您应该看看这篇文章:

http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql- gis/

它有一些在与您的查询非常相似的查询中使用 IP 的好主意。

您应该尝试的另一件事是使用存储函数而不是子查询。这将简化您的查询,如下所示:

SELECT 
event.id,
event.event_name,
GET_PROVIDER_NAME(event.ip_address) as provider
FROM events

You should take a look at this post:

http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql-gis/

It has some nice ideas for working with IPs in queries very similar to yours.

Another thing you should try is using a stored function instead of a sub-query. That would simplify your query as follows:

SELECT 
event.id,
event.event_name,
GET_PROVIDER_NAME(event.ip_address) as provider
FROM events
喜爱纠缠 2024-09-13 12:34:14

似乎没有办法通过 JOIN 或子查询来实现我想要的。

为了扩展 Ike Walker 使用存储函数的建议,我最终在 MySQL 中创建了一个存储函数,如下所示:

DELIMITER //
DROP FUNCTION IF EXISTS get_network_provider //
CREATE FUNCTION get_network_provider(ip_address_number INT) RETURNS VARCHAR(255)
BEGIN
DECLARE network_provider VARCHAR(255);
    SELECT provider_name INTO network_provider FROM network_providers
    WHERE ip_address_number >= network_providers.ip_start
    AND network_providers.provider_name != ""
    ORDER BY provider_name.ip_start DESC LIMIT 1;
RETURN network_provider;
END //

说明:

检查忽略空白名称,并使用 >= & ORDER BY for ip_start 而不是 BETWEEN ip_start 和 ip_end 是针对我正在使用的两个组合网络提供商数据库的特定捏造,这两个数据库都需要以这种方式进行查询。

当调用函数的查询只需要返回几百个结果(尽管可能需要几秒钟)时,这种方法效果很好。对于返回几千个结果的查询,可能需要 2 或 3 分钟。对于具有数万个结果(或更多)的查询,它太慢而无法实际使用。

使用这样的存储函数(即返回的每个结果都会触发一个单独的查询)并不出乎意料,但我确实比预期更快地遇到了性能下降。

建议:

这样做的结果是我需要接受数据结构不适合我的需求。一位朋友已经向我指出了这一点,这并不是我当时真正想听到的(因为我真的想使用特定的 network_provider DB,因为表中的其他键对我有用,例如地理定位之类的事情)。

如果您最终尝试使用任何遵循类似可疑数据格式的 IP 提供商数据库(或者实际上任何其他数据库),那么我只能建议它们不适合,并且不值得尝试将某些东西拼凑在一起这将与他们一起工作。

至少您需要重新格式化数据,以便可以通过简单的 BETWEEN 语句可靠地使用它们(没有排序,也没有其他比较),这样您就可以将其与子查询(或 JOINS)一起使用 - 尽管它可能表明任何无论如何,混乱的数据可能并不那么可靠。

There seems to be no way to achieve what I wanted with a JOIN or Subquery.

To expand on Ike Walker's suggestion of using a stored function, I ended up creating a stored function in MySQL with the following:

DELIMITER //
DROP FUNCTION IF EXISTS get_network_provider //
CREATE FUNCTION get_network_provider(ip_address_number INT) RETURNS VARCHAR(255)
BEGIN
DECLARE network_provider VARCHAR(255);
    SELECT provider_name INTO network_provider FROM network_providers
    WHERE ip_address_number >= network_providers.ip_start
    AND network_providers.provider_name != ""
    ORDER BY provider_name.ip_start DESC LIMIT 1;
RETURN network_provider;
END //

Explanation:

The check to ignore blank names, and using >= & ORDER BY for ip_start rather than BETWEEN ip_start and ip_end is a specific fudge for the two combined network provider databases I'm using, both of which need to be queried in this way.

This approach works well when the query calling the function only needs to return a few hundred results (though it may take a handful of seconds). On queries that return a few thousand results, it may take 2 or 3 minutes. For queries with tens of thousands of results (or more) it's too slow to be practical use.

This was not unexpected from using a stored function like this (i.e. every result returned triggering a separate query) but I did hit a drop in performance sooner than I had expected.

Recommendation:

The upshot of this was that I needed to accept that the data structure is just not suitable for my needs. This had been already pointed out to me by a friend, it just wasn't something I really wanted to hear at the time (because I really wanted to use that specific network_provider DB due to other keys in the table that were useful to me, e.g. for things like geolocation).

If you end up trying to use any of the IP provider DB's (or indeed any other database) that follow a similar dubious data format, then I can only suggest they are just not going to be suitable and it's not worth trying to cobble something together that will work with them as they are.

At the very least you need to reformat the data so that they can be reliably used with a simple BETWEEN statement (no sorting, and no other comparisons) so you can use it with subqueries (or JOINS) - although it's likely an indicator that any data that messed up is probably not all that reliable anyway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文