是否可以以与数据库无关的方式搜索字符串形式的日期?

发布于 2024-08-19 19:50:41 字数 739 浏览 2 评论 0原文

我有一个带有 PostgreSQL 数据库的 Ruby on Rails 应用程序;一些表具有created_at和updated_at时间戳属性。显示时,这些日期会按照用户的区域设置进行格式化;例如,时间戳 2009-10-15 16:30:00.435 变为字符串 15.10.2009 - 16:30 (本示例的日期格式为 dd.mm.yyyy - hh.mm)。

要求是用户必须能够按日期搜索记录,就好像它们是在当前区域设置中格式化的字符串一样。例如,搜索 15.10.2009 将返回日期为 2009 年 10 月 15 日的记录,搜索 15.10 将返回日期为任意年份 10 月 15 日的记录,搜索 < code>15 将返回与 15 匹配的所有日期(无论是日、月还是年)。由于用户可以使用日期的任何部分作为搜索词,因此无法将其转换为日期/时间戳进行比较。

一种(慢)方法是检索所有记录,格式化日期,然后对其执行搜索。可以通过首先仅检索 id 和日期,执行搜索,然后获取匹配记录的数据来加快速度;但对于大量行来说,它仍然可能很慢。

另一种(与数据库无关的)方法是使用 PostgreSQL 函数或运算符将日期转换/格式化为数据库中的正确格式,并让数据库进行匹配(使用 PostgreSQL 正则表达式运算符或其他)。

有没有一种方法可以以与数据库无关的方式有效地完成此操作(无需获取所有行)?或者您认为我走错了方向,应该以不同的方式处理问题?

I have a Ruby on Rails application with a PostgreSQL database; several tables have created_at and updated_at timestamp attributes. When displayed, those dates are formatted in the user's locale; for example, the timestamp 2009-10-15 16:30:00.435 becomes the string 15.10.2009 - 16:30 (the date format for this example being dd.mm.yyyy - hh.mm).

The requirement is that the user must be able to search for records by date, as if they were strings formatted in the current locale. For example, searching for 15.10.2009 would return records with dates on October 15th 2009, searching for 15.10 would return records with dates on October 15th of any year, searching for 15 would return all dates that match 15 (be it day, month or year). Since the user can use any part of a date as a search term, it cannot be converted to a date/timestamp for comparison.

One (slow) way would be to retrieve all records, format the dates, and perform the search on that. This could be sped up by retrieving only the id and dates at first, performing the search, and then fetching the data for the matching records; but it could still be slow for large numbers of rows.

Another (not database-agnostic) way would be to cast/format the dates to the right format in the database with PostgreSQL functions or operators, and have the database do the matching (with the PostgreSQL regexp operators or whatnot).

Is there a way to do this efficiently (without fetching all rows) in a database-agnostic way? Or do you think I am going in the wrong direction and should approach the problem differently?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夜灵血窟げ 2024-08-26 19:50:41

根据卡洛斯的答案,如果您在所有日期和日期部分字段上都有索引,那么这应该允许您进行所有搜索而无需全表扫描。基于函数的索引对于日期部分列会更好,但我没有使用它们,因为这不应该是特定于数据库的。

CREATE TABLE mytable (
    col1 varchar(10),
    -- ...
    inserted_at timestamp,
    updated_at timestamp);

INSERT INTO mytable
VALUES
    ('a', '2010-01-02', NULL),
    ('b', '2009-01-02', '2010-01-03'),
    ('c', '2009-11-12', NULL),
    ('d', '2008-03-31', '2009-04-18');

ALTER TABLE mytable
    ADD inserted_at_month integer,
    ADD inserted_at_day integer,
    ADD updated_at_month integer,
    ADD updated_at_day integer;

-- you will have to find your own way to maintain these values...
UPDATE mytable
SET
    inserted_at_month = date_part('month', inserted_at),
    inserted_at_day = date_part('day', inserted_at),
    updated_at_month = date_part('month', updated_at),
    updated_at_day = date_part('day', updated_at);

如果用户仅输入年份,请使用 WHERE Date BETWEEN 'YYYY-01-01' AND 'YYYY-12-31'

SELECT *
FROM mytable
WHERE
    inserted_at BETWEEN '2010-01-01' AND '2010-12-31'
    OR updated_at BETWEEN '2010-01-01' AND '2010-12-31';

如果用户输入年份和月份,请使用 WHERE Date BETWEEN 'YYYY-MM-01' AND 'YYYY-MM-31 ' (可能需要针对 30/29/28 进行调整)

SELECT *
FROM mytable
WHERE
    inserted_at BETWEEN '2010-01-01' AND '2010-01-31'
    OR updated_at BETWEEN '2010-01-01' AND '2010-01-31';

如果用户输入三个值,请使用 SELECT .... WHERE Date = 'YYYY-MM-DD'

SELECT *
FROM mytable
WHERE
    inserted_at = '2009-11-12'
    OR updated_at = '2009-11-12';

如果用户输入月份和日期

SELECT *
FROM mytable
WHERE
    inserted_at_month = 3
    OR inserted_at_day = 31
    OR updated_at_month = 3
    OR updated_at_day = 31;

如果用户输入月份或日期 (您可以优化为每月不检查值 > 12)

SELECT *
FROM mytable
WHERE
    inserted_at_month = 12
    OR inserted_at_day = 12
    OR updated_at_month = 12
    OR updated_at_day = 12;

Building on the answer from Carlos, this should allow all of your searches without full table scans if you have indexes on all the date and date part fields. Function-based indexes would be better for the date part columns, but I'm not using them since this should not be database-specific.

CREATE TABLE mytable (
    col1 varchar(10),
    -- ...
    inserted_at timestamp,
    updated_at timestamp);

INSERT INTO mytable
VALUES
    ('a', '2010-01-02', NULL),
    ('b', '2009-01-02', '2010-01-03'),
    ('c', '2009-11-12', NULL),
    ('d', '2008-03-31', '2009-04-18');

ALTER TABLE mytable
    ADD inserted_at_month integer,
    ADD inserted_at_day integer,
    ADD updated_at_month integer,
    ADD updated_at_day integer;

-- you will have to find your own way to maintain these values...
UPDATE mytable
SET
    inserted_at_month = date_part('month', inserted_at),
    inserted_at_day = date_part('day', inserted_at),
    updated_at_month = date_part('month', updated_at),
    updated_at_day = date_part('day', updated_at);

If the user enters only Year use WHERE Date BETWEEN 'YYYY-01-01' AND 'YYYY-12-31'

SELECT *
FROM mytable
WHERE
    inserted_at BETWEEN '2010-01-01' AND '2010-12-31'
    OR updated_at BETWEEN '2010-01-01' AND '2010-12-31';

If the user enters Year and Month use WHERE Date BETWEEN 'YYYY-MM-01' AND 'YYYY-MM-31' (may need adjustment for 30/29/28)

SELECT *
FROM mytable
WHERE
    inserted_at BETWEEN '2010-01-01' AND '2010-01-31'
    OR updated_at BETWEEN '2010-01-01' AND '2010-01-31';

If the user enters the three values use SELECT .... WHERE Date = 'YYYY-MM-DD'

SELECT *
FROM mytable
WHERE
    inserted_at = '2009-11-12'
    OR updated_at = '2009-11-12';

If the user enters Month and Day

SELECT *
FROM mytable
WHERE
    inserted_at_month = 3
    OR inserted_at_day = 31
    OR updated_at_month = 3
    OR updated_at_day = 31;

If the user enters Month or Day (you could optimize to not check values > 12 as a month)

SELECT *
FROM mytable
WHERE
    inserted_at_month = 12
    OR inserted_at_day = 12
    OR updated_at_month = 12
    OR updated_at_day = 12;
执笏见 2024-08-26 19:50:41

“与数据库无关的方式”通常是“慢速方式”的同义词,因此解决方案不太可能有效。

无论如何,在客户端解析所有记录都是效率最低的解决方案。

您可以在客户端处理您的区域设置字符串,并为 LIKERLIKEREGEXP_SUBSRT 运算符形成正确的条件。客户端当然应该了解系统使用的数据库。

然后,您应该将运算符应用于根据区域设置使用特定于数据库的格式化函数形成的字符串,如下所示(在 Oracle 中):

SELECT  *
FROM    mytable
WHERE   TO_CHAR(mydate, 'dd.mm.yyyy - hh24.mi') LIKE '15\.10'

更有效的方法(仅在 PostgreSQL 中有效 ) >,但是)将在各个日期部分上创建一个 GIN 索引:

CREATE INDEX ix_dates_parts
ON      dates
USING   GIN
        (
        (ARRAY
        [
        DATE_PART('year', date)::INTEGER,
        DATE_PART('month', date)::INTEGER,
        DATE_PART('day', date)::INTEGER,
        DATE_PART('hour', date)::INTEGER,
        DATE_PART('minute', date)::INTEGER,
        DATE_PART('second', date)::INTEGER
        ]
        )
        )

并在查询中使用它:

SELECT  *
FROM    dates
WHERE   ARRAY[11, 19, 2010] <@ (ARRAY
        [
        DATE_PART('year', date)::INTEGER,
        DATE_PART('month', date)::INTEGER,
        DATE_PART('day', date)::INTEGER,
        DATE_PART('hour', date)::INTEGER,
        DATE_PART('minute', date)::INTEGER,
        DATE_PART('second', date)::INTEGER
        ]
        )
LIMIT 10

这将选择具有所有三个数字(122010)在任何日期部分:例如,2010 年 11 月 19 日 的所有记录加上 19:11 的所有记录> 2010

"Database agnostic way" is usually a synonym for "slow way", so the solutions will unlikely be efficient.

Parsing all records on the client side would be the least efficient solution in any case.

You can process your locale string on the client side and form a correct condition for a LIKE, RLIKE or REGEXP_SUBSRT operator. The client side of course should be aware of the database the system uses.

Then you should apply the operator to a string formed according to the locale with database-specific formatting function, like this (in Oracle):

SELECT  *
FROM    mytable
WHERE   TO_CHAR(mydate, 'dd.mm.yyyy - hh24.mi') LIKE '15\.10'

More efficient way (that works only in PostgreSQL, though) would be creating a GIN index on the individual dateparts:

CREATE INDEX ix_dates_parts
ON      dates
USING   GIN
        (
        (ARRAY
        [
        DATE_PART('year', date)::INTEGER,
        DATE_PART('month', date)::INTEGER,
        DATE_PART('day', date)::INTEGER,
        DATE_PART('hour', date)::INTEGER,
        DATE_PART('minute', date)::INTEGER,
        DATE_PART('second', date)::INTEGER
        ]
        )
        )

and use it in a query:

SELECT  *
FROM    dates
WHERE   ARRAY[11, 19, 2010] <@ (ARRAY
        [
        DATE_PART('year', date)::INTEGER,
        DATE_PART('month', date)::INTEGER,
        DATE_PART('day', date)::INTEGER,
        DATE_PART('hour', date)::INTEGER,
        DATE_PART('minute', date)::INTEGER,
        DATE_PART('second', date)::INTEGER
        ]
        )
LIMIT 10

This will select records, having all three numbers (1, 2 and 2010) in any of the dateparts: like, all records of Novemer 19 2010 plus all records of 19:11 in 2010, etc.

醉态萌生 2024-08-26 19:50:41

无论用户输入什么,您都应该使用他的区域设置作为指导,提取三个值:YearMonthDay。某些值可能为空。

  • 如果用户仅输入年份,请使用WHERE Date BETWEEN 'YYYY-01-01' AND 'YYYY-12-31'
  • 如果用户输入年份 > 和 Month 使用 WHERE Date BETWEEN 'YYYY-MM-01' AND 'YYYY-MM-31'(可能需要针对 30/29/28 进行调整)
  • 如果用户输入三个值 use SELECT .... WHERE Date = 'YYYY-MM-DD'
  • 如果用户输入 MonthDay,您必须使用“慢”的方式

Watever the user enters, you should extract three values: Year, Month and Day, using his locale as a guide. Some values may be empty.

  • If the user enters only Year use WHERE Date BETWEEN 'YYYY-01-01' AND 'YYYY-12-31'
  • If the user enters Year and Month use WHERE Date BETWEEN 'YYYY-MM-01' AND 'YYYY-MM-31' (may need adjustment for 30/29/28)
  • If the user enters the three values use SELECT .... WHERE Date = 'YYYY-MM-DD'
  • If the user enters Month and Day, you'll have to use the 'slow' way
将军与妓 2024-08-26 19:50:41

恕我直言,简短的回答。但绝对要避免加载所有行

几点注意:

  • 如果您只需要简单查询确切的日期或范围,我建议使用 ISO 格式 对于 DATE(YYYY-MM-DD,例如:2010-02-01) 或 DATETIME。但由于您似乎需要诸如“10 月 15 日的所有年份”之类的查询,因此无论如何您都需要自定义查询。
  • 我建议您创建一个“解析器”,它接受日期查询并为您提供 SQL WHERE 子句的部分。我确信您最终会遇到不到十几个案例,因此您可以为每个案例找到最佳的 WHERE。这样您就可以避免加载所有记录。
    • 您绝对不想在 SQL 中执行任何特定于语言环境的操作。因此,在非 SQL 代码中将本地转换为某种标准,然后使用它来执行查询(基本上是单独的本地化/全球化和查询执行)
    • 然后您就可以进行优化。如果您发现仅针对year有大量查询,您可以创建一个COMPUTED COLUMN,其中仅包含YEAR并具有索引就在上面。

IMHO, the short answer is No. But definitely avoid loading all rows.

Few notes:

  • if you had only simple queries for exact dates or ranges, I would recommend using ISO format for DATE (YYYY-MM-DD, ex: 2010-02-01) or DATETIME. But since you seem to need queries like "all years for October 15th", you need custom queries anyways.
  • I suggest you create a "parser" that takes your date query and gives you the part of the SQL WHERE clause. I am certain that you will end up having less then a dozen of cases, so you can have optimal WHEREs for each of them. This way you will avoid loading all records.
    • you definitely do not want to do anything locale specific in the SQL. Therefore convert local to some standard in the non-SQL code, then use it to perform your query (basically separate localization/globalization and the query execution)
    • Then you can optimize. If you see that you have a lot of query just for year, you might create a COMPUTED COLUMN which would contain only the YEAR and have index on it.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文