如何准确检测SQL Server作业是否正在运行并处理已经运行的作业？

发布于 2024-11-05 03:09:59 字数 1870 浏览 1 评论 0原文

我目前正在使用这样的代码来检测 SQL Server 作业是否正在运行。（这是 SQL Server 2005，所有 SP）

return (select isnull(  
(select top 1 CASE 
    WHEN current_execution_status = 4 THEN 0
    ELSE 1
    END
from openquery(devtestvm, 'EXEC msdb.dbo.sp_help_job')
where current_execution_status = 4 and
    name = 'WQCheckQueueJob' + cast(@Index as varchar(10))
), 1)
)

没有问题，一般来说，它工作得很好。

但是....（总是但是）

有时，我会调用它，返回“作业未运行”结果，此时我将尝试启动作业，通过

exec msdb.dbo.sp_start_job @JobName

SQL 将返回“SQLAgent”已拒绝开始该作业，因为它已有待处理的请求”。

好的。也不是问题。可以想象，在该代码启动目标作业之前，但在检查它是否已启动之后，有一个轻微的窗口可以启动目标作业。但是，我可以将其包装在 try catch 中并忽略该错误，对吧？

begin try
if dbo.WQIsQueueJobActive(@index) = 0 begin
    exec msdb.dbo.sp_start_job @JobName
    break
end         
end try begin catch
    -- nothing here
end catch

但这就是问题所在。

十分之九的情况下，这个效果很好。 SQL 代理将引发错误，它被捕获，并且处理将继续，因为作业已经在运行，没有坏处，也没有犯规。

但偶尔，我会在“作业历史记录”视图中收到一条消息（请记住上面的代码，以检测特定作业是否正在运行，如果没有实际从另一个作业运行，则启动它），说该作业失败，因为“SQLAgent 已拒绝开始该作业，因为它已经有一个待处理的请求”。

当然，这正是 TRY CATCH 应该处理的错误！

当这种情况发生时，正在执行的作业就会终止，但据我所知，不会立即终止，只是非常接近。我把日志记录到处都是，但没有一致性。一次失败，它会在位置 a，下一次会在位置 b。什么也没有

select @var = 'message'

在某些情况下，地点 A 和地点 B 之间除了一个地方之外。很奇怪。基本上，该作业似乎被毫不客气地转储，作业中剩下的任何要执行的内容都根本不被执行。

但是，如果我删除“exec StartJob”（或者当我知道目标作业尚未运行时仅调用它一次），则一切都会正常运行，并且作业中的所有处理都会运行。

所有这一切背后的目的是通过触发器（除其他外）启动一项作业，并且，如果该作业已经启动，则实际上不需要“再次启动它”。

有人在 SQL 代理的作业处理中遇到过这样的行为吗？

编辑：当前的控制流程如下：

更改表（更新或插入）...
触发触发器，该触发器调用...
一个存储过程，该过程调用...
sp_Start_Job，该过程...
启动一个特定作业，该作业...
调用另一个作业存储过程（称为 CheckQueue），它...
执行一些处理并...
检查几个表，并且根据它们的内容可能...
调用另一个作业上的 sp_start_job 来启动第二个同步作业处理额外的工作（第二个作业也调用 CheckQueue 存储过程）但是这两个调用对完全独立的数据集进行操作）

原文

I'm currently using code like this to detect if a SQL server job is running. (this is SQL Server 2005, all SP's)

return (select isnull(  
(select top 1 CASE 
    WHEN current_execution_status = 4 THEN 0
    ELSE 1
    END
from openquery(devtestvm, 'EXEC msdb.dbo.sp_help_job')
where current_execution_status = 4 and
    name = 'WQCheckQueueJob' + cast(@Index as varchar(10))
), 1)
)

No problems there, and generally speaking, it works just fine.

But.... (always a but)

On occasion, I'll invoke this, get back a "job is not running" result, at which point I'll try and start the job, via

exec msdb.dbo.sp_start_job @JobName

and SQL will return that "SQLAgent has refused to start the job because it already has a pending request".

Ok. Also not a problem. It's conceivable that there's a slight window where the target job could get started before this code can start it, but after checking if it's started. However, I can just wrap that up in a try catch and just ignore the error, right?

begin try
if dbo.WQIsQueueJobActive(@index) = 0 begin
    exec msdb.dbo.sp_start_job @JobName
    break
end         
end try begin catch
    -- nothing here
end catch

here's the problem, though.

9 times out of 10, this works just fine. SQL agent will raise the error, it's caught, and processing just continues on, since the job is already running, no harm no foul.

But occasionally, I'll get a message in the Job History view (keep in mind the above code to detect if a specific job is running and start it if not is actually running from another job) saying that the job failed because "SQLAgent has refused to start the job because it already has a pending request".

Of course, this is the exact error that TRY CATCH is supposed to be handling!

When this happens, the executing job just dies, but not immediately from what I can tell, just pretty close. I've put logging all over the place and there's no consistency. One time it fails, it'll be at place a, the next time at place b. In some cases, Place A and place B have nothing but a

select @var = 'message'

in between them. Very strange. Basically, the job appears to be unceremoniously dumped and anything left to execute in the job is +not+ executed at all.

However, if I remove the "exec StartJob" (or have it invoked exactly one time, when I KNOW that the target job can't already be running), everything works perfectly and all my processing in the job runs through.

The purpose behind all this is to have a job started as a result of a trigger (among other things), and, if the job is already started, there's really no need to "start it again".

Anyone ever run into behavior like this with SQL Agent's Job handling?

EDIT:
Current flow of control is like so:

Change to a table (update or insert)...
fires trigger which calls...
a stored proc which calls...
sp_Start_Job which...
starts a specific job which...
calls another stored proc (called CheckQueue) which...
performs some processing and...
checks several tables and depending on their contents might...
invoke sp_start_job on another job to start up a second, simultaneous job
to process the additional work (this second job calls the CheckQueue sproc also
but the two invocations operate on completely separate sets of data)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

随心而道 2024-11-12 03:09:59

首先，您有机会了解服务经纪人吗？从你的描述来看，这似乎就是你真正想要的。

区别在于，您不是启动作业，而是将数据放入 SB 队列，SB 会异步调用您的处理过程，并完全回避已运行作业等的问题。它将自动生成/终止其他线程并根据需求指示，它负责顺序等。

这是一个很好的（并且模糊相关的）教程。 http://www.sqlteam.com/article/centralized-asynchronous -auditing-with-service-broker

让我们假设您出于任何原因都不能使用 SB（但说真的，可以！）。

使用作业 spid 的 context_info 怎么样？

您的工作调用一个包装程序来单独执行每个步骤。

包装过程中的第一条语句是

声明 @context_info VARBINARY(30)
SET @context_info = CAST('MyJob1' AS VARBINARY)
设置 CONTEXT_INFO @context_info

当您的过程完成时（或在您的 catch 块中）
<前><代码>设置 CONTEXT_INFO 0x0

当您正在考虑调用您的作业时，请执行以下操作：

如果不存在（SELECT * FROM master..sysprocesses with (NOLOCK) WHERE context_info=CAST('MyJob1' AS VARBINARY)）
    EXEC 启动作业

当您的包装程序终止或连接关闭时，您的 context_info 将消失。

您还可以使用全局临时表（即 ##JobStatus），当引用它的所有 spid 断开连接或显式删除它时，它们将消失。

只是一些想法。

First of all, have you had a chance to look at service broker? From your description, it sounds like that's what you actually want.

The difference would be instead of starting a job, you put your data into a SB queue and SB will call your processing proc asynchronously and completely side-step issues with already-running jobs etc. It will auto spawn/terminate additional threads and demand dictates, it takes care of order etc.

Here's a good (and vaguely related) tutorial. http://www.sqlteam.com/article/centralized-asynchronous-auditing-with-service-broker

Let's assume that you can't use SB for whatever reason (but seriously, do!).

What about using the job spid's context_info.

Your job calls a wrapper proc that execs each step individually.

The first statement inside the wrapper proc is

DECLARE @context_info VARBINARY(30)
SET @context_info = CAST('MyJob1' AS VARBINARY)
SET CONTEXT_INFO @context_info

When your proc finishes (or in your catch block)
```
SET CONTEXT_INFO 0x0
```

When you are looking at calling your job, do this:

IF NOT EXISTS (SELECT * FROM master..sysprocesses WITH (NOLOCK) WHERE context_info=CAST('MyJob1' AS VARBINARY))
    EXEC StartJob

When your wrapper proc terminates or the connection is closed, your context_info goes away.

You could also use a global temp table (i.e. ##JobStatus) They will disappear when all spids that reference it disconnect or if it's explicitly dropped.

Just a few thoughts.

回复收藏 0 原文

憧憬巴黎街头的黎明 2024-11-12 03:09:59

我有一个查询可以为我提供正在运行的作业，也许它可以帮助您。它一直对我有用，但如果您发现任何错误，请告诉我，我会尽力纠正。干杯。

-- get the running jobs
--marcelo miorelli
-- 10-dec-2013


SELECT sj.name
      ,DATEDIFF(SECOND,aj.start_execution_date,GetDate()) AS Seconds
 FROM msdb..sysjobactivity aj
 JOIN msdb..sysjobs sj on sj.job_id = aj.job_id
WHERE aj.stop_execution_date IS NULL -- job hasn't stopped running
 AND aj.start_execution_date IS NOT NULL -- job is currently running
--AND sj.name = 'JobName'
and not exists( -- make sure this is the most recent run
    select 1
    from msdb..sysjobactivity new
    where new.job_id = aj.job_id
      and new.start_execution_date > aj.start_execution_date )

I have a query that gives me the running jobs, maybe it can help you. It has been working for me, but if you find any fault on it, let me know, I will try to rectify. cheers.

-- get the running jobs
--marcelo miorelli
-- 10-dec-2013


SELECT sj.name
      ,DATEDIFF(SECOND,aj.start_execution_date,GetDate()) AS Seconds
 FROM msdb..sysjobactivity aj
 JOIN msdb..sysjobs sj on sj.job_id = aj.job_id
WHERE aj.stop_execution_date IS NULL -- job hasn't stopped running
 AND aj.start_execution_date IS NOT NULL -- job is currently running
--AND sj.name = 'JobName'
and not exists( -- make sure this is the most recent run
    select 1
    from msdb..sysjobactivity new
    where new.job_id = aj.job_id
      and new.start_execution_date > aj.start_execution_date )

回复收藏 0 原文