找出缺陷！使用任务队列可靠地执行长任务

发布于 2024-10-18 21:30:55 字数 2159 浏览 6 评论 0原文

我正在谷歌应用程序引擎上制作成绩册。我跟踪每个评分期每个学生的成绩。评分期可以重叠。由于我可能一次显示数百个这样的成绩，因此我在服务器上预先计算了成绩。因此，对于任何一名学生，我可能有许多计算出的成绩 - 每个评分期都有一个成绩。

现在，老师输入测验的新分数。该分数可能会影响许多计算的成绩，因为它可能属于许多评分期。我需要重新计算所有受影响的成绩。这可能需要很长时间，因为对于每个评分周期，我都需要获取所有相关分数并对这些分数执行复杂的例程。我认为 30 秒是不够的 - 特别是如果数据存储今天感觉很慢的话。此外，失败不是一种选择。某些成绩更新而另一些成绩却悄然过时，这是不可接受的。

所以我心里想，这是一个学习任务队列的好时机！

我不是数据库结构或其他方面的专家，但这里是我想做的事情的概述：

public ReturnCode addNewScore(Float score, Date date, Long studentId)
{
    List<CalculatedGrade> existingGrades = getAllRelevantGradesForStudent(studentId, date);

    for (CalculatedGrade grade : existingGrades)
    {
        grade.markDirty(); //leaves a record that this grade is no longer up to date
    }

    persistenceManager.makePersistentAll(existingGrades);
    //DANGER ZONE?
    persistenceManager.makePersistent(new IndividualScore(score, date, studentId));

    tellTheTaskQueueToStartCalculating();

    return OMG_IT_WORKED;
}

这似乎是一种将所有相关等级标记为脏的快速方法。如果中途失败，则会返回失败，客户端会知道要重试。如果客户端稍后尝试获取脏成绩，我们可以在那里返回错误。

然后，任务队列代码将如下所示：

public void calculateThemGrades()
{
    List<CalculatedGrade> dirtyGrades = getAllDirtyGrades();

    try
    {
        for (CalculatedGrade grade : dirtyGrades)
        {
            List<Score> relevantScores = getAllRelevantScores();
            Float cleanGrade = calculateGrade(relevantScores);
            grade.setGrade(cleanGrade);
            grade.markClean();

            persistenceManager.flush();
        }
    }
    catch(Throwable anything)
    {
        //if there was any problem, like we ran out of time or the datastore is down or whatever, just try again
        tellTheTaskQueueToStartCalculating()
    }
}

这是我的问题：这是否保证在添加新分数后永远不会有一个计算成绩被标记为干净？

需要关注的特定领域：

在危险区域周围的第一个片段中，existingGrades 是否始终会保留在新的 IndividualScore 之前？
是否有可能另一个线程会在危险区域启动任务队列代码，以便在真正输入新的 IndividualScore 之前，那些现有的成绩可能会再次被标记为干净？如果是这样，我如何确保不会发生这种情况（所有年级的交易都已结束）？
即使 pm 未关闭，persistenceManager.flush() 是否足以保存部分完成的计算？

这一定是一个常见的问题。我很感激任何教程的链接，特别是那些关于 appengine 的链接。感谢您阅读了这么多！

原文

I'm making a gradebook on google app engine. I keep track of each student's grade per grading period. The grading periods can overlap. Since I may display hundreds of these grades at a time, I precalculate the grades on the server. So, for any one student, I may have many calculated grades - one for each grading period.

Now, the teacher enters a new score from a quiz. That score may affect many of the calculated grades, because it may fall into many grading periods. I need to recalculate all of the affected grades. This could take a long time, since for each grading period I need to fetch all relevant scores and do a complex routine over those scores. I think 30 seconds isn't enough - especially if the datastore is feeling slow today. Furthermore, failure is not an option. It is unacceptable for some grades to update and others to fall silently out of date.

So I think to myself, what a wonderful time to learn about the task queue!

I'm not an expert in DB structure or anything, but here's an outline of what I want to do:

public ReturnCode addNewScore(Float score, Date date, Long studentId)
{
    List<CalculatedGrade> existingGrades = getAllRelevantGradesForStudent(studentId, date);

    for (CalculatedGrade grade : existingGrades)
    {
        grade.markDirty(); //leaves a record that this grade is no longer up to date
    }

    persistenceManager.makePersistentAll(existingGrades);
    //DANGER ZONE?
    persistenceManager.makePersistent(new IndividualScore(score, date, studentId));

    tellTheTaskQueueToStartCalculating();

    return OMG_IT_WORKED;
}

This seems like a fast way to mark all of the relevant grades dirty. If it fails half-way through, then failure is returned and the client will know to try again. If a client later tries to fetch a dirty grade, we can return an error there.

Then, the task queue code would look something like this:

public void calculateThemGrades()
{
    List<CalculatedGrade> dirtyGrades = getAllDirtyGrades();

    try
    {
        for (CalculatedGrade grade : dirtyGrades)
        {
            List<Score> relevantScores = getAllRelevantScores();
            Float cleanGrade = calculateGrade(relevantScores);
            grade.setGrade(cleanGrade);
            grade.markClean();

            persistenceManager.flush();
        }
    }
    catch(Throwable anything)
    {
        //if there was any problem, like we ran out of time or the datastore is down or whatever, just try again
        tellTheTaskQueueToStartCalculating()
    }
}

Here's my question: does this guarantee that there will never be a calculated grade that is marked clean after a new score has been added?

Specific areas of concern:

will the existingGrades always be persisted before the new IndividualScore in the first snippet, around the danger zone?
Is it possible that another thread will start the task queue code in the danger zone so that those existingGrades might be marked clean again before the new IndividualScore is really entered? If so, how can I make sure that won't happen (transactions across all of the grades are out)?
Is persistenceManager.flush() enough to save partially-done calculations, even though the pm is not closed?

This must be a common sort of problem. I'd appreciate any links to tutorials, especially those for appengine. Thanks for reading so much!

分享到QQ

分享到微博