[WIP] Swarmkit task reaper should be more robust for dead nodes.#2408
[WIP] Swarmkit task reaper should be more robust for dead nodes.#2408anshulpundir wants to merge 1 commit intomoby:masterfrom
Conversation
4a8a50d to
a5506ee
Compare
Codecov Report
@@ Coverage Diff @@
## master #2408 +/- ##
==========================================
+ Coverage 61.89% 61.98% +0.08%
==========================================
Files 134 134
Lines 21771 21771
==========================================
+ Hits 13476 13494 +18
+ Misses 6836 6819 -17
+ Partials 1459 1458 -1 |
| case api.EventUpdateTask: | ||
| t := v.Task | ||
| if t.Status.State >= api.TaskStateOrphaned && t.ServiceID == "" { | ||
| if t.Status.State >= api.TaskStateOrphaned { |
There was a problem hiding this comment.
This will only fire when the task actually gets updated, right? If we load this task out of a snapshot, this event will not fire for the task.
There was a problem hiding this comment.
Good point. I'll address that and see if I can add a test for that case.
|
The The idea was that we don't want to be forced to delete a task in order to free its resources, because then we lose the history associated with that task. The task can remain in the task list in an This change seems to negate the purpose of the |
|
I looked at the PR that the Orphaned state was added in and #440 and it was not clear from either on why the check for t.ServiceID == "" made sense and why it sense for service-less tasks to be deleted right away but not for service tasks. The code doesn't have any comments whatsoever on the motivation behind this, so thanks for providing this clarification. I have updated the description to provide the motivation for this change. The change was made before I was clear on the motivation for the Orphaned state (I tried digging around but I wasn't able to get a good description of it). I agree now that it is not in the right direction to remove orphaned tasks right away. I think the one concern that may still need to be addressed is that of cleaning out the task history upon restart when there are no longer any EventCreateTask events (is that even possible ?). Either way, this may not be that high priority. I also think it would be very helpful to add comments to the code. Its not scalable to have to piece together motivation behind a change from PRs and issues. |
|
@aaronlehmann The original idea behind this change is that orphaned tasks are supposed to get cleaned up when you run |
Got it. Sounds like the task reaper needs some reconciliation that makes sure global service tasks for dead nodes are deleted. I assume the problem is for global service tasks? Replicated tasks would be retained up to the "task history limit", and then deleted. |
Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>
The issue we're trying to address that of bloating of the store because of a large number of orphaned tasks. It was noted in the code that orphaned tasks are not deleted unless service id == "".
Also, task history is only cleaned up when EventCreateTask is seen for the service for that task. It is not clear if this is enough in the case of restarting from a snapshot when orphaned tasks may not have been cleaned up from the history before the restart.
Signed-off-by: Anshul Pundir anshul.pundir@docker.com