-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON file generation failed while deploying the a pipeline #12672
Comments
Thanks for reporting with some details. This has been reported before at https://groups.google.com/g/go-cd/c/gQ5wWIadNqg however without narrowing it down, or knowing some approach to replicate it, I am not sure how to resolve it. In theory I don't think this value should ever be null (although there is some very weird code and DB triggers which are responsible for updating it when jobs complete....), which means something has gone wrong somewhere.
I could just YOLO it and make the code in the serializer nullsafe, however I'm afraid something else might break if I do that without understanding if it's an expected situation.... Line 30 in 67fbd63
(Note for myself: This field was only added to V3 of the API in #8681/#8690 for |
Hi @chadlwilson,
Let me know if I can provide more details. |
It sounds to me like the GoCD server version change is probably a coincidence, however which GoCD server version were you running previously that you hadn't seen this issue with?
OK, this is interesting, thanks. You might want to check the logs of the gocd-server during the period that
Yes, I was referring to the API call to retrieve the output, not the API call to trigger the stage/pipeline run. Why do you need a new release of the product? You can test the same API call manually offline outside your deploy triggering tool to see whether it is consistently failing for The reason to ask to test the API call to get-stage-instance via GET |
R = We had version 19.12 before, and we performed a upgrade to 23.5
Those are the logs during that run, there are a few hanging jobs that are not related to this one (but they were also successfully completed and are hanging in the UI after the server upgrade):
|
OK, thanks - this implies someone must have done an H2 database migration, as
OK, The stack trace implies the API call is failing whereas the above indicates successful calls to get the stage instance status. These results imply that the stage now has a Are you saying that there are no longer issues with the stage instance which earlier had a problem? Are you sure you are checking the right pipeline/stage instance that your deploy tool had a problem with? We are wanting to make sure we are looking at the one which had an error at If your deploy tool constantly polls the API to get all of the stage instances in the pipeline (even ones it did not directly trigger), perhaps it's possible that this error is temporarily created due to a race condition (job status is completed in DB, but trigger to populate the stage transition time not yet fired) after which it fixes itself? I'd probably need to understand a bit more about how your deploy tool interacts with the various APIs to guess as to whether that is what is happening and see if I can replicate it? if your deploy tool is polling APIs, a workaround for now might be to tell it to try the |
Yes, the migration was done using the migration tool. H2 DB size was around 240MB.
No, we still have the issue with multiple pipelines. Regarding our deploy tool, yes it constantly polls the API to get all the stage instances, we followed the suggestion to increase the time of the polls and see if we could avoid the race condition, however, no new deployments similar to the failed ones were triggered since then.
The deploy tool is python based, and uses the following via requests to trigger a deployment:
And the following also via requests to get the status:
All of it is triggered from a slack bot that interacts with the python library, it has also permissions for users, and the output of errors or successful pipelines go to slack as well. In our case, instead of the output we are receiving the error mentioned when the issue was opened. |
As a new update, that just happened today with a different pipeline even after the polling time increase: Logs from the server at the moment:
|
Hello Chad, As you might see in the history, we added an exception to our deploy tool to suppress the error in our output for that set of pipelines. Meanwhile, is there any other information you would like us to provide? |
Issue Type
Summary
While triggering a pipeline that contains multiple stages with automatic approval for the next stage, we are seeing an error in generating the JSON file.
The pipeline is successful, but our internal deployment tool can't receive the JSON file because of the generation failure, the previous stages and different pipelines without the multiple automatic approval works without issues.
Environment
Basic environment details
23.5.0 (18179-7702b283accd1f90f014f0087aa2e9bd8baf4a97)
17.0.9
Linux 4.19.0-23-cloud-amd64
Additional Environment Details
Expected Results
We expected it to behave normally like the previous stage:
Actual Results
See the log snippet
Possible Fix
Log snippets
Code snippets/Screenshots
Any other info
The text was updated successfully, but these errors were encountered: