-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFn: DAG based deploy order #10849
CFn: DAG based deploy order #10849
Conversation
LocalStack Community integration with Pro 2 files 2 suites 1h 39m 46s ⏱️ Results for commit 1aa3144. ♻️ This comment has been updated with latest results. |
e483288
to
0ba64e5
Compare
@@ -74,6 +74,9 @@ def resolve_dependencies(d: dict, evaluated_conditions: dict[str, bool]) -> set[ | |||
items = items.union(resolve_dependencies(item, evaluated_conditions)) | |||
else: | |||
pass | |||
elif isinstance(d, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to support the second argument of Fn::Join
e.g.
Fn::Join:
- ""
- - https://
- Ref: LsApi42D61DD0
- .execute-api.
- Ref: AWS::Region
- "."
- Ref: AWS::URLSuffix
- /
- Ref: LsApiDeploymentStageprod82D8C5E7
- /
S3 Image Test Results (AMD64 / ARM64) 2 files 2 suites 3m 7s ⏱️ Results for commit 1aa3144. ♻️ This comment has been updated with latest results. |
f4fbf38
to
9196d9d
Compare
|
||
|
||
@markers.aws.only_localstack | ||
def test_unexisting_resource_dependency(deploy_cfn_template, aws_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate of test_useful_error_when_invalid_ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👌
@@ -133,7 +133,7 @@ def delete( | |||
""" | |||
iam_client = request.aws_client_factory.iam | |||
iam_client.delete_user(UserName=request.desired_state["Id"]) | |||
return ProgressEvent(status=OperationStatus.SUCCESS, resource_model=None) | |||
return ProgressEvent(status=OperationStatus.SUCCESS, resource_model=request.previous_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is the appropriate value from now on right?
@pytest.mark.parametrize( | ||
"deploy_order", TEMPLATE_ORDER_CASES, ids=["-".join(vals) for vals in TEMPLATE_ORDER_CASES] | ||
) | ||
def test_stack_deploy_order(deploy_cfn_template, aws_client, snapshot, deploy_order: tuple[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test 👌
@@ -57,8 +57,8 @@ def test_fn_sub_cases(self, deploy_cfn_template, aws_client, snapshot): | |||
snapshot.match("outputs", deployment.outputs) | |||
|
|||
|
|||
@markers.aws.only_localstack | |||
def test_useful_error_when_invalid_ref(deploy_cfn_template): | |||
@markers.aws.validated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! this is great, a DAG is definitely the better concept for the deployer 🙌
just curious: have you looked into using cfnlint? i played around with this over a year ago so it's been a while, but IIRC it can create a graph from a cfn template that you can then run a depth first search, which i assume then gives the correct the deployment order?
I think it would be a definite improvement to use something like After investigating, I found that we don't actually use I want to get a first working version in place (even if hacky) which we can then refactor in a follow up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change! Let's get it over the line and see if we encounter any issues with it.
I haven't checked every line of the copied TemplateDeployerBase subclasses and focused mostly on the other parts where I didn't find any obvious issues 🥳 We also already did a walkthrough of the changes recently, so from my side it's a go 👍
@@ -70,7 +70,7 @@ def delete( | |||
|
|||
return ProgressEvent( | |||
status=OperationStatus.SUCCESS, | |||
resource_model=None, | |||
resource_model=request.previous_state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a particular reason this was sneaked in here? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious since I saw some warnings related to that operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was fixing warnings with teardowns, and found some warnings to do with the resource_model
of the resulting ProgressEvent
from the delete method be None
when trying to access the physical resource id. I think it is within the scope to reduce the stack delete warnings, as this is evidence that the deploy order is correct.
@@ -791,7 +802,102 @@ class ChangeConfig(TypedDict): | |||
ResourceChange: ResourceChange | |||
|
|||
|
|||
class TemplateDeployer: | |||
def order_resources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer having those outside of the template deployer, just to avoid ballooning that file again
nodes: dict[str, list[str]] = {} | ||
for logical_resource_id, properties in resources.items(): | ||
nodes.setdefault(logical_resource_id, []) | ||
deps = get_deps_for_resource(properties, resolved_conditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great that you were able to reuse that 👍
indegrees = {k: 0 for k in nodes.keys()} | ||
for dependencies in nodes.values(): | ||
for dependency in dependencies: | ||
indegrees[dependency] += 1 | ||
|
||
# Place all elements with indegree 0 in queue | ||
queue = [k for k in nodes.keys() if indegrees[k] == 0] | ||
|
||
sorted_logical_resource_ids = [] | ||
|
||
# Continue until all nodes have been dealt with | ||
while len(queue) > 0: | ||
# node of current iteration is the first one from the queue | ||
curr = queue.pop(0) | ||
sorted_logical_resource_ids.append(curr) | ||
|
||
# remove the current node from other dependencies | ||
for dependency in nodes[curr]: | ||
indegrees[dependency] -= 1 | ||
|
||
if indegrees[dependency] == 0: | ||
queue.append(dependency) | ||
|
||
# check for circular dependencies | ||
if len(sorted_logical_resource_ids) != len(nodes): | ||
raise Exception("Circular dependency found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extracting the circular dependency detection would help make it more readable
return sorted_changes | ||
|
||
|
||
class TemplateDeployerBase(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I appreciate it, I also really hope we can remove this soon*ish 😬
8882126
to
1aa3144
Compare
Motivation
When deploying stacks, our CloudFormation engine currently performs a number of retries to deploy a stack. This is required because the implementation does not take dependency order into account when deploying resources. Resources are strictly taken from the template in the order they are defined. We retry up to 30 times for deployments and 10 times for deletions.
This means that a stack with 31 resources defined in the reverse order of their dependencies will fail to deploy.
This is a particularly bad example, yes, but it is a fragile and brittle implementation that (especially on deletes) leads to potential breakages.
If we had a way of ordering dependencies so that we could iterate through them in order, then we would not need a retry process, and the behaviour would conform to the AWS behaviour. This would also be a first step towards concurrent deployment in our CFn engine.
Changes
order_resources
andorder_changes
which order resources and changes based on the dependencies of each resource (or associated change).apply_changes_in_loop
anddelete_stack
CFN_LEGACY_TEMPLATE_DEPLOYER
flag to revert back to old behaviourcreate_changeset
where references to non-existent resources leads to a validation error during the call tocreate_changeset
as per AWS, where before it would fail the stack deploy.resolve_dependencies
forFn::Join
use caseNone
in a progress event from thedelete
handlers:CDK::Metadata
AWS::IAM::User
AWS::Lambda::Alias