12. October 2023 By Dominik Táskai
Lambda-backed Providers and Custom Resources in AWS CDK
Even if you are a seasoned AWS CDK/CloudFormation veteran, it has most likely happened to you that you tried to work with a resource that was not yet available in CDK/CloudFormation (looking at you Lake Formation). Luckily if the resource and the accompanying operations are available through the AWS API then you are in good luck as AWS has a way to incorporate these API calls into your templates/code and bridge the gap created by the unavailability of some resources.
Custom Resources
The first solution to our aforementioned issue is the `AwsCustomResource` construct included in the `aws-cdk-lib.custom_resources` package.
const listObjects = new cr.AwsCustomResource(this, 'ListObjectsInBucket', {
onCreate: {
service: 'S3',
action: 'listObjectsV2',
parameters: {
Bucket: 'demo-bucket',
MaxKeys: 3,
},
physicalResourceId: cr.PhysicalResourceId.of('demo-bucket'),
},
policy: cr.AwsCustomResourcePolicy.fromSdkCalls({
resources: cr.AwsCustomResourcePolicy.ANY_RESOURCE,
}),
});
This construct can fill most gaps, but only allows us to perform a single API call per CloudFormation event, so one call each for the `CREATE`, `UPDATE` and `DELETE` events. The data returned by the API call can be extracted using the `Fn::GetAtt` intrinsic function, so going back to the example above, we can access the items returned by our Custom Resource the following way:
const bucketObjects = Fn.getAtt(listObjects, "Contents")
Hint: the API calls use the underlying AWS SDK, so it follows the response syntax from the standard API, this way you can easily deduce how to extract your desired object(s).
Physical Resource IDs
A very important aspect of both Providers and Custom Resources are the physical resource IDs attached to them and it is recommended that you are at least somewhat familiar with them as it will make it easier to debug your resources when an anomaly during the resource lifecycle occurs.
Every CloudFormation resource has a physical resource ID tied to it, which is returned from the `Create` operation and is assigned to the logical ID defined for the resource.
When an `Update` operation happens and a different physical resource ID is returned, then CloudFormation will treat it as a resource replacement and issue the `Delete` operation for the old resource.
This is the reason why when you change a specific property in a construct, in some cases the old one is deleted and a new is recreated in its place with the new properties in place.
Here is a quick example to give a better understanding:
const listBuckets = new cr.AwsCustomResource(this, 'ListBuckets', {
// onUpdate is called for a CREATE event if no onCreate behaviour is specified explicitly
onUpdate: {
service: 'S3',
action: 'listBuckets',
physicalResourceId: cr.PhysicalResourceId.of(Date.now().toString()),
},
policy: cr.AwsCustomResourcePolicy.fromSdkCalls({
resources: cr.AwsCustomResourcePolicy.ANY_RESOURCE
}),
});
const buckets = Fn.getAtt(listBuckets, "Buckets")
In the above example if we would have used a simple string value as the physical resource ID of the resource then we would have no way of knowing whether there were any new buckets created or old ones deleted in our account, but by giving it a new value each time the resource is deployed (through `Date.now().toString()`), the old resource will be deleted and a new one will be created and the API call will be executed again, containing the most recent status of our buckets in our account.
Providers
As mentioned Custom Resources can only execute a single API call per event, but what if you want to do multiple operations on one event? Providers to the rescue!
The `Provider` construct (included in `@aws-cdk/custom-resources`) is a mini-framework for creating Providers for Custom Resources. The framework allows you to package all your business logic for each type of event in a Lambda function and have the framework take the wheel in handling the details.
There is of course a gotcha to all this, you have to write your Lambda functions in a specific way, which is just implementing an `onEvent` handler and handling all three event cases.
def on_event(event, context):
print(event)
request_type = event['RequestType']
if request_type == 'Create':
return on_create(event)
if request_type == 'Update':
return on_update(event)
if request_type == 'Delete':
return on_delete(event)
raise Exception(f"Invalid request type: {request_type}")
def on_create(event):
props = event["ResourceProperties"]
physical_resource_id = "my-new-custom-resource"
return { 'PhysicalResourceId': physical_id }
def on_update(event):
physical_resource_id = event["PhysicalResourceId"]
props = event["ResourceProperties"]
props_old = event["OldResourceProperties"]
return {
"PhysicalResourceId": physical_id,
"Data": {"Status": "onUpdate"},
}
def on_delete(event):
physical_resource_id = event["PhysicalResourceId"]
return {
"PhysicalResourceId": physical_id,
"Data": {"Status": "onDelete"},
}
In case of an `UPDATE` event you also have access to both the old resource properties and the new ones, so in case of an IAM policy update you can do asserts on what were the original permissions and the new ones.
For all types of events, a `PhysicalResourceId` is always implicitly returned, unless you want to manage it yourself. If omitted then in case of a `CREATE` event, the `RequestId` will be used. For both `UPDATE` and `DELETE` events the original `PhysicalResourceId` is returned. It is important that if you explicitly return a different `PhysicalResourceId` for an `UPDATE` event, then a subsequen `DELETE` event will be provisioned and your implmenetation for the `DELETE` event will be called.
You also have an option to return data from your Custom Resource, by specifying a `Data` object in your return value. The data included in the object can be acquired with the aforementioned `Fn::GetAtt` function.
Asynchronous Providers
Some of the resource provisioning APIs include asynchronous operations, as in you need to "wait" until the provisioning finishes. A good example would be the Athena query execution, for which you have to request the query results/status periodically after starting the query execution, simply put it doesn't block until the query execution finishes.
The provider framework has a functionality which allows you to implement a Lambda function, just for checking the operation status and the framework will only submit a "SUCCESS" signal to CloudFormation if the Lambda executes successfully, the Stack will stay in a "RUNNING" state up until then.
A simple implementation of the handler would look like the following:
def is_complete(event, context):
physical_resource_id = event["PhysicalResourceId"]
request_type = event["RequestType"]
response = athena_client.get_query_execution(QueryExecutionId=execution_id)
is_ready = response["QueryExecution"]["Status"]["State"] == "SUCCEEDED"
return { 'IsComplete': is_ready }
Lambda handlers will be retried periodically, until they return `{'IsComplete: true'}`. The retry interval can be specified through the `queryInterval` parameter in the Provider construct, and it will be retried up until it either hits the maximum 2 hour limit or the value set through the `totalTimeout` parameter.
const myCustomProvider = new cr.Provider(this, 'MyCustomProvider', {
onEventHandler: onEventLambda,
isCompleteHandler: isCompleteLambda,
queryInterval: Duration.seconds(5),
totalTimeout: Duration.minutes(30), // Can exceed the 15 minute Lambda timeout (max 2 hrs)
role: providerRole
});
Conclusion
Both Providers and Custom Resources, bundled together with AWS CDK, provide us a great toolbox for dealing with the imperfections of CloudFormation/CDK, especially if you are working in an environment which makes quick use of newly released AWS services and features. Make sure to also check the official documentation for further gotchas, as state handling and especially physical resource IDs can easily turn your world upside down.