Troubleshoot KMS Permissions for Encrypted EBS-Backed AMIs Launched in Secondary Account ASG

Tony Tannous
6 min readJun 15, 2023

In most cases, troubleshooting EC2 issues starts with reviewing system logs through the AWS console, or by connecting to the instance via SSH/Session Manager, but what happens when an instance fails to launch?

The following is a sample error message observed from an instance which had failed to launch:

State transition reason:Client.InternalError
State transition message: Client.InternalError: Client error on launch

It turned out the error was caused by missing KMS permissions. Ok, what does “missing KMS permissions mean”?

This article is all about getting down to the level of granularity required to zone in on the root cause, and finally, a resolution.

Two AWS accounts are used throughout the article for the sake of explanations. They are referenced as follows:

AWS Account id :111122223333 --> secondary account

AWS Accountid : 444455556666 --> primary account

The scenario leading to the error mentioned above can be summarised as follows (some resource names have been altered):

  • An encrypted EBS-backed AMI image with
    "ImageId": "ami-0ad79b02b5570cdcc" has been created in primary account
  • The KMS Customer managed key (CMK) used to encrypt the AMI/EBS snapshot within primary account has an ARN of:
arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337
  • The AMI has been shared with the secondary account
  • Policy for CMK includes required statements to allow access to the secondary account
  • When attempting to launch the AMI from the secondary account within an autoscaling group (ASG), the following error message is encountered Client.InternalError: Client error on launch

Understanding the Architecture Components

The following diagram is a visual representation of the architecture to keep in mind during the troubleshooting/setup process.

Green circles (numbered 1 to 6) are assigned to various components to assist with explanations/considerations in the sections to follow.

Each section includes the respective number(s) as depicted in the diagram.

(1) KMS

KMS Key

Starting with circle numbered (1):

  • If creating an EBS encrypted AMI with the aim of sharing with secondary account, then CMK must be used
  • AWS managed keys cannot be used for encrypted AMIs intended for cross account sharing

KMS key policy

The KMS CMK key policy needs to include following statements.

  • For our sample key with ARN:

arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337

The Additional key policy statements required are:

...

{
"Sid": "Allow use of the key",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::111122223333:root"
]
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
},
{
"Sid": "Allow attachment of persistent resources",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::111122223333:root"
]
},
"Action": [
"kms:CreateGrant",
"kms:ListGrants",
"kms:RevokeGrant"
],
"Resource": "*"
}
...

(2&3) Baked AMI Created using Correct KMS Key

Numbered circles 2 & 3 in the diagram are there to serve as a reminder that during the AMI image creation, the correct CMK should be used.

For the sample AMI with ImageId: ami-0ad79b02b5570cdcc the KMS key id should match that mentioned in the previous section:

(4) Ensure the AMI has been Shared

Check the AMI has been shared. Explicitly sharing the snapshot is not required.

ImageId: ami-0ad79b02b5570cdcc

To share the AMI with the secondary account:

aws ec2 modify-image-attribute \
--image-id ami-0ad79b02b5570cdcc \
--launch-permission "Add=[{UserId=111122223333}]"

Checkpoint

At this point, it would seem that almost everything is covered, however, attempting to launch the AMI in the secondary account’s ASG leads to:

Following suggested troubleshooting step mentioned at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/troubleshooting-launch.html by running the following for the instance in question does not lead to any solid clues:

aws --profile sandbox_admin ec2 describe-instances \
--instance-id i-08eac0xxxxxxxxxxx \
--region us-east-1

output:

{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"AmiLaunchIndex": 0,
"ImageId": "ami-0ad79b02b5570cdcc",
"InstanceId": "i-08eac0xxxxxxxxxxx",
...
...
"State": {
"Code": 48,
"Name": "terminated"
},
"StateTransitionReason": "Client.InternalError",
"Architecture": "x86_64",
"BlockDeviceMappings": [],
...
...
"StateReason": {
"Code": "Client.InternalError",
"Message": "Client.InternalError: Client error on launch"
},
...
}

Create a Trail (AWS CloudTrail)

Digging for more diagnostics in the standard CloudTrail logs for the secondary account didn’t reveal anything obvious.

I decided to create a trail, hoping this would enrich existing logs with messages leading to further clues. The trail would send logs to a nominated S3 bucket (trail log location), and was configured to capture all API activity (including AWS KMS events) along withEC2 instance connect endpoint data events.

After enabling the trail, I attempted launching the AMI once again (using a launch template configured to use the ASG).

Not knowing the exact intervals at which CloudTrail “flushes” data to S3, I waited around 5 minutes before downloading the logs from S3.

Running a quick grep -i against the logs for strings matching any of the following kms,access,error, narrowed down the output to:

  {
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "XXXXXXXXXXXXXXX:AutoScaling",
"arn": "arn:aws:sts::111122223333:assumed-role/AWSServiceRoleForAutoScaling/AutoScaling",
"accountId": "111122223333",
...
"sessionContext": {
"sessionIssuer": {
"type": "Role",
...
"arn": "arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"accountId": "111122223333",
"userName": "AWSServiceRoleForAutoScaling"
...
...
"invokedBy": "autoscaling.amazonaws.com"
...
...
"userAgent": "autoscaling.amazonaws.com",
"errorCode": "AccessDenied",
"errorMessage": "The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.",
....
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "111122223333",
"eventCategory": "Management"
}

From the above log, we can see that the autoscaling service-linked role was denied access when attempting to access the KMS key, i.e:

...
"arn": "arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"accountId": "111122223333",
"userName": "AWSServiceRoleForAutoScaling"
...
...
"invokedBy": "autoscaling.amazonaws.com"
...
"errorMessage": "The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.",

After reading through more documentation, it became clear that a KMS grant was required to allow this role access to the CMK.

Getting back to our green numbered circles in the architecture diagram, this would be our number (5).

(5) KMS Key Grant for Autoscaling Service

The Service-linked role ARN for the autoscaling service was listed in the previous trail log:

arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling

To grant this role access to the CMK, the following would need to be fulfilled:

“If you create a customer managed key in a different account than the Auto Scaling group, you must use a grant in combination with the key policy to allow cross-account access to the key.”

i.e, the required grant would be:

aws --profile sandbox_admin kms create-grant \
--key-id arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337 \
--grantee-principal \
arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling \
--operations Decrypt \
GenerateDataKeyWithoutPlaintext \
ReEncryptFrom \
ReEncryptTo \
CreateGrant

output:


{
"GrantToken": "AQpAZTZm....",
"GrantId": "cff2...."
}

To list the grant after it has been applied:

{
"Grants": [
{
"KeyId": "arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337",
...
{
"KeyId": "arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337",
"GrantId": "cff2....",
...
"GranteePrincipal": "arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"IssuingAccount": "arn:aws:iam::111122223333:root",
"Operations": [
"Decrypt",
"GenerateDataKeyWithoutPlaintext",
"ReEncryptFrom",
"ReEncryptTo",
"CreateGrant"
....

Launch Instance using AMI (6)

Attempting to launch an instance from the AMI, by ensuring the ASG values of Desired Capacity=1 and Maximum capacity>=1 are set, leads to a successful boot without any errors being reported.

--

--

Tony Tannous

Learner. Interests include Cloud and Devops technologies.