documentation.pdf

 / Getting Started  Previous Next  Getting Started Also see the README in the GitHub repository . Install Cloud Custodian Explore Cloud Custodian Cloud Provider Specific Help Monitor resources Tab Completion Community Resources Install Cloud Custodian These instructions will install Cloud Custodian. Cloud Custodian is a Python application that supports Python 3 on Linux, MacOS and Windows. We recommend using at least the minimum supported version of Python. If python3 --version shows a Python version that is not actively supported  and the steps above don’t apply to your environment, you can still install a current release of Python manually. This guide  may be a useful reference. NOTE: Some Enterprise/Long Term Support Linux distributions may support Python longer than the upstream support period. If you are in this situation it might be prudent to contact your vendor to see if there are other supported ways to use a more recent version. In some cases using the Docker container might be an appropriate solution. NOTE: Ensure you install the correct follow-on package depending on the cloud you are deploying to, otherwise you won’t have the right modules for that specific cloud. Linux and Mac OS To install Cloud Custodian: python3 -m venv custodian source custodian/bin/activate pip install c7n # This includes AWS support To install Cloud Custodian for Azure, you will also need to run: pip install c7n-azure # Install Azure package To install Cloud Custodian for GCP, you will also need to run: pip install c7n-gcp # Install GCP Package To install Cloud Custodian for Oracle Cloud Infrastructure (OCI), you will also need to run: pip install c7n-oci # Install OCI Package Windows (CMD/PowerShell) To install Cloud Custodian, run: python3 -m venv custodian.\custodian\Scripts\Activate.ps1 # For Powershell users #.\custodian\Scripts\activate.bat # Or use this for CMD users pip install c7n # This includes AWS support To install Cloud Custodian for Azure, you will also need to run: pip install c7n-azure To install Cloud Custodian for GCP, you will also need to run: pip install c7n-gcp Docker To install via Docker, run: docker pull cloudcustodian/c7n You’ll need to export your cloud provider credentials to the container when executing. For example, if you’re using environment variables for provider credentials: docker run -it \ -v $(pwd)/output:/home/custodian/output \ -v $(pwd)/policy.yml:/home/custodian/policy.yml \ --env-file schema.json Next install a YAML plug-in for your editor, like YAML for Visual Studio Code  or coc-yaml for coc.nvim . Both plugins use the yaml-language-server  under the hood. You’ll then need to configure your plug-in to use the generated schema.json as the schema for your policy files. For example in Visual Studio Code, navigate to the settings for the YAML plug-in and under Schemas, edit configuration file and add the following schema configuration: "yaml.schemas": { "./schema.json": "*yml" }, Note the path to schema.json can either be either relative or the full path. You’ll now have completion and validation while authoring policies. Note if you’re authoring policies in json you can also configure the json-language-server for the same. Also, if you’re seeing errors like 'Request textDocument/hover failed with message: Cannot read property '$ref' of null' file. try re-creating your schema.json Tab Completion To enable command-line tab completion for custodian on bash do the following one-time steps: Run: activate-global-python-argcomplete Now launch a new shell (or refresh your bash environment by sourcing the appropriate file). Community Resources We have a regular community meeting that is open to all users and developers of every skill level. Joining the mailing list  will automatically send you a meeting invite. See the notes below for more technical information on joining the meeting. Community Meeting Videos  Community Meeting Notes Archive  Troubleshooting If you get an error about “complete -D” not being supported, you need to update bash. See the “Base Version Compatability” note in the argcomplete docs  : If you have other errors, or for tcsh support, see the argcomplete docs . If you are invoking custodian via the python executable tab completion will not work. You must invoke custodian directly.  / Generic Filters  Previous Next  Generic Filters The following filters can be applied to all policies for all resources. See the provider specific resource reference for additional information. Value Filter Cloud Custodian provides for a flexible query language on any resource by allowing for rich queries on JSON objects via JMESPath, and allows for mixing and combining those with boolean conditional operators that are nest-able. (Tutorial here on JMESPath  syntax) The base value filter enables the use of jmespath with data returned from a describe call. filters: - type: value key: "State" describe call value: "running" filtered against ─▶ The value from the ─▶ Value that is being There are several ways to get a list of possible keys for each resource. Via Custodian CLI Create a new custodian yaml file with just the name and resource fields. Then run custodian run -s OUTPUT_DIR. The valid key fields can be found in the output directory in resources.json policies: - name: my-first-policy resource: aws.ec2 Via Cloud Providers CLI Use the relevant cloud provider cli to run the describe call to view all available keys. For example using aws cli run aws ec2 describe-instances or with azure az vm list. Note: You do not need to include the outermost json field in most cases since custodian removes this field from the results. Via Cloud Provider Documentation Go to the relevant cloud provider sdk documentation and search for the describe api call for the resource you’re interested in. The available fields will be listed under the results of that api call. Special Values These meta-values can be used to test whether or not a resource contains a specific value, and if the value is empty. absent : matches when a key does not exist present empty : matches when a key does exist : matches when a value is false, empty, or missing not-null : matches when a value exists, and is not false or empty Consider an S3 bucket with this abbreviated set of attributes: { "Name": "my_bucket", "Versioning": {}, "Tags": [{ "Environment": "dev", "Owner": "" }] } All of the following filters would match this resource: filters: - "tag:Environment": "dev" - "tag:Environment": "not-null" - "tag:Environment": "present" - "tag:Owner": "empty" - "tag:Owner": "present" - "tag:Team": "empty" - "tag:Team": "absent" - "Versioning": "empty" - "Versioning": "present" - "Versioning.Status": "empty" - "Versioning.Status": "absent" Comparison Operators The generic value filter allows for comparison operators to be used equal or eq or ne greater-than or not-equal gte or ge less-than or le not-in or lte gt or lt in ni contains filters: - type: value key: CpuOptions.CoreCount from the describe call value: 36 is being compared op: greater-than Operator ─▶ The value ─▶ Value that ─▶ Comparison Logical Operators or and or or Or And not filters: - or: Operator - type: value key: CpuOptions.CoreCount from the describe call value: 36 is being compared - type: value key: CpuOptions.CoreCount from the describe call value: 42 is being compared ─▶ Logical ─▶ The value ─▶ Value that ─▶ The value ─▶ Value that List Operators There is a collection of operators that can be used with user supplied lists. The operators are evaluated as value from key in (the operator) given value. If you would like it evaluated in the opposite way given value in (the operator) value from key then you can include the contains swap transformation or use the operator. in not-in or ni contains intersect - Match if two lists share any elements difference - Match if the first list has any values not in the second list This filter only matches resources whose ImageId property appears in a predefined list: filters: - type: value key: ImageId from the describe call op: in operator value: [ID-123, ID-321] Values to be compared against ─▶ The value ─▶ List ─▶ List of Some resource properties are lists themselves. For example, EC2 instances can have multiple security groups. For the next few examples, assume the filters are evaluating three instances: Instance Security Group Names instance1 default, common, custom instance2 common, custom, extra instance3 common This filter matches contains the instance1 default , whose security group list group: filters: - type: value key: SecurityGroups[].GroupName op: contains value: default The difference operator can find instances with security groups that don’t appear in a predefined list. This filter matches default and extra security groups: instance1 and instance2 , because aren’t in the list of expected filters: - type: value key: SecurityGroups[].GroupName op: difference value: - common - custom value_type: swap can invert that logic, checking to see if the predefined list has any values that don’t appear on an instance. This filter matches missing the custom instance3 security group: filters: - type: value key: SecurityGroups[].GroupName op: difference value: - common - custom value_type: swap , because it is Pattern Matching Operators glob regex - Provides Glob matching support - Provides Regex matching support but ignores case (1) regex-case - Provides case sensitive Regex matching support (1) filters: - type: value key: FunctionName ─▶ The value from the describe call, or resources.json op: regex ─▶ Special operator value: '(custodian|c7n)_\w+' ─▶ Regex string: match all values beginning with custodian_ or c7n_ - type: value key: name ─▶ The value from the describe call, or resources.json op: regex ─▶ Special operator value: '^.*c7n.*$' ─▶ Regex string: match all values containing c7n - type: value key: name ─▶ The value from the describe call, or resources.json op: regex ─▶ Special operator value: '^((?!c7n).)*$' ─▶ Regex string: match all values not containing c7n 1. These operators are implemented using re.match a filter isn’t working as expected take a look at the re  documentation.. If Value Type Transformations Transformations on the value can be done using the value_type keyword. The following value types are supported: age - convert to a datetime (for past date comparisons) cidr - parse an ipaddress cidr_size expiration - the length of the network prefix - convert to a datetime (for future date comparisons) integer - convert the value to an integer normalize - convert the value to lowercase resource_count - compare against the number of matched resources size - the length of an element swap - swap the value and the evaluated key date - parse the filter’s value as a date. Note that the age and expiration transformations expect a value given as a number of days. Use a floating point value to match time periods shorter than a day. Examples: # Get the size of a group - type: value key: SecurityGroups[].GroupId value_type: size value: 2 # Membership example using swap - type: value key: SecurityGroups[].GroupId value_type: swap op: in value: sg-49b87f44 # Convert to integer before comparison - type: value key: tag:Count op: greater-than value_type: integer value: 0 # Apply only to rds instances created after the given date - type: value key: InstanceCreateTime op: greater-than value_type: date value: "2019/05/01" # Find instances launched within the last 31 days - type: value key: LaunchTime op: less-than value_type: age value: 32 # Find instances launched within the past 12 hours - type: value key: LaunchTime op: less-than value_type: age value: 0.5 # Use `resource_count` to filter resources based on the number that matched # Note that no `key` is used for this value_type since it is matching on # the size of the list of resources and not a specific field. - type: value value_type: resource_count op: lt value: 2 # This policy will use `intersect` op to compare rds instances subnet group list # against a user provided list of public subnets from a s3 txt file. - name: find-rds-on-public-subnets-using-s3-list comment: | The txt file needs to be in utf-8 no BOM format and contain one subnet per line in the file no quotes around the subnets either. resource: aws.rds filters: - type: value key: "DBSubnetGroup.Subnets[].SubnetIdentifier" op: intersect value_from: url: s3://cloud-custodianbucket/PublicSubnets.txt format: txt # This policy will compare rds instances subnet group list against a # inline user provided list of public subnets. - name: find-rds-on-public-subnets-using-inlinelist resource: aws.rds filters: - type: value key: "DBSubnetGroup.Subnets[].SubnetIdentifier" op: intersect value: - subnet-2a8374658 - subnet-1b8474522 - subnet-2d2736444 Additional JMESPath Functions Cloud Custodian supports additional custom JMESPath functions, including: split(seperator, input_string) -> list[str] : takes 2 arguments, the seperator token as well as the input string. Returns a list of strings. policies: - name: copy-related-tag-with-split resource: aws.log-group filters: - type: value key: logGroupName value: "/aws/lambda/" op: in value_type: swap actions: - type: copy-related-tag resource: aws.lambda # split the log group's name to get the lambda function's name key: "split(`/`, logGroupName)[-1]" tags: "*" Value Regex When using a Value Filter, a can be value_regex specified. This will mean that the value used for comparison is the output from evaluating a regex on the value found on a resource using key. The filter expects that there will be exactly one capturing group, however non-capturing groups can be specified as well, e.g. (?:newkey|oldkey). Note that if the value regex does not find a match, it will return a None value. In this example there is an expiration comparison, which needs a datetime, however the tag containing this information also has other data in it. By setting the value_regex to capture just the datetime part of the tag, the filter can be evaluated as normal. # Find expiry from tag contents - type: value key: "tag:metadata" value_type: expiration value_regex: ".*delete_after=([0-9]{4}-[0-9]{2}[0-9]{2}).*" op: less-than value: 0 Value From value_from allows the use of external values in the Value Filter Retrieve values from a url. Supports json, csv and line delimited text files and expressions to retrieve a subset of values. Expression syntax - on json, a jmespath expr is evaluated - on csv, an integer column or jmespath expr can be specified - on csv2dict, a jmespath expr (the csv is parsed into a dictionary where the keys are the headers and the values are the remaining columns) Text files are expected to be line delimited values. Examples: value_from: url: s3://bucket/xyz/foo.json expr: [].AppId value_from: url: http://foobar.com/mydata format: json expr: Region."us-east-1"[].ImageId headers: authorization: my-token value_from: url: s3://bucket/abc/foo.csv format: csv2dict expr: key # inferred from extension format: [json, csv, csv2dict, txt] Value Path Retrieve values using JMESPath. The filter expects that a properly formatted ‘string’ is passed containing a valid JMESPath. (Tutorial here on JMESPath  syntax) When using a Value Filter, a value_path can be specified. This means the value(s) the filter will compare against are calculated during the initialization of the filter. Note that this option only pulls properties of the resource currently being filtered. - name: find-admins-with-user-roles resource: gcp.project filters: - type: iam-policy doc: key: bindings[? (role=='roles/admin')].members[] op: intersect value_path: bindings[? (role=='roles/user_access')].members[] The iam-policy uses the generic Value Filter implementation. This implementation allows for the comparison of two separate lists of values within the same resource. List Item Filter The list-item filter makes it easier to evaluate resource properties that contain a list of values. Example 1: AWS ECS Task Definitions AWS ECS task definitions include a list of container definitions. This policy matches a task definition if any of its container images reference an image from outside a given account and region: - name: find-task-def-not-using-registry resource: aws.ecs-task-definition filters: - not: - type: list-item key: containerDefinitions attrs: - not: - type: value key: image value: "${account_id}.dkr.ecr.useast-2.amazonaws.com.*" op: regex That check is not possible with the because the against a list. regex value filter alone, operator cannot operate directly Example 2: S3 Lifecycle Rules S3 buckets can have lifecycle policies that include multiple rules. This policy matches buckets that are missing a rule for cleaning up incomplete multipart uploads. - name: s3-mpu-cleanup-not-configured resource: aws.s3 filters: - not: - type: list-item key: Lifecycle.Rules[] attrs: - Status: Enabled AbortIncompleteMultipartUpload.DaysAfterInitiation: not-null Here the list-item filter ensures that we check a combination of multiple properties for each individual lifecycle rule. Event Filter Filter against a CloudWatch event JSON associated to a resource type. The list of possible keys are now from the cloudtrail event and not the describe resource call as is the case in the ValueFilter - name: no-ec2-public-ips resource: aws.ec2 mode:make type: cloudtrail events: - RunInstances filters: - type: event # The key is a JMESPath Query of the event JSON from CloudWatch. key: "detail.requestParameters.networkInterfaceSet.items[] # The key expression returns a list. Combining "op: contains" with "value: true" # allows this filter to match if any network interface has a public IP address. op: contains value: true actions: - type: terminate force: true Reduce Filter The reduce filter lets you group, sort, and limit the number of resources to act on. Maybe you want to delete AMIs, but want to do it in small batches where you act on the oldest AMIs first. Or maybe you want to do some chaos engineering and randomly select ec2 instances part of ASGs, but want to make sure no more than one instance per ASG is affected. This filter lets you do that. This works using this process: 1. Group resources 2. Sort each group of resources 3. Selecting a number of resources in each group 4. Combine the resulting resources Grouping resources Resources are grouped based on the value extracted as defined by the group-by attribute. All resources not able to extract a value are placed in a group by themselves. This is also the case when group-by is not specified. Sorting resources Sorting of individual resources within a group is controlled by a combination of the sort-by sort-by and order attributes. determines which value to use to sort and order controls how they are sorted. For any resources with a null value, those are by default sorted last. You can optionally sort those first with the null-order Note: if neither or sort-by order attribute. are specified, no sorting is done. Selecting resources Once groups have been sorted, we can then apply rules to select a specific number of resources in each group. We first discard some resources and then limit the remaining set to a maximum count. When the discard or discard-percent attributes are specified, we take the ordered resources in each group and discard the first discard-percent of them or discard absolute count, whichever is larger. After discarding resources, we then limit the remaining set. limit-percent is applied first to reduce the number of resources to this percentage of the original. limit is then applied to allow for an absolute count. Resources are kept from the beginning of the list. To explain this with an example, suppose you have 50 resources in a group with all of these set: discard: 5 discard-percent: 20 limit: 10 limit-percent: 30 This would first discard the first 10 resources because 20 percent of 50 is 10, which is greater than 5. You now have 40 resources left in the group and the limit settings are applied. 30% of 40 is 12, but limit is set to 10, which is lower, so the first 10 of the remaining are kept. If they were numbered #1-50, you’d have discarded 1-10, kept 11-20, and dropped the remaining 21-50. If you had the following settings: discard-percent: 25 limit-percent: 50 We’d discard the first 25% of 50 (12), then of the remaining 38 resources, we’d keep 50% of those (19). You’d end up with resources 13-31. Now, some of these could eliminate all resources from a group. If you have 20 resources in one group and 5 in another and specify limit-percent = 10 , you’ll get 2 resources from the first group and 0 resources from the second. Combining resource groups Once the groups have been modified, we now need to combine them back to one set of resources. Since the groups are determined by a JMESPath expression, we sort the groups first based on the order attribute the same way we sort within a group. After the groups are sorted, it’s a simple concatenation of resources. Attributes group-by , sort-by These are both defined the same way… Note: For simplicity, you can specify these as just a single string which is treated as the key key. - The JMESPath expression to extract a value - A regular expression with a single value_regex capture group that extracts a portion of the result of the key expression. - parse the value as one of the following: value_type string (default) number date order controls how to sorting is done (default) - sort in ascending order based on asc key desc - sort in descending order based on reverse key - reverse the order of resources (ignores ) key randomize (ignores null-order - randomize the order of resources key ) - when sorting, where to put resources that have a null value last first discard (default) - at the end of the list - at the start of the list - discard the first N resources within each group discard-percent - discard the first N percentage of resources within each group limit - select the first N resources within each group (after discards) limit-percent - select the first N percentage of resources within each group (after discards) Examples This example will select the longest running instance from each ASG, then randomly choose 10% of those, making sure to not affect more than 15 instances total, then terminate them. - name: chaos-engineering resource: aws.ec2 filters: - "State.Name": "running" - "tag:aws:autoscaling:groupName": present - type: reduce group-by: "tag:aws:autoscaling:groupName" sort-by: "LaunchTime" order: asc limit: 1 - type: reduce order: randomize limit: 15 limit-percent: 10 actions: - terminate This example will delete old AMIs, but make sure to only do the top 10 based on age. - name: limited-ami-expiration resource: aws.ami filters: - type: image-age days: 180 op: ge - type: reduce sort-by: "CreationDate" order: asc limit: 10 actions: - deregister This example simply sorts the resources by when they are marked for expiration. We use a date type because the tags might be in different date formats or are not textsortable. - name: ami-expiration-by-expire-date resource: aws.ami filters: - type: value key: "tag:expire-after" value_type: age op: gt value: 0 - type: reduce sort-by: key: "tag:expire-after" value_type: date order: asc limit: 10 actions: - deregister  / Generic Actions  Previous Next  Generic Actions The following actions can be applied to all policies for all resources. See the provider specific resource references. Webhook Action ¶ The webhook action allows invoking a webhook with information about your resources. You may initiate a call per resource, or a call referencing a batch of resources. Additionally you may define the body and query string using JMESPath references to the resource or resource array. ► Schema: JMESPath queries for query-params, headers and body will have access to the following data: { 'account_id', 'region', 'execution_id', 'execution_start', 'policy', 'resource', ─▶ if Batch == false 'resources', ─▶ if Batch == true } Examples: actions: - type: webhook url: http://foo.com?hook-id=123 ─▶ Call will default to POST query-params: ─▶ Additional query string query-params resource_name: resource.name ─▶ Value is a JMESPath query into resource dictionary policy_name: policy.name actions: - type: webhook url: http://foo.com batch: true call for full resource array body: 'resources[].name' will reference array of resources query-params: count: 'resources[] | length(@)' resource count in query string static-value: '`foo`' string literal in ticks actions: - type: webhook url: http://foo.com batch: true batch-size: 10 method: POST headers: static-value: '`foo`' string literal in ticks query-params: count: 'resources[] | length(@)' ─▶ Single ─▶ JMESPath ─▶ Include ─▶ JMESPath ─▶ JMESPath  / Advanced Usage  Previous Next  Advanced Usage Running against multiple regions Reporting against multiple regions Adding custom fields to reports Limiting how many resources custodian affects Running against multiple regions By default Cloud Custodian determines the region to run against in the following order: the --region flag the AWS_DEFAULT_REGION the region set in the environment variable ~/.aws/config file It is possible to run policies against multiple regions by specifying the --region flag multiple times: custodian run -s out --region us-east-1 --region uswest-1 policy.yml If a supplied region does not support the resource for a given policy that region will be skipped. The special all keyword can be used in place of a region to specify the policy should run against all applicable regions  for the policy’s resource: custodian run -s out --region all policy.yml Note: when running reports against multiple regions the output is placed in a different directory than when running against a single region. See the multi-region reporting section below. Reporting against multiple regions When running against multiple regions the output files are placed in a different location that when running against a single region. When generating a report, specify multiple regions the same way as with the run command: custodian report -s out --region us-east-1 --regionus-west-1 policy.yml A region column will be added to reports generated that include multiple regions to indicate which region each row is from. Conditional Policy Execution Cloud Custodian can skip policies that are included in a policy file when running if the policy specifies conditions that aren’t met by the current environment. The available environment keys are Key Description name Name of the policy region Region the policy is being evaluated in. resource The resource type of the policy. account_id The account id (subscription, project) the policy i provider The name of the cloud provider (aws, azure, gcp, policy The policy data as structure now The current time account When running in c7n-org, current account info p As an example, one can set up policy conditions to only execute between a given set of dates. policies: # other compliance related policies that # should always be running... - name: holiday-break-stop description: | This policy will stop all EC2 instances if the current date is between 12-15-2018 to 12-31-2018 when the policy is run. Use this in conjunction with a cron job to ensure that the environment is fully turned off during the break. resource: ec2 conditions: - type: value key: now op: greater-than value_type: date value: "2018-12-15" - type: value key: now op: less-than value_type: date value: "2018-12-31" filters: - "tag:holiday-off-hours": present actions: - stop - name: holiday-break-start description: | This policy will start up all EC2 instances and only run on 1-1-2019. resource: ec2 conditions: - type: value key: now value_type: date op: greater-than value: "2009-1-1" - type: value key: now value_type: date op: less-than value: "2019-1-1 23:59:59" filters: - "tag:holiday-off-hours": present actions: - start If a policy is executing in a serverless mode, the above environment keys are evaluated during the deployment of the policy using type: value conditions (any type: event conditions are skipped). The execution of the policy will evaluate these again, but will also include the triggering event. These events can be evaluated using a type: event condition. This is useful for cases where you have a more complex condition than can be handled by an event pattern expression, but you want to short-circuit the execution before it queries the resources. For instance, the below example will only deploy the policy to the us-west-2 and us-east-2 regions. The policy will stop execution before querying any resources if the event looks like it was created by a service or automation identity matching a complex regular expression. policies: - name: ec2-auto-tag-creator description: Auto-tag Creator on EC2 if not set. resource: aws.ec2 mode: type: cloudtrail events: RunInstances conditions: - type: value ─▶ evaluated at deployment and execution key: region op: in value: - us-east-2 - us-west-2 - not: - type: event ─▶ evaluated at execution only key: "detail.userIdentity.arn" op: regex-case value: '.* (CloudCustodian|Jenkins|AWS.*ServiceRole|LambdaFunction|\ |\/i-|\d{8,}$)' filters: - "tag:Creator": empty actions: - type: auto-tag-user tag: Creator Limiting how many resources custodian affects Custodian by default will operate on as many resources exist within an environment that match a policy’s filters. Custodian also allows policy authors to stop policy execution if a policy affects more resources than expected, either as a number of resources or as a percentage of total extant resources. policies: - name: log-delete description: | This policy will delete all log groups that haven't been written to in 5 days. As a safety belt, it will stop execution if the number of log groups that would be affected is more than 5% of the total log groups in the account's region. resource: aws.log-group max-resources-percent: 5 filters: - type: last-write days: 5 Max resources can also be specified as an absolute number using max-resources specified on a policy. When executing if the limit is exceeded, policy execution is stopped before taking any actions: custodian run -s out policy.yml custodian.commands:ERROR policy: log-delete exceeded resource limit: 2.5% found: 1 total: 1 If metrics are being published (-m/--metrics) then an additional metric named ResourceCount will be published with the number of resources that matched the policy. Max resources can also be specified as an object with an or or and operator if you would like both a resource percent and a resource amount enforced. policies: - name: log-delete description: | This policy will not execute if the resources affected are over 50% of the total resource type amount and that amount is over 20. resource: aws.log-group max-resources: percent: 50 amount: 20 op: and filters: - type: last-write days: 5 actions: - delete Adding custom fields to reports Reports use a default set of fields that are resource-specific. To add other fields use the --field flag, which can be supplied multiple times. The syntax is: --field KEY=VALUE where KEY is the header name (what will print at the top of the column) and the VALUE is a JMESPath expression accessing the desired data: custodian report -s out --field Image=ImageId policy.yml If hyphens or other special characters are present in the JMESPath it may require quoting, e.g.: custodian report -s. --field "AccessKey1LastRotated"='"c7n:credentialreport".access_keys.last_rotated' policy.yml To remove the default fields and only add the desired ones, the --no-default-fields flag can be specified and then specific fields can be added in, e.g.: custodian report -s out --no-default-fields --field Image=ImageId policy.yml  / Example tag compliance policy  Previous Next  Example tag compliance policy ¶ In this sample policy we are filtering for EC2 instances that are: running, not part of an Auto Scaling Group (ASG), not already marked for an operation, have less than 10 tags, and are missing one or more of the required tags. Once Custodian has filtered the list, it will mark all EC2 instances that match the above criteria with a tag. That tag specifies an action that will take place at a certain time. This policy is one of three that are needed to manage tag compliance. The other two policies in this set are, 1) checking to see if the tags have been corrected before the four day period is up, and 2) performing the operation of stopping all instances with the status to be stopped on that particular day. 1 - name: ec2-tag-compliance-mark 2 resource: ec2 3 comment: | 4 Mark non-compliant, Non-ASG EC2 instances with stoppage in 4 days 5 filters: 6 ▣───────── - "State.Name": running 7 │ ▣─────── - "tag:aws:autoscaling:groupName": absent 8 │ │ ▣───── - "tag:c7n_status": absent 9 │ │ │ ▣─── - type: tag-count 10 │ │ │ │ - or: ─┐ 11 │ │ │ │ - "tag:Owner": absent ├─If any of these tags are 12 │ │ │ │ - "tag:CostCenter": absent │ missing, then select instance 13 │ │ │ │ - "tag:Project": absent ─┘ 14 │ │ │ │ 15 │ │ │ │ actions: ─────────────────▶ For selected instances, run this action 16 │ │ │ │ - type: mark-for-op ────▶ Mark instance for operation 17 │ │ │ │ tag: c7n_status ──────▶ Use the "c7n_status" tag instead of the 18 │ │ │ │ legacy default "maid_status" 19 │ │ │ │ op: stop ─────────────▶ Stop instance 20 │ │ │ │ days: 4 ──────────────▶ After 4 days 21 │ │ │ │ 22 │ │ │ ▣────▶ If instance has 10 tags, skip 23 │ │ ▣──────▶ If instance already has a c7n_status, skip 24 │ ▣────────▶ If instance is part of an ASG, skip 25 ▣──────────▶ If instance is not running, skip  / Deployment  Previous Next  Deployment In this section we will cover a few different deployment options for Cloud Custodian. Compliance as Code ¶ When operating Cloud Custodian, it is highly recommended to treat the policy files as code, similar to that of Terraform or CloudFormation files. Cloud Custodian has a built-in dryrun mode and policy syntax validation which when paired with an automated CI system, can help you release policies with confidence. This tutorial assumes that you have working knowledge of Github, Git, Docker, and a continuous integration tool (Jenkins, Drone, Travis, etc.). To begin, start by checking your policy files into a source control management tool like Github. This allows us to version and enable collaboration through git pull requests and issues. In this example, we will be setting up a new repo in Github. First, set up a new repo in Github and grab the repository url. You don’t need to add a README or any other files to it first. mkdir my-policies cd my-policies git init git remote add origin touch policy.yml Next, we’ll add a policy to our new policy.yml file. policies: - name: aws-vpcs resource: aws.vpc Once you’ve added the policy to your policy file we can stage our changes from our working directory and push it up to our remote: # this should show your policy.yml as an untracked file git status git add policy.yml git commit -m 'init my first policy' git push -u origin master Once you’ve pushed your changes you should be able to see your new changes inside of Github. Congratulations, you’re now ready to start automatically validating and testing your policies! Continuous Integration of Policies Next, enable a CI webhook back to your CI system of choice when pull requests targeting your master branch are opened or updated. This allows us to continuously test and validate the policies that are being modified. In this example, we will be using Microsoft Azure Devops Pipelines. First, navigate to https://azure.microsoft.com/enus/services/devops/pipelines/  and click the “Start pipelines free with Github” button and follow the flow to connect your Github account with Devops Pipelines. Next click on the Pipelines section in the left hand side of the sidebar and connect with Github. Once the pipeline is setup, we can add the following azure devops configuration to our repo: trigger: - master jobs: - job: 'Validate' pool: vmImage: 'Ubuntu-16.04' steps: - checkout: self - task: UsePythonVersion@0 displayName: "Set Python Version" inputs: versionSpec: '3.7' architecture: 'x64' - script: pip install --upgrade pip displayName: Upgrade pip - script: pip install c7n c7n_azure c7n_gcp displayName: Install custodian - script: custodian validate policy.yml displayName: Validate policy file This configuration will install Cloud Custodian and validate the policy.yml file that we created in the previous step. Finally, we can run the new policies against your cloud environment in dryrun mode. This mode will only query the resources and apply the filters on the resources. Doing this allows you to assess the potential blast radius of a given policy change. Setting up the automated dryrun of policies is left as an exercise to the user– this requires hosting your cloud authentication tokens inside of a CI system or hosting your own CI system and using Managed Service Identities (Azure) or Instance Profiles (AWS). It’s important to verify that the results of the dryrun match your expectations. Custodian is a very powerful tool that will do exactly what you tell it to do! In this case, you should always “measure twice, cut once”. IAM Setup To run Cloud Custodian against your account, you will need an IAM role with appropriate permissions. Depending on the scope of the policy, these permissions may differ from policy to policy. For a baseline, the managed read only policies in each of the respective cloud providers will be enough to dryrun your policies. Actions will require additional IAM permissions which should be added at your discretion. For serverless policies, Custodian will need the corresponding permissions to provision serverless functions. In AWS, you will need ReadOnly access as well as the following permissions: { "Version": "2012-10-17", "Statement": [ { "Sid": "CustodianLambdaPermissions", "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface", "ec2:CreateNetworkInterface", "events:PutRule", "events:PutTargets", "iam:PassRole", "lambda:CreateFunction", "lambda:TagResource", "lambda:CreateEventSourceMapping", "lambda:UntagResource", "lambda:PutFunctionConcurrency", "lambda:DeleteFunction", "lambda:UpdateEventSourceMapping", "lambda:InvokeFunction", "lambda:UpdateFunctionConfiguration", "lambda:UpdateAlias", "lambda:UpdateFunctionCode", "lambda:AddPermission", "lambda:DeleteAlias", "lambda:DeleteFunctionConcurrency", "lambda:DeleteEventSourceMapping", "lambda:RemovePermission", "lambda:CreateAlias", "logs:CreateLogStream", "logs:PutLogEvents", "logs:CreateLogGroup" ], "Resource": "*" } ] } Note: These are just the permissions to deploy Custodian Lambda functions, these are not the permissions that are required to run Custodian _in_ the Lambda function. Those roles are defined in the role attribute in the policy or with the assume role used in the cli. Single Node Deployment Now that your policies are stored and available in source control, you can now fill in the next pieces of the puzzle to deploy. The simplest way to operate Cloud Custodian is to start with running Cloud Custodian against a single account (or subscription or project) on a virtual machine. To start, create a virtual machine on your cloud provider of choice. It’s recommended to execute Cloud Custodian in the same cloud provider that you are operating against to prevent a hard dependency on one cloud to another as well being able to utilize your cloud’s best practices for credentials (instance profile, service account, etc). Then, log into the instance and set up Custodian, following the instructions in the Install Cloud Custodian guide. Once you have Cloud Custodian installed, download your policies that you created in the Compliance as Code section. If using git, just simply do a git clone : git clone You now have your policies and custodian available on the instance. Typically, policies that query the extant resources in the account/project/subscription should be run on a regular basis to ensure that resources are constantly compliant. To do this you can simply set up a cron job to run custodian on a set cadence. Monitoring Cloud Custodian Cloud Custodian ships with the ability to emit metrics on policy execution and transport logs to cloud provider native logging solutions. When executing Custodian, you can enable metrics simply by adding the -m flag and the cloud provider: # AWS custodian run -s output -m aws policy.yml # Azure custodian run -s output -m azure policy.yml # GCP custodian run -s output -m gcp policy.yml When you enable metrics, a new namespace will be created and the following metrics will be recorded there: ResourceCount ResourceTime ActionTime To enable logging to CloudWatch logs, Stackdriver, or Azure AppInsights, use the -l flag: # AWS CloudWatch Logs custodian run -s output -l /cloud-custodian/policies policy.yml # Azure App Insights Logs custodian run -s output -l azure://cloudcustodian/policies policy.yml # Stackdriver Logs custodian run -s output -l gcp://cloudcustodian/policies policy.yml You can also store the output of your Custodian logs in a cloud provider’s blob storage like S3 or Azure Storage accounts: # AWS S3 custodian run -s s3://my-custodian-bucket policy.yml # Azure Storage Accounts custodian run -s azure://my-custodian-storage-account policy.yml Mailer and Notifications Deployment For instructions on how to deploy the mailer for notifications, see c7n-mailer: Custodian Mailer. Multi Account Execution For more advanced setups, such as executing Custodian against multiple accounts, we distribute the tool c7n-org. c7n-org utilizes a accounts configuration file and assume roles to operate against multiple accounts, projects, or subscriptions in parallel. More information can be found in c7n-org: Multi Account Custodian Execution. Advanced Continuous Integration Tips When policy files reach a sufficiently large size it can cause dryruns to execute for a significantly long period of time. In most cases, the only thing that actually needs to be tested would be the policies that were changed. The following example will download the cloudcustodian/policystream image and generate a policy file containing only the policies that changed between the most recent commit and master. # in your git directory for policies docker pull cloudcustodian/policystream docker run -v $(pwd):/home/custodian/policies cloudcustodian > policystream-diff.yml custodian run -s output -v --dryrun policystreamdiff.yml After running your new policy file (policystream-diff.yml), the outputs will be stored in the output directory. Additional Resources manheim-c7n-tools  - Manheim’s Cloud Custodian (c7n) wrapper package, policy generator/interpolator, runner, error scanner, and supporting tools.  / Getting Started  Previous Next  Getting Started Write your first policy A policy specifies the following items: The type of resource to run the policy against Filters to narrow down the set of resources Actions to take on the filtered set of resources For this tutorial, let’s stop all EC2 instances that are tagged with Custodian. To get started, go make an EC2 instance in your AWS console  , and tag it with the key Custodian (any value). Also, make sure you have an access key handy. Then, create a file named custodian.yml policies: - name: my-first-policy resource: aws.ec2 filters: - "tag:Custodian": present with this content: At this point, we have specified the following things: 1. The name of the policy 2. The resource type to query against, in this case (aws.ec2) 3. The filters list 4. The Custodian tag filter Running this policy will not execute any actions as the actions list does not exist. We can extend this example to stop the instances that are actually filtered in by the Custodian tag filter by simply specifying the stop action: policies: - name: my-first-policy resource: aws.ec2 filters: - "tag:Custodian": present actions: - stop Run your policy Now, run Custodian: AWS_ACCESS_KEY_ID="foo" AWS_SECRET_ACCESS_KEY="bar" custodian run --output-dir=. custodian.yml Note: If you already have AWS credentials configured for AWS CLI or SDK access, then you may omit providing them on the command line. If successful, you should see output similar to the following on the command line: 2016-12-20 08:35:06,133: custodian.policy:INFO Running policy my-first-policy resource: ec2 region:us-east-1 c7n:0.8.21.2 2016-12-20 08:35:07,514: custodian.resources.ec2:INFO Filtered from 3 to 1 ec2 2016-12-20 08:35:07,514: custodian.policy:INFO policy: my-first-policy resource:ec2 has count:1 time:1.38 2016-12-20 08:35:07,515: custodian.actions:INFO Stop 1 of 1 instances 2016-12-20 08:35:08,188: custodian.policy:INFO policy: my-first-policy action: stop resources: 1 execution_time: 0.67 You should also find a new my-first-policy directory with a log and other files (subsequent runs will append to the log by default rather than overwriting it). Lastly, you should find the instance stopping or stopped in your AWS console. Congratulations, and welcome to Custodian! See our extended example of a policy’s structure tag compliance policy, or browse all of our use case recipes. A 2nd Example Policy First a role must be created with the appropriate permissions for custodian to act on the resources described in the policies yaml given as an example below. For convenience, an example policy  is provided for this quick start guide. Customized AWS IAM policies will be necessary for your own custodian policies To implement the policy: 1. Open the AWS console 2. Navigate to IAM -> Policies 3. Use the json option to copy the example policy  as a new AWS IAM Policy 4. Name the IAM policy as something recognizable and save it. 5. Navigate to IAM -> Roles and create a role called CloudCustodian-QuickStart 6. Assign the role the IAM policy created above. 7. Now with the pre-requisite completed; you are ready continue and run custodian. A custodian policy file needs to be created in YAML format, as an example policies: - name: s3-cross-account description: | Checks S3 for buckets with cross-account access and removes the cross-account access. resource: s3 conditions: - region: us-east-1 filters: - type: cross-account actions: - type: remove-statements statement_ids: matched - name: ec2-require-non-public-and-encrypted-volumes resource: aws.ec2 description: | Provision a lambda and cloud watch event target that looks at all new instances and terminates those with unencrypted volumes. mode: type: cloudtrail role: CloudCustodian-QuickStart events: - RunInstances filters: - type: ebs key: Encrypted value: false actions: - terminate - name: tag-compliance resource: aws.ec2 description: | Schedule a resource that does not meet tag compliance policies to be stopped in four days. filters: - State.Name: running - "tag:Environment": absent - "tag:AppId": absent - or: - "tag:OwnerContact": absent - "tag:DeptID": absent actions: - type: mark-for-op op: stop days: 4 Given that, you can run Cloud Custodian with # Validate the configuration (note this happens by default on run) custodian validate policy.yml # Dryrun on the policies (no actions executed) to see what resources # match each policy. custodian run --dryrun -s out policy.yml # Run the policy custodian run -s out policy.yml Monitor AWS You can generate CloudWatch metrics by specifying the metrics flag and specifying aws -- : custodian run -s --metrics aws.yml You can also upload Cloud Custodian logs to CloudWatch logs: custodian run --log-group=/cloud-custodian// -s.yml And you can output logs and resource records to S3: custodian run -s s3://.yml If Custodian is being run without Assume Roles, all output will be put into the same account. Custodian is built with the ability to be run from different accounts and leverage STS Role Assumption for cross-account access. Users can leverage the metrics that are being generated after each run by creating Custodian Dashboards in CloudWatch. Troubleshooting & Tinkering If you are not using the us-east-1 region, then you’ll need to specify that as well, either on the command line or in an environment variable: --region=us-west-1 AWS_DEFAULT_REGION=us-west-1  / Example Policies  Previous Next  Example Policies These use cases provide examples of specific policies for individual AWS modules. Account - Login From Invalid IP Address Account - Detect Root Logins Account - Service Limit AMI - Stop EC2 using Unapproved AMIs AutoScaling Group - Verify ASGs have valid configurations AMI - ASG Garbage Collector ASG - Offhours Support Block New Resources In Non-Standard Regions DMS - DB Migration Service Endpoint - Enforce SSL EBS - Garbage Collect Unattached Volumes EBS - Create and Manage Snapshots EBS - Delete Unencrypted EC2 - auto-tag aws userName on resources EC2 - Modify Instance Metadata Options EC2 - Offhours Support EC2 - Old Instance Report EC2 - Power On For Scheduled Patching EC2 - Terminate Unpatchable Instances EIP - Garbage Collect Unattached Elastic IPs ELB - Delete New Internet-Facing ELBs ELB - Delete Unused Elastic Load Balancers ELB - SSL Blacklist ELB - SSL Whitelist IAM - Manage Whether A Specific IAM Policy is Attached to Roles Lambda - Notify On Lambda Errors Example offhours policy RDS - Delete Unused Databases With No Connections RDS - Terminate Unencrypted Public Instances S3 - Configure New Buckets Settings and Standards S3 - Block Public S3 Object ACLs S3 - Encryption S3 - Global Grants S3 - Add lifecycle policy on bucket delete SageMaker Notebook - Delete Public or Unencrypted Security Groups - add permission Security Groups - Detect and Remediate Violations Tag Compliance Across Resources (EC2, ASG, ELB, S3, etc) VPC - Flow Log Configuration Check VPC - Notify On Invalid External Peering Connections  / Example Policies / Account - Login From Invalid IP Address  Previous Next  Account - Login From Invalid IP Address The following example policy will automatically create a CloudWatch Event Rule triggered Lambda function in your account and region which will be triggered anytime a user logs in from an invalid IP address. If the source IP address of the event is outside of the provided ranges in the policy then notify the admins security team for further investigation. Using the cloudtrail mode provides near realtime auto-remediation (typically within 1-2 mins) of the event occurring. Having such a quick auto-remediation action greatly reduces an attack window! By notifying the cloud admins or security team they can validate the login and revoke the login session if it’s not valid followed by changing the password for or disabling the compromised user etc. In the below example the filter being applied is regex and reads as follows: -Notify if the source IP address of the event is not from one of the valid IP CIDRs 158.103.0.0/16 - 142.179.0.0/16 - 187.39.0.0/16 - 12.0.0.0/8 You can generate the Regex for IP ranges on a site like: http://www.analyticsmarket.com/freetools/ipregex  policies: - name: invalid-ip-address-login-detected resource: account description: | Notifies on invalid external IP console logins mode: type: cloudtrail events: - ConsoleLogin filters: - not: - type: event key: 'detail.sourceIPAddress' value: | '^((158\.103\.|142\.179\.|187\.39\.) (?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]) \.(?[0-9]?[0-9]|2[0-4][0-9]|25))|(12\.(?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]) \.(?[0-9]?[0-9]|2[0-4][0-9]|25)\.(?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]))$' op: regex actions: - type: notify template: default.html priority_header: 1 subject: "Login From Invalid IP Detected [custodian {{ account }} - {{ region }}]" violation_desc: "A User Has Logged In Externally From A Invalid IP Address Outside The Company's Range:" action_desc: | "Please investigate and revoke the invalid session along with any other restrictive actions if appropriate" to: - [email protected] - [email protected] transport: type: sqs queue: https://sqs.us-east- 1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1 Note that the notify action requires the cloud custodian mailer tool to be installed.  / Example Policies / Account - Detect Root Logins  Previous Next  Account - Detect Root Logins The following example policy will automatically create a CloudWatch Event Rule triggered Lambda function in your account and region which will be triggered anytime the root user of the account logs in. Typically the root user of an AWS account should never need to login after the initial account setup and root user access should be very tightly controlled with hardware MFA and other controls as root has full control of everything in the account. Having this visibility to see if and when someone logs in as root is very important. policies: - name: root-user-login-detected resource: account description: | Notifies Security and Cloud Admins teams on any AWS root user console logins mode: type: cloudtrail events: - ConsoleLogin filters: - type: event key: "detail.userIdentity.type" value_type: swap op: in value: Root actions: - type: notify template: default.html priority_header: 1 subject: "Root User Login Detected! [custodian {{ account }} - {{ region }}]" violation_desc: "A User Has Logged Into the AWS Console With The Root User:" action_desc: | Please investigate and if needed revoke the root users session along with any other restrictive actions if it's an unapproved root login to: - [email protected] - [email protected] transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1 Note that the notify action requires the cloud custodian mailer tool to be installed.  / Example Policies / Account - Service Limit  Previous Next  Account - Service Limit The following example policy will find any service in your region that is using more than 50% of the limit and raise the limit for 25%. policies: - name: account-service-limits resource: account filters: - type: service-limit threshold: 50 actions: - type: request-limit-increase percent-increase: 25 Noted that the threshold in service-limit filter is an optional field. If not mentioned on the policy, the default value is 80. As there are numerous services available in AWS, you have the option to specify the services you wish to include or exclude, thereby preventing prolonged execution times and unnecessary API calls. Please utilize either of the attributes: “include_service_codes” or “exclude_service_codes”. This special filter only works for aws.service-quota. An example is provided below. policies: - name: service-quota-usage resource: aws.service-quota query: - include_service_codes: - ec2 Global Services Services like IAM are not region-based. Custodian will put the limit information only in us-east-1. When running the policy above in multiple regions, the limit of global services will ONLY be raised in us-east-1. Additionally, if you want to target any the global services on the policy, you will need to target the region as useast-1 on the policy. Here is an example. policies: - name: account-service-limits resource: account conditions: - region: us-east-1 filters: - type: service-limit services: - IAM threshold: 50  / Example Policies / AMI - Stop EC2 using Unapproved AMIs  Previous Next  AMI - Stop EC2 using Unapproved AMIs ¶ - name: ec2-invalid-ami resource: ec2 comment: | Find all running EC2 instances that are using invalid AMIs and stop them filters: - "State.Name": running - type: value key: ImageId op: in value: - ami-12324567 # Invalid - ami-12324567 # Invalid - ami-12324567 # Invalid - ami-12324567 # Invalid - ami-12324567 # Invalid actions: - stop  / Example Policies / AutoScaling Group - Verify ASGs have valid configurations  Previous Next  AutoScaling Group - Verify ASGs have valid configurations ¶ The following example policy will check all AutoScaling Groups in the current account and region for configuration issues which could prevent the ASG from functioning properly or launching an instance. Then the ASG resource owner and a cloud admins group get an email showing the affected ASG(s). The following ASG items are checked when using the `` invalid `` filter: invalid subnets invalid security groups invalid key pair name invalid launch config volume snapshots invalid AMIs invalid ELB health check policies: - name: asg-invalid-configuration resource: asg filters: - invalid actions: - type: notify template: default.html priority_header: 1 subject: "ASG-Invalid Config-[custodian {{ account }} - {{ region }}]" violation_desc: | "New ASG instances may fail to launch or scale! The following Autoscaling Groups have invalid AMIs, SGs, KeyPairs, Launch Configs, or Health Checks" action_desc: | "Actions Taken: Notification Only. Please investigate and fix your ASGs configuration to prevent you from having any outages or issues" to: - [email protected] - resource-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1 Note that the notify action requires the cloud custodian mailer tool to be installed.  / Example Policies / AMI - ASG Garbage Collector  Previous Next  AMI - ASG Garbage Collector ASG garbage collector which mean that: Check if an ASG has minSize = 0 and DesiredCapacity = 0 Mark the ASG as ops to alert. If value won’t change cloudCustodian will send an alert with ASGs. - name: asg-mark-as-unused resource: asg comments: | Mark any unused ASG checking it every day. filters: - type: value key: MinSize value: 0 op: eq - type: value key: DesiredCapacity value: 0 op: eq actions: - type: mark-for-op op: notify days: 30 - name: asg-unmark-as-unused resource: asg comments: | Unmark any ASG that has a value greater than 0. filters: - type: value key: DesiredCapacity op: greater-than value: 0 - "tag:maid_status": not-null actions: - unmark - name: asg-slack-alert resource: asg comments: | Alert for ASG which have MinSize < 0 and DesiredCapacity < 0 filters: - "tag:maid_status": not-null - type: marked-for-op op: notify actions: - type: notify slack_template: slack violation_desc: Having ASG with both (DesiredCapacity and MinSize) = 0. action_desc: Please investigate if you can delete this ASG. to: https://hooks.slack.com/services/TXXXXX/XXXXXX/XXXxxXXX transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1  / Example Policies / ASG - Offhours Support  Previous Next  ASG - Offhours Support The following example policy will stop all ASGs with the custodian_downtime tag at 10pm daily and start them back up at 10am daily, leaving them off during weekends. policies: - name: offhour-stop-22 resource: asg comments: | Daily stoppage at 10pm filters: - type: offhour tag: custodian_downtime offhour: 22 actions: - suspend - name: onhour-start-10 resource: asg comments: | Daily start at 10am filters: - type: onhour tag: custodian_downtime onhour: 10 actions: - resume For detailed information on offhours/onhours support and configuration, see Example offhours policy.  / Example Policies / Block New Resources In Non-Standard Regions  Previous Next  Block New Resources In NonStandard Regions ¶ The following are examples of Cloud Custodian policies which detect the region a resource is being launched in and deletes the resource if it’s outside your standard approved regions. These examples block the full creation of the resources launched outside of the us-east-1 and eu-west-1 regions and then emails the event-owner (the person launching the resource) and the Cloud Team. This set of policies covers several of the common AWS services but you may add your desired services if supported by Cloud Custodian. While a proactive approach through IAM or AWS Organizations policies is the ideal way to go, that isn’t always possible or manageable for all users. These policies take a reactive approach and may be a fitting use case for some users. For the notify action to work you will need to have installed and configured the Cloud Custodian c7nmailer tool. policies: - name: ec2-terminate-non-standard-region resource: ec2 description: | Any EC2 instance launched in a non standard region outside of us-east-1 and eu-west-1 will be terminated mode: type: cloudtrail events: - RunInstances filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: terminate force: true - type: notify template: default.html priority_header: 1 subject: "EC2 SERVER TERMINATED - NonStandard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new EC2 server has been terminated. Please relaunch the server in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: asg-terminate-non-standard-region resource: asg mode: type: cloudtrail events: - source: autoscaling.amazonaws.com event: CreateAutoScalingGroup ids: requestParameters.autoScalingGroupName description: | Detect when a new AutoScaling Group is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete force: true - type: notify template: default.html priority_header: 1 subject: "ASG TERMINATED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new ASG has been terminated. Please relaunch the ASG in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: app-elb-terminate-non-standard-region resource: app-elb mode: type: cloudtrail events: - source: "elasticloadbalancing.amazonaws.com" event: CreateLoadBalancer ids: "requestParameters.name" description: | Detect when a new Application Load Balancer Group is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete - type: notify template: default.html priority_header: 1 subject: "App ELB TERMINATED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new App ELB has been deleted. Please relaunch the App ELB in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: elb-terminate-non-standard-region resource: elb mode: type: cloudtrail events: - CreateLoadBalancer description: | Detect when a new Load Balancer is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete - type: notify template: default.html priority_header: 1 subject: "ELB TERMINATED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new ELB has been deleted. Please relaunch the ELB in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: es-terminate-non-standard-region resource: elasticsearch mode: type: cloudtrail events: - CreateElasticsearchDomain description: | Detect when a new Elasticsearch Domain is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - delete - type: notify template: default.html priority_header: 1 subject: "ES DOMAIN TERMINATED - NonStandard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new Elasticsearch Domain has been deleted. Please relaunch the Domain in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: lambda-terminate-non-standard-region resource: lambda mode: type: cloudtrail events: - source: lambda.amazonaws.com event: CreateFunction20150331 ids: "requestParameters.functionName" description: | Detect when a new Lambda Function is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 - not: - or: - type: value key: FunctionName op: regex value: ^(custodian?)\w+ actions: - delete - type: notify template: default.html priority_header: 1 subject: "LAMBDA DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new Lambda Function has been deleted. Please relaunch in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: rds-terminate-non-standard-region resource: rds mode: type: cloudtrail events: - source: rds.amazonaws.com event: CreateDBInstance ids: "requestParameters.dBInstanceIdentifier" description: | Detect when a new RDS is created in a nonstandard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete skip-snapshot: true - type: notify template: default.html priority_header: 1 subject: "RDS DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new RDS Database has been deleted. Please relaunch in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: rdscluster-terminate-non-standard-region resource: rds-cluster mode: type: cloudtrail events: - CreateCluster description: | Detect when a new RDS Cluster is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete skip-snapshot: true delete-instances: true - type: notify template: default.html priority_header: 1 subject: "RDS CLUSTER DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new RDS Database Cluster has been deleted. Please relaunch in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: sg-terminate-non-standard-region resource: security-group mode: type: cloudtrail events: - source: ec2.amazonaws.com event: CreateSecurityGroup ids: "responseElements.groupId" description: | Detect when a new Security Group is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - delete - type: notify template: default.html priority_header: 1 subject: "SG DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new Security Group has been deleted. Please recreate in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: ami-terminate-non-standard-region resource: ami mode: type: cloudtrail events: - source: "ec2.amazonaws.com" event: "CreateImage" ids: "responseElements.imageId" description: | Detect when a new Amazon Machine Image is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - deregister - remove-launch-permissions - type: notify template: default.html priority_header: 1 subject: "AMI DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new Amazon Machine Image has been deleted. Please recreate in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: s3-terminate-non-standard-region resource: s3 mode: type: cloudtrail events: - CreateBucket role: arn:aws:iam:: {account_id}:role/Cloud_Custodian_Role timeout: 200 description: | Detect when a new S3 Bucket is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete remove-contents: true - type: notify template: default.html priority_header: 1 subject: "S3 DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new S3 Bucket has been deleted. Please recreate in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: dynamo-terminate-non-standard-region resource: dynamodb-table mode: type: cloudtrail events: - CreateTable description: | Detect when a new DynamoDB Table is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - delete - type: notify template: default.html priority_header: 1 subject: "DYNAMODB DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new DynamoDB Table has been deleted. Please recreate in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: kinesis-terminate-non-standard-region resource: kinesis mode: type: cloudtrail events: - source: "kinesis.amazonaws.com" event: "CreateStream" ids: "requestParameters.streamName" description: | Detect when a new Kinesis Stream is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete - type: notify template: default.html priority_header: 1 subject: "KINESIS DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new Kinesis Stream has been deleted. Please recreate in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1 - name: firehose-terminate-non-standard-region resource: firehose mode: type: cloudtrail events: - source: "firehose.amazonaws.com" event: "CreateDeliveryStream" ids: "requestParameters.deliveryStreamName" description: | Detect when a new Firehose is created in a non-standard region and delete it and notify the customer filters: - type: event key: "detail.awsRegion" op: not-in value: - us-east-1 - eu-west-1 actions: - type: delete - type: notify template: default.html priority_header: 1 subject: "FIREHOSE DELETED - Non-Standard Region [custodian {{ account }} - {{ region }}]" violation_desc: "Launching resources outside of the standard regions is prohibited" action_desc: "Actions Taken: Your new Firehose has been deleted. Please recreate in your accounts standard region which is either eu-west-1 or us-east-1." to: - [email protected] - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/XXXXXXXXXXX/cloud-custodian-mailer region: us-east-1  / Example Policies / DMS - DB Migration Service Endpoint - Enforce SSL  Previous Next  DMS - DB Migration Service Endpoint - Enforce SSL The following example policies will allow you to enforce SSL connectivity on any new or modified DMS Endpoints. The supported SSL methods vary based on the database engine. See https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ Security.SSL.html  for more info. There are 2 policies to handle the different types of SSL. With sqlserver, mongodb, and postgres you can turn on the SSL mode to require without having to pass in a certificate. Most other database engines would require you to pass in the ARN of the CA certificate to use which is why automating those in a c7n policy is difficult and this example policy will just delete them instead. DMS certificate ARNS are unique per account and region which is why multi-account policy runs wouldn’t work. Both policies trigger off the creation or modification of any DMS endpoints so if a user tries to disable the SSL it would re-enable the SSL or delete the users endpoint and then email them depending on SSL modes supported. For the notify action in the second policy to work you must have setup the c7n_mailer tool: https://github.com/cloudcustodian/cloud-custodian/tree/master/tools/c7n_mailer  policies: - name: dms-endpoint-enable-ssl-require-realtime resource: dms-endpoint description: | If the SSL Mode is none for a DMS Endpoint with engine of sql, mongo, or postgres it gets turned on to Require SSL setting mode: type: cloudtrail events: - source: dms.amazonaws.com event: CreateEndpoint ids: "responseElements.endpoint.endpointArn" - source: dms.amazonaws.com event: ModifyEndpoint ids: "responseElements.endpoint.endpointArn" filters: - or: - SslMode: none - type: event key: "detail.requestParameters.sslMode" op: eq value: "none" - or: - EngineName: sqlserver - EngineName: mongodb - EngineName: postgres actions: - type: modify-endpoint SslMode: require - name: dms-delete-endpoint-missing-ssl-ca-certrealtime resource: dms-endpoint description: | If the SSL Mode is none for a DMS Endpoint with engine that is not one of sql, mongo, or postgres the endpoint is deleted and an email is sent stating that CA Certificates need to be used as a requirement mode: type: cloudtrail events: - source: dms.amazonaws.com event: CreateEndpoint ids: "responseElements.endpoint.endpointArn" - source: dms.amazonaws.com event: ModifyEndpoint ids: "responseElements.endpoint.endpointArn" filters: - or: - SslMode: none - type: event key: "detail.requestParameters.sslMode" op: eq value: "none" - or: - EngineName: aurora - EngineName: mariadb - EngineName: mysql - EngineName: sybase - EngineName: oracle actions: - delete - type: notify template: default.html priority_header: 1 subject: DMS Endpoint Deleted As It's NonCompliant! - [custodian {{ account }} - {{ region }}] violation_desc: | Per regulations all DMS Endpoints have to use SSL connections and your endpoint was setup as 'none' for SSL mode! action_desc: | Actions Taken: You are required to enable SSL on your endpoint for a secure transmission of data. This incident has been reported and the invalid endpoint has been deleted. Please launch a new endpoint using SSL to: - [email protected] - resource-owner - event-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/123456789012/cloud-custodian-mailer region: us-east-1  / Example Policies / EBS - Garbage Collect Unattached Volumes  Previous Next  EBS - Garbage Collect Unattached Volumes ¶ Use the mark-for-op action to mark a resource for action later. One common pattern to follow is to mark a resource with an operation (example: delete) in n days. In the subsequent days leading up to the marked date, run a unmark or untag policy if the resource has become compliant in the mean time. You can use this principle to implement garbage collection on resources. In this example, Custodian will first mark all unattached EBS volumes for deletion. The next policy will then unmark any volume that has been attached and has the maid_status tag, indicating that it had been previously marked. Finally, the third policy will filter in any resources that have been marked and run the delete action. It is important to note that the delete policy will need to be run on the day that the resource is marked for, else the resource will still exist in the account. The mark operation only tags the resource with metadata about the upcoming operation. Operationally, the policy still must be executed on the day that is specified in the tag. Note: all resources that are marked-for-op up to and including the current date will be filtered in when utilizing the marked-for-op filter. - name: ebs-mark-unattached-deletion resource: ebs comments: | Mark any unattached EBS volumes for deletion in 30 days. Volumes set to not delete on instance termination do have valid use cases as data drives, but 99% of the time they appear to be just garbage creation. filters: - Attachments: [] - "tag:maid_status": absent actions: - type: mark-for-op op: delete days: 30 - name: ebs-unmark-attached-deletion resource: ebs comments: | Unmark any attached EBS volumes that were scheduled for deletion if they are currently attached filters: - type: value key: "Attachments.Device" value: not-null - "tag:maid_status": not-null actions: - unmark - name: ebs-delete-marked resource: ebs comments: | Delete any attached EBS volumes that were scheduled for deletion filters: - type: marked-for-op op: delete actions: - delete  / Example Policies / EBS - Create and Manage Snapshots  Previous Next  EBS - Create and Manage Snapshots ¶ The following example policy will snapshot all EBS volumes attached to EC2 instances and copy the instances tags to the snapshot. Then when the snapshots are 7 days old they will get deleted so you always have a rolling 7 days worth of snapshots. policies: - name: ec2-create-ebs-snapshots resource: ec2 actions: - type: snapshot copy-tags: - CreatorName - "Resource Contact" - "Resource Purpose" - Environment - "Billing Cost Center" - Name tags: CloudCustodian: true - name: ebs-delete-old-ebs-snapshots resource: ebs-snapshot filters: - type: age days: 7 op: ge - "tag:custodian_snapshot": present actions: - delete  / Example Policies / EBS - Delete Unencrypted  Previous Next  EBS - Delete Unencrypted ¶ policies: - name: terminate-unencrypted-ebs description: | Terminate all unencrypted EBS volumes upon creation resource: ebs mode: type: cloudtrail events: - CreateVolume filters: - Encrypted: false actions: - delete  / Example Policies / EC2 - auto-tag aws userName on resources  Previous Next  EC2 - auto-tag aws userName on resources ¶ Note that this can work for other resources besides EC2, and the principalId is optional. principalId tag is useful if you want to enforce users not being able to shut down each others VMs unless their principalId matches (meaning they originally spun up the resource). Documentation about principalId here: https://aws.amazon.com/blogs/security/how-toautomatically-tag-amazon-ec2-resources-in-responseto-api-events/  policies: - name: ec2-auto-tag-user resource: ec2 mode: type: cloudtrail role: arn:aws:iam::{account_id}:role/custodianauto-tagger # note {account_id} is optional. If you put that there instead of # your actual account number, when the policy is provisioned it # will automatically inherit the account_id properly events: - RunInstances filters: - tag:CreatorName: absent actions: - type: auto-tag-user tag: CreatorName principal_id_tag: CreatorId  / Example Policies / EC2 - Modify Instance Metadata Options  Previous Next  EC2 - Modify Instance Metadata Options ¶ The following examples allow you to enforce Instance metadata options over EC2 instances. to learn more about Instance Metadata option please visit: https://docs.aws.amazon.com/AWSEC2/latest/APIReferenc e/API_ModifyInstanceMetadataOptions.html  To filter the list of instances you can choose any combination of Ec2 mwtadate-instances elements. As of now below options are available: HttpEndpoint Valid Values: disabled | enabled Action value: HttpEndpoint HttpPutResponseHopLimit Possible values: Integers from 1 to 64 Action value: HttpPutResponseHopLimit HttpTokens Valid Values: optional | required Action value: tokens InstanceMetadataTags Valid Values: disabled | enabled Action value: metadata-tags Examples: policies: - name: ec2-require-imdsv2 resource: ec2 description: | Finds all instances with optional HttpTokens and change the policy to Requied. filters: - MetadataOptions.HttpTokens: optional actions: - type: set-metadata-access tokens: required policies: - name: ec2-disable-imds resource: ec2 description: | Finds all instacnes with Enabled httpsendpoint and change it to disabled. By default this option must be enabled therefore, please make sure before disabling this option. filters: - MetadataOptions.HttpEndpoint: enabled actions: - type: set-metadata-access endpoint: disabled policies: - name: ec2-disable-imds resource: ec2 description: | Finds all the instances with disables Instance Meta Data Tags and enable them. filters: - MetadataOptions.InstanceMetadataTags: disabled actions: - type: set-metadata-access metadata-tags: enabled Intance MetaDate Tags Reference: https://amzn.to/2XOuxpQ  Custodian Filters reference: https://cloudcustodian.github.io/cloud-custodian/docs/filters.html   / Example Policies / EC2 - Offhours Support  Previous Next  EC2 - Offhours Support ¶ Offhours are based on current time of the machine that is running custodian. Note, in this case you could tag an instance with the following two tags: StopAfterHours: off= (M-F,18);tz=est; and StartAfterHours: on=(M-F,8). This would have the instance turn off every weekday at 6pm NY time, and turn on every day at 8am California time (since if no tz is set, it uses the default which is pt). Note when custodian runs, if it’s 6:00pm or 6:59 pm NY time, it will shut down the VM you tagged this way. The key is the hour integer on the NY clock matching 18. If custodian runs at 5:59pm or 7:00pm NY time, it won’t shut down the VM. Same idea for starting. The reason we filter for only seeing instances older than 1 hour, if a dev is on a VM that is shut down by the off hours schedule, and they turn it back on, if we run custodian again we don’t want to keep shutting down the VM on the dev repeatedly. policies: - name: stop-after-hours resource: ec2 filters: - type: offhour tag: CustodianOffHours default_tz: pt offhour: 19 - type: instance-age hours: 1 actions: - stop - name: start-after-hours resource: ec2 filters: - type: onhour tag: CustodianOffHours default_tz: pt onhour: 7 - type: value value: 1 key: LaunchTime op: less-than value_type: age actions: - start For detailed information on offhours/onhours support and configuration, see Example offhours policy.  / Example Policies / EC2 - Old Instance Report  Previous Next  EC2 - Old Instance Report ¶ - name: ec2-old-instances resource: ec2 comment: | Report running instances older than 60 days filters: - "State.Name": running - type: instance-age days: 60 # Use Case: Report all AMIs that are 120+ days or older - name: ancient-images-report resource: ami comment: | Report on all images older than 90 days which should be de-registered. filters: - type: image-age days: 120 Instance Age Filter The instance age filter allows for filtering the set of EC2 instances by their LaunchTime, i.e. all instances older than 60 or 90 days. The default date value is 60 days if otherwise unspecified. Configuring a specific value for instance-age to report all instances older than 90 days. policies: - name: old-instances resource: ec2 filters: - type: instance-age days: 90  / Example Policies / EC2 - Power On For Scheduled Patching  Previous Next  EC2 - Power On For Scheduled Patching The following example policies will automatically create CloudWatch cron rate triggered Lambda functions in your account and region. The Lambda functions will be triggered on the cron rate expression schedule you provide in the mode section of the policy. The following example policies find all EC2 instances that are both in a stopped state, and have a tag called Patch Group with a value of Linux Dev. Those instances are then started and tagged with an additional tag of PowerOffWhenDone and a value of True so that they can be stopped again after the patching window. Then all instances with the Linux Dev Patch Group get another tag called True PatchingInProgress with a value of. The PatchingInProgress tag can be used by other policies such as offhours policies where the presence of that tag would exclude it from being stopped by the offhours. When the patching window is done the last 2 policies in this example will remove the PatchingInProgress tag from all instances in that group and remove the PowerOffWhenDone tag and stop those instances that were previously stopped. The cron expressions for this example read as the following: cron(0 3 ? 1/1 SUN#1 *) means trigger on the 1st Sunday of every month at 3:00 UTC then cron(0 13 ? 1/1 SUN#1 *) is the same day at 13:00 UTC which allows for a 10 Hour patching window. Learn more on AWS cron rate expressions https://docs.aws.amazon.com/AmazonCloudWatch/latest/e vents/ScheduledEvents.html  policies: - name: power-on-patch-group-linux-dev resource: ec2 mode: type: periodic schedule: "cron(0 3 ? 1/1 SUN#1 *)" filters: - "State.Name": stopped - type: value key: tag:Patch Group op: eq value: "Linux Dev" actions: - start - type: tag key: PowerOffWhenDone value: "True" - name: patching-exception-tag-linux-dev resource: ec2 mode: type: periodic schedule: "cron(0 3 ? 1/1 SUN#1 *)" filters: - type: value key: tag:Patch Group op: eq value: "Linux Dev" actions: - type: tag key: PatchingInProgress value: "True" - name: patching-exception-removal-linux-dev resource: ec2 mode: type: periodic schedule: "cron(0 13 ? 1/1 SUN#1 *)" filters: - type: value key: tag:Patch Group op: eq value: "Linux Dev" actions: - type: unmark tags: ["PatchingInProgress"] - name: power-down-patch-group-linux-dev resource: ec2 mode: type: periodic schedule: "cron(0 13 ? 1/1 SUN#1 *)" filters: - "State.Name": running - "tag:PowerOffWhenDone": present - type: value key: tag:Patch Group op: eq value: "Linux Dev" actions: - stop - type: unmark tags: ["PowerOffWhenDone"] Note that the notify action requires the cloud custodian mailer tool to be installed.  / Example Policies / EC2 - Terminate Unpatchable Instances  Previous Next  EC2 - Terminate Unpatchable Instances ¶ The following example policy workflow uses the mark-forop and marked-for-op filters and actions to chain together a set of policies to accomplish a task. In this example it will find and tag any instances that are in a stopped state. The example specifies a custom tag called c7n_stopped_instance and the value of the tag will be an op action of terminate for 60 days in the future. The reasoning behind terminating unpatchable instances is after 60 days the instance will be far enough behind on patching and virus defs(if used) that starting the instance after 60 days would present too large of a security risk. Note the use of the skew option with the marked-for-op filter in some of the policies to notify the resource owners X number of days ahead of the scheduled marked-for-op action date. policies: - name: ec2-mark-stopped-instance resource: ec2 description: | Mark any stopped ec2 instance for deletion in 60 days If an instance has not been started for 60 days or over then they will be deleted similar to internal policies as it wont be patched. filters: - "tag:c7n_stopped_instance": absent - "State.Name": stopped actions: - type: mark-for-op tag: c7n_stopped_instance op: terminate days: 60 - name: ec2-unmark-previously-stopped resource: ec2 description: | Unmark/untag any ec2 instance that was scheduled for deletion due to being stopped if they are currently running. filters: - "State.Name": running - "tag:c7n_stopped_instance": present actions: - type: unmark tags: ["c7n_stopped_instance"] - name: ec2-notify-before-delete-marked-14-days resource: ec2 description: | Notify on any ec2 instances that will be deleted in 14 days if not started comments: | Your EC2 server will be terminated in 14 days if not started and patched by then. Please start your stopped servers and leave them on for 24 hours minimum to allow for patching to occur. filters: - type: marked-for-op tag: c7n_stopped_instance op: terminate skew: 14 actions: - type: notify template: default.html priority_header: 2 subject: "EC2 Stopped Instance Termination Scheduled! [custodian {{ account }} - {{ region }}]" violation_desc: "EC2(s) have been in a stopped state for 45 days and at 60 days will be termianted:" action_desc: | Your EC2 server will be terminated in 14 days if not started and patched by then. Please start your stopped servers and leave them on for 24 hours minimum to allow for patching to occur. to: - [email protected] - resource-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1 - name: ec2-notify-before-delete-marked-7-days resource: ec2 description: | Notify on any ec2 instances that will be deleted in 7 days if not started filters: - type: marked-for-op tag: c7n_stopped_instance op: terminate skew: 7 actions: - type: notify template: default.html priority_header: 1 subject: "EC2 Stopped Instance Termination Scheduled! [custodian {{ account }} - {{ region }}]" violation_desc: "EC2(s) have been in a stopped state for 53 days and at 60 days will be termianted:" action_desc: | Your EC2 server will be terminated in 7 days if not started and patched by then. Please start your stopped servers and leave them on for 24 hours minimum to allow for patching to occur. to: - [email protected] - resource-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1 - name: ec2-delete-marked resource: ec2 description: | Terminate and notify on any ec2 instances that were scheduled for deletion if its been stopped for 60 days and no longer up-to-date on patching. filters: - type: marked-for-op tag: c7n_stopped_instance op: terminate actions: - type: terminate force: true - type: notify template: default.html priority_header: 1 subject: "EC2 Stopped Instance Terminated [custodian {{ account }} - {{ region }}]" violation_desc: "EC2(s) had been stopped for 60 days and have now been terminated:" action_desc: | Your EC2 server has been terminated as its patching is too far out-of-date and beyond the 60 day window. to: - [email protected] - resource-owner transport: type: sqs queue: https://sqs.us-east1.amazonaws.com/12345678900/cloud-custodian-mailer region: us-east-1  / Example Policies / EIP - Garbage Collect Unattached Elastic IPs  Previous Next  EIP - Garbage Collect Unattached Elastic IPs ¶ Use the mark-for-op action to mark a resource for action later. One common pattern to follow is to mark a resource with an operation (example: release) in n days. In the subsequent days leading up to the marked date, run a unmark or untag policy if the resource has become compliant in the mean time. You can use this principle to implement garbage collection on resources. In this example, Custodian will first mark all unattached Elastic IPs for removal. The next policy will then unmark any EIP that has been attached and has the maid_status tag, indicating that it had been previously marked. Finally, the third policy will filter in any resources that have been marked and run the release action. It is important to note that the release policy will need to be run on the day that the resource is marked for, else the resource will still exist in the account. The mark operation only tags the resource with metadata about the upcoming operation. Operationally, the policy still must be executed on the day that is specified in the tag. Note: all resources that are marked-for-op up to and including the current date will be filtered in when utilizing the marked-for-op filter. vars: notify: &notify type: notify to: - slack://#slack-channel subject: "EIP - No Instances Attached [custodian {{ account }} - {{ region }}]" transport: type: sqs queue: https://sqs.us-east2.amazonaws.com/123456789012/mailer region: us-east-2 run_mode: &run_mode type: periodic schedule: "rate(1 day)" tags: app: "c7n" env: "tools" account: "{account_id}" eip_filters: &eip_filters - InstanceId: absent - AssociationId: absent policies: - name: unused-eip-mark resource: elastic-ip description: "Mark any EIP with no instances attached for action in 7 days" filters: - "tag:maid_status_eip": absent - and: *eip_filters mode:

Document Details

Tags

Related

Full Transcript

Upgrade to continue