Underrated: Custom Azure CLI extensions

Underrated: Custom Azure CLI extensions

Adham 12 min read

Most people who use Azure CLI never think, “I should build my own extension.” They install whatever already exists, run a few commands, and move on.

That is totally normal, but it also leaves a lot of productivity on the table. If your team repeats the same command chains every week, an extension can turn that workflow into one clean command. You can package it as a Python wheel and install it locally in minutes.

This post shows why that is worth doing, and how to do it with two real examples from infrastructure automation.

Why Write a Custom Extension?

The Azure CLI is already enormous. So what could possibly be missing?

Think about the glue between tools. You use AKS, so you need kubectl, but first you need az aks get-credentials, and maybe kubelogin after that. Three commands before you can do anything. A small extension can handle that handshake so your team only runs one command.

Or think about compliance checks. Azure Policy is great, but it is a whole system to set up and manage. Sometimes you just want a quick script that reads resource state, checks it against a your own specs, and gives you a report. An extension gives that script a proper command interface.

Or team conventions. Your team probably chains the same five Azure CLI commands every release. Putting that into an extension means everyone runs the same thing, the same way.

None of these need the official extension index. A local wheel file is enough for internal tooling.

How Azure CLI Extensions Work

Azure CLI is built on a Python library called knack. An extension is a Python wheel that registers command groups with knack at load time. When you run a command like az mygroup mycommand, knack finds the registered handler and calls it.

The entry point is a folder called azext_yourname/ with an init.py file that exposes a load_command_table function and a load_arguments function. That is all you need.

You install the extension from a local wheel file or a URL:

az extension add --source ./dist/myextension-0.1.0-py3-none-any.whl

Or for development, you can install directly from the source folder without building:

pip install -e ./myextension

Let’s build two examples.

Prerequisites

You need Python 3.10 or later, the Azure CLI, and the azdev tool:

pip install azdev

azdev handles scaffolding, style checks, and wheel builds. It is the official development tool for extensions. You don’t have to use it, but it saves time.


Example 1: az azk kube, wrapping kubectl

Your team uses AKS. Before running any kubectl command, someone needs to run az aks get-credentials, maybe kubelogin convert-kubeconfig after that, and on private clusters they also need to pick the right subscription first. Every new team member gets this wrong at least once.

So we write an extension that handles the login side and passes everything else to kubectl.

Project layout

azk/
  azext_azk/
    __init__.py
    commands.py
    _help.py
  setup.py
  setup.cfg
  HISTORY.rst

setup.py

from setuptools import setup, find_packages

setup(
    name="azk",
    version="0.1.0",
    description="AKS helper commands for Azure CLI",
    author="Your Name",
    author_email="you@example.com",
    license="MIT",
    packages=find_packages(),
    install_requires=[],
    python_requires=">=3.10",
)

Note that install_requires is empty because we only import from azure-cli-core, which is already on the path when the extension runs.

setup.cfg

[bdist_wheel]
universal = 0

[metadata]
description-file = README.md

azext_azk/__init__.py

This is the entry point. Knack calls load_command_table to register commands and load_arguments to attach parameters.

from azure.cli.core import AzCommandsLoader
from azext_azk.commands import load_command_table
from azext_azk._help import helps  # noqa: F401


class AzkCommandsLoader(AzCommandsLoader):

    def __init__(self, cli_ctx=None):
        super().__init__(cli_ctx=cli_ctx)

    def load_command_table(self, args):
        load_command_table(self, args)
        return self.command_table

    def load_arguments(self, command):
        pass


COMMAND_LOADER_CLS = AzkCommandsLoader

The COMMAND_LOADER_CLS name is the contract. Azure CLI discovers it when loading the extension.

azext_azk/commands.py

import subprocess
import sys
from azure.cli.core.commands import CliCommandType


def load_command_table(loader, args):
    with loader.command_group("azk") as g:
        g.custom_command("kube", "kube_command")


def kube_command(cmd, resource_group, cluster_name, kubectl_args=None):
    """Log in to an AKS cluster and pass remaining args to kubectl."""
    from azure.cli.core._session import Session

    # Step 1: get credentials
    get_creds = [
        "az", "aks", "get-credentials",
        "--resource-group", resource_group,
        "--name", cluster_name,
        "--overwrite-existing",
    ]
    result = subprocess.run(get_creds, check=True)

    if result.returncode != 0:
        raise RuntimeError("az aks get-credentials failed")

    # Step 2: convert kubeconfig for non-interactive login (kubelogin)
    convert = ["kubelogin", "convert-kubeconfig", "-l", "azurecli"]
    kubelogin_result = subprocess.run(convert, capture_output=True)
    if kubelogin_result.returncode != 0:
        # kubelogin is optional - skip if not installed
        pass

    # Step 3: if the caller passed extra args, run them via kubectl
    if kubectl_args:
        kube_cmd = ["kubectl"] + list(kubectl_args)
        subprocess.run(kube_cmd, check=False)

There is a problem with this though. We want to pass whatever kubectl arguments the user types after the login flags. But knack’s argument parser will try to interpret everything. It does not have a passthrough mode.

The fix is to read sys.argv directly for everything after a -- separator. This is a common UNIX convention:

def kube_command(cmd, resource_group, cluster_name):
    """Log in to an AKS cluster, then pass anything after -- to kubectl."""
    import subprocess
    import sys

    # Everything after '--' belongs to kubectl
    argv = sys.argv
    separator = "--"
    kubectl_args = []
    if separator in argv:
        kubectl_args = argv[argv.index(separator) + 1:]

    # Step 1: get credentials
    subprocess.run(
        [
            "az", "aks", "get-credentials",
            "--resource-group", resource_group,
            "--name", cluster_name,
            "--overwrite-existing",
        ],
        check=True,
    )

    # Step 2: kubelogin (optional - silently skip if not installed)
    subprocess.run(
        ["kubelogin", "convert-kubeconfig", "-l", "azurecli"],
        capture_output=True,
    )

    # Step 3: run the kubectl command if provided
    if kubectl_args:
        subprocess.run(["kubectl"] + kubectl_args, check=False)

azext_azk/_help.py

from knack.help_files import helps

helps["azk"] = """
    type: group
    short-summary: AKS helper commands.
"""

helps["azk kube"] = """
    type: command
    short-summary: Log in to an AKS cluster and optionally run a kubectl command.
    examples:
      - name: Log in only
        text: az azk kube -g my-rg -n my-cluster
      - name: Log in and get pods
        text: az azk kube -g my-rg -n my-cluster -- get pods -n default
"""

Register the arguments

Add a load_arguments override in init.py:

    def load_arguments(self, command):
        from azure.cli.core.commands.parameters import get_resource_name_completion_list
        if command == "azk kube":
            with self.argument_context("azk kube") as c:
                c.argument(
                    "resource_group",
                    options_list=["--resource-group", "-g"],
                    help="Resource group containing the AKS cluster.",
                )
                c.argument(
                    "cluster_name",
                    options_list=["--name", "-n"],
                    help="Name of the AKS cluster.",
                )

Build and install

cd azk
pip install wheel
python setup.py bdist_wheel
az extension add --source dist/azk-0.1.0-py3-none-any.whl --yes

Then use it:

# Just log in
az azk kube -g my-rg -n my-cluster

# Log in and run kubectl
az azk kube -g my-rg -n my-cluster -- get pods -n kube-system

Both the login and kubelogin steps run for you. Everything after -- goes straight to kubectl.


Example 2: az checker run, YAML-driven resource compliance reports

This one comes up a lot on operations teams. You have a bunch of resources and you want to check if they meet certain conditions, maybe on a schedule, maybe before a deployment. Not full Azure Policy, not a separate script for each environment. Just a short YAML file you commit next to your IaC, and a command that reads it and produces an HTML report.

What the spec file looks like

# checks.yaml
report: html
checks:
  - name: "No AKS clusters in a failed provisioning state"
    description: "Flags any AKS cluster where provisioning state is Failed or Canceled."
    scope:
      resourceType: "Microsoft.ContainerService/managedClusters"
      resourceGroup: "rg-platform"
    condition:
      - field: "properties.provisioningState"
        operator: "in"
        value: ["Failed", "Canceled"]

  - name: "Storage accounts must have HTTPS-only enabled"
    description: "All storage accounts in the subscription must enforce HTTPS."
    scope:
      resourceType: "Microsoft.Storage/storageAccounts"
    condition:
      - field: "properties.supportsHttpsTrafficOnly"
        operator: "equals"
        value: false

The field key uses dot notation to navigate the JSON structure of any Azure resource. The operator key supports equals, notEquals, and in.

Project layout

checker/
  azext_checker/
    __init__.py
    commands.py
    engine.py
    reporter.py
    _help.py
  setup.py
  setup.cfg

setup.py

from setuptools import setup, find_packages

setup(
    name="checker",
    version="0.1.0",
    description="YAML-driven Azure resource compliance reporter",
    author="Your Name",
    author_email="you@example.com",
    license="MIT",
    packages=find_packages(),
    install_requires=[
        "pyyaml>=6.0",
        "jinja2>=3.1",
    ],
    python_requires=">=3.10",
)

We pull in PyYAML for reading the spec and Jinja2 for rendering the HTML report. Both are small and stable. Jinja2 saves us from building HTML strings by hand.

azext_checker/__init__.py

from azure.cli.core import AzCommandsLoader
from azext_checker.commands import load_command_table


class CheckerCommandsLoader(AzCommandsLoader):

    def __init__(self, cli_ctx=None):
        super().__init__(cli_ctx=cli_ctx)

    def load_command_table(self, args):
        load_command_table(self, args)
        return self.command_table

    def load_arguments(self, command):
        if command == "checker run":
            with self.argument_context("checker run") as c:
                c.argument(
                    "spec_file",
                    options_list=["--spec", "-s"],
                    help="Path to the YAML checks file.",
                )
                c.argument(
                    "output_file",
                    options_list=["--output", "-o"],
                    default="report.html",
                    help="Path for the output report. Default: report.html",
                )
                c.argument(
                    "subscription",
                    options_list=["--subscription"],
                    help="Override the active subscription.",
                )


COMMAND_LOADER_CLS = CheckerCommandsLoader

azext_checker/commands.py

def load_command_table(loader, args):
    with loader.command_group("checker") as g:
        g.custom_command("run", "run_checks")


def run_checks(cmd, spec_file, output_file="report.html", subscription=None):
    """Load a YAML spec and evaluate each check against live Azure resources."""
    from azext_checker.engine import load_spec, evaluate_checks
    from azext_checker.reporter import write_html_report

    spec = load_spec(spec_file)
    results = evaluate_checks(cmd.cli_ctx, spec, subscription)
    write_html_report(results, output_file)
    print(f"Report written to {output_file}")

azext_checker/engine.py

This is where the real work happens. We use the command az resource list under the hood via the Azure CLI Python SDK that is already available inside azure-cli-core.

import yaml
from azure.cli.core.commands.client_factory import get_subscription_id


def load_spec(path):
    with open(path, "r") as f:
        return yaml.safe_load(f)


def _get_nested(obj, dotted_key):
    """Navigate a dict using dot notation. Returns None if key is missing."""
    keys = dotted_key.split(".")
    current = obj
    for key in keys:
        if not isinstance(current, dict):
            return None
        current = current.get(key)
    return current


def _matches(resource, condition):
    """Return True if the resource matches ALL conditions in the list (AND logic)."""
    for rule in condition:
        field = rule["field"]
        operator = rule["operator"]
        expected = rule["value"]
        actual = _get_nested(resource, field)

        if operator == "equals":
            if actual != expected:
                return False
        elif operator == "notEquals":
            if actual == expected:
                return False
        elif operator == "in":
            if actual not in expected:
                return False
        else:
            raise ValueError(f"Unknown operator: {operator}")
    return True


def evaluate_checks(cli_ctx, spec, subscription_override=None):
    from azure.cli.core._profile import Profile
    from azure.mgmt.resource import ResourceManagementClient
    from azure.cli.core.commands.client_factory import get_mgmt_service_client

    results = []

    for check in spec.get("checks", []):
        scope = check.get("scope", {})
        resource_type = scope.get("resourceType")
        resource_group = scope.get("resourceGroup")
        condition = check.get("condition", [])

        # Build the resource client
        client = get_mgmt_service_client(cli_ctx, ResourceManagementClient)

        # Query resources
        filter_parts = []
        if resource_type:
            filter_parts.append(f"resourceType eq '{resource_type}'")
        resource_filter = " and ".join(filter_parts) if filter_parts else None

        if resource_group:
            raw_list = client.resources.list_by_resource_group(
                resource_group,
                filter=resource_filter,
                expand="properties",
            )
        else:
            raw_list = client.resources.list(
                filter=resource_filter,
                expand="properties",
            )

        violations = []
        for res in raw_list:
            # Convert the resource object to a plain dict for navigation
            res_dict = res.as_dict()
            if _matches(res_dict, condition):
                violations.append({
                    "id": res.id,
                    "name": res.name,
                    "location": res.location,
                    "resourceGroup": res.id.split("/resourceGroups/")[1].split("/")[0]
                    if "/resourceGroups/" in (res.id or "")
                    else "",
                })

        results.append({
            "name": check.get("name", "Unnamed check"),
            "description": check.get("description", ""),
            "passed": len(violations) == 0,
            "violation_count": len(violations),
            "violations": violations,
        })

    return results

The get_mgmt_service_client helper comes from azure-cli-core. It handles authentication for us, using whatever the user configured, service principal, device flow, or managed identity. We don’t write any auth code ourselves.

One thing to watch out for is the expand=“properties” parameter. Without it, the properties object comes back empty and all the field checks will fail.

azext_checker/reporter.py

The reporter takes the results and turns them into an HTML page. Each check shows up as a green PASS or red FAIL card, and failed checks list every violating resource in a table.

The core function is short:

from jinja2 import Environment, BaseLoader


def write_html_report(results, output_path, template_string):
    env = Environment(loader=BaseLoader(), autoescape=True)
    template = env.from_string(template_string)
    html = template.render(results=results)
    with open(output_path, "w", encoding="utf-8") as f:
        f.write(html)

The full HTML template is about 60 lines. I did not include it in the post because it would just be a wall of HTML. but you get the idea.

Build and install

cd checker
pip install wheel
python setup.py bdist_wheel
az extension add --source dist/checker-0.1.0-py3-none-any.whl --yes

Run it

az checker run --spec checks.yaml --output report.html

Open report.html in a browser and you get a table of every failing resource for each check you defined.

To add a new check you only touch the YAML file. No code changes.


Sharing with Your Team

For internal tooling, you do not need to publish to the official extension index. Two practical options:

Option 1: Host the wheel file on an Azure Blob Storage container with public read access.

az checker run --spec checks.yaml
# anyone on the team installs it once:
az extension add --source https://myaccount.blob.core.windows.net/extensions/checker-0.1.0-py3-none-any.whl

Option 2: Commit the wheel file to a private Git repo and install from a path.

git clone https://github.com/myorg/internal-tools
az extension add --source ./internal-tools/dist/checker-0.1.0-py3-none-any.whl --yes

If your team provisions machines automatically, add the az extension add line to a bootstrap script and everyone gets the extensions without thinking about it.


What to Keep in Mind

Dependencies. If your extension pulls in a package that conflicts with something already in azure-cli-core, the install will fail or break in weird ways. Check the azure-cli-core setup.py before adding anything unusual to install_requires. PyYAML and Jinja2 are safe.

Error handling. The Azure CLI already turns Python exceptions into user-facing errors. You don’t need try/except everywhere. But for things like a missing spec file or a bad field name, raising CLIError from knack.util gives a clean message:

from knack.util import CLIError
import os

if not os.path.exists(spec_file):
    raise CLIError("Spec file not found: " + spec_file)

Version bumps. Bump the version in setup.py every time you rebuild. If the version stays the same, az extension add will not install over the existing one unless you uninstall first.


Wrapping Up

The extension model is just a Python wheel, a loader class, and your logic. The CLI gives you auth, argument parsing, and help text for free.

Both examples in this post solve real problems I ran into. If you find yourself chaining Azure CLI commands in a bash script and sharing it over Slack, that is probably an extension waiting to happen.

Further reading: