This article introduces Kpow for Apache Kafka®'s new Temporary Policies feature.
Introducing Temporary Policies
Introduced to the Kpow Kafka Management and Monitoring toolkit in v79 is the ability to Stage Mutations , create Temporary Role Based Access Control Policies (temporary policies), and a suite of new admin features giving greater control over Kpow to Admin Users.
This blog post introduces temporary policies through the lense of a common real-world scenario.
Temporary policies allow Admins the ability to assign access control policies for a fixed duration. A common use-case would be providing a user TOPIC_INSPECT access to read data from a topic for an hour while resolving an issue in a Production environment.
Temporary Policies Use Case
You wake up one morning to a dreaded sight: a poison message has taken down one of your services.
Your team decides the simplest solution is to skip the message by incrementing your consumer group's offset for the topic.
Now here's the problem. Access to production is limited, and for such a simple action (incrementing the offset), a team member generally must jump through the hoops of configuring the VPN, connecting to the jumpbox, and making sure they execute the right combination of bash commands against the Kafka cluster.
Often these operations are unnecessarily time-consuming, brittle, and frustrating in a time-critical moment when you need to restore production access. Furthermore, the jumpbox generally has full access to the Kafka cluster, and there is no audit log recording the actions being committed.
In combination with Kpow's existing Role-Based Access Controls and powerful mutation actions, Temporary Policies improve this experience by giving teams the tools they need to easily effect change in a secured environment, like production, when things go wrong.
Configuring Role-Based Access Control
In this example, two roles are coming from our Identity provider: devs
and owners
.
We will assign anyone with the role owners
admin access, and give them GROUP_EDIT
access to the production cluster.
The devs
role will be implicitly denied from undertaking any action against the cluster, but are authorized for read-only access to view the production cluster in Kpow.
Our example RBAC yaml file might look something like:
admin_roles: - "owners" authorized_roles: - "owners" - "devs" policies: - actions: - GROUP_EDIT effect: Allow resource: - "*" role: "owners"
This configuration prevents regular developers from making changes against the production cluster.
The Poison Pill
Today is the day when your team has to fix the consumer group on the production cluster.
Everyone has been briefed on the plan, and it has been decided that the team lead will temporarily grant the devs
role Allow
access for GROUP_EDIT
. This will enable one of the developers on the team to make the required change to the production cluster.
This has been done through the Temporary Policies section of Kpow's settings UI:
Once a temporary policy has been created, team members can be notified via Slack with the Kpow Slack integration.
Incrementing the offset
A team member has been tasked with the job of incrementing the offset of the consumer group for the problematic topic.
The developer looks to the application logs and notices that it is partition 3 of topic tx_trade1
that contains the poison message.
The erroring consumer group is named trade_b2
.
The developer then opens Kpow, navigates to the "Workflows" tab, and selects the consumer group.
From within the consumer group view, the dev clicks on the partition and selects "Skip Offset".
This action will schedule the mutation, and once someone on the team scales down the trade_b2
service, the offset will be incremented.
Post-Mortem
Kpow also provides valuable information and insights for teams to use after a production incident when you are completing your incident post-mortem.
Kpow has an Audit Log for Data Governance, and all the actions undertaken to resolve any production incident are persisted in Kpow's audit log topic. Meaning you can use the Audit Log to see the recorded history of all actions taken to restore the production service.
Inspecting the audit log message reveals the offset that was skipped.
You can use Kpow's data inspect functionality to view the poison message to help investigate why that message took down the consumer group.
You can find further information on setting up, viewing and managing temporary policies here.
Further reading/references
Explore our documentation to learn more about the Kpow's features mentioned in this article:
You might also be interested in the following articles:
Manage, Monitor and Learn Apache Kafka with Kpow by Factor House.
We know how easy Apache Kafka® can be with the right tools. We built Kpow to make the developer experience with Kafka simple and enjoyable, and to save businesses time and money while growing their Kafka expertise. A single Docker container or JAR file that installs in minutes, Kpow's unique Kafka UI gives you instant visibility of your clusters and immediate access to your data.
Kpow is compatible with Apache Kafka+1.0, Red Hat AMQ Streams, Amazon MSK, Instaclustr, Aiven, Vectorized, Azure Event Hubs, Confluent Platform, and Confluent Cloud.
Start with a free 30-day trial and solve your Kafka issues within minutes.