This article dives into the various ways you can delete records in Kafka
Overview
Have you ever wondered how to effectively delete records in a Kafka topic? Well, there are actually several ways to do it, each with their own implications and granularity.
In this article, we'll explore these different approaches in detail, from the complete deletion of a topic to the more granular erasure of individual records. Understanding these methods is essential for anyone working with Kafka, as it can have significant implications for data retention, storage, and processing. By the end of this article, you'll have a better understanding of the different methods available for record deletion in Kafka, and how to choose the best approach for your specific use case.
About Kpow
This article uses Kpow for Apache Kafka as a companion to demonstrate how you can delete records in Kafka.
Kpow is a powerful tool that makes it easy to manage and monitor Kafka clusters, and its intuitive user interface simplifies the process of deleting records.
Explore with the live multi-cluster demo environment or grab the free community edition of Kpow and get to work deleting records in your own Kafka cluster. If you need a Kafka cluster to play with, check out our local Docker Compose environment to spin up Kpow along side a 3-node Kafka cluster on your machine.
1. Deleting Topics
The most blunt and impactful way of deleting records on a Kafka cluster is by deleting the topic that contains the records.
While this can be an effective way to remove all data associated with a topic, it's important to note that this action is permanent and irreversible. Once a topic is deleted, all data will no longer be available, and any running applications that depend on this topic will likely throw exceptions. If topic auto-create is enabled on the broker, the topic could even get created again with the default topic configuration, potentially causing data loss or other issues.
Despite these risks, there may be cases where deleting a Kafka topic is necessary, such as when the topic is no longer needed or contains sensitive data that must be removed. Much like dropping a table in a traditional relational database, it's important to proceed with caution and have a clear understanding of the potential impacts before deleting a topic.
Deleting topics is simple in Kpow!
Navigate to Topic -> Details in the UI and select the topic you wish to delete.
The result of deleting topics in Kpow (like all other actions) gets persisted to Kpow's audit log for data governance. Kpow also provides a Slack webhook integration to notify a channel when the deletion of a topic has been performed.
2. Truncating Records
Truncating records is another method for deleting data from a Kafka topic, specifically a range of records from a topic partition. Truncation removes all records before a specified offset for a given topic partition. This can be useful when you want to remove a specific range of records without deleting the entire topic.
How topic partitions work in Kafka
In Kafka, all topic partitions have a start and end offset.
The start offset is the offset of the very first record on the topic partition. A fresh topic partition will have a start offset of 0
. However, because of topic retention, cleanup policies, or even truncation, the start offset could be any value over time.
And similarly, the end offset is always the last record on a topic partition. The end offset is forever growing as producers write more records to a topic.
One thing to note: producing a single record may not result in a simple increment of the end offset. For example, transactional producers write additional metadata records when committing.
Viewing the start and end offsets inside Kpow is easy! Simply navigate to the topic partitions table in Topic -> Details and select the start and end offset columns.
An example of truncation
Consider a topic partition with 6 records. The start offset is 0
and the end offset is 5
.
If we make a request to truncate a topic partition before offset 3
, all records highlighted in gray will be deleted.
After we have performed this action the new start offset will be 3
and the end offset will remain as 5
.
Truncating records in Kpow
Kpow provides a convenient way to truncate topics with its intuitive UI.
To truncate a topic in Kpow, simply follow these steps:
- Navigate to the topic you want to truncate in the UI.
- Select the partitions you want to truncate.
- Choose to truncate by either the last observed end offset or by group offset.
- Click "Truncate" to delete the specified range of records from the topic.
- By default, Kpow populates the last observed end offset of each partition in the form. This will delete all records up to and including the specified offset.
Alternatively, you can choose to truncate by group offset, which deletes all records a consumer group has consumed. This has the advantage of not impacting the correctness/behavior of the consumer group, by only deleting records it has read.
It's important to note that truncating a topic is a destructive action and requires careful consideration. If multiple consumers are reading from the topic, truncating by group offset could impact the other consumers.
Implications of truncating a topic
Truncating a topic in Kafka is a less intrusive way of deleting records than deleting the topic entirely. This is because the topic configuration, including the number of partitions and replicas, remains unchanged. Additionally, you have more granular control over which records get deleted.
However, it's important to note that truncating a topic is a destructive action that requires careful consideration. In particular, truncating a topic can cause data loss and may impact the behavior of any consumers reading from the affected partitions.
As a best practice, it's generally recommended to rely on the semantics of how you configure a topic to manage topic growth, rather than resorting to truncation. For example, you can use the retention.ms
configuration parameter to automatically age out data after a certain period of time, or configure a cleanup policy to remove old or irrelevant data. This blog post covers how these retention policies work in Kafka. How these get configured will depend on the use case of your topic.
That said, there are still valid reasons to truncate a topic on a running Kafka cluster. For instance, you may want to reset a topic to a specific state for testing or debugging purposes, or you may have encountered a production issue that requires you to delete a range of records from a topic. In these cases, truncation can be a useful tool.
If you do decide to truncate a topic, it's important to be aware of the potential impacts on your Kafka cluster and consumers. For example, truncating a topic may cause consumers to experience data gaps or inconsistencies. As a best practice, you should always test truncation in a non-production environment before running it in a production context.
3. Tombstoning Records
The final and most granular way of a deleting record in Kafka is via tombstoning. Tombstoning deletes an individual record based on its key.
How tombstoning works in Kafka
Tombstoning works by producing a record with a null
value and the key of the record that needs to be deleted to a topic. Note: null in this case means a value of 0 bytes. For example, producing the value null
with a JSON serializer will not have the same effect.
Tombstoning allows you to delete individual records from a topic without affecting the rest of the data in the topic.
Note: tombstoning will only work when the topic has been configured with a compact.policy
of compact
or compact,delete
.
Compacted topics
Compacted topics in Kafka ensure that only the latest record per message key is retained within the log of data for a single topic partition. This policy is useful for implementing key/value stores or aggregated views where only the most recent state is needed.
For example, a KTable that holds the latest count of Covid-19 cases by country, where each record is keyed by the country, would benefit from a compacted topic.
It is important to note that compaction does not happen automatically and how often it happens depends on your topic and broker configuration. Therefore, deletion does not occur automatically after a tombstone record is produced.
This blog post goes into finer details about the different broker/topic configuration that can have an impact on when compaction happens.
Producing tombstone messages in Kpow
First, we can ensure that compaction has been enabled on our topic by navigating to the Topic Configuration table and selecting our topic and the config value cleanup.policy
.
If cleanup.policy
hasn't been correctly set, we can click the pencil icon to edit the topic configuration and set it to compact,delete
.
Next, navigate to Kpow's Data Produce UI and select None
for the value serializer while specifying the key you wish to delete.
Done! You have successfully produced a tombstone message!
Querying for data to be deleted
We can use Kpow to query for data we want to tombstone on a topic.
For example. Consider a topic that contains the following data:
{ "name": "John Smith", "score": 10, "expires": "2022-10-10" }
Let's say we want to query for all records that have expired, we could write a kJQ query like so:
.value.expires | from-date < now
kJQ is Kpow's powerful query language for searching data on a Kafka topic. It is our implementation of the jq language with added features built specifically for Kafka.
The above query parses the expires
field as an ISO 8601 date time and checks if its before the current date time (now). now
will get resolved as the current date during query execution time.
After executing this query in Kpow, we can see a list of results that match our filtered query. These are the expired records!
We can now click the 'Produce results' button and produce these records back to the topic as tombstones, by selecting the value serializer as None
.
Done! We have managed to delete a collection of records based on a query filter.
Conclusion
In this article we have demonstrated the various ways you can delete records in Kafka using Kpow.
You should now have a better understanding of deletion, understanding the different implications between each method, and when they might be applicable to use.
Get started with the free community edition of Kpow today!