Skip to main content

Pod JVM Kafka Exception

Last updated on

Pod JVM Kafka Exception fault simulates Kafka producer/consumer failures by raising exceptions for Kafka operations executed by the Java process running inside a Kubernetes pod. This helps test the application's behavior and resilience against Kafka-related errors.

tip

JVM chaos faults use the Byteman utility to inject chaos faults into the JVM.

Use cases

Pod JVM Kafka exception:

  • Validate the application's resilience by simulating Kafka exceptions to ensure it can recover gracefully, retry operations, or switch to backup message brokers without affecting functionality.
  • Assess if the monitoring systems and alerting mechanisms can accurately detect and report Kafka exceptions in real-time.
  • Trigger exception-handling paths in the application to ensure coverage of edge cases related to Kafka message production/consumption failures during testing.
  • Test circuit breaker patterns and fallback mechanisms when Kafka operations fail.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-jvm-kafka-exception
spec:
definition:
scope: Namespaced
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
Java requirements

This fault requires the following Java-specific prerequisites:

  • The Java process must allow agent attachment (Attach API must be available).
  • Utilities like ps, pgrep, and bash must be available in the target container.
  • File permissions must allow the JVM to read and execute agent files.
  • Agent attachment must not be restricted by user or security context configurations.
  • The target container image must not use a restricted/minimal Java runtime that removes attach-related modules.

Supported environments

Platform Support Status
GKE (Google Kubernetes Engine) ✅ Supported
EKS (Amazon Elastic Kubernetes Service) ✅ Supported
AKS (Azure Kubernetes Service) ✅ Supported
GKE Autopilot ✅ Supported
Self-managed Kubernetes ✅ Supported

Mandatory tunables

Tunable Description Notes
KAFKA_MODE The Kafka operation mode to target (producer or consumer). Supported values: producer, consumer. For more information, go to Parameters
KAFKA_TOPIC The name of the Kafka topic to be targeted. For more information, go to Parameters
EXCEPTION_CLASS The name of the exception class. For more information, go to Parameters
EXCEPTION_MESSAGE The exception message to be raised. For more information, go to Parameters

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration through which chaos is injected into the target resource. Should be provided in [numeric-hours]h[numeric-minutes]m[numeric-seconds]s format. Default: 30s. Examples: 1m25s, 1h3m2s, 1h3s. For more information, go to duration of the chaos.
TRANSACTION_PERCENTAGE The percentage of total Kafka operations to be targeted. Supports percentage in (0.00,1.00] range. If not provided, it targets all Kafka operations. For more information, go to Parameters
POD_AFFECTED_PERCENTAGE Percentage of total pods to target. Provide numeric values. Default: 0 (corresponds to 1 replica). For more information, go to pods affected percentage.
JAVA_HOME Path to the Java installation directory. For example, /tmp/dir/jdk.
BYTEMAN_PORT Port used by the Byteman agent. Default: 9091.
CONTAINER_RUNTIME Container runtime interface for the cluster. Default: containerd. Support values: docker, containerd and crio. For more information, go to container runtime.
SOCKET_PATH Path of the containerd or crio or docker socket file. Default: /run/containerd/containerd.sock. For more information, go to socket path.
RAMP_TIME Period to wait before and after injecting chaos. Should be provided in [numeric-hours]h[numeric-minutes]m[numeric-seconds]s format. Default: 0s. Examples: 1m25s, 1h3m2s, 1h3s. For more information, go to ramp time.
SEQUENCE Sequence of chaos execution for multiple target pods. Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution.
TARGET_CONTAINER Name of the container subject to Kafka exception injection. None. For more information, go to target specific container
TARGET_PODS Comma-separated list of application pod names subject to pod JVM Kafka exception. If not provided, the fault selects target pods randomly based on provided appLabels. For more information, go to target specific pods.
NODE_LABEL It filters the target pods that are scheduled on nodes matching the specified NODE_LABEL. For more information, go to node label.
LIB_IMAGE Image used to inject chaos. Default: harness/chaos-ddcr-faults:1.72.0. For more information, go to image used by the helper pod.

Parameters

The following YAML snippet illustrates the use of these tunables:

kind: KubernetesChaosExperiment
apiVersion: litmuschaos.io/v1alpha1
metadata:
name: pod-jvm-kafka-exception
namespace: hce
spec:
tasks:
- definition:
chaos:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
# Kafka mode: producer or consumer
- name: KAFKA_MODE
value: "producer"
# name of the Kafka topic to be targeted
- name: KAFKA_TOPIC
value: "orders"
# name of the exception class
- name: EXCEPTION_CLASS
value: "org.apache.kafka.common.errors.TimeoutException"
# provide the exception message
- name: EXCEPTION_MESSAGE
value: "Kafka operation timeout!"
# provide the transaction percentage
- name: TRANSACTION_PERCENTAGE
value: "50"
# provide the Byteman port
- name: BYTEMAN_PORT
value: "9091"