Chaos Engineering with Apache Kafka and Gremlin

The most secure clusters aren’t built on the hopes that they’ll never break. They are the clusters that are broken on purpose and with a specific goal. When organizations want to avoid systematic weaknesses, chaos engineering with Apache Kafka® is the route to go. Your system is only as reliable as its highest point of vulnerability. Patrick Brennan (Principal Architect) and Tammy Butow (Principal SRE) from Gremlin discuss how they do their own chaos engineering to manage and resolve high-severity incidents across the company. But why would an engineer break things when they would have to fix them? Brennan explains that finding weaknesses in the cloud environment helps Gremlin to:Avoid lengthy downtime when there is an issue (not if, but when)Halt lost revenue that results from service interruptionsMaintain customer satisfaction with their stream processing servicesSteer clear of burnout for the SRE team Chaos engineering is all about experimenting with injecting failure directly into the clusters on the cloud. The key is to start with a small blast radius and then scale as needed. It is critical that SREs have a plan for failure and then practice an intense communication methodology with the development team. This plan has to be detailed and includes precise diagramming so that nothing in the chaos engineering process is an anomaly. Once the process is confirmed, SREs can automate it, and nothing about it is random. When something breaks or you find a vulnerability, it only helps the overall network become stronger. This becomes a way to problem-solve across engineering teams collaboratively. Chaos engineering makes it easier for SRE and development teams to do their job, and it helps the organization promote security and reliability to their customers. With Kafka, companies don’t have to wait for an issue to happen. They can make their disorder within microservices on the cloud and fix vulnerabilities before anything catastrophic happens.EPISODE LINKSTry Gremlin’s free tierJoin Gremlin’s Slack channelLearn more about Girl Geek AcademyLearn more about gardeningWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Om Podcasten

Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.