Topics such as application resiliency, self-healing, antifragility are area of interest for many. This article is trying to distinguish, define, and visualize these concepts, and create solutions with these characteristics.
I agree with all the suggestions above such as timeout, bulkhead, and circuit breaker. But that is a very narrow sighted view.
Once you have isolated the different service instances and ensured failure containment among the different service processes through containers, the next step is to protect from VM/Node/Host failures.
The circuit breaker pattern has characteristics for auto-recovery and self-healing to recover from failures related to service interactions.
Health checks such as Kubernetes liveness and readiness probes will monitor and detect failures in the services and restart them if required.
In order to have a system that can self-heal from different kinds of failures, there is a need for several resiliency primitives that are used together.