Use case Help me from tons of alarms – #OpenStack
Fujitsu Cloud had a performance issue of OpenStack API. Average of API response time was usually good(less than a few seconds), however, once the trouble happened, a large amount of time out errors occurred. We tried to detect the trouble with metrics monitoring(CPU, memory…), but could not configure the threshold for each metric properly. We just got a ton of alarms after the trouble happened. It’s very hard for operators to check whether the alarm is necessary or not for all alarms.