It’s amazing how some companies manage to function, or even survive.
I worked with a company who operated a large IBM Mainframe network.
Now back in the day (late 1980’s), essentially there were two products that could be used to monitor an IBM SNA Network. The NetMaster product, and the somewhat inferior IBM Netview Product.
You could use either product to detect network problems, but the (free) NetView product required more skill to interpret the information.
The type of errors we’d generally aim to spot were latency and transmission errors.
Latency is a measure of how long it takes for a packet to travel along a network. With IBM SNA, you would get a broken network connection with a latency of more than 2 seconds. So you start to worry about traffic taking 1.2+ seconds.
Transmission errors. Much like how static on a phone line makes it hard to talk, high transmission error rates makes it hard for computers to talk on networks. Eventually, with growing transmission errors, the computers would stop talking.
So in a well managed network, you’d monitor both, with the aim of keeping up times, up.
“We don’t do that here.”, was the comment I heard in the first week I was there. I was amazed the company managed to function at all.