If you can boil down reliability 'advice' to one thing ... what would that be?
There is only one answer to this: Do everything possible to identify and replicate failure throughout the design process.
This piece of advice has a couple of layers ... but lets start with the obvious one. We often use the term 'physics of failure' when talking about reliability. The 'pure' element of design is focused on the 'physics of success.' That is, designers first and foremost have to create a system that works. To do this, the components need to interact and do something good.
This goes some way to explain the sometimes adversarial relationship design teams have with reliability engineers. And if there is an adversarial relationship ... you have already organizationally failed.
Reliability systems are those designed to accommodate failure mechanisms, software coding faults, manufacturing defects and so on. To do this ... you need to know how the system will fail. And every system that is designed from scratch presents a new suite of ways it can fail.
Another layer to the answer above revolves around culture. Many design engineers 'protect' their design from failing in the laboratory or the manufacturing line. That is, they do whatever they can to get their system, subsystem or component to pass qualification tests - even if this means making these qualification tests less challenging.
You should never test to pass - you should always test to learn.
A management team serious about reliability needs to get serious about breaking systems as soon as possible and as often as possible. This means changing cultures in workplaces and within design teams. Producing prototypes for destructive testing as early as possible in the design process is immensely valuable. Getting buy-in from your team is essential.
If you break a system in a laboratory, you have just found out how it will break when being used by the customer. Which sort of failure do you want?
And it doesn't just have to be testing (although nothing can replace testing). Failure Mode and Effect Analyses (FMEAs) have proven to be perhaps the single most important activity for improving reliability during design.
So in short, embrace failure during the design process. Because if you don't - you will be forced to embrace failure during operations.