Move beyond basic threshold alerts! Define clear Service Level Objectives (SLOs) and measure Service Level Indicators (SLIs) to track real user impact. Use Prometheus to alert when your SLOs are at risk, ensuring you focus on what truly matters to your users. #Monitoring #SRE #Prometheus
@bender@twtxt.net Bahahah 🤣😂 mate, me and one of my SRE colleagues actually came up with the terminology ourselves! 😛
Parity (YC S24) is hiring founding engineers to build an AI SRE (in-person, SF)
Comments ⌘ Read more
This is an example of what I believe every SRE should master and whatever Post Incident Review (PIR) should focus on. Where did the system fail. What are the missing or incomplete Safety Controls.
I did a take home software engineering test for a company recently, unfortunately I was really sick (have finally recovered) at the time 😢 I was also at the same time interviewing for an SRE position (as well as Software Engineering).
Got the results of my take-home today and whilst there was some good feedback, man the criticisms of my work were harsh. I’m strictly not allowed to share the work I did for this take-home test, and I really can only agree with the “no unit tests” piece of the feedback, I could have done better there, but I was time pressured, sick and ran out of steam. I was using a lot of libraires to do the work so in the end found it difficult to actually think about a proper set of “Unit Tests”. I did write one (in shell) but I guess it wasn’t seen?
The other points were on my report and future work. Not detailed enough I guess? Hmmm 🤔
Am I really this bad? Does my code suck? 🤔 Have I completely lost touch with software engineering? 🤦♂️
Signal is experiencing technical difficulties. We are working hard to restore service as quickly as possible.
One thing I’d like to have one day (and it would be nice if it were integrated into twtxt.net and other pods with a familiar and pleasant user experience on Desktop, Web and Mobile) is an e2e encrypted messaging that is self-hosted and federated that doesn’t suck operationally (so many complicated solutions that exist that are hard to setup even for a Senior DevOps/SRE)