SRE Engineer with solid experience in observability and monitoring of critical systems for high-availability enterprise environments. Specialized in early detection of issues and improving monitoring coverage in complex infrastructures. Experience implementing monitoring-as-code solutions with Terraform in AWS, Docker, and Kubernetes environments. Scrum certified, currently working on improving system reliability for enterprise clients.
- π Observability architectures for enterprise infrastructures
- π Implementation of synthetic monitors and advanced alert configuration
- π οΈ Integration of contextual metrics with kube-state-metrics
- π Event correlation for false positive elimination
- Leading the implementation of observability architecture for enterprise infrastructure with multiple critical services
- Increased metrics coverage through monitoring configuration for Kubernetes with Prometheus Operator
- Development of dashboards in Grafana Cloud for monitoring key components
- Implementation of alert system with optimized thresholds and correlation patterns
- Contributing to the design of high-availability architecture with Grafana and Prometheus
- Configuration of synthetic monitors to validate availability
- Implementation of monitoring for network devices with SNMP Exporter
- Integration of notification channels (Slack, Teams, Email, PagerDuty)
- Oracle Cloud Infrastructure Foundations Associate
- Microsoft Certified: Azure AI Fundamentals
- LFFL1009: Getting Started with OpenTofu (The Linux Foundation)
- Develop Your Google Cloud Network Skill Badge
- AWS Educate (Serverless, Machine Learning, Cloud Ops)
- Scrum Foundation Professional
- Data Science (FCFM, Universidad de Chile)
- π Advanced monitoring techniques with multi-window alerts and contextual alerting
- π Google Professional Cloud Architect
- π§ [email protected]
- π LinkedIn
- π± +56 9 6414 2352
βοΈ From fabianimv