Join a rapid response team to resolve technical incidents, improve processes, and build automation t
岗位职责
As a Site Reliability Engineer, you will be part of a rapid response team focused on resolving business-impacting technical incidents. Your primary responsibilities include:
Incident Response: Quickly diagnose and resolve production incidents to minimize downtime and restore services.
Process Improvement: Analyze recurring issues and implement process changes to prevent future incidents.
Automation Development: Build and maintain automation tools to reduce manual intervention and improve system reliability.
申请条件
Qualifications:
Newly trained or entry-level experience in DevOps or Site Reliability Engineering.
Basic understanding of cloud infrastructure (e.g., AWS, GCP, Azure).
Familiarity with scripting languages (e.g., Python, Bash).
Knowledge of monitoring tools (e.g., Prometheus, Grafana).