Role: Site Reliability Engineer (SRE) Role Type: Full Time Location: Remote or Southern California Key Competencies: Maintaining production systems and services, designing systems and solutions, incident monitoring, infrastructure such as code (IaC), cloud technology, scripting, monitoring, and containerization. DevOps. Job Summary: As a Site Reliability Engineer, you will play a critical role in ensuring the stability, reliability, and performance of our production systems and services. You will work closely with software engineering, DevOps, and IT operations teams to design and build scalable, reliable, and efficient systems that support our business operations. This role combines software development, systems engineering, and operational expertise to keep our applications running smoothly. Responsibilities: System Reliability & Performance: Design, implement, and maintain solutions to improve the reliability, scalability, and performance of production systems. Monitoring & Incident Response: Set up monitoring, alerting, and incident response systems to detect, troubleshoot, and resolve production issues proactively. Automation & Infrastructure as Code (IaC): Develop and maintain automation scripts to manage infrastructure, deployment, and routine tasks, minimizing human intervention. Capacity Planning & Scaling: Collaborate with cross-functional teams to manage system capacity planning and scaling, ensuring our systems meet current and future demands. System Health & Troubleshooting: Monitor system health, troubleshoot issues, and address service failures, latency issues, and performance bottlenecks. On-Call Support: Participate in on-call rotation for monitoring and support of production systems. Documentation: Maintain detailed documentation for system designs, processes, and procedures to support team knowledge sharing and continuity. Requirements: Educational Background: Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). Experience: Proven experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role. Technical Skills: Proficiency with cloud platforms (e.g., AWS, GCP, Azure). Strong scripting and automation skills (e.g., Python, Bash, Ansible). Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Splunk). Knowledge of containerization and orchestration (e.g., Docker, Kubernetes). Familiarity with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation). Soft Skills: Strong problem-solving skills, excellent communication skills, and a team-oriented mindset. Preferred Qualifications: Experience with CI/CD pipelines and DevOps best practices. Familiarity with security best practices for system reliability. Certifications in cloud technologies or DevOps practices are a plus. Job Type: Full-time Pay: $110,000.00 - $140,000.00 per year Benefits: 401(k) Dental insurance Health insurance Paid time off Vision insurance Compensation Package: Yearly bonus Schedule: 8 hour shift Monday to Friday Education: Bachelor's (Required) Experience: Site Reliability Engineer (SRE): 2 years (Required) DevOps: 1 year (Required) cloud platforms: 2 years (Required) IT: 4 years (Required) License/Certification: Cloud Certification (Preferred) Work Location: Remote

Salary

Competitive

Project Basis based

Remote Job

Worldwide

Job Overview
Job Posted:
1 year ago
Job Type
Contractual
Job Role
Any
Education
Any
Experience
Any
Total Vacancies
-

Share This Job:

Location

United States