Site Reliability Engineer (Local Candidate)
Job Description
Responsibilities
· Design, manage and optimise Continuous Integration and Continuous Delivery.
· Monitor application and infrastructure (cloud) to ensure system security and availability usingthe real time dashboard and active alerting mechanism.
· Manage and monitor archival, backup and housekeeping of the data & application resources.
· Make the logs and other information from production & test environment to be securely accessible by people who need it.
· Perform necessary updates to the application, database and infrastructure as required by the business/operation and security requirements.
· Run security scan on the system & source code, perform early assessment on reported security findings and escalate to development team if necessary.
· Strive to increase the service reliability through establishing guidance and methods of improvement.
· Collaborate and cultivate relationships with Development and Support teams to improve reliability, stability and scalability of services.
· Deliver data and analytics to provide insights for our team from a reliability and resilience perspective.
· Identify and resolve problems relating to critical service operations and to prevent their recurrence using automation.
· Improve the incident management lifecycle to identify, mitigate, and learn from reliability risks.
· Work closely with internal teams to ensure technical and operational compliance with ISO 27001 requirements.
Requirements
Skills and Qualifications
· 3 years experience in Site Reliability Engineering, DevOps, or related roles.
· Hands-on experience with AWS cloud infrastructure and services (e.g. EC2, RDS, IAM and etc.)
· Experience in infrastructure as code e.g. Terraform.
· Experience with containerization and orchestration (e.g. Docker, AWS EKS).
· Experience in managing and improving the processes within CI/CD tools (e.g. Jenkins), BitBucket repository and code quality scanner (e.g. SonarQube).
· Experience in application and infrastructure monitoring and familiar with application logging and monitoring tools such as Datadog and Grafana-Loki-Promtail stack.
· Familiar with scripting e.g. bash, python, go for task automation.
· Experience in managing linux servers.
· Awareness of the security practices, standards and processes will be an advantage.
· Experience analyzing and resolving performance, scalability and reliability issues.
· Knowledge on web application environments, such as TCP/IP, SSL/TLS, HTTP, DNS, routing, load balancing, CDNs, Tomcat, Apache, etc.
Preferred Qualifications
· Experience in ISO 27001 policies and processes.
· Experience in SaaS environments with multi-tenant architectures.
· Experience with PDPA, GDPR, SOC 2 or other compliance frameworks.
· Experience with performance testing using JMeter.
Benefits of Joining our Team include:
· Opportunities working with both international and high profile clients
· Enjoy hybrid work, flexible hours, result oriented, and collaborate with a mission-driven team invested in your growth.
· Outpatient medical coverage for employees, their spouses, and children, in accordance with company policy.
· Provision for spectacles and dental expenses for employees.
· A range of allowances, including travel and mobile phone expenses, among others.
· Product and technical training are provided, both from internal & external sources
· Accelerate your career through hands-on challenges, mentorship from leadership, and opportunities to lead as the team scales.
· Partner closely with product, and operation teams while owning decisions in a flat hierarchy.
· Collaborate with a passionate team using the latest technologies and frameworks.