Deskripsi Pekerjaan
Join KMC Solutions as a Senior Site Reliability Engineer and play a crucial role in maintaining and optimizing our large-scale platforms and data centers. We are seeking a highly skilled professional with a strong background in Linux infrastructure and automation to ensure the reliability, scalability, and performance of our critical systems. In this remote position, you will have the opportunity to work with cutting-edge technologies and collaborate with a team of talented engineers to solve complex challenges.
As a Senior Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining robust infrastructure solutions that support our growing business needs. You will leverage your expertise in automation tools and practices to streamline operations, reduce downtime, and improve system efficiency. This role offers the flexibility of remote work while providing opportunities for professional growth and impact in a dynamic environment.
Tanggung Jawab
- Design, implement, and maintain scalable Linux infrastructure solutions for large-scale platforms
- Develop and maintain automation tools and scripts to streamline operational processes
- Monitor system performance, identify bottlenecks, and implement optimization strategies
- Collaborate with development teams to ensure seamless integration of new features
- Participate in on-call rotation and respond to system incidents in a timely manner
- Document infrastructure configurations, procedures, and best practices
- Stay updated with industry trends and emerging technologies to continuously improve our systems
Kualifikasi
- Bachelor's degree in Computer Science, Information Technology, or related field
- Minimum of 5 years of experience in Linux system administration and operations
- Strong expertise in automation tools such as Ansible, Puppet, or Chef
- Experience with containerization technologies like Docker and Kubernetes
- Proficiency in scripting languages such as Python, Bash, or Perl
- Familiarity with cloud platforms (AWS, Azure, or GCP) and services
- Experience with monitoring and logging tools like Prometheus, Grafana, or ELK stack
- Strong problem-solving skills and ability to work in a fast-paced environment