MathWorks Site Reliability Engineer in Natick, Massachusetts
Site Reliability Engineer
Department:Infrastructure and Architecture
We are looking for individuals that will help lead our cloud-native transformation from a successful DevOps team to a Site Reliability Engineering team. As a SRE, you will be responsible for leading the change to bring software engineering mindset into our current implementation, and will help maintain and elevate the reliability, availability and performance of our SaaS and corporate web applications.
You will be successful in this role if you:
Understand the difference between systems and software engineering, and are either proficient in both or have the capability to grow into them.
Understand what it takes, in terms of both technology and process, to improve an application SLA.
Are able to work closely with application development teams bringing Reliability as a core feature into their design.
Detest manual, repeatable work and have the drive and skills to eliminate that via automation and infrastructure as code.
Design, implement and document highly reliable and secure systems leveraging industry wide best practices and architectures.
Collaborate with application developers and other system engineers to keep improve reliability and observability all while relentlessly reducing toil.
Improve our automation framework to make it more scalable, reliable and testable
Participate in rotating on-call support to ensure high availability of our production systems
- A bachelor's degree and 5 years of professional work experience (or a master's degree, or equivalent experience) is required.
Deep knowledge of AWS services and design patterns such as the Well Architected Framework.
Strong experience with infrastructure as code tools such as CloudFormation, Puppet, Chef
Strong experience with scripting and automation tools such as Python and Ansible
Experience designing and deploying application infrastructures in AWS
Experience with programming languages such as Java, Ruby or Go is a big plus
Experience with containers and container management platforms such as Docker and Kubernetes
Strong LINUX and application framework (SpringBoot, Rails, gin) knowledge
Experience with application load balancers such as F5, AWS ALB, Nginx
Experience with CI/CD tools such as Jenkins, TeamCity, GitHub Actions, XebiaLabs Release
Experience with infrastructure and application performance monitoring and analysis tools such as Sensu, New Relic, Splunk and Dynatrace
Experience with MATLAB is a plus