Senior Production Reliability Engineer

About CK-12

CK-12’s vision is to provide content and tools that can help increase student learning through engagement, and to provide more universal access to learning and learning content, irrespective of educational resources available for a student or region.

To achieve this noble and ambitious vision, we at CK-12 are challenging traditional model of education to transform it dramatically. Technology has opened up lots of opportunities to revolutionize education for the benefit of students, teachers and parents.

We have chosen to be non-profit so that we can effectively realize our mission and so that we can do the right thing! It also provides us the ability to experiment big and bold ideas. CK-12 is backed by Vinod Khosla, a renowned technologist and philanthropist.

At CK-12, you’ll experience the benefits of working in a dynamic, entrepreneurial, innovative and non bureaucratic environment where you will get a lot of cool things done than you ever imagined! We are a small group of passionate folks who are determined disrupt the current form of education. We came together from companies such as Apple, eBay, Amazon, McGraw-Hill, and startups. To achieve our goals, we use cutting edge technologies like Amazon cloud (AWS), MongoDB, GraphDB, HTML5/CSS3/Javascript, mobile apps on Android, iOS and Windows, PhoneGap and data science / machine learning. We are reimagining our cloud-based platform with a mobile-first strategy, providing easy to use, intuitive and simple interfaces to millions of users worldwide.

Does our mission, people and technologies excite you? Best in class User Experience powered by Data science / Machine learning is a key part of providing customized solutions for our users needs. Do you want to revolutionize the way teachers teach and students learn with free access to highly engaging experiences, anytime and anywhere? If the answer is YES! and you are a great technologist who will challenge status-quo (no order takers please!) by innovating, please come join us! Together, we will change the world!

CK-12 Foundation is looking for a Sr. Production Reliability Engineer to advance stability and performance in our production systems. The ideal candidate will have experience with large-scale cloud deployments for a consumer-facing website, where they maintain global SLAs., are very hands-on, yet analytical in your approach to solving problems. You are constantly looking for things to improve even when they are running smoothly. In this role, you will get the opportunity to implement any procedures, architectures, and automation necessary to achieve the desired outcome. You will be joining CK-12 at an exciting time of growth and rapid iteration, this will be an opportunity to use technology to impact the learning outcomes of millions of learners.


  • A proven track record of scaling consumer-facing products and maintaining large scale systems
  • Deep understanding of modern web services architectures, cloud platforms such as AWS, GCP, Azure
  • Take on performance and stability issues using a wide variety of tools
  • Strong core infrastructure debugging skills
  • Maintain and optimize critical services and provide visibility to internal teams and stakeholders
  • Experience implementing observability monitoring, hands-on experience with Log mining frameworks like Splunk, ELK, etc.
  • In-depth knowledge of build/release systems and process
  • Strong attention to detail and excellent analytical capabilities
  • Be a part of the On-Call for production issues during shift or as required.
  • Able to quickly learn new and existing technologies, self-motivated, proactive and solution-oriented, individual

BS Computer Science or Equivalent combination of academic and professional experience 

At least 10 Years of Experience

Nice to have:

  • Responsible for production deployments of 100+ nodes
  • Ability to troubleshoot python
  • Experience with web server performance tuning 
  • Strong programming and scripting fundamentals (Python/Bash)

Technologies deployed at CK-12:

  • Continuous integration: Jenkins
  • Observability: OpenTelemetry, Dynatrace, Newrelic
  • Web Stack: Apache, Mysql, Mongodb, Neo4j , Redis , Python , Linux
  • Load Balancer: ELB + HAProxy
  • Configuration management: Ansible
  • Amazon Cloud: EC2, EBS, RDS, S3, ELB, Cloudwatch 
  • Application servers: Tomcat, Solr
  • NoSQL, Zookeeper, Storm, Kafka

Must be a US citizen or Green Card holder