The Internet Archive is looking for an expert DevOps / SRE engineer to join the UX Team, working remotely.
You will be one of the primary engineers responsible for the Archive.org website (a Top 250 website) and related services. You will be in charge of maintaining and developing the mostly Ansible-managed production cluster, provisioning and configuring servers, maintaining applications, setting up monitoring and alerts, and generally helping keep things running smoothly. There is also the possibility of contributing to front-end development and participating in other UX-related activities. This is a rare opportunity to become a critical member of a small team making a huge impact in the world.
Responsibilities:
Operationally maintaining Archive.org servers and services
Maintaining and evolving the Ansible-based provisioning and configuration infrastructure
Collaboratively managing the deployment architecture of our staging and production apps
Setting up and maintaining monitoring and alerts
Identifying and triaging problems when they arise; researching, building consensus around, and implementing solutions
Responding to external stakeholders who have apps hosted in our server cluster
Working with other DevOps engineers, both on the UX Team and on other teams
Communicating effectively with stakeholders
Reducing technical debt
Being a role model for effective and collaborative engineering practices
Maintaining the blog and other Wordpress sites
Requirements:
3+ years of relevant work experience in a collaborative software development environment
Strong Linux system administration skills
Expertise with maintaining and optimizing a server cluster through time
Experience setting up monitoring and alerting at all levels within a system
Excellent problem-solving and debugging skills
Excellent verbal and written communication skills
Familiarity with website and server security
Comfort working in a loosely structured environment requiring individual autonomy and initiative within one's scope of responsibilities
Willingness to learn and change, reach compromise with others
Remote work with occasional optional on-sites
Preferred Skills:
Automated server provisioning with Ansible (or similar tooling)
Web servers, load-balancing, and caching (e.g. nginx, HAProxy)
Network & DNS configuration
Containerization and clustering (e.g. Docker, nomad, consul)
Monitoring and observability (e.g. Grafana, Prometheus, Loki, Sentry)
Git, GitLab
JIRA, Agile-ish software development
About Us:
We are a 501(c)(3) non-profit digital research library with a bold mission: to provide universal access to all knowledge—including the books, music, images, audio, television, websites, and software that form our shared human culture. Our dedicated team of engineers, archivists, librarians, and other professionals has created one of the world’s top 300 websites, archive.org. Each day, the Internet Archive digitizes thousands of books and captures hundreds of millions of web pages weekly. Over the past 25 years, we have built one of the largest digital libraries in existence, serving millions of people worldwide. This achievement is made possible through collaborations with hundreds of libraries, archives, museums, universities, and non-profits across the globe.
Benefits & Perks:
The Internet Archive provides a comprehensive benefits package including: PTO, paid holidays, medical, dental, vision, FSA, commuter, STD, LTD, 401K/Roth accounts. Work-life balance is important to us. For engineers located near HQ, we offer catered Friday lunches.
Internet Archive is an Equal Opportunity Employer M/F/D/V/L/G/B/T and will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.