Linux Storage Engineer
Job ID
5434
Location
SLAC – Menlo Park, CA
Full-Time
Regular
SLAC Job Postings
Position overview:
Would you like to enable groundbreaking research through innovative scientific computing? Do you enjoy learning and applying leading-edge technologies? SLAC National Accelerator Laboratory seeks a talented storage engineer to provide multi-petabyte Object and POSIX solutions required for massive scale analytics.
SLAC is one of the world’s premier research laboratories, with leading capabilities in photon science, accelerator physics, high energy physics (HEP), and energy sciences. Our Scientific Computing Systems (SCS) department is responsible for the scalable high-throughput computing infrastructure that enables our major facilities and experiments. These include: Rubin Observatory, the Linac Coherent Light Source (LCLS), CryoEM and the LHC ATLAS detector at CERN.
You must be a pragmatic and skillful storage specialist who will blend hardware and software solutions to deliver the best outcomes for our scientists. A broad understanding of technology and the implications of the impact of design decisions will be important, as will the ability to communicate clearly both within our team and with our partners.
You will play a critical role in designing, deploying and supporting the storage system required by the Vera C. Rubin Observatory. This system will eventually hold exabytes of data and require terabit/sec aggregate transfer speeds. Rubin is a next-generation astronomical facility currently under construction in Chile, with staff based at SLAC. Rubin Observatory will undertake the Legacy Survey of Space and Time (LSST) which will be one of the largest and most comprehensive astronomy surveys of its kind. Rubin will bring countless discoveries in almost every area of modern astronomical research. We encourage free-thinking open dialog and provide opportunities to explore and implement new technologies and ideas. There is huge potential for career growth. High performance computing is recognized as a SLAC core competency.
Given the nature of this position, SLAC is open to on-site and hybrid work options.
Your specific responsibilities will be to:
- Engage in, support, improve and evolve the whole lifecycle of the object and POSIX storage services portfolio; from inception and design through deployment, operation and sunset
- Plan, operate and manage 500+PB of disk storage and 1+EB of tape storage
- Investigate new storage technologies through research, collaboration with peers, and participation in standards organizations, industry groups, conferences, etc.
- Gather data, perform analysis and help troubleshoot issues across the entire scientific storage services portfolio
- Provide documentation, monitoring, alerting and reporting of the entire storage portfolio
- Support day-to-day operations of scientific storage services at SLAC
- Provide 24×7 on-call support for all storage platforms on a rotational basis
To be successful in this position you will bring:
- Bachelor’s degree in computer science or a related field and 8 years of relevant experience in Unix storage (design, operation and lifecycle) or a combination of education and relevant experience
- Expertise in Erasure Coding, RAID, fault tolerant/HA storage architectures
- Expertise in deploying and managing object storage software (Ceph, MinIO, or similar)
- Expertise in any/all of: Lustre, WekaIO, GPFS, ZFS, XFS, Santricity, MegaRAID
- Expertise with storage hardware/arrays (e.g., Dell, Supermicro, DDN, Seagate, NetApp)
- Proficient with VMware or other leading virtualization platforms
- Proficient with general Unix administration, configuration management and monitoring
- Understanding of high performance networking (including 100GbE+)
- Proficient with programming in python and/or ruby and bash
- Track record of detecting and resolving service and performance issues
- Ability to establish and promote best practices
- Excellent organizational and communication skills
- Ability to work effectively in a team environment
In addition, preferred requirements include:
- Expertise in tape library management (Oracle STK, IBM, and/or Spectra Logic)
- Expertise in HPSS, TSM/Spectrum Protect, and/or TiBS tape storage software
- Experience with Cloud storage technologies
SLAC employee competencies :
- Effective Decisions: Uses job knowledge and solid judgment to make quality decisions in a timely manner.
- Self-Development: Pursues a variety of venues and opportunities to continue learning and developing.
- Dependability: Can be counted on to deliver results with a sense of personal responsibility for expected outcomes.
- Initiative: Pursues work and interactions proactively with optimism, positive energy, and motivation to move things forward.
- Adaptability: Flexes as needed when change occurs, maintains an open outlook while adjusting and accommodating changes.
- Communication: Ensures effective information flow to various audiences and creates and delivers clear, appropriate written, spoken, presented messages.
- Relationships: Builds relationships to foster trust, collaboration, and a positive climate to achieve common goals.
Physical Requirements and Working Conditions:
- You are expected to reside locally and work onsite up to 3 days a week
- Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of the job. May work extended hours during peak business cycles.
Work Standards :
- Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.
- Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for environment, safety and security; communicates related concerns; uses and promotes safe behaviors based on training and lessons learned. Meets the applicable roles and responsibilities as described in the ESH Manual, Chapter 1-General Policy and Responsibilities: http://www-group.slac.stanford.edu/esh/eshmanual/pdfs/ESHch01.pdf
- Subject to and expected to comply with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in the University’s Administrative Guide, http://adminguide.stanford.edu
Classification Title: System Administrator 3 Grade: K Job code: 4833 Duration: Regular Continuing _The expected pay range for this position is $119,000 to $150,000 per annum. SLAC National Accelerator Laboratory/Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs._ SLAC National Accelerator Laboratory is an Affirmative Action / Equal Opportunity Employer and supports diversity in the workplace. All employment decisions are made without regard to race, color, religion, sex, national origin, age, disability, veteran status, marital or family status, sexual orientation, gender identity, or genetic information. All staff at SLAC National Accelerator Laboratory must be able to demonstrate the legal right to work in the United States. SLAC is an E-Verify employer.