SRE Network Engineer
Job Details
Experienced
Full Time
Description
  • Effectively manage troubleshooting and recovery of complex production incidents, ranging from low to critical impacts.
  • Drive incident resolution through a systematic problem-solving approach, coupled with a strong sense of ownership and drive.
  • Actively participate in teams’ Agile stories (project work) to streamline and enhance day to day operations of the team.
  • Create, manage, and utilize appropriate technical procedural documentation (run books).
  • Proactively monitor all applications and infrastructure behind TokenEx’s external and internal customer-facing services, including availability, latency, performance, and capacity. 
  • Influence resiliency and scalability in production environments in Azure and Amazon Web Services (AWS).
  • Assist with conducting Root Cause Analysis (RCA) on critical production outages, develop and implement mitigation strategies
  • Utilize production support expertise to influence and support new designs, architectures, standards, and methods, maintaining stability and availability for large-scale distributed systems
  • Proactively identify and implement opportunities for automation of routine maintenance tasks, data gathering, and resolution of common issues.
  • Continuously seek to develop new skills and technical expertise, as well as proactively share knowledge with others.
  • Build software and systems to manage platform infrastructure and applications to improve reliability, quality, and time-to-market of our suite of software solutions.
  • Gather and analyze operating systems/applications metrics to assist in performance tuning and fault finding.
  • Participate in system design consulting, platform management, capacity planning, testing & release procedures.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and reliability with well-defined service level objectives.
  • Perform disaster recovery operations, monitor network performance, and troubleshoot, diagnose, and resolve hardware, software, and other network and system problems.
Qualifications
  • Bachelor’s Degree in Computer Science preferred but not required or relevant experience
  • 5+ years of software development experience, ideally in an Agile SaaS/product development company
  • In-depth understanding of web service protocols and REST API design and consumption
  • Excellent .NET (C#) development and debugging skills
  • Experience with both container and serverless computing
  • Microsoft Azure/AWS developer/architecture certifications preferred
  • Skilled in Cloud/PaaS Environments (e.g., AWS, Azure), LAN, WAN, Network Security
  • Proficient, collaborative, & experienced in building reliable, scalable, enterprise systems
  • Ability to identify root-cause sources of instability in a high-traffic, large-scale distributed systems
  • Linux administration, troubleshooting, and performance tuning experience
  • Understanding of observability principles (monitoring, logging, tracing, alerting), tools, and practices that promote observability
  • Experience with continuous integration tools (e.g., GitLab, AWS CodeBuild, CodeDeploy, CodePipeline, Azure DevOps)
  • Troubleshooting skills that span systems, network, and code Strong understanding of network infrastructure and network hardware
  • Ability to implement, administer, and troubleshoot network infrastructure devices, including firewalls and load balancers
  • Configuration management and orchestration (e.g., Terraform, Cloud Formation, Ansible, Chef)
Apply