Senior DevOps Engineer

Middle

DevOps

Linux

3+ years in DevOps/Platform Engineering and Linux/Windows administration, skilled in AWS/Azure, Kubernetes, IaC, GitOps, CI/CD, and DevSecOps, capable of designing and operating scalable, resilient cloud and AI infrastructures

Senior DevOps Engineer

Senior DevOps Engineer

Middle

DevOps

Linux

Responsibilities:

  • Designing and deploying scalable, multi-tenant cloud (AWS or Azure) and hybrid/on-premises architectures tailored to diverse client needs, including specialized infrastructure for AI and machine learning workloads.

  • Understanding business requirements, evaluating architectural trade-offs, and translating them into cost-effective, production-ready technical solutions.

  • Developing declarative scripts and modules for automating infrastructure provisioning, configuration management, and environment replication.

  • Designing, implementing, and optimizing GitOps-driven CI/CD pipelines to achieve automated, self-healing software delivery cycles for both application code and AI assets (models, prompts, and evaluation datasets).

  • Building and maintaining comprehensive observability (monitoring, logging, and tracing) systems to ensure proactive anomaly detection, including tracking LLM performance metrics (latency, token usage, and drift).

  • Ensuring system security and protection by integrating security guardrails (SAST/DAST, container scanning, prompt injection defense, and data anonymization) directly into the delivery pipeline (DevSecOps).

  • Designing and implementing robust disaster recovery (DR), failover procedures, and high-availability strategies across multi-region setups.

  • Automating the deployment of system updates, patches, and zero-downtime microservices and AI model endpoint releases.

  • Adhering to corporate security, data privacy (GDPR/HIPAA/SOC2), and industry-standard regulatory rules and compliance practices, with specific guardrails for AI data handling and model usage.

  • Providing technical leadership, architectural guidance, and mentorship to developers, data scientists, system engineers, and cross-functional client teams.

  • Supporting team infrastructure, unblocking development workflows, and rapidly resolving complex configuration, network, and automation issues across multi-cloud environments.

  • Ensuring high availability, scalability, elasticity, and maximum resilience against infrastructure and service component failures.

  • Staying up-to-date on the latest cloud-native technologies, CNCF ecosystem projects, FinOps (specifically managing unpredictable cloud AI/GPU spend), and industry best practices to drive continuous innovation.

What we expect from you:

  • Practical administration experience with Linux/UNIX and Windows systems (mandatory, at least 3+ years in a senior or lead capacity).

  • Strong understanding of modern web architectures, microservices, distributed systems, and networking protocols.

  • Practical experience in DevOps/Platform Engineering roles involving end-to-end infrastructure development and client-facing delivery (mandatory, at least 3+ years).

  • Practical database administration and optimization experience with relational, non-relational (NoSQL), and vector databases (e.g., Pinecone, Milvus, Qdrant, or pgvector) used in AI applications.

  • Deep understanding and production experience with Infrastructure as Code (IaC) principles, focusing on modularity, reusability, and state management.

  • Automation experience with enterprise configuration management tools like Ansible, or modern alternatives/code-driven IaC (e.g., Pulumi).

  • Experience in designing, deploying, and managing environments in AWS or Azure using advanced, automated GitOps/IaC workflows (Terraform, OpenTofu, CloudFormation, or Bicep/ARM).

  • Practical skills in automating code compilation, artifact management, and continuous deployment using GitHub Actions, GitLab CI, Jenkins, or cloud-native tooling (ArgoCD, Flux).

  • Experience implementing automated code testing and compliance shifts in the CI process, extending to continuous evaluation pipelines for LLM-backed applications (using frameworks like Ragas or Langfuse).

  • Proficiency in containerization and cloud-native orchestration using Docker, Kubernetes (EKS/AKS), Helm, and ingress management.

  • Experience deploying, scaling, and managing service meshes, microservices releases, and containerized AI model deployment frameworks (e.g., vLLM, Triton Inference Server, Hugging Face TGI).

  • Experience with enterprise artifact repository managers (JFrog Artifactory, Nexus, or cloud-native container registries).

  • Advanced scripting and programming skills in Python (essential for AI ecosystems), Bash, Go, or PowerShell for building custom automation tools.

  • Expert knowledge of Git, including advanced branching strategies (GitFlow, Trunk-Based Development), repository management, and managing version control for application code, configuration, and prompt templates.

  • Proven experience implementing DevSecOps, secrets management (HashiCorp Vault, AWS Secrets Manager), and identity access management (IAM).

  • Experience implementing cloud financial management (FinOps), with a strong focus on tracking and optimizing high-cost AI infrastructure and API token spending.

  • Great communication and consultancy skills, with the ability to articulate technical concepts clearly to both technical teams and non-technical client stakeholders.

Will be a plus:

  • Deep knowledge of Linux/Windows OS internals, low-level troubleshooting, kernel tuning, and advanced performance diagnostics.

  • Deep knowledge of Enterprise Networking (VPC peering, SD-WAN, VPNs) and Cloud Security Architecture (Zero Trust models, WAF, DDoS mitigation).

  • Experience in the end-to-end design, business justification, documentation, and implementation of complex, large-scale enterprise architectural solutions.

  • Hands-on experience building or operating Retrieval-Augmented Generation (RAG) pipelines and managing LLM-backed agent orchestration frameworks (e.g., LangChain, AutoGen).

  • Active professional-level certifications (e.g., AWS Certified Solutions Architect Professional, Azure Solutions Architect Expert, CKA/CKAD, or cloud AI/Machine Learning specializations).

What we offer:

  • Long-term career stability with a competitive salary paid in USD.

  • Conditions for steady career development.

  • Development supported by dedicated mentors and a variety of programs focused on expertise and innovation.

  • Private medical insurance provided after successful completion of the probationary period

  • A well-equipped and cozy office supports comfort and productivity across all project stages.

  • Welcoming atmosphere and a friendly corporate culture.

If you feel this opportunity resonates with you, apply now — we’re looking forward to getting to know you!

Responsibilities:

  • Designing and deploying scalable, multi-tenant cloud (AWS or Azure) and hybrid/on-premises architectures tailored to diverse client needs, including specialized infrastructure for AI and machine learning workloads.

  • Understanding business requirements, evaluating architectural trade-offs, and translating them into cost-effective, production-ready technical solutions.

  • Developing declarative scripts and modules for automating infrastructure provisioning, configuration management, and environment replication.

  • Designing, implementing, and optimizing GitOps-driven CI/CD pipelines to achieve automated, self-healing software delivery cycles for both application code and AI assets (models, prompts, and evaluation datasets).

  • Building and maintaining comprehensive observability (monitoring, logging, and tracing) systems to ensure proactive anomaly detection, including tracking LLM performance metrics (latency, token usage, and drift).

  • Ensuring system security and protection by integrating security guardrails (SAST/DAST, container scanning, prompt injection defense, and data anonymization) directly into the delivery pipeline (DevSecOps).

  • Designing and implementing robust disaster recovery (DR), failover procedures, and high-availability strategies across multi-region setups.

  • Automating the deployment of system updates, patches, and zero-downtime microservices and AI model endpoint releases.

  • Adhering to corporate security, data privacy (GDPR/HIPAA/SOC2), and industry-standard regulatory rules and compliance practices, with specific guardrails for AI data handling and model usage.

  • Providing technical leadership, architectural guidance, and mentorship to developers, data scientists, system engineers, and cross-functional client teams.

  • Supporting team infrastructure, unblocking development workflows, and rapidly resolving complex configuration, network, and automation issues across multi-cloud environments.

  • Ensuring high availability, scalability, elasticity, and maximum resilience against infrastructure and service component failures.

  • Staying up-to-date on the latest cloud-native technologies, CNCF ecosystem projects, FinOps (specifically managing unpredictable cloud AI/GPU spend), and industry best practices to drive continuous innovation.

What we expect from you:

  • Practical administration experience with Linux/UNIX and Windows systems (mandatory, at least 3+ years in a senior or lead capacity).

  • Strong understanding of modern web architectures, microservices, distributed systems, and networking protocols.

  • Practical experience in DevOps/Platform Engineering roles involving end-to-end infrastructure development and client-facing delivery (mandatory, at least 3+ years).

  • Practical database administration and optimization experience with relational, non-relational (NoSQL), and vector databases (e.g., Pinecone, Milvus, Qdrant, or pgvector) used in AI applications.

  • Deep understanding and production experience with Infrastructure as Code (IaC) principles, focusing on modularity, reusability, and state management.

  • Automation experience with enterprise configuration management tools like Ansible, or modern alternatives/code-driven IaC (e.g., Pulumi).

  • Experience in designing, deploying, and managing environments in AWS or Azure using advanced, automated GitOps/IaC workflows (Terraform, OpenTofu, CloudFormation, or Bicep/ARM).

  • Practical skills in automating code compilation, artifact management, and continuous deployment using GitHub Actions, GitLab CI, Jenkins, or cloud-native tooling (ArgoCD, Flux).

  • Experience implementing automated code testing and compliance shifts in the CI process, extending to continuous evaluation pipelines for LLM-backed applications (using frameworks like Ragas or Langfuse).

  • Proficiency in containerization and cloud-native orchestration using Docker, Kubernetes (EKS/AKS), Helm, and ingress management.

  • Experience deploying, scaling, and managing service meshes, microservices releases, and containerized AI model deployment frameworks (e.g., vLLM, Triton Inference Server, Hugging Face TGI).

  • Experience with enterprise artifact repository managers (JFrog Artifactory, Nexus, or cloud-native container registries).

  • Advanced scripting and programming skills in Python (essential for AI ecosystems), Bash, Go, or PowerShell for building custom automation tools.

  • Expert knowledge of Git, including advanced branching strategies (GitFlow, Trunk-Based Development), repository management, and managing version control for application code, configuration, and prompt templates.

  • Proven experience implementing DevSecOps, secrets management (HashiCorp Vault, AWS Secrets Manager), and identity access management (IAM).

  • Experience implementing cloud financial management (FinOps), with a strong focus on tracking and optimizing high-cost AI infrastructure and API token spending.

  • Great communication and consultancy skills, with the ability to articulate technical concepts clearly to both technical teams and non-technical client stakeholders.

Will be a plus:

  • Deep knowledge of Linux/Windows OS internals, low-level troubleshooting, kernel tuning, and advanced performance diagnostics.

  • Deep knowledge of Enterprise Networking (VPC peering, SD-WAN, VPNs) and Cloud Security Architecture (Zero Trust models, WAF, DDoS mitigation).

  • Experience in the end-to-end design, business justification, documentation, and implementation of complex, large-scale enterprise architectural solutions.

  • Hands-on experience building or operating Retrieval-Augmented Generation (RAG) pipelines and managing LLM-backed agent orchestration frameworks (e.g., LangChain, AutoGen).

  • Active professional-level certifications (e.g., AWS Certified Solutions Architect Professional, Azure Solutions Architect Expert, CKA/CKAD, or cloud AI/Machine Learning specializations).

What we offer:

  • Long-term career stability with a competitive salary paid in USD.

  • Conditions for steady career development.

  • Development supported by dedicated mentors and a variety of programs focused on expertise and innovation.

  • Private medical insurance provided after successful completion of the probationary period

  • A well-equipped and cozy office supports comfort and productivity across all project stages.

  • Welcoming atmosphere and a friendly corporate culture.

If you feel this opportunity resonates with you, apply now — we’re looking forward to getting to know you!

Senior DevOps Engineer

Content

Middle

3+ years in DevOps/Platform Engineering and Linux/Windows administration, skilled in AWS/Azure, Kubernetes, IaC, GitOps, CI/CD, and DevSecOps, capable of designing and operating scalable, resilient cloud and AI infrastructures