Oracle Unity brings together online, offline, and third-party customer data sources to create a single, dynamic view of the customer. With built-in artificial intelligence (AI) and machine learning, Oracle Unity derives and delivers timely intelligence about your customers so you can optimize their brand experience across marketing, sales, and service. Embedded within Oracle Customer Experience Cloud, Oracle Unity is open and extensible for integrating actionable intelligence into partner and ecosystem applications for the fastest time to value
- Work with engineering teams to design robust cloud-based architectures and redundant, fault tolerant solutions utilizing practices around CICD, blue-green deployments, canary testing, and traffic management.
- Define non-functional requirements (NFRs) for engineering teams around security, logging, monitoring, alerting, configuration, and testing and work with those teams in their implementations of apps and services
- Develop runbooks and standard operating procedures (SOPs) for each service and application to ensure DevOps and SRE teams can detect incidents or issues before customers are impacted and act quickly to restore impacted services.
- Define practices and procedures around postmortems and root cause analysis to ensure service quality and maintainability KPIs are improving and downtime and service interruption are negligible.
- Ensure that day-to-day operational requirements and SLAs are met
- Capacity planning and Hardware requests
- Ensure production security standards are followed
- Ensure monitoring is robust and effective
- Zero downtime deployments and a high availability mindset
- Tackle problems both at the large scale and the small scale, with constant focus on optimization, high availability, and security as it relates to the CI/CD process
- Experience working with and managing multiple Kubernetes clusters, preferably with federation
- Certified Kubernetes Administrator (CKA) a plus
- Experience integrating CI/CD feedback with code review systems like GitLab and group chat software such as Slack or Mattermost
- Experience working with SALT and Terraform for configuration management, automation and IaC
Qualification & Experience:
- 12+ years in a technical role such as senior engineer, lead, or architect in SW engineering, DevOps, or SRE functions
- 8+ years of experience being responsible for the uptime and reliability of customer facing web applications, critical services or mobile systems.
- 8+ years of experience maintaining and administrating large scale Linux based environments with best practices for security and automation.
- 8+ years of experience providing and maintaining cloud based infrastructure such as AWS, GCP, Azure with broad experience in Infrastructure as Code (IaC) solutions such as Terraform, Terragrunt, Atlantis etc.
- 5+ years implementing and maintaining monitoring and alerting systems, creating service level indicators (SLIs), service level objectives (SLOs), and focusing on systems that self-heal or alert teams to take action before system downtime.
- 5+ years designing and operating fault tolerant systems, with zero to no downtime.
- 3 + years of working and implementing the compliance architectures/protocols for SOC2, PCI, GDPR, ISO and others as needed to ensure compliance for the application and the infrastructure.
- Expert knowledge of Kubernetes (K8S) and distributed computing, containers, and scaling
- Expert knowledge of network architectures, security, and troubleshooting of connectivity or latency issues.
Vacancy Type: Full Time
Job Functions: Sales Business Development
Job Location: San Francisco, CA, US
Application Deadline: N/A