DIRECTV is seeking an AIOps Lead (Principal, IT Software Engineer 2) who will play a crucial role in driving the adoption and execution of Artificial Intelligence for IT Operations (AIOps) practices across the organization. This individual will be responsible for leading observability standards, AIOps initiatives, automation-first strategies, leveraging AI and machine learning technologies to optimize IT operations, detecting anomalies, improving system performance, and automating incident and problem management processes. The ideal candidate will have a strong background in IT operations, SRE, a deep understanding of observability platforms and AIOps and tools, DevOps, software development and the ability to lead cross-functional teams to drive innovation in the realm of IT operations automation and monitoring. Here’s what you’ll do: Team Leadership and Guidance: Lead projects from a team of 3-4 NPW engineers dedicated to stability and observability improvements and operation efficiency. Technical lead for a team to design and develop end-to-end solutions, managing dependencies and cross-team impacts. Provide hands-on guidance and support to team members (50% hands-on, 50% managerial). Lead a team of AIOps engineers and specialists, ensuring their development, coaching, and alignment with organizational goals. Develop and report on team performance KPIs. Foster a culture of continuous learning, DevOPS excellence through regular technical sessions and internal workshops. Active participant in the development community (Business Unit) to promote best practices through educating their peers. Manage risk and request help from leadership, when necessary, to meet commitments or change directions. Observability, AIOPS Strategy and Execution: Define and implement an Observability, AIOPS strategy aligned with business objectives and an autonomous IT operations vision. Responsible for planning short term (sprint-to-sprint) and long-term (multiple PI) initiatives and organizing work and designs to meet the long-term target. Implement and optimize AI and machine learning algorithms to detect performance anomalies, predict outages, automate incident response, and improve overall operational efficiency. Implement automated workflows for proactive issue resolution, reducing manual intervention and improving operational agility. Seek opportunities to improve processes and take an automation-first approach. Lead the evaluation, selection, and deployment of AIOps platforms and tools. Design and implement cost-efficient observability and AIOps solutions across cloud and on-premise environments using a mix of commercial, open source, and CNCF solutions. Leverage data analytics and monitoring systems to generate actionable insights that improve system health, application performance, and availability. Develop internal resources and training materials to ease the adoption and implementation of AIOPS tools and practices. Cross-functional Collaboration: Work closely with IT operations, DevOps, SRE and application development teams to identify pain points and automate processes with AIOps tools and techniques. Present findings, improvements, and key metrics to senior management and stakeholders. Automation and Process Improvement: Leverage scripting, AI/ML, and automation skills for automation first approach. Embed Observability and AIOps capabilities into reusable platform services by utilizing DevOps, CI/CD, and IaC tools and practices like Terraform, Jenkins, GitHub, ArgoCD, Harness and Ansible. Technical Implementation and Management: Establish and enforce observability standards, policies, and best practices across the enterprise. Ensure compliance with regulatory and security requirements. Plan and migrate legacy tools and functions to new AIOPS approach. Develop and maintain AIOPS dashboards, extensions, applications, and workflow automation. Integrate AIOPS with tools like Jira, ServiceNow, MS Teams, Slack, xMatters, Confluence/wiki/KB and MoogSoft/BigPanda. Set up and manage observability stacks for cloud monitoring (AWS, Azure), VMs, Kubernetes, and various databases. Optimize naming conventions, management zones, alerting profiles, and tagging to align with business processes. Performance Monitoring and Reporting: Analyze and report on observability metrics, KPIs, Service Level Indicators (SLI), and Service Level Objectives (SLOs). Develop and recommend baseline monitoring thresholds, SLO, and error budgets to drive continuous improvement in MTR and Availability. What you’ll need to be successful: Educational and Professional Experience: Bachelor’s degree in computer science or engineering, or related field. 5 – 7 years required, 7+ years preferred, of experience in IT operations, DevOps, or site reliability engineering, with at least 2 years in AIOps-related roles. Strong experience with AIOps tools such as Moogsoft, BigPanda, Splunk, Dynatrace, Datadog, ServiceNow, xMatters or similar. Solid understanding of machine learning algorithms and their application in IT operations. Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes). Technical Skills: 3+ years of experience with Dynatrace SaaS, DQL, and Logs on Grail or similar. Strong scripting/automation skills in Python, Perl, Shell, and JavaScript. Experience with automation, DevOps, GitOps, CI/CD, and IaC tools (Terraform, Jenkins, GitHub, Ansible). Experience integrating and automating ITSM tools like ServiceNow, xMatters, PagerDuty, JIRA. Hands on experience in building and operating open-source observability tools like ELK, Grafana, Prometheus fluentd, fluent bit, Loki, OpenTelemetry, OpenSearch, and Thanos. Experience in designing and implementing observability and AIOPS solutions for complex, distributed systems. Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions (both frontend and backend). Experience with OS: Linux & Windows, Java, NodeJS, ReactJS, databases: Oracle, Casandra, Kafka, MuleSoft, Salesforce, networking. Expertise in incident management, monitoring systems, and ITSM processes. Leadership and Communication: 2+ years of experience leading engineering teams in Observability, SRE, Platform, Infrastructure, or Application organizations. Excellent communication, collaboration, and problem-solving skills. Proficient in developing and maintaining technical documentation, runbooks, and process. Proven track record of driving change and innovation in a fast-paced, dynamic environment. May require a background check due to job duties requiring routine access to DIRECTV and DIRECTV customer’s proprietary data. Qualified applicants with arrest and conviction will be considered for employment in accordance with local ordinances and state law. This role may require occasional travel, less than 5%. This is a remote position that can be located anywhere in the United States. #LI-Remote A career with us comes with big rewards: DIRECTV's compensation structure is designed to be market-competitive and fully supports efforts to attract and retain employees. It is the company's policy to offer pay that is competitive with other employers in the local market. Our salary ranges are determined by role, level, and location. The Base Salary range displayed below reflects the minimum and maximum target salary for each of DIRECTV's 4 (four) US Labor Market Zones. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
...aggregating data. This person:Transforms data into practical business views to drive decision making and business planning.Has... ...broader, enterprise-wide focus.Provides specialized consulting, analytical and technical support.Exercises judgment to identify, diagnose...
RN - Emergency Services (Nights) - $20,000 Sign On Bonus!! at Starr Regional Medical Center summary: The Registered Nurse in Emergency Services provides critical care by assessing patients, administering treatments, and coordinating with medical teams in a fast-paced...
...administrative support and oversee transaction management for agents #127969; Coordinate listings, update MLS, manage signage, and... ...with clients, vendors, and team members #128197; Schedule showings, inspections, and closing appointments #128187; Assist with...
The YMCA of Central Texas is seeking excited team members to join our Early Childhood PreK Extended Day program. This is a rewarding opportunity to help children in your area learn and grow through play as well as build a rewarding career working with children! The YMCA...
Founded over 40 years ago, American Leak Detection is the world leader in the accurate, non-destructive detection of hidden water, sewer, gas and swimming pool leaks. American Leak Detection sets the standard for non-invasive leak detection and repair using proprietary ...