Friday, April 26, 2024

The Evolution of an IT Professional into an AI Engineer for Top IT Companies

 


In today's rapidly evolving technological landscape, the demand for skilled professionals in artificial intelligence (AI) has reached unprecedented levels. As AI continues to transform industries and redefine the way businesses operate, there is a pressing need for IT professionals to adapt and upskill themselves to become proficient AI engineers. This essay explores the journey of an IT professional as they transition into an AI engineer and position themselves for success in top IT companies.

To embark on this transformative journey, an IT professional must first recognize the significance of AI in shaping the future of technology. AI encompasses a broad spectrum of technologies, including machine learning, natural language processing, computer vision, and robotics, among others. Understanding the fundamentals of AI and its applications across various domains lays the foundation for an IT professional's transition into this field.

The next step involves acquiring specialized skills and knowledge in AI. This may entail pursuing advanced degrees or certifications in fields such as data science, machine learning, or AI engineering. Top IT companies often value candidates with a strong educational background and practical experience in AI-related disciplines. Therefore, investing in continuous learning and professional development is crucial for staying abreast of the latest advancements in AI technology.

Furthermore, gaining hands-on experience through real-world projects and internships can significantly enhance an IT professional's proficiency in AI. Engaging in practical applications of AI, such as developing predictive models, building intelligent systems, or implementing AI-driven solutions, provides invaluable insights and skills that are highly sought after by top IT companies. Collaborating with multidisciplinary teams and leveraging cutting-edge AI tools and frameworks further enriches one's experience and expertise in this field.

Networking and building connections within the AI community also play a pivotal role in the transition process. Attending industry conferences, workshops, and meetups enables IT professionals to engage with AI experts, share knowledge, and stay updated on industry trends. Joining online forums, participating in open-source projects, and contributing to AI communities foster collaboration and create opportunities for learning and growth.

In addition to technical skills, soft skills such as problem-solving, critical thinking, and communication are indispensable for success as an AI engineer in top IT companies. AI projects often involve complex challenges that require creative solutions and effective teamwork. Being able to communicate ideas clearly, collaborate with diverse stakeholders, and adapt to evolving requirements are essential qualities that distinguish exceptional AI engineers.

Finally, positioning oneself for a career in top IT companies requires a strategic approach to career planning and self-promotion. Building a strong online presence through professional networking platforms like LinkedIn, showcasing AI projects and achievements through personal portfolios or GitHub repositories, and actively seeking out opportunities for advancement and recognition are key strategies for establishing credibility and visibility in the AI industry.

In conclusion, the journey of an IT professional to becoming an AI engineer for top IT companies is a multifaceted process that requires dedication, continuous learning, and a proactive approach to skill development and career advancement. By embracing the transformative power of AI, acquiring specialized knowledge and experience, cultivating essential soft skills, and strategically positioning themselves within the AI community, IT professionals can successfully transition into rewarding careers as AI engineers and contribute to the innovation and growth of top IT companies in the digital age.


For AI engineers, there are several certifications available that can help demonstrate expertise and proficiency in artificial intelligence and related fields. Some of the most recognized certifications for AI engineers include:

  1. AWS Certified Machine Learning - Specialty: This certification validates skills in designing, implementing, deploying, and maintaining machine learning solutions on the AWS platform.

  2. Microsoft Certified: Azure AI Engineer Associate: This certification demonstrates expertise in designing and implementing AI solutions on the Microsoft Azure platform, including natural language processing, computer vision, and machine learning.

  3. Google Professional Machine Learning Engineer: This certification showcases proficiency in designing, building, and deploying scalable machine learning models on the Google Cloud Platform.

  4. IBM Certified Data Engineer - Big Data: This certification focuses on skills related to designing, building, and optimizing data processing systems for analytics, including machine learning models.

  5. Certified AI Engineer (CAIE): Offered by various organizations, this certification typically covers a broad range of AI concepts, tools, and techniques, demonstrating proficiency in designing, developing, and deploying AI solutions.

  6. NVIDIA Deep Learning Institute (DLI) Certifications: NVIDIA offers several certifications focused on deep learning and AI, covering topics such as computer vision, natural language processing, and reinforcement learning.

  7. Certified Artificial Intelligence Professional (CAIP): This certification covers fundamental AI concepts, algorithms, and techniques, demonstrating proficiency in various areas of artificial intelligence.

These certifications vary in focus and depth, so it's essential to choose one that aligns with your career goals, interests, and level of expertise. Additionally, pursuing hands-on projects and gaining practical experience in AI development is crucial for complementing certification credentials and showcasing real-world skills to potential employers.


Tuesday, August 29, 2023

Next-Gen Cloud Computing:

 Introduction Next-Gen Cloud Computing:


In the rapidly evolving landscape of technology, cloud computing has emerged as a transformative force, reshaping the way individuals, businesses, and industries approach data storage, processing, and collaboration. The term "next-generation cloud computing" encapsulates the ongoing evolution of cloud technologies, encompassing advancements in areas such as edge computing, serverless computing, hybrid cloud models, and AI-driven automation. This essay explores the key aspects of next-gen cloud computing, its implications, benefits, and challenges, as well as its potential to drive innovation across various sectors.


Emergence of Next-Gen Cloud Computing


Next-generation cloud computing builds upon the foundation established by traditional cloud computing, which allowed users to access resources remotely and scale their operations dynamically. However, as technology needs have grown increasingly complex, next-gen cloud computing solutions have arisen to address emerging challenges.


1. Edge Computing: Bridging the Latency Gap


One significant advancement in next-gen cloud computing is edge computing, which involves processing data closer to the source rather than relying solely on centralized data centers. This approach reduces latency, enabling real-time applications such as IoT devices and augmented reality systems to operate more efficiently. Edge computing not only enhances user experience but also supports time-sensitive applications that require immediate data analysis.


2. Serverless Computing: Efficient Resource Utilization


Serverless computing, another facet of next-gen cloud computing, abstracts the underlying infrastructure from developers. This allows them to focus solely on writing code without concerning themselves with resource provisioning or management. This approach enhances resource utilization and cost-effectiveness, as users are billed only for the actual computing resources consumed during execution.


3. Hybrid and Multi-Cloud Models: Optimizing Workloads


Hybrid and multi-cloud models offer organizations the flexibility to distribute workloads across a combination of public and private clouds. This approach optimizes performance, security, and cost efficiency by allowing organizations to leverage the strengths of various cloud providers. It also mitigates vendor lock-in concerns, giving businesses greater control over their infrastructure.


4. AI-Driven Automation: Enhancing Efficiency


Artificial intelligence and machine learning are being integrated into cloud platforms to automate various tasks such as resource allocation, security monitoring, and data analysis. These AI-driven capabilities optimize system performance, reduce manual intervention, and enhance overall operational efficiency.


Benefits and Implications


The next-gen cloud computing paradigm brings forth a multitude of benefits and implications across different sectors:


1. Innovation Acceleration


Next-gen cloud computing fosters innovation by providing a scalable, cost-effective platform for experimenting with new technologies and services. Startups and developers can easily access cutting-edge tools, enabling them to focus on creativity and business logic rather than infrastructure management.


2. Industry Transformation


Industries such as healthcare, finance, and manufacturing can leverage next-gen cloud computing to streamline processes, improve decision-making through real-time analytics, and offer personalized experiences to customers. For example, remote patient monitoring in healthcare or predictive maintenance in manufacturing can be significantly enhanced through edge computing.


3. Data Security and Privacy


While next-gen cloud computing offers numerous benefits, it also raises concerns about data security and privacy. As data processing becomes more distributed, ensuring the protection of sensitive information becomes increasingly complex. Striking a balance between efficient processing and robust security measures will be pivotal.


4. Workforce Transformation


As automation becomes more prevalent due to AI-driven capabilities, the nature of the workforce might evolve. Certain roles may shift from manual tasks to overseeing and optimizing automated processes. Upskilling and reskilling efforts will be essential to equip the workforce with the necessary skills to thrive in this changing landscape.


Challenges


While next-gen cloud computing presents significant opportunities, several challenges must be addressed:


1. Connectivity Issues


Edge computing heavily relies on reliable and high-speed network connections. In areas with limited connectivity, the benefits of edge computing might not be fully realized.


2. Interoperability Complexity


Hybrid and multi-cloud models introduce challenges related to interoperability and data portability. Ensuring seamless communication between different cloud environments requires standardized protocols and interfaces.


3. Security and Compliance


Distributed processing and data storage increase the attack surface for potential cyber threats. Maintaining strong security measures and compliance with regulations across diverse cloud environments is paramount.


Conclusion


Next-gen cloud computing represents a pivotal shift in how technology is utilized and harnessed. The integration of edge computing, serverless architectures, hybrid models, and AI-driven automation is reshaping industries and driving innovation. As we navigate the benefits and challenges of this evolving paradigm, it is imperative to strike a balance between technological advancement, security, and ethical considerations. Embracing next-gen cloud computing will not only reshape the digital landscape but also empower individuals and organizations to unlock new levels of efficiency and creativity.

Sunday, January 29, 2023

Corporate Governance Beyond the written words

 

Corporate Governance – Beyond the written words

Good governance delivers good businesses and good businesses lead to a good reputation. Good corporate reputation converts good companies into great companies.
Corporate Governance – Beyond the written words

Corporate Governance, I believe goes beyond the written word and is much more than just following the book.  In fact, many-a-times its principles are unwritten. Simply put, corporate governance is a set of rules and procedures for steering corporate behaviour. It is less about policing and more about inculcating transparency and accountability, as an intrinsic part of the corporate culture.

Embedding an efficacious corporate governance framework is really a function of how well its need is understood by the three key stakeholders - Management, the Board and the Shareholders. Needless to say, tone-at-the-top is the most important ingredient.

The importance of good corporate governance is that bit more, also because of the fiduciary duty towards customers and other stakeholders, including investors, and for the trust, they repose in a company. Because in addition to asking ‘what a company does’, people today are also asking and are more interested in ‘how it does it’. 

One important skill set that every governance manager must possess, is the ability to understand the business because there is no one-size-fits-all solution for governance. While the basic key tenets remain the same and constant, the framework needs to be tailor-made for the best fit. 

Let’s talk about some key cornerstones of effective corporate governance:

  • Transparency – Transparency is a critical pillar of corporate governance which ensures that the processes and transactions of a company are open to scrutiny and verification and that the Company has nothing to hide; it means the Company has made meaningful disclosures, keeps all stakeholders updated and complies with applicable legal requirements.
  • Accountability – Accountability is a trait that helps take all actions reach the planned goals and objectives. It makes management responsible for its actions – not only for the failings, but also for all accomplishments. Positively taken, accountability brings in motivation and drives employees across the pyramid.
  • Diversity at the board level - A diverse Board is a strong Board. While many countries, including , the gender diversity continues to be in favour of men. Studies have shown that companies with women on Boards, perform better, and not just in terms of profitability. Not only gender diversity, but a diverse skill set of directors also brings in better ideas and judgement to the Board table. A healthy mix of independent and executive directors is also essential so that management and the Board work at their respective levels and there is no overlap of authority.
  • Flow of information – It’s a known fact that more information makes better decisions. It is the responsibility of the management to ensure a sufficient and relevant flow of information to the Board, which helps in effective decision making. Regular disclosures to Customers, Regulators, Shareholders etc also increase confidence and says much about the fairness and transparency levels of the company.
  • Control functions - It is important to have a clear demarcation between the three lines of defence within an organisation - business, risk & compliance and internal audit, which will ensure that conflict of interest situations are handled effectively and there is a proper mitigation of risks. All well-governed companies also require to have in place a mechanism for employees and others to raise concerns, around irregularities that they notice and a defined framework to handle these concerns.
  • Board evaluation – Stakeholders are increasingly interested in Board evaluation results, which is a direct indicator of the effectiveness of a Board and its accountability. An effective evaluation process, helps the Board, its committees and individual directors perform to their optimum capabilities. 
  • Effective delegation - The Board needs to have a well-defined charter/ terms of reference, which would enlist its roles and responsibilities and the Board room processes. The Board must also effectively delegate responsibilities to its committees, so as to allow adequate time to discharge its strategic responsibilities and provide directional advice. 

Why do we need corporate governance?

It is a very well-known fact that good governance leads to higher returns and profitability. Well-governed companies are rewarded with governance premiums and deficit leads to serious eroding of profits in the long term. A strong culture of corporate integrity is a direct contributor to sustainable growth.

Many corporate failures have revealed chips in the corporate governance framework, be it non-disclosures, lack of control mechanisms etc. Investor attention to the corporate governance of an investee company is increasing and a large number of empirical evidence indicates that well-governed companies not just attract higher market valuations, but are also able to increase capital flow, because of the trust factor.

Good governance delivers good businesses and good businesses lead to a good reputation. Good corporate reputation converts good companies into great companies.

Governance and ethics are actually two sides of the same coin and go hand-in-hand. In true sense, Ethics is really what our primary teachers taught us during moral science classes - speak the truth, be good to others, don’t hurt the environment and do your work diligently. 

Tuesday, June 28, 2022


Capacity and Performance Management

 


What Is ITIL Capacity and Performance Management?

ITIL capacity and performance management is one of 34 ITIL management practices. This service management practice falls within the service design lifecycle stage.

 

What Is the Objective of ITIL Capacity and Performance Management?

The objective of ITIL capacity and performance management is to ensure that your IT capacity meets your business needs. Satisfying current and future demand in a timely and cost-effective way is key to this ITIL practice.

 

ITIL Capacity and Performance Management Sub-Practices

Capacity and performance management is complex. ITIL breaks the practice into three sub practices. These include business, service, and component capacity management.

 

Business Capacity Management: Business capacity management is a strategic process that translates business strategy into IT service requirements. As part of IT capacity planning, business capacity management accounts for future changes to service requirements.

Service Capacity Management: Service capacity management focuses on monitoring live IT services and gathering data to identify trends. Monitoring solutions help IT teams detect usage and performance problems in order to prevent incidents from occurring.

Component Capacity Management: Component capacity management focuses on the performance, utilization, and capacity of individual technology components. For example, components include hard disk storage and internet throughput. Component capacity management can be reactive when an incident occurs or proactive based on trends that help predict how services impact component usage.

ITIL Capacity and Performance Management Roles and Responsibilities

People are fundamental to the ITIL framework. You must define appropriate roles to manage practices. ITIL capacity and performance management roles vary by business. Roles may include process owner, manager and practitioner, and service owner.

 

Capacity Manager: The capacity manager is responsible for ensuring that you have adequate IT capacity to meet service levels, communicating with IT teams about balancing capacity and demand, and optimizing capacity. This person is responsible and accountable for the overall practice, subpractices, and results.

Service Owner: The service owner is responsible for the service capacity management subprocess.

Applications Analyst: The applications analyst is responsible for the component capacity management subprocess.

Technical Analyst: The technical analyst is also responsible for the component capacity management subprocess.

ITIL V4 Capacity and Performance Management Practice Steps and Activities

The capacity and performance management team performs many tasks. These activities concern applications, hardware, and external services. Below are the team’s six major areas of responsibility, including role assignment, monitoring, analysis, and more:

 

Assign Roles and Responsibilities: Use the roles outlined above to identify the appropriate team member for each role. It is not uncommon for one person to wear multiple hats. 

Research and Monitor Current Service Performance: Monitor and collect data associated with your company’s cloud services, end-user devices, networks, servers, and storage devices.

Perform Capacity and Performance Modeling: Identify trends that help you predict future capacity requirements, and then build models based on those expected changes. 

Analyze Capacity Requirements: Evaluate your current capacity in the context of your future needs, and then calculate the impact such changes will have on your business and services.

Forecast Demand and Plan Resources: Understand and anticipate the growth or shrinkage in demand for IT services, and then apply infrastructure resources accordingly while also reducing costs. 

Plan Performance Improvements: Develop your capacity management plan so that it satisfies your infrastructure and resource efficiency requirements. 

 

 

The ITIL capacity and performance management planning template helps you anticipate future capacity requirements. The template contains examples of both capacity planning and business impact information. It includes the data that most professionals use when planning for future capacity.

 

Best Practices for ITIL Capacity and Performance Management

Industry experts emphasize the importance of learning the ITIL framework as part of the overall IT initiative. See the additional best practices below.

 

Professor Gladstone provides two fundamental tips when it comes to ITIL capacity and performance management: “Learn ITIL, and ensure that you have data collection, monitoring, alerts, and reporting in place for the components/services/businesses you support.” Regarding the value that tools bring to the practice, he says, “There are lots of very sophisticated tools out there for these processes, but when it comes right down to it, you may be able to do the job with a competent data manipulation tool (i.e., a very good spreadsheet).”

 

Benefits of ITIL Capacity and Performance Management

Technological demands shift with business growth, new projects, and ad hoc work. ITIL capacity and performance management ensures that resources function regardless of the volatility of those demands. The practices below help companies remain productive amid constant change.

 

Larry Klosterboer is a certified IT architect who specializes in systems management at IBM’s Technology Integration Management Center in Austin, Texas. In his book, ITIL Capacity Management, he suggests, “The more processes you implement and the more tightly you integrate them, the more benefit your organization sees.”

 

ITIL capacity and performance management reduces potential downtime through planning: “You can start the analysis of potential benefits by reviewing your incident tickets over the past year or so. If you have a way to record capacity and performance-related tickets, focus there first. If you don’t have that capability yet, look for tickets on which the description field includes ‘slow down,’ ‘delay,’ or ‘response time,’” Klosterboer recommends. “Assume that you could have eliminated half of those tickets through better capacity planning, and you can start to project what the benefit would be to your organization.”

 

These benefits range from making more informed decisions to improving performance and reducing costs.

 

How Is Capacity Management Implemented in ITIL?

ITIL practices focus on delivering end-to-end service. You do not implement ITIL. Instead, you use it as a framework to guide your IT organization.

 

Whether you are following an older version of ITIL (v3) or the latest evolution (v4), you should follow the best practices above, lean into executive support, and focus on changing attitudes and behaviors. Most important, focus on customer outcomes and business value. ITIL is not a start-to-finish process; it is about continual improvement.

 

Maximize Your ITIL Capacity Management and Planning

 

Empower your people to go above and beyond with a flexible platform designed to match the needs of your team — and adapt as those needs change. 

 

capture, manage, and report on work from anywhere, helping your team be more effective and get more done. Report on key metrics and get real-time visibility into work as it happens with roll-up reports, dashboards, and automated workflows built to keep your team connected and informed. 

 

When teams have clarity into the work getting done, there’s no telling how much more they can accomplish in the same amount of time.

 


Tuesday, February 9, 2021



ITIL Service Level Agreements or SLAs are recognized as an essential requirement to delivering consistent IT services to the customer community, irrespective if they are internal or external customers of the organization. What is often neglected when considering the establishment of the SLA is the corresponding necessity for the Operating Level Agreements or OLAs. ITIL pays little attention or guidance to the production of OLAs, caring to focus more on the production of the SLA.
 
ITIL Service Level Management or SLM is a critical component of every IT organization delivering services, but underpinning the success of the SLA it is imperative that the IT organization or department works collectively to deliver and support the IT services to the customer base as defined within the ITIL Service Catalogue. The OLA ensures that each department's role and responsibilities and interactions with each of the other teams, groups and suppliers within the IT department are formalized, documented, and agreed to.
 
The OLA and SLA work 'hand in glove' and ensure a consistent understanding throughout the IT organization, furthermore, the new-comer to the department is able to 'come up to speed' quicker and therefore be more effective in the manor in which the team, group, and department operate - leaving little to chance. As a 'living document' it should be updated on a regular basis to ensure it remains effective and where possible it should be published on the organizations' intranet.
 
Probably one of the best examples as to where the OLA is most effective is when it explains the response time of the IT support teams to the various 'severities' assigned to Incidents. The response time stated in the OLA should be less than that stipulated in the SLA and as agreed with the customer, to ensure that IS wherever possible are able to meet its service levels.
 

In summary, the Service Level Agreement is customer-facing and supports the services offered by the IT department. The Operating Level Agreement permits the various teams, groups, and suppliers to work cohesively together to deliver the IT services in support of the SLA.

Service Level Agreements (SLAs) defining the quality attributes (QoS - Quality of Service) and guarantees a service is required to process, are of growing commercial interest with a deep impact on the strategic and organizational processes, as many research studies and intensified interest in accepted management standards like ITIL v4 show. They are used in all areas of IT reaching from hosting or communication services to help desk or problem resolution. A well-defined and effective SLA correctly fulfills the expectations of all participants and provides metrics for accurately measuring performance to the guaranteed Service Level (SL) objectives. During the monitoring and enforcement phase, the defined metrics will be used to detect violations to the promised SLs and to derive consequential activities in terms of rights and obligations. They play a key role in metering, accounting and reporting and provide data for further analysis and refinement of SLAs in the analysis phase. SLA metrics are defined from a variety of disciplines, such as business process management, service and application management, or traditional systems and network management. 

Different organizations have different definitions for crucial IT parameters such as Availability, Throughput, Downtime, Response Time, etc, for example some focus on the infrastructure (TCP connections) to define service availability, while others refer to the service application (ability to access the service application). Ambiguity, unfulfilled expectations and problems during the accomplishment of SLAs are the result. A poor choice of metrics will result in SLAs that are difficult to enforce automatically and may motivate the wrong behaviour. Currently, practitioners have almost no support in selecting the appropriate metrics for the implementation of successful SLAs (in terms of automation and compliance with the service objects and IT management processes) in order to automatically gauge the service performance. The paper does not attempt to define an exhaustive list of metrics that should be included in a SLA - the topic is too large by the enormous number of potential metrics and it varies as seen before from organization to organization and service to service. We propose a general categorisation scheme for typical metrics for basic service objects and IT management processes and populate it with metrics which commonly appear in SLAs. The metrics are derived from industrial requirements, i.e. they are taken from SLAs currently in use in an effort to provide realistic terms that are both useful and usable in particular for the automation of SLAs. To our knowledge, this is a first-of-a-kind approach and a multi-dimensional categorization of SLA contents and metrics is missing in literature. The contribution of the categorization is manifold. It supports SLA engineers in their design decision in particular concerning the specification of SLAs which are intended to be monitored and enforced automatically. During execution time it might contribute in root causes analysis identifying problems such as infrastructure instability, low-performance levels of service objects or poorly designed, critical IT processes for which responsible persons can be derived. Furthermore, it might be used to analyse existing SLAs indicating the extent to which an SLA is already oriented towards ITIL and if there is improvement potential.


Service Level Agreements This section gives an insight into Service Level Agreements and in general IT service contracts. It categorizes different types of service contracts, presents the main component parts and defines the goals in order to reach a common understanding. We first start with the definition of some terms used throughout the paper: • SLA metrics are used to measure the performance characteristics of the service objects. They are either retrieved directly from the managed resources such as servers, middleware or instrumented applications or are created by aggregating such direct metrics into higher-level composite metrics. Typical examples of direct metrics are the MIB variables of the IETF Structure of Management Information (SMI) such as number of invocations, system uptime, outage period or technical network performance metrics such as loss, delay, utilization etc. which are collected via measurement directives such as management interfaces, protocol messages, URIs etc. Composite metrics use a specific function averaging one or more metrics over a specific amount of time, e.g. average availability, or breaking them down according to certain criteria, e.g. maximum response time, minimum throughput, top 5%, etc. • Service Levels and Guarantees a.k.a. SLA rules represent the promises and guarantees with respect to graduated high/low ranges, e.g., average availability range [low: 95% , high: 99%, median: 97%], so that it can be evaluated whether the measured metrics exceed, meet or fall below the defined service levels at a certain time point or in a certain validity period. They can be informally represented as if-then rules which might be chained in order to form graduations, complex policies, and conditional guarantees, e.g., conditional rights and obligation with exceptions, violations, and consequential actions: “If the average service availability during on month is below 95% then the service the provider is obliged to pay a penalty of 20%.”. • IT Management Processes / ITIL Processes are IT management processes defining common practices in areas such as Incident, Problem, Configuration, Change or Service Level Management. • SLA (Service Level Agreement): An SLA is a document that describes the performance criteria a provider promises to meet while delivering a service. It typically also sets out the remedial actions and any penalties that will take effect if performance falls below the promised standard. It is an essential component of the legal contract between a service consumer and the provider. According to the Hurwitz Group the life cycle of an SLA is defined as follows: 28 The objectives of SLAs are manifold. In a nutshell, the substantial goals are: [Pa04] • Verifiable, objective agreements • Know risk distribution • Trust and reduction of opportunistic behavior • Fixed rights and obligations • Support of short and long term planning and further SLM processes • Decision Support: Quality signal (e.g. assessment of the new market participants) According to their intended purpose, their scope of application or their versatility SLAs can be grouped into different (contract) categories, e.g. Table 1: SLA categorization Intended Purpose Basic Agreement Defines the general framework for the contractual relationship and is the basis for all subsequent SLAs inclusive the severability clause. Service Agreement Subsumes all components which apply to several subordinated SLAs. Service Level Agreement Normal Service Level Agreement Operation Level Agreement (OLA) A contract with internal operational partners, which are needed to fulfill a superior SLA. Underpinning Contract (UC) A contract with an external operational partner, which are needed to fulfill a superior SLA. Scope of Application (according to Internal Agreement Rather an informal agreement than a legal contract In-House Agreement Between internal department or divisions External Agreement Between the service provider and an external service consumer Multi-tiered Agreement Including third parties up to a multitude of parties 1. SLA Design 2. Assign SLA owner 3. Monitor SLA compliance 4. Collect and analyze data 5. Improve the service provided 6. Refine SLA Fig. 1 SLA life cycle [St00] 29 Versatility (according to [Bi01]) Standard Agreement Standard contract without special agreements Extensible Agreement Standard contract with additional specific agreements Individual Agreement Customized, individual agreements Flexible Agreement Mixture of standard and individual contract A particular service contract might belong to more than one category, e.g. an Operation Level Agreement (OLA) might also be an individual in-house agreement. Several service contracts can be organized in a unitized structure according to a taxonomical hierarchy: Service Level Agreements come in several varieties and comprise differently technical, organizational or legal components. Table 2 lists some typical contents. Table 2: Categorization of SLA contents Technical Components Organizational Components Legal Components - Service Description - Service Objects - SLA/QoS Parameter - Metrics - Actions … - Liability and liability limitations - Level of escalation - Maintenance / Service periods - Monitoring and Reporting - Change Management … - Obligations to co-operate - Legal responsibilities - Proprietary rights - Modes of invoicing and payment.


 SLA Metrics In order to develop a useful categorization scheme for IT metrics, we have spoken to close to three dozen IT service providers from small-and medium-sized enterprises to big companies and we have analyzed nearly fifty state-of-the-art SLAs currently used throughout the industry in the areas of IT outsourcing, Application Service Provisioning (ASP), Hardware Hosting, Service Suppliers and many other. One of the biggest problems we identified is the lack of rapport between metrics and service objects/IT processes as well as the lack of automation in SLA management and monitoring which is directly influenced by the underlying metrics and their ability to be automated. According to this observation we use three major categories to structure the field of SLA metrics: The service objects under consideration, ITIL processes and automation grade. The first category distinguishes basic service objects such as hardware, software, network etc. Composite metrics such as end-to-end availabilities can be broken down in smaller direct metrics which are assigned to one of these basic object types. The second category is organized around the eleven ITIL management processes. This leads to clear responsibilities and procedures and the metrics might reveal potential for process optimization. The last category deals with the question of measurability and therefore implicitly with the automation of metrics. It helps to find “easy-tocollect” metrics and to identify problematic SLA rules in existing SLAs, i.e. rules with metrics which can be measured manually only or which can not be measured at all. 31 In a nutshell, each category gives answers to different questions relating to design, implementation and analysis of SLAs, such as “Which metrics can be used for a particular service object?”, “Can the metric be automatically measured and what are the possible units?” or “Does a particular SLA sufficiently support the ITIL processes or is there improvement potential in terms of missing metrics?” etc. Furthermore, the combination of the categories helps to identify dependencies between used SLA resources and the performance of management processes and management tools.


Categorization according to Service Objects Although the particular end-to-end service objects may differ considerably among SLAs, they can mostly be reduced to five basic IT object classes, namely: Hardware, Software, Network, Storage and Help Desk (a.k.a. Service Desk). The respective instances can be combined in every combination in order to form complex and compound services such as e.g. ASP solutions including servers (hardware), applications such as SAP (software), databases or data warehouse (storage) and support (help desk). Each object class has its own set of typical quality metrics. In the following we present useful metrics in each class and give examples for their units.  




Friday, April 5, 2019

Cisco ACI Guide for Humans

Cisco ACI Guide for Humans, Part 1: Physical Connectivity

First of all, I need to explain why I decided to write such a post. It's quite simple to everyone who ever tried to Deploy/Configure/Understand how Cisco ACI works using the official Cisco Documentation. Cisco ACI is a very powerful architecture, and once you learn it - you start loving it. My assumption is that for some reason, Cisco seems to have hired the App Development experts to develop the ACI GUI and the ACI design and configuration guides, and the final product turned out to be hard to digest to both, DevOps and Networking professionals. That is why I feel there is a need to explain the concepts in a way more easy to understand for us, humans.

TIP: APIC maintains an audit log for all configuration changes to the system. This means that all the changes can be easily reverted.
Before the ACI installation starts, we need to connect every ACI controller (APIC) to 2 Leafs. There should be 3 or 5 APICs, for high availability, and a standard procedure, once the cabling is done, should be:

  • Turn ON and perform a Fabric Discovery.
  • Configure Out-of-Band Management.
  • Configure the NTP Server in Fabric Policies -> POD Policies” menu. This is very important, because if the Fabric and the Controllers are in different time zones for example, the ACI wont synchronise correctly.


Once the Fabric Discovery is done, you need to enter the mgmt tenant, and within the Node Management Addresses create the Static Entries for all your nodes. In our case, we have 3 nodes: Spine (201) and 2 Leafs (101 and 102). This means that since the nodes are not consecutive, you should create 2 Static Entries, one for nodes 101-102, and the second one for the node 201. You should choose the “default” Node Management EPG for now, and you will end up with:




When we are looking at a real world ACI deployment, in the Typical Migration Scenario a client would want us to migrate 2 different environments:
- Virtual Environment, where we would need to first define all VM types and "group" them (define EPGs).
- Physical Environment.
Once we have the environments defined, we need to build the ANPs (Application Network Profiles), where we will group all the EPGs that need to inter-communicate.
Once we did the initial design, we need to make a list of all the tasks we need to do, and start building up the Tenants. Be sure you understand what Infra and Common tenants are before you start planning the Configuration. Configuration objects in the Common tenant are shared with all other tenants (things that affect the entire fabric):
- Private Networks (Context or VRF)
- Bridge Domains
- Subnets

1. Physical Connectivity/Fabric Policies
The communication with the outside world (external physical network) starts by a simple question: Who from the outside world needs to access the "Service" (ANP in the ACI "language"). Once we have this answered, we need to define a EPG with these users. Let´s say the financial department needs to access the ANP, which is a Salary Application. We will create the EPG called "Financial_EPG" which might be an External L2 EPG where we group all the guys from Finances. This EPG will access the Financial Application Web Server, so the Financial_Web_EPG will need a PROVIDER CONTRACT allowing the access to Financial_Department_EPG.
Domains are used to interconnect the Fabric configuration with the Policy configuration. Different domain types are created depending on how a device is connected to the leaf switch. There are four different domain types:
- Physical domains, for physical servers (no hypervisor).
- External bridged domains, for a connection to L2 Switch via dot1q trunk.
- External routed domains, for a connection to a Router/WAN Router.
- VMM domains, which are used for Hypervisor integration. 1 VMM domain per 1 vCenter Data Center.
The ACI fabric provides multiple attachment points that connect through leaf ports to various external entities such as baremetal servers, hypervisors, Layer 2 switches (for example, the Cisco UCS fabric interconnect), and Layer 3 routers (for example Cisco Nexus 7000 Series switches). These attachment points can be physical ports, port channels, or a virtual port channel (vPC) on the leaf switches.
VLANs are instantiated on leaf switches based on AEP configuration. An attachable entity profile (AEP) represents a group of external entities with similar infrastructure policy requirements. The fabric knows where the various devices in the domain live and the APIC can push the VLANs and policy where it needs to be. AEPs are configured under global policies. The infrastructure policies consist of physical interface policies, for example, Cisco Discovery Protocol (CDP), Link Layer Discovery Protocol (LLDP), maximum transmission unit (MTU), and Link Aggregation Control Protocol (LACP). A VM Management (VMM) domain automatically derives the physical interfaces policies from the interface policy groups that are associated with an AEP.
VLAN pools contain the VLANs used by the EPGs the domain will be tied to. A domain is associated to a single VLAN pool. VXLAN and multicast address pools are also configurable. VLANs are instantiated on leaf switches based on AEP configuration. Forwarding decisions are still based on contracts and the policy model, not subnets and VLANs. Different overlapping VLAN pools must not be associated with the same attachable access entity profile (AAEP).

The two types of VLAN-based pools are as follows:

  • Dynamic pools - Managed internally by the APIC to allocate VLANs for endpoint groups (EPGs). A VMware vCenter domain can associate only to a dynamic pool. This is the pool type that is required for VMM integration.
  • Static pools - The EPG has a relation to the domain, and the domain has a relation to the pool. The pool contains a range of encapsulated VLANs and VXLANs. For static EPG deployment, the user defines the interface and the encapsulation. The encapsulation must be within the range of a pool that is associated with a domain with which the EPG is associated.

An AEP provisions the VLAN pool (and associated VLANs) on the leaf. The VLANs are not actually enabled on the port. No traffic flows unless an EPG is deployed on the port. Without VLAN pool deployment using an AEP, a VLAN is not enabled on the leaf port even if an EPG is provisioned. Infrastructure VLAN is required for AVS communication to the fabric using the OpenFlex control channel.

Now that this is all clear, we can configure, for example, a Virtual Port Channel between our Leaf Switches and an external Nexus Switch. In our case, we are using the Nexus5548 (5.2). Physical Connectivity to ACI will generally be handled using the Access Policies. There is a bit non-intuitive procedure that needs to be followed here, so lets go through it together:

1.1 Create the Interface Policies you need.
You only need to create the Interface Policies if you need a Policy on the Interface that is different then the Default policy. For example, the default LLDP state is ENABLE, so if you want to enable the LLDP – just use the default policy. In this case you will most probably need only the Port-Channel Policy, because the Default Port-Channel policy enables the “ON” mode (Static Port-Channel).

1.2 Create the Switch Policy.
This is the step where you will have to choose the Physical Leaf Switches where you need to apply your Policy. In our case we will choose the both Leaf Switches (101 and 102). This is done under Switch Policies -> Policies -> Virtual Port Channel Default.

1.3 Create the Interface Policy Group.
In this step you will need to create the Group that gathers the Interface Policies you want to use on the vPC. This means that we need to create a vPC Interface Policy Group and

1.4 Create the Interface Profile.
This is the step that will let you specify on which ports the vPC will be configured. In our case we want to choose the interface e1/3 of each Leaf.

1.5 Create the Switch Profile.
Switch Profile lets you choose the exact Leaf Switches you want the policy applied on, and select the previously configured Interface Profile to specify the vPC Interfaces on each of those leaf switches.
Check if everything is in order:

Nexus# show port-channel summary

3     Po3(SU)     Eth      LACP      Eth1/17(P)   Eth1/18(P)

Leaf1# show vpc ext
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 10
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : Disabled
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : primary
Number of vPCs configured         : 1
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Enabled (timeout = 240 seconds)
Operational Layer3 Peer           : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans
--   ----   ------ --------------------------------------------------
1           up     -

vPC status
---------------------------------------------------------------------------------
id   Port   Status Consistency Reason               Active vlans Bndl Grp Name
--   ----   ------ ----------- ------               ------------ ----------------
1    Po1    up     success     success              -            vPC_101_102


IMPORTANT: ID and port-channel number (Po#) are automatically created and will vary. Notice no active VLANs. They will appear once you have created and associated an AEP.
Multicast is also allowed in the ACI Fabric, the MCAST trees are built, and in the case of failure there is a FRR (Fast Re-Route). The ACI Fabric knows the MCAST tree, and drops the MCAST exactly on the ports of the leaf Switches where the MCAST frames are supposed to go.  This might be a bit confusing, when you consider that ACI actually STRIPS the encapsulation to save the Bandwidth when the frame gets to the Leaf Port (this applies to all external encapsulations: dot1x, VxLAN, nvGRE…), and adds it back on when the “exit” leaf needs to forward the frame to the external network.

2. Tenant(s) and 3. VRF are the concepts that I think are clear enough even from the official Cisco documentation, so I wont go too deep into it.

4. Bridge Domains and EPGs
Once you create the Bridge Domain, you need to define the Subnets that will reside within the Bridge Domain. These Subnets are used as the Default Gateway within the ACI Fabric, and the Default Gateway of the Subnet is equivalent to the SVI on a Switch.
In our case we created a Bridge Domain called “ACI_Local_BD”, and decided to interconnect 2 Physical PCs with different subnets, and see if they can ping if we put them in the same EPG. In order to do this we created the following Subnets within the EPG:

  • 172.2.1.0/24 with the GW 172.2.1.1 (configured as a Private IP on ACI Fabric, and as the Principal GW within the Bridge Domain)
  • 172.2.2.0/24 with the GW 172.2.2.1 (configured as a Private IP on ACI Fabric)




Once we have the BD and the Subnets created, we need to define the EPG(s). Since in our case we are treating the Physical Servers, we know exactly what physical port of each Leaf they are plugged to. This means that the easiest way to assign the Physical Servers to the EPG is to define the Static Bindings.

IMPORTANT: If you use the Static Bindings (Leafs), all the ports within the Leaf you configure will statically belong to the EPG.

In our case we configured the ports e1/21 and e1/22 of the Leaf1, and the port e1/21 of the Leaf 1 (Node1), as shown on the screenshot below.

TIP: In one moment you will need to manually define the encapsulation of the traffic coming from this Node within the ACI Fabric. This is not the number of the Access VLAN on the Leaf port that VLAN will be locally assigned by the Leaf. This is a VLAN that needs to be from the VLAN Pool you defined for the Physical Domain.



Now comes the “cool” part (at least for the Networking guys). We will check what is happening with the VLANs on the Leaf Switches.

Leaf2# show vlan extended

 VLAN Name                             Status    Ports
 ---- -------------------------------- --------- -------------------------------
 7    infra:default                    active    Eth1/1, Eth1/5
 8    Connectivity_Tests:ACI_Local_BD  active    Eth1/21, Eth1/22
 9    Connectivity_Tests:Logicalis_Int active    Eth1/22
      ernal:Portatiles_Logicalis
 10   Connectivity_Tests:Logicalis_Int active    Eth1/21
      ernal:Portatiles_Logicalis

 VLAN Type  Vlan-mode  Encap
 ---- ----- ---------- -------------------------------
 7    enet  CE         vxlan-16777209, vlan-4093
 8    enet  CE         vxlan-16121790
 9    enet  CE         vlan-502
 10   enet  CE         vlan-501



Leaf1# show vlan ext

 VLAN Name                             Status    Ports
 ---- -------------------------------- --------- -------------------------------
 7    infra:default                    active    Eth1/1, Eth1/5
 10   Connectivity_Tests:ACI_Local_BD  active    Eth1/21
 11   Connectivity_Tests:Logicalis_Int active    Eth1/21
      ernal:Portatiles_Logicalis

 VLAN Type  Vlan-mode  Encap
 ---- ----- ---------- -------------------------------
 7    enet  CE         vxlan-16777209, vlan-4093
 10   enet  CE         vxlan-16121790
 11   enet  CE         vlan-502         
                                                                                                                                                 
First of all, have in mind that the VLANs have only the local importance on the Switch; they are NOT propagated within the ACI Fabric. Notice the following VLANs in the previous output:
-        VLAN 7: The default infra VLAN. This VLAN has no importance at all. The important part of the output is the column “Encapsulation”, where the VxLAN 16777209 and VLAN 4093 (Default Infrastructure for real) appear. These 2 entities carry the traffic between Spines and the Leafs.
-        VLANs 8, 9, 10 and 11 are also not important for ACI, only for the Leafs. This means that on the Leaf Ports there is a “Switchport access VLAN 8” command configured. The important parts are the VLANs 501 and 502, which carry the traffic within the ACI Fabric.
If you focus on how the local leaves VLANs are named, you will figure out the following structure: Tenant -> ANP -> EPG. This is done by the ACI, t give you a bettwe preview of what these local VLANs are for.

5. ANP and 6. Contracts will not be explained at this moment.

7. Virtual Machine Manager Integration
Virtual Machine Manager Domain or VMM Domain - Groups VM controllers with similar networking policy requirements. For example, the VM controllers can share VLAN or Virtual Extensible Local Area Network (VXLAN) space and application endpoint groups (EPGs).
The APIC communicates with the controller to publish network configurations such as port groups that are then applied to the virtual workloads.
Note: A single VMM domain can contain multiple instances of VM controllers, but they must be from the same vendor (for example, from VMware or from Microsoft).
The objective here is to create a VMM Domain. Upon creating the VMM domain, APIC will populate the datacenter object in vCenter with a virtual distributed switch (VDS).  You need to create a VLAN pool to be associated with the VMM domain. Have in mind that the VLAN Pools configuration is Global to the ACI Fabric because the VLANs apply to physical Leaf Switches, and they are configured at "Fabric-Access Policies-Pools" menu.
Apart from this, you will need to actually create the VMM Domain (VM Networking Menu), and define the Hypervisor IP and credentials and associate the previously created AEP to your VMM Domain. Once you have the VMM Domain created and all the hosts in the new VDS, you need to associate your EPGs with the VMM Domain, in order to add the Endpoints from the Hypervisor to the EPG.

TIP: Don´t forget that you need to add the ESXi hosts to your newly created VDS manually, from vSphere.

8. RBAC  - Works exactly the same like a RBAC (Role Based Access Policy) on any Cisco platform.

9. Layer 2 and 3 External Connectivity
L2 Bridge: Packet forwarding between EP in bridge domain “BD1” and external hosts in VLAN 500 is a L2 bridge.

IMPORTANT: We need one external EPG for each L2 external connection (VLAN).
Trunking multiple VLANs over the same link requires multiple L2 External EPGs, each in a unique BD. Contract required between L2 external EPG EPG and EPG inside ACI fabric

10. Layer 4 to Layer 7 Services/Devices [Service Function Insertion]
There are 3 major steps we need to perform in order to integrate an external L4-7 Service with ACI:
  •        Import the Device package to ACI.
  •        Create the logical devices
  •        Create the concrete devices

The APIC uses northbound APIs for configuring the network and services. You use these APIs to create, delete, and modify a configuration using managed objects. When a service function is inserted in the service graph between applications, traffic from these applications is classified by the APIC and identified using a tag in the overlay network. Service functions use the tag to apply policies to the traffic. For the ASA integration with the APIC, the service function forwards traffic using either routed or transparent firewall operation.

The Evolution of an IT Professional into an AI Engineer for Top IT Companies

  In today's rapidly evolving technological landscape, the demand for skilled professionals in artificial intelligence (AI) has reached ...