Showing posts with label api. Show all posts
Showing posts with label api. Show all posts

December 15, 2021

Day 15 - Introduction to the PagerDuty API

By: Mandi Walls (@lnxchk)
Edited by: Joe Block (@curiousbiped)

Keeping track of all the data generated by a distributed ecosystem is a daunting task. When something goes wrong, or a service isn’t behaving properly, tracking down the culprit and getting the right folks enabled to fix it is also challenging. PagerDuty can help you with these challenges.

The PagerDuty platform integrates with over 600 other components to gather data, add context, and process automation. Under the hood of all of these integrations is the PagerDuty API, ready to help you programmatically interact with your PagerDuty account.

What’s Exposed Via the API

The PagerDuty API provides access to all the structural objects in your PagerDuty account - users, teams, services, escalation policies, etc - and also to the data objects including incidents, events, and change events.

For objects like users, teams, escalation policies, schedules, and services, you may find using the PagerDuty Terraform Provider will help you maintain the state of your account more efficiently without using the API directly.

The other object types in PagerDuty are more useful when we can send them anytime from anywhere, including via the API from our own code. Let’s take a look at three of them: incidents, events, and change events. If you’d like a copy of the code for these examples, you can find them on Github.

API Basics

To write new information into PagerDuty via the API, you'll need some authorization. You can use OAuth, or create an API key. There are account-level and user-level API keys available. You'll use the account-level keys for the rest of the examples here and keep things simple.

To create a key in your PagerDuty app, you'll need Admin, Global Admin, or Account Owner access to your account. More on that here.

In PagerDuty, navigate to Integrations and then chose API Access Keys. Create a new key, give it a description, and save it somewhere safe. The keys are strings that look like y_NbAkKc66ryYTWUXYEu.

Now you’re ready to generate some incidents! These examples use curl, but there are a number of client libraries for the API as well.

Incidents

Incidents are probably what you’re most familiar with in PagerDuty - they represent a problem or issue that needs to be addressed and resolved. Sometimes this includes alerting a human responder. Many of the integrations in the PagerDuty ecosystem generate incidents from other systems and services to send to PagerDuty.

In PagerDuty, incidents are assigned explicitly to services in your account, so an incoming incident will register with only that service. If your database has too many long-running queries, you want an incident to be assigned to the PagerDuty service representing that database so responders have all the correct context to fix the issue.

If you have a service that doesn’t have an integration out of the box, you can still get information from that service into PagerDuty via the API, and you don’t need anything special to do it. You can send an incident to the API via a curl request to the https://api.pagerduty.com/incidents endpoint.

There are three required headers for these requests, Accept, Content-Type and From, which needs to be an email address associated with your account, for attribution of the incident. Setting up the request will look something like:


curl -X POST --header 'Content-Type: application-json' \
--url https://api.pagerduty.com/incidents \
--header 'Accept: application/vnd.pagerduty+json;version=2' \
--header 'Authorization: Token token=y_NbAkKc66ryYTWUXYEu' \
--header 'From: [email protected]' \

Now you need the information bits of the incident. These will be passed as --data in the curl request. There are just a few required pieces to set up the format and a number of optional pieces that help add context to the incident.

The most important piece you'll need is the service ID. Every object in the PagerDuty platform has a unique identifier. You can find the ID of a service in its URL in the UI. It will be something like https://myaccount.pagerduty.com/service-directory/SERVICEID.

Now you can create the rest of the message with JSON:


curl -X POST --header 'Content-Type: application/json' \
--url https://api.pagerduty.com/incidents \
--header 'Accept: application/vnd.pagerduty+json;version=2' \
--header 'Authorization: Token token=y_NbAkKc66ryYTWUXYEu' \
--header 'From: [email protected]' \
--data '{
  "incident": {
    "type": "incident",
    "title": "Too many blocked requests",
    "service": {
      "id": "PWIXJZS",
      "summary": null,
      "type": "service_reference",
      "self": null,
      "html_url": null
    },
    "body": {
      "type": "incident_body",
      "details": "The service queue is full. Requests are no longer being fulfilled."
    }
  }
}'

When you run this curl command, it will generate a new incident on the service PWIXJZS with the title "To many blocked requests", along with some context in the "body" of the data to help our responders. You can add diagnostics or other information here to help your team fix whatever is wrong.

What if there is information being generated that might not need an immediate response? Instead of an incident, you can create an event.

Events

Events are non-alerting items sent to PagerDuty. They can be processed via Event Rules to help create context on incidents or provide information about the behavior of your services. They utilize the PagerDuty Common Event Format to make processing and collating more effective.

Events are registered to a particular routing_key via an integration on a particular service in your PagerDuty account. In your PagerDuty account, select a service you'd like to send events to, or create new one to practice with. On the page for that service, select the Integrations tab and Add an Integration. For this integration, select "Events API V2" and click Add. You'll have a new integration on your service page. Click the gear icon, and copy the Integration Key. For the full walkthrough of this setup, see the docs.

The next step is to set up the event. The request is a little different from the incident request - the url is different, the From: header is not required, and the authorization is completely handled in the routing_key instead of using an API token.

The content of the request is more structured, based on the Common Event Format, so that you can create event rules and take actions if necessary based on what the events contain.



curl --request POST \
  --url https://events.pagerduty.com/v2/enqueue \
  --header 'Content-Type: application/json' \
  --data '{
  "payload": {
    "summary": "DISK at 99% on machine prod-datapipe03.example.com",
    "timestamp": "2021-11-17T08:42:58.315+0000",
    "severity": "critical",
    "source": "prod-datapipe03.example.com",
    "component": "mysql",
    "group": "prod-datapipe",
    "class": "disk",
    "custom_details": {
      "free space": "1%",
      "ping time": "1500ms",
      "load avg": 0.75
    }
  },
  “event_action”: “trigger”,
  "routing_key": "e93facc04764012d7bfb002500d5d1a6"
}'

Change Events

A third type of contextual data you can send to the API is a Change Event. Change events are non-alerting, and help add context to a service. They are informational data about what's changing in your environment, and while they don't generate an incident, they can inform responders about other activities in the system that might have contributed to a running incident. Change events might come from build and deploy services, infrastructure as code, security updates, or other places that change is generated in your environment.

These events have a similar basic structure to the general events, and the setup with the routing_key is the same, as you can see in the below example. The custom_details can contain anything you want, like the build number, a link to the build report, or the list of objects that were changed during an Infrastructure as Code execution.

Change events have a time horizon. They expire after 90 days in the system, so you aren't looking at old context based on past changes.



curl --request POST \
  --url https://events.pagerduty.com/v2/change/enqueue \
  --header 'Content-Type: application/json' \
  --data '{
  "routing_key": "737ea619db564d41bd9824063e1f6b08",
  "payload": {
    "summary": "Build Success: Increase snapshot create timeout to 30 seconds",
    "timestamp": "2021-11-17T09:42:58.315+0000",
    "source": "prod-build-agent-i-0b148d1040d565540",
    "custom_details": {
      "build_state": "passed",
      "build_number": "220",
      "run_time": "1236s"
    }
  }
}'

Adding Notes

One final fun bit of functionality you can leverage in PagerDuty's API is with notes. Notes are short text entries added to the timeline of an incident. In some integrations, like PagerDuty and Slack, notes will be sent to any Slack channel that is configured to receive updates for an impacted service, making them helpful for responders to coordinate and record activity across different teams.

Notes are associated with a specific incident, so when you are creating a note, the url will include the incident ID. Incident IDs are similar to the other object IDs in PagerDuty in that you can find them from the URL of the incident in the UI. They are longer strings than other objects than the service ID in the examples above.

The content of a note can be anything that might be interesting to the timeline of the incident, like commands that have been run, notifications that have been sent, or additional data and links for responders and stakeholders.


curl --request POST \
  --url https://api.pagerduty.com/incidents/{id}/notes \
  --header 'Accept: application/vnd.pagerduty+json;version=2' \
  --header 'Authorization: Token token=y_NbAkKc66ryYTWUXYEu' \
  --header 'Content-Type: application/json' \
  --header 'From: [email protected]' \
  --data '{
  "note": {
    "content": "Firefighters are on the scene."
  }
}'

Responders utilizing the UI will see notes in a widget on the incident pag.

Next Steps

Using the API to create tooling where integrations don't yet exist, or for internally-developed services, can help your team stay on top of all the moving parts of your ecosystem when you have an incident. Learn more about the PagerDuty resources available at https://developer.pagerduty.com/. Join the PagerDuty Community to learn from other folks working in PagerDuty, ask questions, and get answers.

December 1, 2013

Day 1 - System APIs

Written By: Tim Ray (@drstrangecode)
Edited By: Adam Compton

Are We There Yet?

There has been a lot of talk about "infrastructure as code" over the past couple of years and how we can use various configuration management tools to work like developers. The idea is great but the practice leaves a lot to be desired.

We can keep the definitions for our infrastructure in source control. We can iteratively develop our infrastructure. We can even practice the ops version of TDD and use tests to drive out our infrastructure. But are we really programming?

If we think about object oriented programming from an operations perspective, the analogies are fairly easy. Say you have a fairly simple app that has a couple of web servers, a couple cache servers, and a database. We might think about a collection of web server objects, a collection of cache server objects, and a database object. Not hard to conceptualize. But what happens in practice and how does it relate to good programming principles?

First we set up our configuration management, defining the roles that our servers will perform. The web servers get apache. The cache servers get memcached. The database gets sqlite because you, madam, are evil and love to see them cry.

Things are going well. Everything is up and running. You're browsing the bofh archives, reminiscing about the glory days. Then it happens. Like every app ever written, you have to log in to the machine to do a little bit of maintenance.

Stop right here. If we remember our desire to treat things like a developer would his application: why are we messing with the internal state of the object by logging in to the system? We habitually log in to our systems to perform routine tasks but by doing this we have thrown out any similarity to programming. We decry the practice of developers making a bugfix on the production server, yet we don't think twice when we log in to restart a service.

This raises the question, "What would our systems look like if we were to treat them like developers treat their objects?"

Do It Like They Do On The IRC Channel

Good developers don't mess with the internal state of their objects. They define behaviors, interfaces, and interactions. It's the sanest way to deal with large systems.

If you need an object to do something or look a certain way, you don't poke around in its guts: you define an interface to achieve the desired behavior.

Take the simple case of managing a service. Sometimes you need to turn a service off or on without the need to change a config file. Configuration management is a poor choice for the periodic or non-normal state. I don't want to deploy a new configuration for something that's only temporary. I want to be able to "manage" the service.

If we extend that thought a little more: wouldn't it be interesting to have a program that we can deploy that models the functionality of our system? A program that lets us perform those routine tasks we're so used to doing by hand? A program that's in source control? A program that we can have tests for? A program that can handle the little bits that configuration management is weak at?

One of the easiest entry points to treating systems more like programs is to put an actual API on the system, focused on system functionality.

My API Is Happy To See You

An API provides a clean, well defined interface for systems to interact with each other.

That interface provides the perfect mechanism to ensure that system operations are performed the same way every time. Configuration Management lets us standardize what our systems look like; API's allow us to standardize how we manage our systems.

There are a lot of different kinds of API's. One of the simplest, and the kind that we'll be focusing on here, is a web API. They're easy to interact with on the command line with tools like curl; or when you want to start putting together some more complex behavior, you can make a command line client with your language of choice.

Let's look at what the interfaces for our system API's might look like with our earlier simple app example:

Web Servers

/current_user_count
/start_web
/stop_web
/web_status

Cache Servers

/clear_cache
/start_cache
/stop_cache
/cache_status

Database Servers

/compact
/clear_old_stuff
/start_database
/stop_database
/database_status

These are somewhat contrived examples of things that configuration management is a poor fit for: things that cause us to log on to the systems.

If we want to get the status of our database, we don't need to log into the system. Instead, we can do a simple curl:

curl http://database.server.company.internal.dns.com/database_status

An example of more complex behavior you might find in a custom client (Ruby-ish psuedo code):

def weekly_downtime_maintenance
  # stop web servers
  [web-servers].each do |server|
    post "#{server}/stop_web"
  end

  # db maintenance
  post "#{database-server}/clear_old_stuff"
  post "#{database-server}/stop_database"
  post "#{database-server}/compact"
  post "#{database-server}/start_database"

  # clear cache
  [cache-servers].each do |server|
    post "#{server}/clear_cache"
  end

  # start web servers
  [web-servers].each do |server|
    post "#{server}/start_web"
  end
end

Now we have a fairly complex task broken down into steps that anyone can run with our client. It also ensures that things happen in the correct order every time, with no typos. This is a lot easier to hand off to new admins or even pass back to the developers to run.

Warning: Be careful about what you expose and how: especially if your systems are publicly accessible. Firewalls, SSL, authentication, and all the other goodies associated with secure web apps are in order here.

The DevOps Win

One of the core principles of DevOps is culture. Put another way, it's a sense of community. How can you have a sense of community with someone you don't talk to?

Interface design presents a great opportunity to collaborate with developers. It provides developers an opportunity to work with you on good programming techniques. It brings up questions like "Should we break this one API call up into smaller pieces and what's the best way to do that?".

These kinds of conversations can be very helpful in developing a sense of community. You're able to talk about something developers love in a setting that interests you. They provide a safe topic when things have gotten a little tense because of system downtime or bugs in production.

API Quick Start

You've got two options for starting out with API's: write your own or use a tool that makes it easy.

PyJoJo

I like to start with pyJoJo. It basically turns bash scripts into API calls. Don't expect it to tie into your authentication system; that's what a custom API is for. What it's great at is easily getting an API onto your system that lets you start exploring what we've been discussing.

Word to the wise: be specific. If you write a little method that lets you install any package in the world, don't be suprised when someone has used it to install something stupid. If you want to try writing a method to install the latest version of the software package that the developers have been working on: consider hard coding the script to the name of the software, but making the version a parameter you can pass.

Write Your Own

Eventually you're going to want things out of the API that pyjojo won't do. It's time to start writing some code. Better yet, get the developers to do it! Make a demo of what you're thinking in pyjojo and then let the developers show you "how it's really done."

Parting Thought

System API's are more of a first step than a destination.

The real power comes when we realize what we've just done. We've put API's on our systems. I don't know about you, but I think I know a guy or two at the office that messes with this "web stuff." Maybe one of them could make, I dunno, a dashboard or something?

Something that would tell us who did what, when, and what happened.

Something that provides an easy interface to complex functionality.

Something that has a deploy pipeline and tests.

Now this is starting to sound like programming.

December 16, 2009

Day 16 - Hudson: Build Server and More

Hudson is a job monitor. It is primarily used as a build server for doing continuous builds, testing, and other build engineering activities. Continuous build and integration is a useful tool in improving the day-to-day quality of your company's code.

My first week's impression of Hudson was that it is really great software. Things work the way I expect! It has good documentation and a good web interface. Additionally, the APIs are easy to use. Compared to other learning curves, Hudson's was extremely short thanks to one part documentation and one part ease of use. I was building some work-related software packages in hudson in only a few minutes of playing with the tool.

Setting up a new job in Hudson is really easy, and every field in the job configuration interface has a little question mark icon that reveals useful documentation when clicked, so it's not often you get lost. Functionally, it has the build-related features I expect: build on demand, build if there's been new commits, email folks who break builds, show reports on build histories, etc.

Getting a bit more advanced into Hudson beyond simply building stuff or running jobs, let's talk administrative tasks. Speaking of administrative tasks, Hudson has administrative documentation detailing how to backup and restore Hudson configurations, renaming jobs, etc. Hudson's configuration is stored in XML files in a sane directory heirarchy that makes it easy to backup specific jobs or specific configurations.

Hudson also has an API. The things you need to know about the API is that any url you visit on the web interface can be accessed with the API. Adding '/api/' to any url will give you the API documentation for that url - only one thing to remember when asking "what's the API for this page?" - totally awesome. Check out these screenshots of exactly that:

I wanted to take the latest successful build of a specific job and put the resulting files (artifacts) into my local yum repo. Artifacts are what Hudson calls the files you archive after a build is complete. Fetching the artifacts from the latest successful build of any given job is fairly straightforward from the web interface. The XML api makes this easy, allowing you to find the artifacts for your builds from scripts:

% GET http://build/hudson/job/helloworld/lastSuccessfulBuild/api/xml
<?xml version="1.0"?>
<freeStyleBuild>
  ...
  <artifact>
    <displayPath>rpmbuild/RPMS/x86_64/helloworld-1.0-1.x86_64.rpm</displayPath>
    <fileName>helloworld-1.0-1.x86_64.rpm</fileName>
    <relativePath>deploy/rpmbuild/RPMS/x86_64/helloworld-1.0-1.x86_64.rpm</relativePath>
  </artifact>
  ...
</freeStyleBuild>
Parsing XML in shell without the proper tools should make anyone a sad panda. Luckily, Hudson's XML api allows you to give an XPath query to restrict the output:
# Show me the text contents of the first artifact/relativePath
% GET 'http://build/hudson/job/helloworld/lastSuccessfulBuild/api/xml?xpath=//artifact[1]/relativePath/text()'
deploy/rpmbuild/RPMS/x86_64/helloworld-1.0-1.x86_64.rpm
That path is relative to the URL without the '/api/xml' part, so fetching the RPM becomes: http://build/hudson/job/helloworld/lastSuccessfulBuild/deploy/rpmbuild/RPMS...

Fetching the RPM was only the first step of a deployment process I was building with Hudson. At work, sometimes engineers do deployments. It is not totally productive to require them to have working knowledge of rpm, yum, mrepo, puppet, linux, ssh, and other tools that may be necessary. The deployment learning curve can be reduced to almost zero if we have a one-click deployment option that would do all the right things, in the right order. Having this would also save us from having to (poorly) maintain a wiki describing the steps required to perform an on-demand upgrade.

As shown above with the API, we can fetch the latest packages. Once that happens, the exact method of deployment is really up to you and what your infrastructure needs. My version pushed the rpm to my yum master, then replicated to the per-site mirrors, then ssh'd to each related server, upgraded the package, and restarted any service required. The benefits of this are two-fold: we retire a poorly maintained upgrade howto document, and we relieve some engineers of the burdens of being intimate with the infrastructure, so they can just worry about writing and testing code.

One small trick, however. Since Hudson has a 'build now' button, I didn't want stray clicks or accidents to incur a new deployment. The workaround was to add checkbox to the build that simply asked for confirmation. Without the checkbox checked, the build would fail.

Now armed with an automated, but still human-initiated, deployment process, your deployment job can be extended to include email notifications, nagios silences, and other actions that help make your deployment more safe and reliable.

While I am not certain that Hudson is the end-game of helping coworkers do deployments to production and staging, I am comfortable with this solution given that it works well, was very easy to implement, and relieves the knoweldge transfer problems mentioned above.

Even if you don't like Hudson for deployment, it's a great build server.

Further reading: