Webhook Notes
In this document, I have endeavored to “dump” everything I know about webhooks.
This document has been compiled from my experience –
- – working with webhook systems in content management
- – proactively researching webhook systems for product evaluation and development
- – building multiple webhook systems and related technologies. Examples:
What’s described in this document is a broad survey of webhook styles, architectures, and options. This is not presented as a list of requirements for any particular webhook system, but rather as a catalog of all possible options. To my knowledge, no observed system implements every option presented here.
Also, the usage of webhooks in this document is biased toward content management scenarios, simply due to the majority of my experience.
Finally, in some cases, architectural concepts have been defined. These definitions are for illustration, or represent commonly accepted norms, features, or conventions.
Introduction
A “webhook” is a informal term to refer to an HTTP request made from an originating system in response an internal event. The goal is to notify and transmit data related to the event to another system.
A webhook effectively allows an originating system to notify a remote system of an event and provide it with data describing the event.
Remote procedure calls are not new (consider DCOM or gRPC as historical examples), however webhooks differ in that:
- They use HTTP, over the “public” Internet (hence the “web” moniker)
- They are configured, not coded. Webhooks are specified by users of of the original system as ad hoc methods of connecting it to other systems, unknown to the developers of the originating system.
There has never been a formal specification or even agreement on specifics, simply an informal understanding the architecture, purpose, and usage.
There has traditionally been a website at webhooks.org which served as a wiki for informal webhook information, however it appears to be offline. Here’s the last capture of the site from the Internet Archive.
I was personally using webhooks as far back as 2003, however I called them a “pinged script” at the time. I built a plugin to Movable Type, a popular Perl-based blogging platform. In the related blog posts, I discussed the logic and reasoning behind what I was doing.
Extending Movable Type Using a Pinged Script
From that blog post (emphasis added)
There are a few cases where I want to do interesting things with entries, but I don’t want to hack into Ben’s Perl code. I solved this problem by inserting just enough code to ping a specified URL whenever an entry is saved.
The accepted term “webhook” dates to 2007 (originally “web hook”). As near as I can tell, this is the first reference to them:
Web hooks to revolutionize the web :: Jeff Lindsay
Sample Use Cases
- Notify a remote static site builder to regenerate a website when any content change occurs
- Store a serialized representation of any content in a remote archival system when it is deleted
- Submit an MP3 to a transcription service when one is published
- Clear CDN caches when content changes
- Update a remote, synchronized datastore
- Update an audit log to record a change to content
- Update a remote search index
- Log an event to an analytics platform
- Trigger a content review or Digital Quality Management process on newly approved content (it’s assumed the DQM system would have its own notification process if errors were detected)
- Add redirection URLs to a CDN when a content URL has changed
- Request a translation of newly published content
- Generate and store a preview of content when that content is saved/checked-in (viewable in some other system)
- Provide new content for AI LLM/RAG ingestion
Common Webhook Events
In content scenarios, events are usually raised in response to actions involving a stored entity, meaning a content item, a file, a folder, or a user.
- Creating a new entity (this would include copying an existing entity; the copy would be a new entity)
- Saving an existing entity (given that the UI of many systems saves in the background every few seconds, this would need to be a proactive save, sometimes referred to as a “check-in”)
- Publishing a previously saved entity (whatever that means to the specific system)
- Deletion of an existing entity
- Moving an entity (if the system is spatially oriented)
- Workflow/Approval State Change
- Order checkout
- Login
- Logout is less common, given the vagaries of session abandonment
Possible Alternatives to Webhooks
Let’s first specify the simplest requirements as –
- We want to notify a remote system of events which occur in an originating system, and pass related data
- We want these notifications to be specified by the end users of the system, not the developers, while that system is running in a production environment.
These requirements necessarily preclude anything that involves the source code of the system – therefore nothing like traditional gRPC or SOAP or REST can be used. The system is required to be “ignorant” of any specific webhook, so they can be added by users in the production environment.
Given that limitation, the only real alternative becomes some type of API polling. This could take two forms:
- A scheduled API call which catalogs all entities and diffs that catalog against the last call. This would need to detect new entities (creations) and missing entities (deletions, which would not be in the results). Edits would need to be detected using a last edit time, or perhaps a hash of the actual content.
- A “sync” API call. This would need to be provided by the originating system, but it’s a change or audit log of everything that has happened in that system since a specified point in time.
Unfortunately, polling presents several drawbacks.
- It’s inefficient. To ensure accuracy, you would effectively have to produce a catalog of every entity in the system, every time. If you could order the API results by last edit time, you might be able to cut the catalog down to the “last X items edited,” but you then run the risk of a surge of edits which exceeds the polling interval.
- There is latency, given that this method operates on a schedule
- Unless there is a sync API specifically designed for this purpose (#2, above), polling can be inaccurate. Even assuming the logic to diff the derived catalogs is valid, entities could be created and deleted between polling intervals.
- It requires action from the receiving system to poll the API, or it requires a third system if creating up a general polling system, independent of the target system.
Architectural Definitions
Note that some of the definitions below can be construed to describe a specific architecture. However, the intention is merely to illustrate concepts, not dictate development.
(Also, these definitions are naturally interlinked, so there’s no easy way to order them to avoid to need for look-ahead. They are, very roughly, ordered by the timing of their appearance and impact to the overall process.)
In the text following this list, defined terms will be capitalized.
- Originating System: any system that generates Events which are transmitted to the Framework. More than one Originating System can share the same Framework. Each Originating System simply needs to communication Events via the same communication method – deposit them in the same message queue, the same communication bus, the same database table, etc.
- Event: a notification from a source system. This event is raised to the Framework through some communication protocol – insertion of a database row, event raised in a bus, etc. An Event may or may not generate a Webhook, depending on the logic of the configured Factories.
- Event Payload: the collection of data provided to the Factories. This data can be used by each Factory to (1) decide whether or not to generate a Webhook, or (2) pass the data to the Target in the Webhook Payload.
- Webhook Framework: the computational system which houses the Webhook Factories and manages the Webhooks themselves. In some cases this is built into the Originating System and is part of its immediate concerns. In other cases, the Framework is remote to the Originating System and the Event is transmitted using some type of communication bus, perhaps connected to multiple Originating Systems.
- Webhook Factory: the logic and configuration that decides which events should be attached, the Webhook target, and how the Webhook will be formed. Called a “factory” because it generates/manufactures Webhooks. This is the primary UI interface for users – users will configure multiple Webhook Factories (though, to them, the Factories are often/usually just called “Webhooks”). When an Event is raised, every configured Factory is given the option to generate a Webhook based on it’s data and Payload. Factories are sometimes called “Triggers.”
- Webhook Factory Template: a set of preconfigured settings that a user can copy to create their own Factory. These often represent the configuration needed for common actions on well-known services (ex: “Trigger a Build on Netlify”)
- Webhook: a specific intention to contact a Target. When created, a Webhook is “unresolved.” An individual Webhook has a lifespan from when it’s generated by a Factory, until it successfully resolves (gets an acceptable response from the Attempt Processor, or it fails a maximum time of times and abandons). While “alive” (unresolved), the Webhook effectively becomes a factory that generates Attempts and places them in Queue.
- Webhook Target: the URL and parameters (method, querystring arguments, HTTP headers, Basic Auth, etc.) to which the Webhook is directed
- Target System: the system that receives the Attempt and acts on it
- Webhook Attempt: a single attempt to reach the Target System. An Attempt is separate from a Webhook because if an Attempt fails, many systems have logic to retry. Thus, a single Webhook might generate more than one Attempt.
- Webhook Payload: the serialized content POSTed with the Attempt to the Target (for a GET request there will be no Webhook Payload)
- Attempt Queue: a “holding pen” for Attempts. A Factory generates a Webhook, which in turn generates Attempts which it places in the Queue.
- Attempt Processor(s): the computational agent that works through the Queue, actually sending Attempts, logging the results, and notifying Webhooks of the response. In high-volume scenarios, there might be multiple Processors working Events from the same Queue.
- Webhook Resolution: a state/event in which a Webhook is considered “resolved” and requires no more action. A Resolution doesn’t necessarily mean a Webhook succeeded – multiple Attempt failures and abandonment are a valid Resolution. Once resolved, the Webhook is not longer “active,” it is deleted or archived, and might exist only as Log entries of Attempts.
- Attempt Log: a record of every Webhook generated, along with its corresponding Attempts (although some systems only record the Events, if the Payload contains enough data to define the Webhook)
Example Flow
- Jane publishes a news article in the Originating System. That system raises an Event that News Article #123 was published by Jane. The Event Payload includes a JSON-serialized representation the article itself.
- The Framework detects the Event, and retrieves three Factories configured for Jane’s organization. The Event parameters are submitted to each Factory. One factory is configured thusly:
- Item Type: News
- Action: Published
- Target:
GET https://other-system.com/api/notify-new-content?id=${item.id}
- Retries: 5x, 15s delay, 30s timeout
- In this case, that factory generates a Webhook:
- Target:
GET https://other-system.com/api/notify-new-content?id=123
- Payload: none
- Target:
- At this point, the Event is no longer needed and ceases to exist
- The created Webhook immediately generates an Attempt which is placed in Queue
- A Processor which is monitoring the Queue, retrieves the Attempt, makes a request to the Target System, receives no response in the 30s configured timeout, and closes the connection. The Processor enters a record of this Attempt in the Log, and notifies the Webhook of the (lack of) response.
- The Webhook places another attempt in Queue, scheduled for 15 seconds in the future (alternately, the Webhook sleeps for 15s, then awakens and places the Attempt in Queue)
- The Processor retrieves the new Attempt, makes a request to the Target, and receives a “200 OK” response. The Processor places a record of this Attempt in the Log, and notifies the Webhook of the response.
- The Webhook resolves and ceases to exist. The only record of it are the Attempt records in the Log.
Click to enlarge the diagram.
Webhook Factory Configuration
A Factory is a set of configuration parameters or “rules” that execute for every Event. As a result of these rules, the Factory may or may not generate a Webhook.
Trigger Configuration Options
The most common configuration options to trigger Webhook generation:
- Content Type of the Event (ex: “News” or “User”)
- Action of the Event (ex: “Published” or “Deleted”)
- Location of the event. In systems with a spatial architecture (a content tree, for example), you might specify that the Factory is only concerned with content in a particular “branch” or “section”
- Logical Filters. Simple JSON comparison logic can be performed against the event payload. (ex:
$.item.en_US.title startsWith 'foo'
or/user/id/username equals 'jsmith'
). Some systems have domain-specific languages, or allow server-side JavaScript evaluation. If more than one filter can be specified, then it’s common to also specify boolean logic, such requiring “Match One” or “Match All”.- Conceivably, both old and new versions of existing content could be provided to detect specific changes (ex: to detect URL changes, “$.item.url <> $.old.item.url”)
Target Configuration Options
If a Factory generates a Webhook, the follow options are commonly offered to specify the Target:
- URL, including query parameters: the URL is often processed through a simple templating engine to allow dynamic tokens from data provided by the Event (ex:
?id=${item.id}
) - Method: GET, POST, PUT, etc.
- HTTP Headers: zero or multiple
- Basic Auth Parameters: username and password
- Content Type: technically this can be specified as an arbitrary HTTP header (above), but it’s commonly broken out as its own option
Payload Configuration Options
In the event of a POST request, a Payload can be specified which forms the body of the request.
- Automatic Payload where the Event Payload is simply passed-through
- Manual Payload can be specified, often processed through a simple templating system to allow dynamic data transformation from the Event Payload to another JSON document. Some systems will allow Payload generation through some domain-specific language or server-side JavaScript execution.
Default / Automatic Headers
Most systems will add custom HTTP headers by default to transmit meta-information about the Factory or Webhook that generated the Attempt (ex: “X-Webhook-ID”):
- Timestamp of Attempt creation
- Entity ID
- Entity Version Number/ID
- Factory ID
- Webhook ID (this can be used to detect duplicate/repeat Attempts)
- Attempt ID
- Action Name
- Triggering User ID
- Some unique hash formed of multiple of the above values
Attempt/Resolution Configuration Options
If an Attempt receives a 2XX response, the Webhook resolves. In the event of any other response, retry options can usually be specified.
- Timeout. The amount of time to wait for a response to an Attempt before abandoning the HTTP request.
- Retry Delay. The amount of time to wait from a failed Attempt before generating or retrieving a new Attempt, in the event the Target System is temporarily under load. In some systems, this delay will increase (ex: it will double) with every failed Attempt
- Linear: the delay is the configured delay multiplied by the iteration: (ex: 15s, 30s, 45s, etc.)
- Exponential: the configured delay is used for the first retry, and then doubles every subsequent retry (ex: 15s, 30s, 60s, 120s, etc.)
- Abandonment. The Webhook can Resolve without a successful response under specific conditions:
- Status Code: Upon receiving a specific status code, the Webhook resolves
- Number of Attempts. If a specified number of Attempts have failed the Webhook resolves
- Timespan. In the absence of a specific number of attempts, a Processor might auto-generate retry Attempts for a specific window of time after Webhook creation (example: “retry every 60s for three (3) hours”)
Some systems will have notification/alert systems for Webhook failures (though, in a high-volume system which experiences extended downtime from a Target, individual notifications would quickly become unmanageable).
Some systems will also disable the Factory itself if too many Attempts fail from Webhooks generated by that Factory. This is to prevent wasted work against a crippled or unreachable Target System. (Clearly, this would necessitate some type of notification system to alert that the Factory had been disabled.)
Webhook UI
The only “intentional user” of a webhook system is an administrator. For editorial users, there is no UI – the webhooks will generate and resolve in the background silently.
The webhook administrator needs a UI to accomplish the following:
- Create, update, or delete Factories
- Create a new Factory from a Factory Template
- Activate or suspend a specific Factory
- Export/import/move a Factory between instances (in multi-instance/repository systems where Webhooks are instance-specific)
- Determine if a Webhook was generated in response to an Event, as a debugging tool
- View the response to a specific Attempt
The administration UI generally consists of these displays:
- A list of all configured Factories, with the ability to (1) create a new Factory, (2) edit an existing Factory, or (3) delete an existing Factory
- As noted above, the word “Factory” is appropriate from a technical/architectural perspective. However, from the user’s perspective, Factories are often referred to simply as “Webhooks”
- Some systems have a limit to the number of Factories can be created per account.
- Some systems will display performance metrics for each Factory (ex: number of Webhooks generated from that Factory in the last 24 hours; average Attempt response time; percentage of failed Attempts)
- The details and configuration UI for a specific Factory
- Some systems require Target URLs to be secure (“https://”)
- Some systems will allow the proactive, on-demand generation of a Webhook from this UI for testing. The user might be able to specify a fake Webhook Payload to be sent, or a fake Event Payload from which the Factory can derive a Webhook Payload.
- A log of all Attempts for Webhooks generated by a Factory
- Technically, the Factory should show the Webhooks generated, which should then show the Attempts generated by that Webhook, however, I have not observed this in any system
- The log of Attempts could be filtered by parameters present in Event Payload (ex: “Show me all attempts generated by the publication of News Article #123”)
In some systems, creation or updating an existing Factory will generate a “ping” request to the Target (often as an OPTIONS method) to ensure it exists and can receive connections.
Directionality, Synchronicity, and Reactivity
Most Webhooks are intended to be asynchronous and one-way, meaning the Originating System sends an Attempt to a Target System, and does not (1) block any user-detectable process while waiting for the response, nor (2) use the response in any logical processing, other than simply logging it.
(Colloquially, webhooks are “fire and forget.”)
Two-way Webhooks, in which the Originating System operates on the response data received from the Target System, are theoretically possible, but rare, for several reasons:
- Webhooks are I/O by definition, which can be problematic from a UX perspective, and for general efficiency and stability
- Attempts are user-configured, which means the internal processing of the Originating System is now dependent on an external resource not under its control. Any logical processing which depends on a response would need considerable edge case and error handling.
- Given that webhooks are configured, not coded, any response logic would have to be similarly dynamically programmed, which is beyond the capabilities of most users and systems. (Put another way, once you capture the response from an Attempt… what are you going to do with it?)
A potential two-way use case might be for data validation. Upon data form submittal, a Webhook might send the submitted data to a Target and use the response to prevent saving the data and instead validation errors to the user. However, this stretches the definition of the term “webhook” and might be framed and documented as a “validation API” instead.
Webhooks are normally not guaranteed to be in any specific order, relative to the order of Events occurring in the Originating System. This would necessarily depend on how the Processor is designed to manage the Queue, but given the general vagaries of multi-threaded and transport, order consistency cannot be depended on.
Some systems allow “chaining” webhooks, meaning when Webhook A resolves, it can trigger Webhook B, etc. This is rare, and of limited utility in most cases. The chained Webhooks would usually be configured this way so that Webhook B can make use of data returned by Webhook A, thus making the webhook two-way.
However, as noted above, configuring this level of coupling (and the required edge and error handling) is beyond the scope and capabilities of most webhook users and would normally be accomplished by more programmatic methods. (ex: the Target for Webhook A is an API endpoint that then makes several HTTP requests from code.)
Webhook Security and Load Management
Any webhook necessarily involves two systems: Originating and Target. There are security considerations on both sides.
- Originating System
- Access to the functionality and UI to create and manage Factories and view the Log needs to be restricted to a subset of users (normally administrators, developers, and “power users”).
- In the absence of user configuration, every Factory should have a hard-coded maximum timeout that it specifies on any Attempt it generates to prevent an Attempt that hangs
- In the absence of user configuration, every Factory should have a hard-coded retry limit, that prevents Webhooks from generating unlimited Attempts
- Processor(s) should have a “throttle” limiting the Attempt frequency and/or number of Processor threads to avoid performance overload on either the Originating or Target systems (ex: “Each Processor may only make one Attempt every two (2) seconds.”)
- Target System
- The Target System has normal load/performance management concerns to avoid becoming overloaded by an Originating System that generates a large number of Attempts.
- In most cases, the Target System will want to ensure that the Attempt actually comes from the Originating System.
- Authentication data can be proactively sent with the Attempt in the form of a bearer token or Basic Auth (some form of “shared secret”). Some systems allow decryption/verification via public/private key infrastructure, or via more complex protocols (ex: OAuth or SAML). (Perhaps worth reading: this post about signed webhook requests; though, I have never personally signed a webhook request.)
- Some systems will publish a stable range of IP addresses from which Attempts originate. When this is available, the Target System can check the originating IP before processing.
- Network security such as a firewall or VPN can manage the low-level access between systems.
- Finally, the Target of a Webhook could conceivably be an API call on the Originating System. This might create a circular reference where an Attempt could generate another Event, which would then generate another Webhook, which would generate another Attempt, ad nauseum.
The Minimum Viable System
As noted earlier, this document is a broad survey of all features and architectures in observed webhook frameworks. As also noted, there is no formal definition of what constitutes a “webhook framework,” leaving the features subject to interpretation by an implementing system.
Based on my experience, here is the minimal featureset required to achieve a generally accepted definition of a “webhook framework.”
- Identification and documentation of one or more Events that will be evaluated by Factories
- Minimum expectation would be CrUD operations on the core system entities
- Specification and documentation of the Event Payload
- UI to create and manage Factories
- Specification of the triggering entity type (ex: “News Article”) and event type (ex: “Published”)
- Specification of the Target, with a minimum of:
- URL (with optional querystring args)
- Limited templating (minimum expectations would be token replacement of a unique entity ID provided by the Event)
- HTTP method
- Multiple HTTP headers
- Basic Auth credentials
- URL (with optional querystring args)
- An automatic Payload for POST Webhooks
- Meaning, when the method is POST, the Attempt will simply pass-through the provided Event Payload as the body of the request
- Asynchronous processing of Attempts
- Responses written to the Log
- Initial retry logic can be hard-coded and non-configurable
- UI to review the Log of responses to Attempts
- Log can auto-prune after a specified period of time (ex: 7 days)
Though not required, a final recommendation would be that the Webhook Framework run in its own environment, retrieving Events from some type of communication bus shared with the Originating System (usually a formal queuing system, though a database table or even the file system might be used for low-volume scenarios). This architecture would insulate the Originating System from any performance issues, as its only concern would be generating the Event and sending it to the bus.
Demarcation and Support
One universal principle to webhooks is simply that the Originating System and Webhook Framework is not responsible for what happens on the Target System. The only requirement is to make a valid Attempt and record the result, whatever it might be.
This, then, becomes a clear demarcation point for support issues: if the Webhook Framework can prove that the Attempt was made with the correct/expected data, and the response was recorded, then it bears no liability for how the Target System functions.
That said, here are three points where webhook users might run into issues, and the Framework would need to prove correct functioning:
- The Originating System raised the event correctly.
- The Factories correctly interpreted the event and correctly generated a Webhook (or didn’t)
- An Attempt was made with the correct/expected data
- The response was logged
Conclusion: Managing Developer Tension
Webhooks are an extension feature. As such, they permit the users of the Originating System to extend it to do things not understood or foreseen by the originating developers.
This tends to create tension among the developers and architects of the Originating System, and it tends to move to more “platform” model, rather than a simple “product.”
One of the key sources of tension with webhooks is to what extent they should contain programming logic themselves, or to what extent they should simply be treated as notifications to other system. Put another way, are we expecting our users to become developers?
Within most webhook frameworks, there are two points of logic:
- Event evaluation by Factories, which sometimes includes the ability to provide code-based logic
- Payload specification, which sometimes includes the ability to template data received from the Event, rather than simply passing it through
With those two exceptions, webhook development should be a configuration task, not a code task. If Factories provide some type of domain-specifc-language (DSL) for trigger logic or payload specification, it should never be required, only offered as an option for more advanced requirements.
Another point of tension is whether or not webhooks should be expected to make “blind” calls to APIs, or if they should just notify other systems to make more complicated API calls?
For example, if you wanted to use a web API to create a Slack notification that new content has been published, you can approach it two ways:
- Directly: Configure a Factory to create Webhook that generates the API call directly, with the correct headers, body, authentication, and querystring arguments for the Slack API
- Indirectly: Configure a Factory to simply notify some other logic unit (an AWS Lambda function, for example), which contains specific code to communicate with the Slack API
The former option is often not realistic, given the idiosyncrasies of web APIs. To try to create a system that would allow configuration options that would adapt to any web API would be to invite complication and frustration – it would open the systems up to endless edge cases and random, requirements.
In most cases, webhooks should be treated as simple notifications, and direct communication with other systems should be written in more fully-realized code environments. Webhooks should be developed with the expectation that connections to the “terminal system” are indirect, and proxied through another environment.
A last point of tension is the introduction of the Target System. For developers of the Originating System, there’s a tendency to view the Target System as a dependency that can’t be controlled. However, as noted earlier, this can be mitigated by several principles:
- Completely divorcing the webhook system from any synchronous processing in the Originating System. The only synchronous operation of the Originating System should be to deposit an Event in some communication bus shared with the Webhook Framework. Beyond that, the Originating System has no further connection with the process.
- Clearly demarcating and limiting the responsibility of the Webhook Framework to simply making a correctly formatted Attempt and recording the results. What happens in the Target System is of no consequence to the Framework, and this should be clearly communicated to users.
- Setting sensible, conservative maximums and defaults on Attempt logic. Timeouts should be limited, throttles should be specified, and maximum retries should be capped.
To be clear, the existence of webhooks will change the nature and perception of the Originating System. Even if it has an existing API, it will now become a proactive generator of activity. Additionally, it introduces a pseudo-programming environment, which must be managed.
As with anything, clear communication of the intention and limitations of the system is key to ensuring it fulfills the expectations of the user.