Uniting publishers and subscribers, WebSub has become a well-adopted method for updating content on the internet. Since the protocol can be used for any data type accessible via HTTP, it’s a popular method for communicating notifications and changes, and is a great alternative to the constant server polling of the past. But, how does the WebSub ecosystem work exactly?
In this article, we’ll explore WebSub — the powerful ecosystem used by WordPress, CNN, and many other publishing networks to push content across the web. As we’ve done with other internet standards in the past, we’ll dig into how this one works, identify use cases where it shines, and provide example WebSub implementations to showcase how the protocol securely transmits data.
Some Background on WebSub
WebSub originally went by a much harder to pronounce name – PubSubHubbub. It started out as an extension of the Atom and RSS protocols, which are used to generate real-time change notifications from pretty much any data type. As long as the data is accessible by HTTP, a change can be broadcast to a client, and HTTP changes are pushed to the client without requiring the client to use up resources for polling.
This design style is great for news, forum updates, and more, because it allows for a sort of asynchronous update scheme that flips the traditional polling approach on its head. Instead of a client going to a website to check for an update, that update can be automatically pushed to a client who has “opted-in” to that communication, allowing for passive knowledge of content through alerts.
In January of 2018, WebSub was adopted by W3C as a recommendation based upon these, and other, merits. To see why it’s such a promising technology, let’s look at how it actually works.
How it Works: What Is A Publish-Subscribe Pattern?
WebSub forms a network of publishers, subscribers, and hubs. This network determines how content is sent, when it is sent, and to whom it is sent. The publishers are the linchpin of the network, being the actual content providers who generate the data that is being updated and sent to the various clients. The publishers generate this content and expose the content through hub references in the HTTP headers, attaching it to a “topic,” which broadly labels what the content actually is and how it pertains to the user.
When something is changed, the publisher posts these changes to the hub itself. The subscriber retrieves HTTP content from the webserver, and if the content contains a reference to a hub, that subscriber can subscribe to that reference URL topic. In doing so, it’s basically letting the publisher know that they want to be “kept in the loop,” and that when an update is pushed, they want to be notified.
Hubs utilize webhook mechanics to notify subscribers to these changes, remotely updating their content and pushing reference content automatically. The hub, then, acts as a sort of passive intermediary, pushing content out and receiving subscriptions to that content without actually doing generation or subscribing itself.
One of the main benefits of WebSub is that it includes a rather robust security mechanism. In this process, the subscribers can send a secret to the Hub, which is then used to generate an HMAC key. This key is then sent to the subscriber, who can verify the origin by comparing the signature with a locally computed signature based upon their secret. This ensures a secure connection, because only the subscriber and the hub can generate a secure HMAC key from the secret content.
Event-Driven Architecture and WebHooks
The entire concept behind WebSub, and really WebHooks in general, is for notifications to be pushed for changed content. This obviously works great for regularly publishing websites, news aggregators, etc., because content is always changing and is best pushed remotely without requiring polling.
Despite that, the greatest value of event-driven notifications does not rest in the news aggregator implementations – because the content is sent in summary form, and is sent without demanding the user find this event on a website or portal first, this makes the event-driven nature of WebSub and WebHooks prime for updating content from APIs, pushing new version changes, and even pushing entire new codebases to the user. To see how this works, and whyit’s valuable, let’s first define and separate WebSub and WebHooks in general.
WebHooks Lack Verification
WebHooks is a concept in which the customer registers to a webhook using a callback URL, and this URL is then used to pushed content to the user by calling said URL. On the API side, this can be implemented in a variety of ways and levels of complexity. It can be as simple as providing a single webhook that can be utilized to send content that is ephemeral (that is, automatically expiring for the user after a set time), and as complex as offering a variety of
GET calls that can retrieve a list of callbacks, edit existing URLs, and modify these contents.
WebHooks are quite complex, but for our purposes, we should understand them to be a mechanic for pushing content via callback URLs. In theory, this implementation is great, but they often require additional systems to actually secure them. Since the callback URL is public and exposed over the clear, a malicious actor can intercept this communication, sending false data. Additionally, a denial of service attack could very easily cripple the endpoint. Even something as simple as user application errors could cause delivery to fail, and simple user negligence could result in changes being missed.
Simply put, WebHooks do not provide verification of intent in their default configuration.
WebSub Is More Secure
WebSub was designed to fix many of these issues. First and foremost, since WebSub has a built-in mechanism for security, it can essentially challenge the intent of the request at any stage in the process. Ensuring that the information is sent by the hub it says it was can be done by comparing the HMAC key with a locally computed one, verifying that the hub is acting in good faith.
Ensuring that remains unchanged can easily be done by hashing content from the publisher and comparing the hashed content locally, which can only be done if we intrinsically trust the publisher in the first place. DoS attacks can be negated by this same process, and with integrated rate limiting methods or simple challenge responses to excess traffic, this can be even more effectively negated.
There are a wide range of additional benefits to using WebSub over WebHooks, especially in certain applications, but all in all, they are far less important than the added security mechanism that creates a trust of intent.
It should be noted that WebSub by default sends content via a light ping, meaning that only the Header parts like title, links, etc. are sent. This makes it lighter by default than WebHooks. That being said, WebHooks does have a similar light ping option that can be configured, and as such, could in theory be made to function similarly. That being said, the default light ping nature is majorly value adding, even if it does come with the caveat that a POST call morphing into multiple GET or sub-POST calls can quickly run into rate limiting if not properly handled.
Benefits and Common Cases
Let’s look at the advantages of this Publish-Subscribe pattern to see where some common cases might arise. This pattern is first and foremost an excellent choice for scalability. Since only the subscribed contents are sent to the user, and typically sent via a light ping, the use of resources for notification is much lower than traditional automated polling systems, and thus can quickly scale to larger sizes. This benefit can vaporize when moving to enterprise-level implementations, however, with larger amounts of subscribers demanding resources from the publisher.
Another huge benefit is the fact that this pattern implements loose coupling. The subscriber doesn’t need to know anything about the structure, organization, topology, or function of the servers and its data systems in order to get content, and instead only has to know the principle endpoint being accessed. Because of this, there’s an increased level of security inherent in opacity that results in more server-level dependability and privacy.
Finally, in this pattern, the system has a level of asynchronous functionality that is hard to achieve in traditional systems. If the subscriber fails, the publisher still functions, pushing content. Conversely, if the publisher fails, the subscriber still has access to historical records and content, and isn’t “cut off” from further changes.
With all of this in mind, any situation in which content is best pushed from a provider to a subscriber in a secure methodology would hugely benefit from the WebSub implementation. Let’s look at two theoretical implementations to see this in action.
Hypothetical Implementation 1 – Echo
“Echo” is an API that utilizes a series of third party APIs to collate content and generate lists according to a stated desire, interest, or need for the user. When this content is found, it’s pushed to the user as part of a collection – i.e., a new album from their favorite artist is released, so Echo pushes this album in a news aggregator format to the subscriber with a link to listen to the single from the album.
A collection can be defined by the user to be pretty much anything, ranging from new albums to featured articles on a news site, from new video game news to titles being added to their favorite streaming service. In some cases, this might require passing credentials to login to third party APIs and systems to generate these lists of content.
In this use case, the security inherent in WebSub is what’s truly in play. Being able to determine that the actor in question is who they say they are is extremely important, especially if we’re going to be trusting them with any amount of credentials. If this information is encrypted in transit, confidentiality is increased, and when married with at-rest encryption, it makes for a secure notification service that is not limited to a single provider, social network, or data source.
Hypothetical Implementation 2 – ASyncee
“ASyncee” is a startup looking to provide short-term work projects as a middleman between workers and employers. The company has identified a major weakness in current solutions, which require the use of traditional communication and mean that for workers, looking for a job is often a job in and of itself.
The startup is thus designed to register a job from a client and broadcast that job to users who have logged an interest in that “topic” or job type. Because of the nature of these jobs, the delay between major postings, and the fact that job postings are not consistent or predictable, ASyncee has chosen to leverage WebSub to update clients asynchronously.
By allowing the user to opt in to certain topics, such as “Art Jobs” or “Writing Jobs”, the data sent to them can be tailored specifically to the user in question and their specific skill sets, removing the need to search specific terms and overly-broad categorizations to find relevant postings. The user can tap into additional endpoints or topics to request additional information.
Key to this is that, should the client or provider go silent, the other can still function and leverage the data generated. If there are no additional jobs, the user can still use their locally stored values to apply to jobs and utilize other endpoints to check for other offerings. Should the user go silent, the content generator will never know anything has changed. Since the nature of this data is somewhat sensitive, and some jobs want to keep their applicants somewhat hidden, the fact that there is a divider between the content generators and the end user means that the job poster need not know who is interested until they have actually applied.
WebSub is a promising technology, and given the correct use case, can be leverage to provide quite a bit of value. Notification of new content, be it new codebase revisions or new job postings, is the principle function of many APIs, and as such, simplifying this process and reducing demanded resources can be hugely beneficial for most APIs that fit into the given criteria.
That being said, WebSub is just one approach. For a service that cannot utilize that pattern or a service that demands added functionality, WebSub is not a strong choice – for services that can work within that structure, however, it may be the most powerful offering to date.