Featuring Platform Summit 2017 speaker Audrey Neveu

Streaming and real-time are the new fashion. Developers want their applications to provide an interactive experience — this implies not only receiving data in real time, but also being able to stream events. But for someone first digging into WebHooks and WebSub, the lack of documentation on terminologies can be confusing. So let’s start from the beginning.

Events Streaming: What is it For?

When talking about streaming, you’re probably thinking of data streaming and less often about event streaming. The two notions are very different for the reason being they use different technologies.

When we discuss event streaming, we’re not really interested in the data underneath.
What really matters is that something has happened — something that either implies a reaction from you, or something you want your users to be notified of.

WebHooks

Sometimes called User Defined HTTP Callback or User Defined Post HTTP Callback, WebHook is a concept. Nothing more. You won’t find true WebHook specifications or libraries; it’s up to you to implement it the way you like.

How does it work?

As a customer, registering to a WebHook is as simple as registering a callback URL on the provider website. And then, whenever a new event occurs on the server, it can send a request to your callback URL to notify you about the update. Simple, isn’t it?

Indeed! So, what do I need to start?

For a consumer, their only task is to define a callback URL. But keep in mind that this URL has to be accessible from the outside, so no localhost, no firewall.

As a WebHook provider, you have two tasks on your checklist:

  • Define a subscription endpoint
  • Implement your WebHook queue

And of course, once this checklist is filled, don’t forget to create a nice interface on your website if you want users to be able to subscribe to your WebHook!

Define a subscription endpoint

You could declare only one endpoint to accept new subscription requests sent on a POST /WebHook call, but it could be nice to have something a little bit more fleshed out.

For example you can expose some other useful resources like:

  • GET /WebHook to list existing WebHooks
  • GET /WebHook/{id} to retrieve details of an existing WebHook
  • PUT /WebHook/{id} to update an existing WebHook
  • DELETE /WebHook/{id} to unsubscribe from an existing WebHook

But as I said previously, there is no specification so everything above is optional.

Implement your WebHook queue

Here’s the moment where things become a little bit tricky. We have 4 different solutions here:

  • Inline HTTP Requests: Each time an event is triggered, you’re going to look at any existing subscriptions for this event, loop over each, send POST requests, and then perform any cleanup, failure, or retry logic that might be needed. It will work, but I wouldn’t recommend it unless you’re absolutely sure you’ll have a small amount of events and subscriptions. And even in that case, keep in mind that all those actions will delay your response.
  • Create a DB queue: With this option, you’ll be creating a record in your existing database for each notification you need to send. Then, you’ll need a separate process to frequently look for new notifications and send them. It doesn’t scale well but is a solution if you can’t add another piece of technology to your stack.
  • Use a proper queue: If you know you’ll need to scale your solution, it’s probably better to use a proper message broker like RabbitMQ, ActiveMQ, or Redis. You will still need to have a separate process to consume items from the queue and send notifications, but this way, you’ll save databases resources.
  • Batch: If you know that your API is going to send a large number of hooks at the same time, it’s probably better for you to bundle them into a single hook. For example, Facebook aggregates changes and sends them in batch at most once every 5 seconds, or when the number of unsent changes exceeds 1000.

Ok, everything’s settled. What could possibly go wrong?

Some pitfalls may appear on your road to real time events. First and foremost — DDos attack .

If the WebHook you’ve subscribed to generates more requests than your application can handle, the result will be a DDos attack. So first of all, try to evaluate the expected scale of the WebHook you’ve subscribed to and make sure your application will be able to mitigate the traffic.

Since you’re exposing a public URL to receive notifications, it could be accessed by a malicious application that will send you false data or try to DDOSing your app, this time for harmful reason. That’s the reason why it’s usually recommended to have A) one callback URL per registration, and B) have it be less human readable.

Missed notifications are another possible issue. A WebHook’s job is to deliver data, not necessarily to pay attention to what you’re doing with it. If, for any reason, your application is in error and the WebHook you’ve subscribed to does not pay attention to your answer, you may lose data.

On the contrary, if it does check your answer and make multiple attempts when you’re not providing the one it expects, it ended up with duplicated notifications. Facebook for example will retry again immediately, and then a few more times with decreasing frequency over the next 24 hours. After that time, updates that were not delivered would be dropped.

WebSub

Previously called PubSubHubbub, PubSub, or PuSH, WebSub is an open protocol, based on the Publish/Subscribe pattern and on WebHooks. Initially designed to extend Atom and RSS protocols for data feeds, it can be applied to any data type as long as it’s accessible via HTTP. Since April 2017, WebSub has been adopted by the W3C as a Candidate Recommendation.

The Pub, The Sub, and the Hub.

WebSub is built upon an ecosystem of Publishers (Medium, WordPress, etc.), Hubs (Superfeedr, Switchboard, etc.) and Subscribers (Feedly, Flipboard, etc.).

Compared to WebHook, WebSub requires way less effort for Publishers as all they need to do is declare the Hub they’re using with the Link Header, and then ping it when they have new content published.

For Subscribers, it doesn’t make a big difference as they’re still going to make a subscription request but this time to the Hub, which is going to ping them at its turn when new content has been published.

At this point you might ask yourself…

Is that all? A new player in the middle but everything remains the same. What’s the point?!

… and you’d be right! But the thing is, this particular player is going to change some rules.

Verification of intent

A huge difference between WebHook and WebSub is the verification of intent. We want to be sure the user has asked to subscribe to the content. And in the same way, if they decide to unsubscribe, we still want to be sure nobody has done so on their behalf.

To do so, the Hub is simply going to ask confirmation for both actions. Concretely, once the subscription request has been received, the Hub is going to send a GET request on the user’s callback URL with a hub.challenge parameter. This parameter is just a random string but it has to be sent back to the Hub (and only it!) as the body of the response. Similarly, the same mechanism applies for un-subscription.

Light ping

The second big difference from WebHooks is that by default, new content notifications are a light ping. Basically, instead of sending the whole content, only new entries in the feed with header parts (title, links, etc…) are going to be sent by the Hub to subscribers. It’s up to them to fetch it… or not.

This makes a big difference because we know that not all content will be read, so why send them to all subscribers, each time? Why not save resources by letting the subscribers fetch them only if they want to do so?

However, if it doesn’t fit your needs, depending on the Hub you’re using, you can decide to send a fat ping. But be careful, it might be a paying option.

And what about WebHook’s issues? Is it all fixed?

If the verification of intent step prevents you from intentional DDos attacks and false notifications, other pitfalls are still to be considered.

So, when should I opt for one or the other?

WebHooks are absolutely perfect when you want to build workflows that react immediately to an event (Example: medical connected devices). On the other hand, WebSub is the one you should opt for if you want to provide events along with larger amounts of data (Example: a new blog post publication is an interesting event, but you’re also interested by the data behind it).

Conclusion

That’s it for the big picture. This post has no pretension to be exhaustive but much more to be an introduction to both solutions. If you want to go further, following resources might be useful:

WebHook:

  • WebHooks.org: If the Wiki has not been updated for a while it remains mostly up to date and the mailing list is still active.
  • What’s a WebHook by Nick Quinlan

WebSub:

Audrey Neveu

About Audrey Neveu

Audrey is full-stack developer at Saagie, specialized in APIs and Lucene based search engines. Heavily involved in the French wide Java Community, she’s part of Devoxx4Kids, a not-for-profit global initiative to get children coding and Devoxx France.