Welcome, readers! Our On-call product has been out in the wild for a few months now, and I’m excited to dive into the journey of building a time-sensitive system. In this post, we’ll explore the challenges we faced, what our scheduler handles, the fundamentals of working with time, and how we tested our system.
What’s involved with building an on-call scheduler?
Our On-call product revolves around alerting, scheduling, escalations, and paging. Today, we’ll zoom in on our scheduler, which is responsible for determining which users are on shift at any given time based on the schedule configuration.
So, what does a schedule configuration entail? Our schedules consist of independent rotations, each including on-call users and rules for defining:
- When the rotation is active
- When on-call duty transitions between users
These rotations can either be always-on or follow defined working hours, such as 9-to-5 shifts. Most rotations assign users to shifts on a round-robin basis, with additional complexities like shift overrides and irregular handover periods.
Building a system for this may seem straightforward, especially with modern databases and libraries for handling time. However, the intricacies of time zones, DST, and date calculations can add layers of complexity that require careful consideration.
Components of time
When working with time, precision is key. We often take local time for granted, but creating a global scheduler necessitates a deeper understanding of different time components:
Local time
Local time, also known as “wall-clock” time, is the time we interact with daily. It’s crucial for our users to work in local time and view outputs in their time zone.
Time offsets
Time offsets represent the difference between local time and UTC or GMT. These offsets can vary based on geographical location and observance of DST rules.
Timezones
Timezones are fixed labels representing a specific time offset. Countries that follow DST have standardized rules stored in the tz database, maintained by IANA.
Timestamps/Instants
Timestamps denote specific points in time, usually measured from a well-known reference point. They provide unambiguous time references for calculations.
Durations
Durations measure the passing of time between two points. Absolute durations are straightforward, but relative durations can be affected by factors like DST changes.
The build
On the server
We utilize Postgres for data storage, leveraging the timestamptz
column type to handle time conversions. By working with instants and converting to local times when necessary, we maintain consistency and accuracy in our scheduling logic.
The UI
Displaying schedules in the user’s local timezone has been a learning experience. Flexibility in timezone representation and tools like the Sherlock library for natural language input have enhanced user experience.
Testing and reliability
Reliability is paramount for our on-call product. Robust testing strategies, including unit testing, snapshot testing, and runtime auditing, ensure the scheduler performs accurately and consistently.
Wrap up
Exploring the complexities of time management in a global on-call system has been enlightening. Building and testing our scheduler has been a journey filled with challenges and learnings. If you’re navigating similar territory, remember to prioritize precision, flexibility, and reliability in your time-based systems.
Stay in touch
Want to dive deeper into topics like time management? Subscribe to receive our weekly articles and stay informed. Just drop your email, and we’ll keep you updated.