Migrating JSONB columns in Go

At incident.io, we are strong advocates for database migrations. Mastering the ability to execute them swiftly and securely is crucial for keeping up with the growth of our product.

While we have previously shared our guidelines for handling migrations safely, today we are delving into a specific topic – migrating JSONB columns.
It’s important to note that although we use Postgres, this approach is applicable to all SQL databases.

So, what happens when the structure of a JSON-stored type changes?
Following our playbook, the goal is to migrate all existing data into the new format. But how do we manage both formats simultaneously during the transition period?

Let’s consider a recent example: OR Conditions.
Throughout our application, we support the concept of conditions – “If A and B, do X”.
Recently, we introduced support for OR logic. For instance, “If (A and B) or (C and D), do X”.
Internally, we label each set of OR’d conditions as a condition group.




OR conditions in the Workflow builder

In the previous system, we stored our conditions as a JSON array of objects, with each object representing a condition. With the introduction of condition groups, we needed to store an array of arrays of objects.

Below is our original domain type representing the schema:
conditions is a JSONB Postgres column, which is unmarshalled into a slice of Go structs.



type EngineConfig struct {
ID string `json:"id" gorm:"type:text;primaryKey"`
Conditions Conditions `json:"conditions" gorm:"type:jsonb"`
}

type Conditions []Condition

type Condition struct {
Operation string `json:"operation"`
...
}

And here is our new domain type, now storing condition groups:



type EngineConfig struct {
ID string `json:"id" gorm:"type:text;primaryKey"`
ConditionGroups ConditionGroups `json:"conditions" gorm:"column:conditions;type:jsonb"`
}

type ConditionGroups [][]Condition

type ConditionGroup struct {
Conditions []Condition `json:"conditions"`
}

How do we transition data from the old format []Condition to the new []ConditionGroup?
The solution lies in interfaces sql.Scanner and driver.Valuer.



type Scanner interface {

Scan(src any) error
}

type Valuer interface {

Value() (Value, error)
}

By implementing Scan and Value on a column’s type, we can read and write data to and from the database, respectively.

Thus, we can provide custom marshaling logic to manage data in both formats securely.



func (g *ConditionGroups) Scan(val any) error {
bytes, ok := val.([]byte)
if !ok {
return errors.New("could not parse condition groups as []byte")
}

if err := json.Unmarshal(bytes, g); err == nil {

if len(*g) > 0 && (*g)[0].Conditions != nil {

return nil
}
}

conditions := []Condition{}
if err := json.Unmarshal(bytes, &conditions); err != nil {
return errors.New("could not unmarshal bytes into []Condition")
}

if len(conditions) == 0 {

*g = EngineConditionGroups{}
return nil
}


*g = EngineConditionGroups{
EngineConditionGroup{
Conditions: conditions,
},
}
return nil
}

func (g EngineConditionGroups) Value() (driver.Value, error) {
return json.Marshal(g)
}

With our custom scanner in place, we can write data in the new format while still being able to read both old and new data.

This allows us to continue work on the rest of the codebase, enabling the change to progress through the backend and frontend simultaneously with the remaining migration tasks.

The subsequent step in the runbook involves backfilling all old data in an asynchronous process to unify the data state.

Once again, the scanner proves invaluable – we can simply read the data and immediately rewrite it.

Performing this operation in a batch with an `ON CONFLICT DO UPDATE` clause significantly improves performance.

Please note that we have also introduced a temporary indexed column `conditions_migrated` to assist with the backfill process.

“`go
func backfillConditions(ctx context.Context, db *gorm.DB) error {
maxExpectedRows := 300000
batchSize := 1000
maxIts := maxExpectedRows / batchSize

for i := 0; i < maxIts; i++ {
existing, err := domain.NewQuerier[domain.EngineConfig](db).
Filter(domain.EngineConfigBuilder(
domain.EngineConfigBuilder.ConditionsMigrated(false),
)).
Limit(batchSize).
Find(ctx)
if err != nil {
return err
}
if len(existing) == 0 {
break
}

payloads := []domain.Escalation{}
for _, e := range existing {
payloads = append(payloads, domain.EngineConfigBuilder(
domain.EngineConfigBuilder.ID(e.ID),
domain.EngineConfigBuilder.OrganisationID(e.OrganisationID),
domain.EngineConfigBuilder.Conditions(e.Conditions),
domain.EngineConfigBuilder.ConditionsMigrated(true),
).Build())
}

res := db.
Clauses(clause.OnConflict{
Columns: []clause.Column{{Name: “id”}},
DoUpdates: clause.AssignmentColumns([]string{
“conditions”,
“conditions_migrated”,
}),
}).
Create(ctx, payloads)
if res.Error != nil {
return res.Error
}
}

fmt.Println(“Finished!”)
return nil
}
“`

Once the backfill operation is complete, the data will be in a consistent state. You can then remove the custom scanner and proceed with your tasks. sentence in a different way:

The dog eagerly chased after the squirrel in the park.

In the park, the dog eagerly pursued the squirrel.

Leave a Reply

Your email address will not be published. Required fields are marked *