File format, mirroring and federation

Bonjour,

In the following months, in the context of the forgefriends project, I’ll work on the Gitea migration file format and mirroring projects in Gitea (not just git, also issues, etc.). My motivation is primarily because I think it provides an essential building block for federation.

Here is a high level view of my roadmap in the hope it will help the people funded to further federation in Gitea figure out where I’m headed and why. I very much look forward to reading their roadmap so that I can adjust mine accordingly. Ideally it will all fit nicely together :slight_smile:

Cheers


Maintaining and documenting a migration format

Gitea migration data structures and format as of January 2022

The Gitea migration data structures are used as a pivot when importing software projects from GitLab, GitHub, and more. It is internal, not documented and subject to breaking changes whenever a new Gitea version is released.

The dump-repo command dumps this data structures as YAML files to be read by the restore-repo command. It is not designed for archival because there is no guarantee that future Gitea versions are going to be able to read it back. It is however useful to temporarily store the data structures on disk when the import of a large project takes a long time. Or when creating a new software project from these structures uncovers bugs and requires multiple attempts to get it right.

The migration data structures are different from the database schema or the data structure used by the Gitea API.

Requirements for a durable Gitea migration format

In order for a software project to be dumped and successfully restored by future versions of Gitea, the migration file format must be:

  • Validated
  • Documented
  • Versioned
  • Backward compatible

File format validation

For each file format, a corresponding JSON schema is created to list the required fields, their data types, etc. See for instance the schema describing the file format of an issue.

Documented

The JSON schema includes a reference documentation of the semantic of each field. It is exhaustive and non ambiguous.

Versioned

A version number X.Y is included in each file. When validating the file, the JSON schema matching the version is used. Y is incremented every time the JSON schema changes. X is incremented when the JSON schema changes in a non backward compatible way.

Backward compatibility

Software reading a file with version X.Y is expected to also work when reading files with version X.Y+N. For instance older Gitea versions will be able to import a file from a newer Gitea version as long they are both compatible with version X of the format. However if an older version of Gitea only supports X-1, it will not be able to read the files.

Mirroring

Gitea mirroring as of January 2022

Mirroring is implemented in Gitea to push or pull git repositories. Other project information can be migrated but not mirrored.

The code use to migrate a project from one forge to another is neither idempotent nor incremental. If interrupted for any reason, it has to start over from scratch.

Mirroring a project as a whole

The codepath used to migrate a project is modified to be idempotent. It can resume when interrupted. It can also be run on a regular basis to mirror a project instead of migrating it.

Using the Gitea migration format for federation

Federating forges requires two kind of communication:

  • Notification (e.g., a pull request was merged)
  • Project state synchronization (e.g., closed pull request now closed, the modified state of the associated issues, the effect on milestone completion, etc.)

While notification is in the scope of ActivityPub, project state synchronisation is not. ActivityPub does not provide any kind of guarantee to ensure the consistency of a data set. The project state is best shared between federated Gitea forges using git.

5 Likes

Great to see this coming together so quickly. Looking through it I thought it might be worth mentioning that since the backend of Gitea is written in Golang and there’s going to be lots of chatter between the forges it might be worth looking at protobuf module - google.golang.org/protobuf - pkg.go.dev over JSON. Not just for speed but also for backwards compatability.

1 Like

I think you can also take a look at Github’s migration archive format. I currently are working on Add support to import repository data from an exported data of Github by lunny · Pull Request #18165 · go-gitea/gitea · GitHub and found something we could learn from that.

2 Likes

A short (7 minutes) video was published to explain why a file format would be generally useful for mirroring and federation.

1 Like