Script to download all repos for one user

How would i go about this?

I can query all repos from 1 user with:
curl -X ‘GET’
https://xxx.xxx.xxx/api/v1/repos/search?access_token=111111111111111111

I get a long json formatted string with all the info but I’m lost as to what to do next.
Any hints appreciated.

You could use a scripting language like Python to accomplish this:

import requests  # must install this module with pip or package manager
from git import Repo  # must install gitpython module

def main():
    host = "https://try.gitea.io"
    token = "your_token_here"

    # Page through repository search endpoint until we stop getting data
    page = 0
    repositories = []
    r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))
    while len(r.json()["data"]):
        repositories.extend(r.json()["data"])
        page = page + 1
        r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))

    # Loop through each repository returned, cloning it over SSH
    for repository in repositories:
        Repo.clone_from(repository["ssh_url"], "repos/" + repository["full_name"])

if __name__ == "__main__":
    main()

I tested this time around and it runs fine for me against try.gitea.io.

Hi Jake,
Thanks for the idea. I have to admit I know nothing about python, but I tried, imported both modules with pip, changed the values (token, url) that needed to be changed, executed with
python backup_gitea.py and this is the result:

Traceback (most recent call last):
File “my_path/backup_gitea.py”, line 14, in
main()
File “my_path/backup_gitea.py”, line 10, in main
for repository in r.json()[“data”]:
File “/usr/lib/python3/dist-packages/requests/models.py”, line 900, in json
return complexjson.loads(self.text, **kwargs)
File “/usr/lib/python3.10/json/init.py”, line 346, in loads
return _default_decoder.decode(s)
File “/usr/lib/python3.10/json/decoder.py”, line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python3.10/json/decoder.py”, line 355, in raw_decode
raise JSONDecodeError(“Expecting value”, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

As said, not familiar with python at all, no idea what’s wrong but obviously not working for me.

Thanks again

I noticed that in your script, it’s supposed to download from repository.clone_url.
I checked in the json that’s returned to the url I used in my initial question that this value is nowhere to be found in the returned json.

No worries. I think the URL might have been setup wrong in your Python file, so instead of grabbing JSON data it got HTML data from a 404 page (hence the JSONDecodeError, it just means it didn’t get proper JSON data). You could further debug that by printing out what data is actually being returned like print(r.text) if you want to see what went wrong.

I’ve edited my original post/script now that I’ve had time to play around with it and now it actually works :smile:. You’ll just have to replace try.gitea.io with your Gitea server’s hostname. You may also want to change where it outputs repositories, but it should be pretty simple (right now the "repos/" + repository["full_name"] just evaluates to repos/username/repository/

1 Like

Thank you so much, it now works for me as well with the caveat that I have to enter my userid and pass for every repository.

Is there a way I could add those as environment for the script? And if yes where should they fit. I mean something like
user = “this_is_me”
pass = “pass”

Repo.clone_from(repository[“user+pass+clone_url”], “repos/” + repository[“full_name”])

Would that work?
Ok I now know it doesn’t work.
I’ll keep looking for the right way to do it.

But again much appreciated.

In the meantime I’ve made all my repositories public which is one way of solving this issue.
It now works as a charm.
Once again thanks a lot. Really useful for me.
May I suggest you post that script on github? If it’s useful for me it probably will be for others.

CORRECTION:
I have 41 repos and it downloaded 30, no error messages nothing. Double checked, they’re all public.
Don’t have time now but will try later to debug as you suggested to see what happens. Will keep you posted.

Checked the .json file returned by your first request and it does return exactly 30 “data” items. Not sure why…
Anyway the problem’s with gitea. I even tried the request without token (now that the repos are public) and it still returns the first 30 in alphabetical order.
So your script is fine, the problem is either with gitea or with the request (maybe something to do with pagination???).

OK, problem solved.
After looking at the API docs for Gitea, I realized there’s a search parameter called limit which tells it how many items per page (so I was right, had to do with pagination!!!.
I set the limit at 80 and it returned all 41 repos.
So your request line should be something like:

r = requests.get("https://try.gitea.io/api/v1/repos/search?limit=limit)

No need for the token since the repos are public (and if they’re not it asks for the credential before cloning each and every repo) and there should be a variable called “limit” set high enough to guarantee that all repos are going to be returned.

Sorry if I’m so talkative but this was really bothering me. I mean why 30?
Anyway that’s it, as said now I know.

I edited the post again, this time I made it clone over SSH since using an SSH key is a lot simpler than adding logic to modify the HTTP URL to add username and password. Let me know if you can only do HTTP cloning and I can add that.

I also fixed the issue where it would only get 30 repositories. It looks like that’s the default number returned by Gitea, so I bumped it to the max (limit=50) and also added logic to hit every page page=0,1,2,etc. until Gitea stops returning data so that should future-proof the script if you go over 50.

No problem! Feel free to share it if you run into anyone it might help.

Didn’t work for me cause ssh is not on port 22.
Maybe variable for port #?

Are your SSH clone URLs also wrong in the web interface? You should be able to fix it in Gitea’s config SSH_PORT (see Config Cheat Sheet - Docs).

If you don’t have access to configure Gitea, you could do a workaround in ~/.ssh/config like:

Host example.com
     Port 2222

Hi Jake,
I’ve setup my Gitea instance a long time ago and forgot that I’ve not opened the port to Internet.
My server is on the public net and I’m rather cautious with security.
Until this mass download need came along I had no reason to, and to tell you the truth your script without ssh answers my needs quite nicely, although as an ex programmer the non elegance of having to plug in the arbitrary value for the limit does irk me I can live with that.

And also for obvious security and also cause it just clogs up the logs I NEVER have ssh running on 22 except on LANs.

This being said I could open Gitea ssh, but not on 22, it would have to be another one, which would still mean that your script would have to accept a value other than 22 for the port.

For your pleasure, not knowing much about python I did ask Chatgpt by first submitting your script and ask if “he” had something to suggest to correct the problem. He came up with one script and when that didn’t work another. Neither does work (I suspect that last because my gitea ssh was not opened) and I’ll quote them below for your reading pleasure.

So the first:
import requests # must install this module with pip or package manager
from git import Repo # must install gitpython module

def main():
host = “https://try.gitea.io
token = “your_token_here”
ssh_port = “1011” # the custom SSH port number you want to use

# Page through repository search endpoint until we stop getting data
page = 0
repositories = []
r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))
while len(r.json()["data"]):
    repositories.extend(r.json()["data"])
    page = page + 1
    r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))

# Loop through each repository returned, cloning it over SSH
for repository in repositories:
    ssh_url = repository["ssh_url"]
    custom_ssh_command = "ssh -p {} -o StrictHostKeyChecking=no".format(ssh_port)
    Repo.clone_from(ssh_url, "repos/" + repository["full_name"], ssh_command=custom_ssh_command)

if name == “main”:
main()

And the second:

import requests # must install this module with pip or package manager
from git import Repo # must install gitpython module

def main():
host = “https://try.gitea.io
token = “your_token_here”
ssh_port = “1011” # the custom SSH port number you want to use

# Page through repository search endpoint until we stop getting data
page = 0
repositories = []
r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))
while len(r.json()["data"]):
    repositories.extend(r.json()["data"])
    page = page + 1
    r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))

# Loop through each repository returned, cloning it over SSH
for repository in repositories:
    ssh_url = repository["ssh_url"].replace("https://", "")  # Remove "https://" from the SSH URL
    custom_ssh_command = "ssh -p {} -o StrictHostKeyChecking=no".format(ssh_port)
    Repo.clone_from("git@{}:{}.git".format(host, ssh_url), "repos/" + repository["full_name"], ssh_command=custom_ssh_command)

if name == “main”:
main()

Cheers!