Script to download all repos for one user

airdogvan · February 17, 2023, 6:30pm

How would i go about this?

I can query all repos from 1 user with:
curl -X ‘GET’
‘https://xxx.xxx.xxx/api/v1/repos/search?access_token=111111111111111111’

I get a long json formatted string with all the info but I’m lost as to what to do next.
Any hints appreciated.

jake · February 17, 2023, 7:34pm

You could use a scripting language like Python to accomplish this:

import requests  # must install this module with pip or package manager
from git import Repo  # must install gitpython module

def main():
    host = "https://try.gitea.io"
    token = "your_token_here"

    # Page through repository search endpoint until we stop getting data
    page = 0
    repositories = []
    r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))
    while len(r.json()["data"]):
        repositories.extend(r.json()["data"])
        page = page + 1
        r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))

    # Loop through each repository returned, cloning it over SSH
    for repository in repositories:
        Repo.clone_from(repository["ssh_url"], "repos/" + repository["full_name"])

if __name__ == "__main__":
    main()

I tested this time around and it runs fine for me against try.gitea.io.

airdogvan · February 17, 2023, 8:11pm

Hi Jake,
Thanks for the idea. I have to admit I know nothing about python, but I tried, imported both modules with pip, changed the values (token, url) that needed to be changed, executed with
python backup_gitea.py and this is the result:

Traceback (most recent call last):
File “my_path/backup_gitea.py”, line 14, in
main()
File “my_path/backup_gitea.py”, line 10, in main
for repository in r.json()[“data”]:
File “/usr/lib/python3/dist-packages/requests/models.py”, line 900, in json
return complexjson.loads(self.text, **kwargs)
File “/usr/lib/python3.10/json/init.py”, line 346, in loads
return _default_decoder.decode(s)
File “/usr/lib/python3.10/json/decoder.py”, line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python3.10/json/decoder.py”, line 355, in raw_decode
raise JSONDecodeError(“Expecting value”, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

As said, not familiar with python at all, no idea what’s wrong but obviously not working for me.

Thanks again

I noticed that in your script, it’s supposed to download from repository.clone_url.
I checked in the json that’s returned to the url I used in my initial question that this value is nowhere to be found in the returned json.

jake · February 18, 2023, 2:28am

No worries. I think the URL might have been setup wrong in your Python file, so instead of grabbing JSON data it got HTML data from a 404 page (hence the JSONDecodeError, it just means it didn’t get proper JSON data). You could further debug that by printing out what data is actually being returned like print(r.text) if you want to see what went wrong.

I’ve edited my original post/script now that I’ve had time to play around with it and now it actually works . You’ll just have to replace try.gitea.io with your Gitea server’s hostname. You may also want to change where it outputs repositories, but it should be pretty simple (right now the "repos/" + repository["full_name"] just evaluates to repos/username/repository/

airdogvan · February 18, 2023, 6:53pm

Thank you so much, it now works for me as well with the caveat that I have to enter my userid and pass for every repository.

Is there a way I could add those as environment for the script? And if yes where should they fit. I mean something like
user = “this_is_me”
pass = “pass”

Repo.clone_from(repository[“user+pass+clone_url”], “repos/” + repository[“full_name”])

Would that work?
Ok I now know it doesn’t work.
I’ll keep looking for the right way to do it.

But again much appreciated.

In the meantime I’ve made all my repositories public which is one way of solving this issue.
It now works as a charm.
Once again thanks a lot. Really useful for me.
May I suggest you post that script on github? If it’s useful for me it probably will be for others.

CORRECTION:
I have 41 repos and it downloaded 30, no error messages nothing. Double checked, they’re all public.
Don’t have time now but will try later to debug as you suggested to see what happens. Will keep you posted.

Checked the .json file returned by your first request and it does return exactly 30 “data” items. Not sure why…
Anyway the problem’s with gitea. I even tried the request without token (now that the repos are public) and it still returns the first 30 in alphabetical order.
So your script is fine, the problem is either with gitea or with the request (maybe something to do with pagination???).

airdogvan · February 18, 2023, 10:17pm

OK, problem solved.
After looking at the API docs for Gitea, I realized there’s a search parameter called limit which tells it how many items per page (so I was right, had to do with pagination!!!.
I set the limit at 80 and it returned all 41 repos.
So your request line should be something like:

r = requests.get("https://try.gitea.io/api/v1/repos/search?limit=limit)

No need for the token since the repos are public (and if they’re not it asks for the credential before cloning each and every repo) and there should be a variable called “limit” set high enough to guarantee that all repos are going to be returned.

Sorry if I’m so talkative but this was really bothering me. I mean why 30?
Anyway that’s it, as said now I know.

jake · February 18, 2023, 10:49pm

I edited the post again, this time I made it clone over SSH since using an SSH key is a lot simpler than adding logic to modify the HTTP URL to add username and password. Let me know if you can only do HTTP cloning and I can add that.

I also fixed the issue where it would only get 30 repositories. It looks like that’s the default number returned by Gitea, so I bumped it to the max (limit=50) and also added logic to hit every page page=0,1,2,etc. until Gitea stops returning data so that should future-proof the script if you go over 50.

No problem! Feel free to share it if you run into anyone it might help.

airdogvan · February 19, 2023, 1:23am

Didn’t work for me cause ssh is not on port 22.
Maybe variable for port #?

jake · February 21, 2023, 4:54pm

Are your SSH clone URLs also wrong in the web interface? You should be able to fix it in Gitea’s config SSH_PORT (see Config Cheat Sheet - Docs).

If you don’t have access to configure Gitea, you could do a workaround in ~/.ssh/config like:

Host example.com
     Port 2222

airdogvan · February 21, 2023, 6:05pm

Hi Jake,
I’ve setup my Gitea instance a long time ago and forgot that I’ve not opened the port to Internet.
My server is on the public net and I’m rather cautious with security.
Until this mass download need came along I had no reason to, and to tell you the truth your script without ssh answers my needs quite nicely, although as an ex programmer the non elegance of having to plug in the arbitrary value for the limit does irk me I can live with that.

And also for obvious security and also cause it just clogs up the logs I NEVER have ssh running on 22 except on LANs.

This being said I could open Gitea ssh, but not on 22, it would have to be another one, which would still mean that your script would have to accept a value other than 22 for the port.

For your pleasure, not knowing much about python I did ask Chatgpt by first submitting your script and ask if “he” had something to suggest to correct the problem. He came up with one script and when that didn’t work another. Neither does work (I suspect that last because my gitea ssh was not opened) and I’ll quote them below for your reading pleasure.

So the first:
import requests # must install this module with pip or package manager
from git import Repo # must install gitpython module

def main():
host = “https://try.gitea.io”
token = “your_token_here”
ssh_port = “1011” # the custom SSH port number you want to use

# Page through repository search endpoint until we stop getting data
page = 0
repositories = []
r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))
while len(r.json()["data"]):
    repositories.extend(r.json()["data"])
    page = page + 1
    r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))

# Loop through each repository returned, cloning it over SSH
for repository in repositories:
    ssh_url = repository["ssh_url"]
    custom_ssh_command = "ssh -p {} -o StrictHostKeyChecking=no".format(ssh_port)
    Repo.clone_from(ssh_url, "repos/" + repository["full_name"], ssh_command=custom_ssh_command)

if name == “main”:
main()

And the second:

import requests # must install this module with pip or package manager
from git import Repo # must install gitpython module

def main():
host = “https://try.gitea.io”
token = “your_token_here”
ssh_port = “1011” # the custom SSH port number you want to use

# Page through repository search endpoint until we stop getting data
page = 0
repositories = []
r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))
while len(r.json()["data"]):
    repositories.extend(r.json()["data"])
    page = page + 1
    r = requests.get("{}/api/v1/repos/search?limit=50&page={}&token={}".format(host, page, token))

# Loop through each repository returned, cloning it over SSH
for repository in repositories:
    ssh_url = repository["ssh_url"].replace("https://", "")  # Remove "https://" from the SSH URL
    custom_ssh_command = "ssh -p {} -o StrictHostKeyChecking=no".format(ssh_port)
    Repo.clone_from("git@{}:{}.git".format(host, ssh_url), "repos/" + repository["full_name"], ssh_command=custom_ssh_command)

if name == “main”:
main()

Cheers!

Topic		Replies	Views
[solved] Downloading single file from repo using Curl and authentication withToken Gitea Usages	0	1740	April 14, 2023
How can I create repos from a command script on a remote system (http or ssh?) General	5	1562	February 16, 2023
Howto create repository using api API/SDK/Tea/HelmChart/Terraform	17	8698	October 12, 2023
Create a repo with API / curl API/SDK/Tea/HelmChart/Terraform	1	3884	October 16, 2017
Mass repository migration Install/Maintain/Configure	4	6313	November 28, 2017

Script to download all repos for one user

Related Topics