Project CHARSET (codepage)

I am newbie that’s why possible that it’s already done and I need just find it. But anyway I tried - no luck. My project written in not UTF8 CHARSET. Where I can change codepage (charset) just for code of my project? my non latin symbols looking completely unreadable, I am talking about comments inside code for example. I can’t convert project to UTF or any other codepage - IDE work so. If it’s not possible to change, could you add this functionality in future release?

I’m not aware of any settings in Gitea to help you with this, but I’d suggest using the .gitattributes file to specify your charset and have the repository in utf-8 (that Gitea can understand) but your checkout in your correct charset.

You can try this in a new repository and see if it works for you. The file should be added at the top folder of your project and contain entries like this:

*.js text working-tree-encoding=LATIN-GREEK
*.sql text working-tree-encoding=LATIN-GREEK

Please read the pitfalls in the linked git documentation, as this change is not without them. Also, you need the latest git both in your server and in your clients in order to support this (docs say March 2018 at the least, so I guess it should be >= 2.17).

1 Like

Currently all your text content will be converted to utf8 when rendering. The charset will be detected automatically. So a non-utf8 project should be also displayed well.

@lunny, many charsets look similar and they can’t possibly be told from each other (e.g. the different variations of ISO-8850-x or the old DOS code pages for different European countries). Gitea does make its best effort to translate, but your results may vary.

Didn’t try this yet

*.js text working-tree-encoding=LATIN-GREEK
*.sql text working-tree-encoding=LATIN-GREEK

My symbols looking so inside Gitea:

As I can understand and decode them it’s Windows-1251.
This is Cyrillic symbols and Structured Text (may be it reason how Gitea will interpretate charset)

Yes, that’s the expected outcome. First, the libraries used by Gitea only look for the first xxx bytes of the file to decide for an encoding (I don’t remember how many). Second… Yes, Windows-1251 is indistinguishable from Windows-1252 (Western European) unless you do some language analysis. And Windows-1252 takes precedence as a choice.

1 Like

Thank you, it’s work for me with line (added after 1 day use - lot of trouble don’t do such - just be in course - additional info below):

*.* text working-tree-encoding=Windows-1251

Сводка

For people who will follow by this steps:

  1. for create file .gitattributes in Windows you should name it like “.gitattributes.” (point in the end) - otherwise Windows say: “not possible create file with such name”.
  2. this line should be added before (or in the same time) files push on server. Otherwise files stay like they pushed before. Or re-pushed them by force somehow.
  3. It can give a lot of headache after. I don’t know how win this. Looking like:

error: failed to encode ‘filename’ from UTF-8 to Windows-1251

Check it all day. This method through working-tree-encoding is real pain in the neck.

Each time after adding this line I get changes in files. Just clone it in free place (without folder of project at all) I already get changes. Just clone and status - change is here.

If I commit this changes, I get automatic changes AND codepage changes in real file! Why? It’s not just interpretation when it readed - this method really change codepage somehow.

Plus each time it change codepage again and again - after few saves (two for me) I can’t decode text at all.

Not useful method in real life at all!

I don’t know is it bug or not, but I return to my suggestion and needs - I just need encoder inside Gitea!!!

My .gitattributes looking so (it's some 10th variation of it):
* text=auto

*.prg    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.st     text working-tree-encoding=Windows-1251 git-encoding=Windows-1251
*.var    text working-tree-encoding=Windows-1251 git-encoding=Windows-1251
*.typ    text working-tree-encoding=Windows-1251 git-encoding=Windows-1251

*.pkg    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.fun    text working-tree-encoding=Windows-1251 git-encoding=Windows-1251
*.lby    text working-tree-encoding=UTF-8        git-encoding=utf-8

*.layer  text working-tree-encoding=UTF-8        git-encoding=utf-8
*.clm    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.vcp    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.vcr    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.dob    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.vc     text working-tree-encoding=UTF-8        git-encoding=utf-8
*.vcvk   text working-tree-encoding=UTF-8        git-encoding=utf-8
*.tdc    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.tpr    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.txtgrp text working-tree-encoding=UTF-8        git-encoding=utf-8
*.vcs    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.page   text working-tree-encoding=UTF-8        git-encoding=utf-8
*.layer  text working-tree-encoding=UTF-8        git-encoding=utf-8
*.fninfo text working-tree-encoding=UTF-8        git-encoding=utf-8
*.bdr    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.bminfo text working-tree-encoding=UTF-8        git-encoding=utf-8
*.bmgrp  text working-tree-encoding=UTF-8        git-encoding=utf-8

*.hw     text working-tree-encoding=UTF-8        git-encoding=utf-8
*.hwl    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.per    text working-tree-encoding=UTF-8        git-encoding=utf-8
*.sw     text working-tree-encoding=UTF-8        git-encoding=utf-8
*.iom    text working-tree-encoding=Windows-1251 git-encoding=Windows-1251
*.vvm    text working-tree-encoding=Windows-1251 git-encoding=Windows-1251
*.role   text working-tree-encoding=UTF-8        git-encoding=utf-8
*.user   text working-tree-encoding=UTF-8        git-encoding=utf-8
*.dis    text working-tree-encoding=UTF-8        git-encoding=utf-8

*.jpg    binary
*.png    binary

Try it without git-encoding - woking almost the same.

Some additional info:

After I pull or clone project with .gitattributes local git start decode physically on hard disk drive look on files like they are in UTF-8 codepage and convert it to other codepage (Windows-1251 in my situation). Just need wait few minutes and more files will be decoded on HDD.
If you commit it and push, after that repeat pull or clone - local git again will look on them like on UTF-8 codepage files and will change codepage again. As much you commit them and pull, as much local git will decode them.

Full describes here: https://stackoverflow.com/questions/58352957/what-going-on-git-see-changes-without-changes

@VitaliyAT Sorry to hear that. It happens with this kind of change of .gitattributes when some files are changed and some are not. In my case we didn’t have an encoding problem but a crlf problem, which gave us the same symptoms. The problem is that your local git assumes that the rest of the repo matches the settings in .gitattributes when in fact it doesn’t. We “solved” that by patching all the affected files before adding the .gitattributes setting. For charset conversion… it can be more complicated than that (and exceeds my experience).

There is a setting in Gitea that might be of use for you:

[repository]
ANSI_CHARSET = Windows-1251

It turns off ANSI character set detection for all repositories in the system except for UTF-8, which always takes precedence.