[solved] Database issue with SQL queries containing emoji

Hi folks,
I recently ran into the Emoji mysql issue.
I notices the error only occures if the browser / chrome replaces the emoji instead of keeping the :emoji: syntax.

Here is log output when trying to add an issue containing the emoji data:

2020/12/04 09:48:02 ...m.io/xorm/core/tx.go:130:ExecContext() [I] [SQL] INSERT INTO `issue` (`repo_id`,`poster_id`,`original_author`,`original_author_id`,`name`,`content`,`milestone_id`,`priority`,`is_closed`,`is_pull`,`num_comments`,`ref`,`deadline_unix`,`created_unix`,`updated_unix`,`closed_unix`,`is_locked`,`index`) SELECT ?,?,?,?,?,?,?,?,false,false,?,?,?,?,?,?,false,coalesce(MAX(`index`),0)+1 FROM `issue` WHERE (repo_id=?) [1 1  0 parsed emoji 💣  testhing this\01503d
        \01503d
         0 0 0  0 1607075282 1607075282 0 1] - 991.735µs
2020/12/04 09:48:02 ...o/xorm/session_tx.go:46:Rollback() [I] [SQL] ROLL BACK [] - 448.716µs
2020/12/04 09:48:02 models/issue.go:956:NewIssue() [E] NewIssue: error attempting to insert the new issue; will retry. Original error: Error 1366: Incorrect string value: '\xF0\x9F\x92\xA3  ...' for column `giteadb`.`issue`.`content` at row 1

Whats the state of this issue? - Are the emojis now supported wir mariadb + utf8mb4 ? - Or did I missed anything in the install instructions?

my docker-compose looks like this: (I didn’t paste the full one with traefik, secrets and network details)

...
...
services:
  db:
    image: mariadb:10
    restart: unless-stopped
    secrets:
      - "MYSQL_ROOT_PASSWORD"
      - "MYSQL_PASSWORD"
    environment:
      - "MYSQL_ROOT_PASSWORD_FILE=/run/secrets/MYSQL_ROOT_PASSWORD"
      - "MYSQL_DATABASE=foo"
      - "MYSQL_USER=bar"
      - "MYSQL_PASSWORD_FILE=/run/secrets/MYSQL_PASSWORD"
    volumes:
      - db:/var/lib/mysql
  gitea:
    image: gitea/gitea:1.12
    ports:
      - "3000:3000"
      - "222:22"
    depends_on:
      - db
    restart: unless-stopped
    environment:
      - "APP_NAME=MyGitea"
      - "RUN_MODE=prod"
      - "DOMAIN=foobar.com"
      - "SSH_PORT=222"
      - "SSH_LISTEN_PORT=22"
      - "DISABLE_SSH=true"
      - "ROOT_URL=https://gitea.foobar.com"
      - "LFS_START_SERVER=true"
      - "DB_TYPE=mysql"
      - "DB_HOST=db:3306"
      - "DB_NAME=foo"
      - "DB_USER=bar"
      #- "DB_PASSWD=``"
      - "REQUIRE_SIGIN_VIEW=true"
...
...

The default encoding seems to be fine:

image

But tables like issue still differs:

SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "giteadb"
  AND T.table_name = "issue";

+--------------------+
| character_set_name |
+--------------------+
| utf8               |
+--------------------+

issue table: columns details:

+--------------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| Field              | Type         | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment |
+--------------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| id                 | bigint(20)   | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| repo_id            | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| index              | bigint(20)   | NULL            | YES  |     | NULL    |                | select,insert,update,references |         |
| poster_id          | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| original_author    | varchar(255) | utf8_general_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| original_author_id | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| name               | varchar(255) | utf8_general_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| content            | text         | utf8_general_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| milestone_id       | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| priority           | int(11)      | NULL            | YES  |     | NULL    |                | select,insert,update,references |         |
| is_closed          | tinyint(1)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| is_pull            | tinyint(1)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| num_comments       | int(11)      | NULL            | YES  |     | NULL    |                | select,insert,update,references |         |
| ref                | varchar(255) | utf8_general_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| deadline_unix      | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| created_unix       | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| updated_unix       | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| closed_unix        | bigint(20)   | NULL            | YES  | MUL | NULL    |                | select,insert,update,references |         |
| is_locked          | tinyint(1)   | NULL            | NO   |     | 0       |                | select,insert,update,references |         |
+--------------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+

I’m really not a sql pro. So I’m not quite sure if it depende on the encoding, collation …


Update:
If found the doc section database-prep. Does there instructions also affect / apply to a docker based installation? - In this case there really should be a link to the topic!

You should also set

[database]
charset=utf8mb4

and covert the original tables from utf8 to utf8mb4

Do you mean the mysql client or server config?

I mean Gitea’s configuration file.

1 Like

Thanks, I’ll try this tomorrow. Would it help other users, if I create a PR which contains this in the example docker config?

Just an update: I could my problems with:

  • setting the database charset in the app.ini to utf8mb4
  • Exporting the database with adminer tool.
  • Setting the correct default server collation (utf8_general_ci) and encoding (utf8mb4)
  • Replacing the table charset value from utf8 in to utf8mb4

Example:
image

  • And finally importing the sql dump again.

Thanks again @lunny for your help! :beers:

I’m sorry I have forgot we have a command to do that. ./gitea convert could help you to convert the database automatically. You can type ./gitea convert --help to know how to use it.

1 Like