Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
translating every text layer
#6
If your text layer is  "Déjà" and you retrieve the text/markup, you get 'D\xc3\xa9j\xc3\xa0'. The len() of this is 6, when Déjà is 4 characters. This is because what you got is a sequence of bytes which is the UTF-8 encoding of the Unicode representation of "Déjà" , where the  é (U+00E9) and the à (U+00E0) are replaced by their UTF-8 encodings(*), that use two bytes.

This is because in Python2 plain strings are just arrays of bytes. Since Gimp supports the whole Unicode set, when you obtain a text from the Gimp API, Gimp returns the UTF-8 encoding of the text.

Python2 however supports text that uses all Unicode characters, using the  unicode type, and you can convert string to unicode and vice-versa using decode() and encode() methods.

So, if we go back to the text layer, and do pdb.gimp_text_layer_get_text(layer).decode('utf-8'), we get a unicode object that has a length of 4 and is u'D\xe9j\xe0', so non ASCII characters are replaced by their Unicode encoding that fits a single element of the sequence.



(*) technically,  the whole string is encoded in UTF-8, but, by design, the plain ASCII characters (up to 0x7F) are UTF-8 encoded by themselves so when you are only concerned by American English not handling UTF-8 sort of works.
Reply


Messages In This Thread
translating every text layer - by jacques_duflos - 07-12-2023, 11:42 PM
RE: translating every text layer - by Ofnuts - 07-13-2023, 06:51 AM
RE: translating every text layer - by Ofnuts - 07-19-2023, 06:37 AM
RE: translating every text layer - by Ofnuts - 07-21-2023, 07:31 AM
RE: translating every text layer - by Ofnuts - 07-25-2023, 06:23 AM

Forum Jump: