Reply To: svn-1655 invalid UTF-8

#12430
rpedde
Participant

@fizze wrote:

Hm, also, I tried to play with the codepage conversion parameter. Didn’t really find a setting that fits.

Also googling didn’t actually reveal a tool to identify the codepage an ID3 tag is encoded in? (neither id3v1, id3v2)

At least this setting seems to do something.
When I set it to ISO-8859-1 Im getting other _weird_ characters instead of, say, german umlauts.

I think the proble might be that my mp3 library spans quite some years, and thus lots of sources for tags. I remember old ones using XING mp3 encoder, mp3enc, lame, mp3compressor etc etc. So likely there is a huge variety of encodings. I’ve got used to it, kinda, though. 😉

The thing is, the id3v2 spec says there are only three encodings for id3v2 tags: iso-8859-1, utf-8, and ucs-2. If a tag is utf-8, I keep it, as I’m maintaining strings internally as utf-8. If it’s iso-8859-1, I do a iso-8859-1 to utf-8 conversion.

But sometimes, on stupid taggers, they mark the tag as encoded iso-8859-1, but are really something else — shift-jis or something. This seems to be a particular problem with chinese and japanese tags, I guess, as that’s who I hear the most complaints about this.

Of course, if you try and set a different codepage, but it really is encoded right in the id3 tag, then you’ll get junk when you do the conversion from the wrong codepage.

Of course, I maintain that the best idea in the first place is to just clean up your tags, but hey… that’s just me.

— Ron