svn-1655 invalid UTF-8

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #1697
    fizze
    Participant

    Seems to work fine, although on the slug the scanning performance seems to be considerably worse than ~1400something, I guess.

    I did a full rescan, this popped up in the logs:

    <25>Sep  8 11:33:12 mt-daapd[15554]: Invalid UTF-8 in /share/hdd/data/public/music/G-O/Jürgen von der Lippe/der blumenmann/vd lippe - 02 irrtümer.mp3
    <25>Sep 8 11:33:13 mt-daapd[15554]: Invalid UTF-8 in /share/hdd/data/public/music/G-O/Jürgen von der Lippe/der blumenmann/vd lippe - 03 und´nen bauch hab ich auch.mp3
    <25>Sep 8 11:33:30 mt-daapd[15554]: Invalid UTF-8 in /share/hdd/data/public/music/G-O/Jürgen von der Lippe/der blumenmann/vd lippe - 17 rock´n´roll, du hast mich nie geliebt.mp3

    I guess it does not like the hyphens in the latter 2 examples and the umlaut “ü” in the first one. So, any checks on the file to perform wether its actually illegal or a bug in firefly?

    edit:
    I tried to access the library from my SoundBridge. It could connect, but failed to get the browse artists info. (timeout probably).
    I got tons of these in the log:

    <25>Sep  8 11:52:14 mt-daapd[16181]: Error writing to client socket: Broken pipe
    #12426
    rpedde
    Participant

    @fizze wrote:

    Seems to work fine, although on the slug the scanning performance seems to be considerably worse than ~1400something, I guess.

    I did a full rescan, this popped up in the logs:

    Sep  8 11:33:12 mt-daapd[15554]: Invalid UTF-8 in /share/hdd/data/public/music/G-O/Jürgen von der Lippe/der blumenmann/vd lippe - 02 irrtümer.mp3
    Sep 8 11:33:13 mt-daapd[15554]: Invalid UTF-8 in /share/hdd/data/public/music/G-O/Jürgen von der Lippe/der blumenmann/vd lippe - 03 und´nen bauch hab ich auch.mp3
    Sep 8 11:33:30 mt-daapd[15554]: Invalid UTF-8 in /share/hdd/data/public/music/G-O/Jürgen von der Lippe/der blumenmann/vd lippe - 17 rock´n´roll, du hast mich nie geliebt.mp3

    I guess it does not like the hyphens in the latter 2 examples and the umlaut “ü” in the first one. So, any checks on the file to perform wether its actually illegal or a bug in firefly?

    edit:
    I tried to access the library from my SoundBridge. It could connect, but failed to get the browse artists info. (timeout probably).
    I got tons of these in the log:

    Sep  8 11:52:14 mt-daapd[16181]: Error writing to client socket: Broken pipe

    were you trying the mp3_tag_codepage stuff?

    #12427
    fizze
    Participant

    *cough*
    I should check the config file before posting stuff here 😉
    Mea culpa.

    Anywho, what’s with the hundreds of lines about the broken pipe there? Is that really just the SoundBridge timeing out during a full scan on the slug?

    #12428
    rpedde
    Participant

    @fizze wrote:

    *cough*
    I should check the config file before posting stuff here 😉
    Mea culpa.

    Anywho, what’s with the hundreds of lines about the broken pipe there? Is that really just the SoundBridge timeing out during a full scan on the slug?

    D’oh?

    #12429
    fizze
    Participant

    Hm, also, I tried to play with the codepage conversion parameter. Didn’t really find a setting that fits.

    Also googling didn’t actually reveal a tool to identify the codepage an ID3 tag is encoded in? (neither id3v1, id3v2)

    At least this setting seems to do something.
    When I set it to ISO-8859-1 Im getting other _weird_ characters instead of, say, german umlauts.

    I think the proble might be that my mp3 library spans quite some years, and thus lots of sources for tags. I remember old ones using XING mp3 encoder, mp3enc, lame, mp3compressor etc etc. So likely there is a huge variety of encodings. I’ve got used to it, kinda, though. 😉

    #12430
    rpedde
    Participant

    @fizze wrote:

    Hm, also, I tried to play with the codepage conversion parameter. Didn’t really find a setting that fits.

    Also googling didn’t actually reveal a tool to identify the codepage an ID3 tag is encoded in? (neither id3v1, id3v2)

    At least this setting seems to do something.
    When I set it to ISO-8859-1 Im getting other _weird_ characters instead of, say, german umlauts.

    I think the proble might be that my mp3 library spans quite some years, and thus lots of sources for tags. I remember old ones using XING mp3 encoder, mp3enc, lame, mp3compressor etc etc. So likely there is a huge variety of encodings. I’ve got used to it, kinda, though. 😉

    The thing is, the id3v2 spec says there are only three encodings for id3v2 tags: iso-8859-1, utf-8, and ucs-2. If a tag is utf-8, I keep it, as I’m maintaining strings internally as utf-8. If it’s iso-8859-1, I do a iso-8859-1 to utf-8 conversion.

    But sometimes, on stupid taggers, they mark the tag as encoded iso-8859-1, but are really something else — shift-jis or something. This seems to be a particular problem with chinese and japanese tags, I guess, as that’s who I hear the most complaints about this.

    Of course, if you try and set a different codepage, but it really is encoded right in the id3 tag, then you’ll get junk when you do the conversion from the wrong codepage.

    Of course, I maintain that the best idea in the first place is to just clean up your tags, but hey… that’s just me.

    — Ron

Viewing 6 posts - 1 through 6 (of 6 total)
  • The forum ‘Nightlies Feedback’ is closed to new topics and replies.