FireFly Media Server

23/01/2008 at 3:27 PM #8757

Participant

Well, yes and no.

For an index to work you need to re-write the query.
I wasn’t aware that sqlite supports indices until now. (Is that a new thing in sqlite3 – anyone?) I did some comparisions and figured that the SELECT DISTINCT is a lot faster than a SELECT … GROUP BY.

On a proper SQL database, the GROUP BY is the way to go, but the difference is slim to none. On sqlite the performance hit of the GROUP BY is almost 100%. Maybe the index changes that. I’ll try that on the slug later today. 😉

edit:
Ok, here come the sqlite3 impressions:

Verdict NÂ°1:
I think the browse by artist, album etc queries should look like this:

SELECT DISTINCT MIN(ALBUM) FROM songs GROUP BY UPPER(REPLACE (ALBUM,' ',''));

That way, albums like “hello kitty”, “Hello Kitty” and “Hellokitty” would appear as a single entry. The min of those would be displayed, in this case it would be “hello kitty”. It has always nerved me when I had slightly untidy metadata to get multiple artist / album entries.

Unfortunately the performance hit seems to be hideous. Now from this point I cant tell wether the screen output of the sqlite3 client on the slug is actually that slow, or wether its just the ssh terminal.
Either way, the performance seems to be 2-3x worse than compared to the traditional

SELECT DISTINCT artist FROM songs ORDER BY artist;

I used a shell script and date to measure the execution time. Its not rocket science, but it should do the trick.
My slug was far from being idle when I crunched those numbers. So it would be nice if someone (perhaps with a bigger library) could give some numbers as well?
Here’s the script:

#!/bin/bash

echo "This is the old way:"

echo "SELECT DISTINCT $1 FROM songs ORDER BY $1"

time1=`date +%s%N`

sqlite3 /opt/var/mt-daapd/songs3.db "SELECT DISTINCT $1 FROM songs ORDER BY $1" >/dev/null

time2=`date +%s%N`

runtime1=`calc a=$time2 - $time1, a / 10^9`

echo "Execution time: $runtime1 seconds"





echo ""

echo "This is the new way:"

echo "SELECT DISTINCT MIN($1) FROM songs GROUP BY UPPER(REPLACE $1,' ','');"

time3=`date +%s%N`

sqlite3 /opt/var/mt-daapd/songs3.db "SELECT DISTINCT MIN($1) FROM songs GROUP BY UPPER(REPLACE($1,' ',''));" >/

time4=`date +%s%N`

runtime2=`calc a=$time4 - $time3, a / 10^9`

echo "Execution time: $runtime2 seconds"

echo ""

echo "Difference is `calc $runtime2-$runtime1` seconds."

And here’s what I get on songs:

./sqlite_performance.sh album
This is the old way:
SELECT DISTINCT album FROM songs ORDER BY album
Execution time: 1.110705 seconds

This is the new way:
SELECT DISTINCT MIN(album) FROM songs GROUP BY UPPER(REPLACE album,’ ‘,”);
Execution time: 4.777444 seconds

Difference is 3.666739 seconds.

Please note that I did create an index on album and artist, but as this query does a full table scan anyway, only the aggregate (MIN) actually would gain. An index certainly speeds up browsing albums by a single artist though.

PS:
For this to run on the slug you need calc, date and bash, next to sqlite3. Duh. 😉