View Full Version : mySQL - random queries
Hunkorama417
11-30-2000, 07:46 PM
I have a table and I'd like mySQL to randomly select 50 rows from it which can't be duplicated and the field "won" must be 0. How can I go about doing this with one mySQL query? I need to know ASAP.
------------------
Andrew
www.digi-FX.net (http://www.digi-FX.net)
PaulKroll
11-30-2000, 07:56 PM
See http://www.mysql.com/documentation/mysql/commented/manual.php?section=Mathematical_functions and look at the rand() function, then adapt for the previous mysql version (which is what we're running here) (OK, that's not necessarily a trivial exercise, but it MIGHT be...)
which can't be duplicated SELECT DISTINCT ...
I'll go back into the corner now and wait to see the other answers...
Distinct
PaulKroll
11-30-2000, 11:11 PM
Modified from the book "MySQL", page 188:
SELECT DISTINCT Table.Field1,Table.Field2,Table.SomeField * 0 + rand() as rand_col FROM Table WHERE won = 0 ORDER BY rand_col LIMIT 50;
I actually tried (something similar to) this on a table on the FQ servers, and it seems to work. Given that you're the one who needs this, you may want to subject it to more stress than I did. :)[nbsp][nbsp]Also, just because it's a single query, doesn't mean it's fast. For a table with 123,000 rows (and otherwise small, one numeric and one length 2 CHAR field) this puppy takes nearly a full second to run. (Usually, a query for a few records won't take anywhere near as much time, maybe just .02 seconds or so.) If this is gonna be called a lot, you might want to consider some alternative options.
As the MySQL book explains, you can't just do "rand() as rand_col" because the optimizer will assume the call is a constant and will eliminate it from the query. You've got to force the optimizer to not optimize that statement.
It's a good book to have if you do any MySQL coding at all, really. ISBN 0-7357-0921-1
[This message has been edited by PaulKroll (edited 11-30-00@10:22 pm)]
PaulKroll
12-01-2000, 12:55 AM
Alternatives:
The basic problem with the "rand() as rand_col" technique in my previous message, is that MySQL has to create a "fakey" field with as many entries as you have in the table... or at least, in the WHERE part of the query. If this is a large number, then MySQL has got a BUNCH of work to do, and then, of course, the rand_col fake-field is NOT indexed, so searching/ordering-by it is pretty slow business. The only way around this that I can think of (and that may well not be the only way around this... :) ) is to plan ahead for the day when you need random numbers, to wit:
A) When each entry is being made on the table, another field, "RandomNum" is filled with rand().[nbsp][nbsp]
B) When you need the 50 entries that haven't Won, you do WHERE Won = 0 ORDER BY RandomNum. The sorting is random, because the numbers placed there originally are random (pseudo-random actually). If they're indexed, so much the better. Only...
Well, the obvious flaw with this scheme is that while the query returns a list of non-winners in random order, the query returns the SAME list every time until another entry is given, which may or may not influence the results of the query. (Because if all 50 non-winning entries are under, say, "0.3" in the RandomNum field, any new entry that gets a rand() of "0.31" or above is not going to show up in those results.)
If you are in fact picking the next 50 winners, then you'll likely be setting "Won" to 1 for each, and the next time the query runs, those 50 are out of contention. So maybe this isn't such a problem.
If it is, well, then the question is how often does the query take place? If this can be cached, it's not a big deal at all: just use the "rand() as rand_col" piece and that'll be OK. Or, what's the actual number of records that we're talking about here? If it's just a couple thousand, or less, again it's probably no big deal to just use the more straightforward query.
Hope something here has helped!
Hunkorama417
12-01-2000, 02:59 AM
Paul, thanks for all your help. This query will only be ran once a week so I'm going to use your first example. Thanks!
Hunkorama417
12-09-2000, 11:47 PM
The query "SELECT DISTINCT primary_key FROM SHdata WHERE won='no' AND week1score='10' ORDER BY RAND()" works; however, when I add "LIMIT 2" to the end of the query mySQL throws an error, but gives no reason. FYI, I'm not using a FutureQuest server. It'd be great if I could get the LIMIT 2 to work. Any ideas...
------------------
Andrew
www.digi-FX.net (http://www.digi-FX.net)
vBulletin® v3.6.8, Copyright ©2000-2009, Jelsoft Enterprises Ltd.