PDA

View Full Version : mySQL indexing - a personal testimony


pdstein
11-30-2000, 10:28 AM
Terra is probably going to reach through my cable modem and choke me when he reads this, but on the otherhand if I can help just one other wayward mySQL programner to reform it will be worth it :) ...

Like most of you, I'm a self-taught PHP/mySQL code writer.[nbsp][nbsp]I've always heard indexing tables was a good idea, but struggled to understand how indexing worked and what tables/columns to index to make my code more efficient.[nbsp][nbsp]The website I maintain, OurChurch.Com, is heavily mySQL-dependent, but pages were always displayed quickly so indexing never seemed to be a pressing need.[nbsp][nbsp]Even so, for months I've been hoping to take the time to learn about indexing and do a few experiments, but as an entrepreneur I've always had so many other higher priority items that I never got to it.

I have a PHP script that runs by cron job every morning at 5AM.[nbsp][nbsp]It's a nightly maintenance script which processes data from the day before and recalcuates a bunch of things.[nbsp][nbsp]As our number of clients tripled in the last year, the processing time of the script grew exponentially and I'm embarassed to say that it was taking slightly over an hour to complete.[nbsp][nbsp]This week, I did some testing on some of the queries and added an index to one particularly large table.[nbsp][nbsp]This morning the script finished in 39 seconds.[nbsp][nbsp](Insert silence as I bow at the feet of the table index.)

Needless to say, I will be combing through other queries and tables in the coming days.

SneakyDave
11-30-2000, 02:35 PM
I don't have complicated tables, but I'll index almost anything that I have a WHERE clause that could return a good number of rows or is used quite frequently, and I never use "SELECT *", even though its convenient.

One question I have is, if you have a table with existing rows, and create an index on a column in that table, are the indecies for rows already in the table automatically created, or just on new rows to the table. I'm sure the answer is around here somewhere.

Now if Terra would come up with a FutureQuest VSAM file system, I could probably provide more information on the subject of indexes.


//STEP01 [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]EXEC PGM=IDCAMS[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
//SYSPRCMR DD SYSOUT=*[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
//SYSIN DD *[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp]DELETE FQST.TERRA.VSAM.FILE.CLUSTER[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp]IF LASTCC LE 12 THEN SET MAXCC = 0[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp]DEFINE CLUSTER ( NAME(FQST.TERRA.VSAM.FILE.CLUSTER) -[nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]INDEXED[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]RECORDSIZE(27 27)[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]KEYS(26 0)[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]-[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]VOLUMES('*')[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]SHAREOPTIONS(2 3))[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]DATA[nbsp][nbsp][nbsp][nbsp]( NAME(FQST.TERRA.VSAM.FILE.DATA) -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]CYLINDERS(5 5)[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]CISZ(4096) )[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] -[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]INDEX[nbsp][nbsp] ( NAME(FQST.TERRA.VSAM.FILE.INDEX)[nbsp][nbsp]-
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]CISZ(1024) )[nbsp][nbsp][nbsp][nbsp]



[This message has been edited by SneakyDave (edited 11-30-00@1:41 pm)]

pdstein
11-30-2000, 02:51 PM
SneakyDave,

mySQL indexes the exisiting rows when an index is added.

Your suggestion to index any columns used in the where clause of a query that returns a significant number of rows is good.[nbsp][nbsp]I believe it also helps to index a column used in an ORDER BY clause, but I'm not positive about that.

For example, if there's a query:
SELECT * from table where col_A=val_A order by col_B

I believe it is beneficial to have a table with:
index (col_A, col_B)

pdstein
11-30-2000, 03:17 PM
The aspect of indexing I'm trying to understand is how to optimize for queries which pull data from more than one table.

dank
11-30-2000, 04:42 PM
The aspect of indexing I'm trying to understand is how to optimize for queries which pull data from more than one table. That's the same thing I'm struggling to understand.[nbsp][nbsp]From what I've read, it can make a difference of several thousand times faster (internally), but it's not clear how to actually go about it.

Furthermore, for the project I'm working on, I have about a dozen fields in one table, joined to half a dozen fields in another table, and pretty much any of those fields can be searched on.[nbsp][nbsp]This would make a ridiculously large number of indexes, as I understand the process.[nbsp][nbsp]Are indexes only useful if you know certain columns are most likely to be searched relative to each other?[nbsp][nbsp]I could guess at a couple main ones, but I have a feeling it would need to be an all or nothing solution...

In this example, I actually do need to use SELECT * WHERE..., as every column of each matching row is to be returned.

Dan

PaulKroll
11-30-2000, 06:15 PM
The (some would say THE) book "MySQL" (ISBN: 0735709211) has ample info and is probably the best place to go for MySQL info...

...after you've exhausted the MySQL site, which has a LOT of information on it on getting the most out of MySQL. The only serious problem (and it is serious) is that the docs assume you're using the latest and greatest MySQL, which I believe is either not available at FQ or is only on one server so far. (The latest and greatest is, after all, also "The most likely to burst into flames" so a cautious rollout is to be expected.)

Start at http://www.mysql.com/ and then try, specifically, "How MySQL uses indexes" (http://www.mysql.com/documentation/mysql/commented/manual.php?section=MySQL_indexes) and "Getting Maximum Performance From MySQL" (http://www.mysql.com/documentation/mysql/commented/manual.php?section=Performance)

When the docs refer to the system administrator optimizing how the server is set up, it's safe to assume Terra is aware of the issue and has done the Right Thing for the FQ servers. :)

While benchmarking is mentioned in these docs, please remember that you're on a shared server at FQ: benchmarks are often Denial Of Service attacks from everyone elses perspective, so please benchmark on a local system.

Terra
11-30-2000, 11:20 PM
Just a minor addition:

EXPLAIN SELECT ...

can be your very best friend in determining the complexity/deficiency of your JOIN...

There are many good articles on the net that cover the theory of 'indexing'...[nbsp][nbsp]Another place that I read a lot of SQL theory from are places like Oracle and Sybase, which fills in some empty holes in the MySQL Docs...

Though we run the 3.22.x version, I personally think that MySQL has done a terrific job of noting 3.22.x and 3.23.x differences...

Upgrading 3.22.x --> 3.23.x, for existing MYSQL engines, is not an option right now...[nbsp][nbsp]I will most likely re-evaluate this after the holiday season as now is not the time to toss hand gernades into peoples shopping carts...

--
Terra
--Since 9:35am, I've been trying to peel myself off the roof--
FutureQuest

dank
12-01-2000, 01:44 AM
Impressive.[nbsp][nbsp]Any good sources of indexing knowledge?[nbsp][nbsp]I'm still a bit hazy on the subject and can't find a good concise explanation of what is and is not an index, and what indexes are time savers.

Dan

Justin
12-01-2000, 05:07 PM
In this example, I actually do need to use SELECT * WHERE..., as every column of each matching row is to be returned. Nonetheless, it is still better to name the columns you want. SELECT * requires an additional query (internally, which you never see) to determine what columns are available, before it can actually perform your query. It basically does a DESCRIBE behind the scenes.

As for table joins, *always* index joined columns. My rule on indexing is to index any column that will:

- be used in a WHERE clause
- be used in an ORDER/GROUP BY clause
- be used to join multiple tables

However, the more columns you index, the less benefit you receive from those columns that are indexed. If you index every column, you won't gain much of anything. I also try to avoid indexing text fields, as there is additional overhead there (case insensitive matching, etc), especially on a VARCHAR (or worse a TEXT/BLOB field). If it is a frequently searched field, then by all means index it -- but if it is only rarely used in a WHERE clause (like in an 'Advanced Search' feature), it might be best not to index it.

Unique fields (for example, an AUTO_INCREMENT field that is your PRIMARY KEY) are the best to use for joins, and are indexed automatically. Using the primary key in the WHERE/ORDER/GROUP/JOIN and so on is the most efficient, but of course only works when you know exactly what row you want returned.

Hope this helps.

------------------
Justin Nelson
FutureQuest (http://www.FutureQuest.net/index.php) Support

dank
12-03-2000, 03:15 AM
The (some would say THE) book "MySQL" ... has ample info and is probably the best place to go for MySQL info... I was hoping you wouldn't say that...[nbsp][nbsp]As good as a book as it is (that's a lot of as's), I haven't found it to be terribly enlightening on the subject of indexes.[nbsp][nbsp]Of course, I haven't read it cover to cover by any means, but I've read over the indexes section several times, plus at least some of those sections of the MySQL docs...

My rule on indexing is to index any column that will:
- be used in a WHERE clause
- be used in an ORDER/GROUP BY clause
- be used to join multiple tables
Yikes, I meet all three criteria (ORDER BY is an optional search)!

I also try to avoid indexing text fields, as there is additional overhead there (case insensitive matching, etc), especially on a VARCHAR Hmm, that's what the majority of my fields are (VARCHAR), due to it being mostly text and needing to allow for a variety of field content formats.

Unique fields (for example, an AUTO_INCREMENT field that is your PRIMARY KEY) are the best to use for joins, and are indexed automatically. That's part of what confuses me.[nbsp][nbsp]The examples say that joining tables makes indexes absolutely essential, but it seems that a join will almost always be an index anyway.[nbsp][nbsp]Why would you join tables by something other than the primary keys?

Nonetheless, it is still better to name the columns you want. SELECT * requires an additional query Good to know, thanks.[nbsp][nbsp]I've been wanting to take the next step (in customizability (sp?) ) with this program and pull all the field and search-dependent variables out and define them in an array, and your point about an extra query convinced me to figure out how.[nbsp][nbsp]:)

I set one of the fields of the array to have a yes/no flag for whether or not to build that as one of the "SELECT ___" fields, then assembled them via a while loop.[nbsp][nbsp]Any thoughts on how much more efficient this is?[nbsp][nbsp]I would assume the while loop (through 16 variables at present) is more optimal than an extra database query?

Thanks for the pointers.[nbsp][nbsp]I'll definitely be referring back to this thread and trying to piece together the puzzle.

Dan

jimbo
12-03-2000, 03:24 AM
As good as a book as it is (that's a lot of as's)
That's because it should have been "As good of a book as it is".

;)

-jim

*Following this thread feverishly*

dank
12-03-2000, 03:41 AM
The one time I decide not to edit a post....

doh!
[nbsp]- just be glad I didn't forget the apostrophe.[nbsp][nbsp]:P

PaulKroll
12-03-2000, 05:09 AM
That's part of what confuses me.[nbsp][nbsp]The examples say that joining tables makes indexes absolutely essential, but it seems that a join will almost always be an index anyway.[nbsp][nbsp]Why would you join tables by something other than the primary keys?
A JOIN would usually be done on primary keys. But a table doesn't have to have a primary key: you can make a set of tables with a "CommonID" field to JOIN the tables together, without actually telling MySQL to make indexes. Then, there are circumstances where a field is used in a JOIN and it ISN'T indexed, because you know for a fact that the field wasn't worth indexing for one reason or another.

A large, large part of this is the fundamental design of the database in question: you mentioned a table w/about a dozen fields and another with half a dozen fields, and you have to join them and search on pretty much all of the fields.[nbsp][nbsp]There may be different ways to set up that database, that would lend it to faster querying, but it's impossible to say that for sure without knowing All The Details. (What are the fields, what do they represent, what are the most common queries going to be...)

I'd be interested in seeing the db design if that's something you'd be interested in describing here. (Perfectly understandable if it's not, of course!) :)

PaulKroll
12-03-2000, 05:37 PM
First, I apologize for the length of this message.

I searched for "Paul" in the name field, and got several folks whose first name is Paul, and then... a Noel Paulson. That's not a problem: maybe it's exactly what you want. But it tells me you're searching using "WHERE Name LIKE '%$Name%'".[nbsp][nbsp]If you use LIKE, and have a wildcard (%) at the front, MySQL will have to look at every single entry in the table.[nbsp][nbsp]An index won't help: indexes can only help LIKE if the wildcard does not appear at the very front of the field. (so "WHERE Name LIKE '$Name%'" would benefit from an index). Course, you don't have that field indexed anyway, at this time... but you might want to, because it's really likely that folks would go to the site and check their friends' results, as well as their own, and they'll do that by name. That and event name, but it's no where near as important to index the event name, because the list of events is so small: if you get many events here, it'll be best to index it as well. Keep in mind that the current form is going to give people fits if you have, say, 500 events because the select box you have for that field is going to be a huge, frightening list. :) At that point or somewhere near it, you may want to replace the select box with a text field and let folks enter the name (or first part of the name) of the event.

I'd be looking to make the Name search as efficient as possible, since it's almost certainly going to be the heavily-hit search, and that probably means not letting folks search for "%$Name%".[nbsp][nbsp]That may seem like a big limitation, but even with the data you have entered in there now, searching for "Paul" gets 38 responses. As the site gathers more data, having all those non-leading "Paul" matches will seem less like "Oh, that's good, it returned everything" and more like "Wow, I can't find myself in all this data...." That's IMHO, of course.

You have Age and Sex fields: neither of these would get much of anything from indexing. The Sex field is the least prone to indexing here: since on average, 50% of the table entries would match. An index might actually slow down a search for these.[nbsp][nbsp]Age is a little better, but since the range is still going to be relatively small (humans not living hundreds of years...), it's unlikely that an index would help much.

The Age, Sex, DivPlace, Time, and Pace are all VARCHARs. The DivPlace and Age fields should certainly be numeric and Time should be a Date/Time field: they'd take up less space in the database, and anything that makes the table record size smaller helps. (Age, for instance, could be an unsigned TINYINT, which would take only 1 byte each (and will be fine, unless one of the runners is over 255 years old). The current VARCHAR(3) you're using actually takes up 4 bytes each. (3 for the text and 1 to track the length).

Similarly, the Sex field is taking 7 bytes of space where 1 (as an ENUM column) could do. Using a CHAR here and doing M/F would seem reasonable, but MySQL doesn't let you have VARCHAR and CHAR fields in the same table: it silently converts all CHARs to VARCHARS.

The Time field could be, well, a Time column: the VARCHAR(15) is taking 16 bytes: a Time column would take 3 bytes. The Pace field could be a Time column too, going from 11 bytes to 3. This won't cover fractions of a second: I'd actually suggest going with a TINYINT for the fractions of a second. Really. Why do that instead of just using the VARCHAR? Well, first, because the VARCHAR is such a space eater. Second, because the Time results you're getting back from the database aren't being sorted as they should: do a sort-by-Time from the form. Because &quot;20:57.5&quot; and &quot;1:45:59.9&quot; are being evaluated as strings (VARCHARS) instead of as time (Time column), the 20 minute time gets sorted >AFTER< the 1 hour time.[nbsp][nbsp]The only other solution to this is to make sure your PHP filters the input so that absolutely every entry gets put into the exact same format, so that 20 minute result above would be stored as &quot;00:20:57.5&quot; (you might only need one leading 0 for the hours...). It's also possible that storing the time in a totally different way (that would take only 4 bytes) would be viable (sort of how a TimeStamp column operates) but I'd need to test some things out to be sure and I don't have the time (HA!) right now.

BibNum I'd call as a numeric as well, except I'm not sure about something: are there ever letters in the field? From memory, all races I've watched on TV involve a bib with a number, but no letters, but I'm not exactly deep into the genre and don't know it for sure. If the answer is &quot;99% of races don't have letters as part of their bib numbers&quot; then I'd say go numeric. MEDIUMINT will take 3 bytes, VARCHAR(6), 7.

I don't know enough about the genre to talk about the Division field. :(

Now...Sorting!

Using PHP to do the sorting on data returned by MySQL is like using a dump truck to drive to a work site, put some dirt in your pockets, get back in the truck, and go to the dump site and empty them out. :)[nbsp][nbsp]MySQL can sort your results far better (faster) than PHP can, using ORDER BY. ORDER BY is your friend.

I hope at least some of this helps. The gist of it is: index the Name field, don't let folks search Name with a leading wildcard. That seems offhand like the most safe suggestion. I'd probably also suggest changing the form a bit, so that either Event or Name would be a required part of the search: either one would be useful in cutting down the query so any other options would be working on a smaller result. For example, even though Event isn't indexed right now, it's still a very small number of entries: picking one event and searching by Sex means you've cut down the number of results to be returned to what, a 60th of the total? versus returning half the results if someone searches on Sex alone with no restriction on the search. (And since the Events.ID >IS< indexed, along with Results.EID, MySQL will quickly chop the Results down to 1/29th (or however many events) before it checks the Sex field).

Whew. I sure can ramble. :)
[This message has been edited by PaulKroll (edited 12-03-00@4:41 pm)]

dank
12-04-2000, 01:11 AM
Hi Paul,

Thanks for the interest.[nbsp][nbsp]As the confusion slowly parts and reveals microscopic traces of blue sky, I realize I must correct an earlier statement or two and partially answer my own question in the process:[nbsp][nbsp]I am joining one primary key column and one non-primary key column, not a primary key from each table as earlier indicated.[nbsp][nbsp]I'll explain why in a minute...

I don't mind sharing the database design, as long as you don't mind reading it.[nbsp][nbsp]:)[nbsp][nbsp]Here's the structure:

CREATE TABLE Events (
[nbsp]ID SMALLINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
[nbsp]EventName VARCHAR(100) NOT NULL,
[nbsp]Distance VARCHAR(25),
[nbsp]Finishers MEDIUMINT(6),
[nbsp]Location VARCHAR(100),
[nbsp]Date DATE NOT NULL
);

CREATE TABLE Results (
[nbsp]ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
[nbsp]Place SMALLINT(6),
[nbsp]DivPlace VARCHAR(6),
[nbsp]BibNum VARCHAR(6),
[nbsp]Name VARCHAR(100),
[nbsp]Age VARCHAR(3),
[nbsp]Sex VARCHAR(6),
[nbsp]Division VARCHAR(10),
[nbsp]Time VARCHAR(15),
[nbsp]Pace VARCHAR(10),
[nbsp]EID INT NOT NULL
);

The basic idea is that an event (road races and the like; should make field names pretty self explanatory) is placed in the Events table, and it has a varying amount of results that are loaded into the Results table.[nbsp][nbsp]The results have a column, EID, which joins them to the appropriate Event (WHERE Results.EID=Events.ID).

It might help to see it in use:

http://www.activesalem.com/results.php

(The site won't go &quot;live&quot; until Monday or Tuesday, but I think it's ok to jump the gun...)

Any of the fields can be searched on, although &quot;only&quot; 10 of them have been set up as search options.[nbsp][nbsp]Name, EventName, and Date are probably the most likely to be searched on, with EventName and Date usually tied together (one EventName will apply to multiple dates over time).[nbsp][nbsp]A flag is set in the array for each field, specifying whether or not it is searchable and whether or not to display results for it, and that assembles the query and the output (no more SELECT * ).

I'm also playing with another array field declaring the sort order for the output, then sorting the fields according to that for the search results (so that the html for the table output is automated based on the array setup).[nbsp][nbsp]PHP's numerical sort utilities seem a bit lacking from what I can tell, and I have a feeling the loopy method I've come up with is not the most efficient:


echo (&quot;<tr bgcolor=\&quot;$row_color\&quot;>&quot;);
$i = 0;
$counter = 1;
while ($i <= $numfields) {
[nbsp][nbsp][nbsp][nbsp] if ( $dbfields[$i][7] == $counter ) {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]echo (&quot;<td>&quot;. $dbfields[$i][1] .&quot;</td>&quot;);
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$counter++;
[nbsp][nbsp][nbsp][nbsp] }
[nbsp][nbsp][nbsp][nbsp] $i++;
[nbsp][nbsp][nbsp][nbsp] if ( ($i == $numfields) &amp;&amp; ($counter < $orderMax) ) { $i = 0; }
}
echo (&quot;</tr>&quot;);

Where $dbfields[$i][7] is the sort order and $dbfields[$i][1] is the output variable (that is also done for the table's first row, with a separate array column for the output).[nbsp][nbsp]$numfields is the number of fields/internal arrays (for lack of a better word; multi-dimensional inner array?), and $orderMax is the number of the last array column/field for the search/sort ouput.[nbsp][nbsp]Anyone cringing?[nbsp][nbsp];)[nbsp][nbsp]

Thanks,
Dan

edit: I should mention that the sort ouptut stuff is not in use on the page posted above, just a test version of it.[nbsp][nbsp]The ol' &quot;one-one thousand, two-one thousand...&quot; method indicates little difference in query time...

[This message has been edited by dank (edited 12-03-00@12:19 pm)]

dank
12-04-2000, 04:06 PM
Hi Paul,

My browser has been cacheing pages on this forum like crazy lately, and I somehow didn't see this thread was updated in the necessitated refreshes of the main page...

First, I apologize for the length of this message. No worries, thanks for taking the time to make it lengthy![nbsp][nbsp]:)

I searched for &quot;Paul&quot; in the name field, and got several folks whose first name is Paul, and then... a Noel Paulson. He's a very good runner...

You are correct that I'm doing LIKE '%$name%' searches for that field.[nbsp][nbsp]I originally set it up for first and last name searches (which would address one of your questions), but it became apparent that, at least for this project, the vast majority of event results would not have the first and last name separated.[nbsp][nbsp]Instead of manually creating two fields prior to importing them into the database, we decided it would be better to leave it as is, which reads well.

Exact name searches might be a good idea for speed and narrowed-down searches, but typos are so common in event registration and results that it might be too restrictive.[nbsp][nbsp]I'm not sure how other people would view it, but I would tend to search by name and event (and probably year) to find something specific, entering my last name to narrow it down quickly.

Keep in mind that the current form is going to give people fits if you have, say, 500 events because the select box you have for that field is going to be a huge, frightening list. True.[nbsp][nbsp]That's why the same event name is used across multiple years (both dropdown boxes are dynamically built from the database content for event names and dates).[nbsp][nbsp]That particular site is for local stuff, and mostly the same events are held each year, so it probably won't grow much in terms of the event search box.[nbsp][nbsp]For something less restrictive in scope, a text search box probably would be preferable.[nbsp][nbsp]I'd like to avoid that, though, as it introduces more guess work (one such event is known both as the Salem Art Fair 5k and the Walk or Run for the Arts -- what would people search for?).

Good points on the indexing of the different fields.[nbsp][nbsp]That helps to understand some of the needs and benefits of indexing.

The Age, Sex, DivPlace, Time, and Pace are all VARCHARs. The DivPlace and Age fields should certainly be numeric and Time should be a Date/Time field I started out with DivPlace and Age being SMALLINT (I think that was the size I used), but I ran into a troubling problem:[nbsp][nbsp]Many event results do not include Divisions, Ages, Age Groups, etc.[nbsp][nbsp]Thus, it has to be left blank.[nbsp][nbsp]With integer fields, a blank default was given a value of &quot;0&quot;, which is not desirable.[nbsp][nbsp]:(

As you can see, the biggest problem with optimizing this project is that maximum flexibility is needed to allow for the variety of formats.[nbsp][nbsp](That's also very likely the reason that, to my knowledge, no one else has put this particular application together before.)

I'm not sure about using a Date/Time field for the times.[nbsp][nbsp]I'll have to see if that would be flexible enough for things ranging from a matter of seconds to several hours, with all combinations of colons and periods.

Similarly, the Sex field is taking 7 bytes of space where 1 (as an ENUM column) could do. Many results use different things, such as W, Women, F, Female, G, Girls (or even lower case) to represent that field.[nbsp][nbsp]I went for the upper limit of such possibilities and did some regex matching to determine what's what.[nbsp][nbsp]Disallowing searches by *only* the sex field was merely a matter of commenting out one line.[nbsp][nbsp]:)

Using a CHAR here and doing M/F would seem reasonable, but MySQL doesn't let you have VARCHAR and CHAR fields in the same table Sigh, I was disappointed when I discovered that...

The Pace field could be a Time column too Would that run into the same non-zero default problem as INT fields?[nbsp][nbsp]Pace values are only included in about half of the results, maybe more.

Second, because the Time results you're getting back from the database aren't being sorted as they should: do a sort-by-Time from the form. Because &quot;20:57.5&quot; and &quot;1:45:59.9&quot; are being evaluated as strings I've run into that problem in a few places, but I think it is true for pretty much any numerical sort.[nbsp][nbsp]Sorting by place can be a bit funky, as 10, sorts between 1 and 2.[nbsp][nbsp]Adding leading zeros to every number isn't the most viable option...[nbsp][nbsp]The Time sorts actually work pretty well for the most part, but you found one of the examples where they break down.[nbsp][nbsp]As long as they are spread across a consistent number of units (assuming a single event is being searched, which may be a poor assumption), and assuming Excel didn't make a mess of them between event results text file importing and loading into the database (auto formatting can be a real pain with time/date values), they work as they should.

BibNum I'd call as a numeric as well, except I'm not sure about something: are there ever letters in the field? I wrestled with the same question.[nbsp][nbsp]There are occasionally blanks (and very infrequently no numbers at all; more common for track-type events, which the program isn't overly suited for) and I think I remember seeing some numbers with letters or possibly even hypens in them, but it certainly isn't common.[nbsp][nbsp]I suppose those could be edited as needed.

Using PHP to do the sorting on data returned by MySQL is like using a dump truck to drive to a work site... Quite the analogy![nbsp][nbsp]Really now, Paul, what are your feelings on sorting in PHP?[nbsp][nbsp];)[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]

The reason I don't want to do the sorting for the HTML output in MySQL is that setting the sort order in the configuration array (PHP) makes it super easy to adjust things on the fly.[nbsp][nbsp]The internal database query uses ORDER BY if specified, but I don't see a way of telling MySQL what the output order is other than by creating a third database table.

I did come up with what looks like a better method for the PHP sorting, although it only works in PHP4:


$i = 0;
$counter = 0;
while ($i <= $numfields) {[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]// sort the output column contents in the specified order
[nbsp][nbsp][nbsp][nbsp] if ( !($dbfields[$i][7] == &quot;&quot;) ) {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$newarray_1[$counter] = $dbfields[$i][7];
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$newarray_2[$counter] = $dbfields[$i][1];
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$counter++;
[nbsp][nbsp][nbsp][nbsp] }
[nbsp][nbsp][nbsp][nbsp] $i++;
}
array_multisort ($newarray_1, $newarray_2, SORT_ASC, SORT_NUMERIC);
echo (&quot;<tr bgcolor=\&quot;$row_color\&quot;>&quot;);
$i = 0;
while ($i < $orderMax) {
[nbsp][nbsp][nbsp][nbsp] echo (&quot;<td>&quot;. $newarray_2[$i] .&quot;</td>&quot;);
[nbsp][nbsp][nbsp][nbsp] $i++;
}
echo (&quot;</tr>&quot;);


Anyone know of a way to do something similar that is not version 4 dependent?[nbsp][nbsp]The three methods (the loopy sort in the previous message, the one above, and the manual/non-sorted HTML output) all seem to return results in a very consistent amount of time -- no perceptible difference -- so I'm guessing there isn't much lost there.[nbsp][nbsp]Please correct me if I'm wrong...

Thanks for all the help!
Dan

edit: form != forum

[This message has been edited by dank (edited 12-04-00@5:18 pm)]