PDA

View Full Version : Hierachical Data and challenging MySQL recursion query


Matt
06-30-2005, 07:04 PM
I have a table 'categories' that holds the 'categories_id' row and a 'parent_id' for the parent category This is hierachical data with the topmost, or root, category equaling '0.' Recursive database calls simplify the problem, but for a category 5 or 6 levels deep, this is a lot of overhead to simply retrieve the path. I would like to accomplish the task in a single SQL query. Here is what I have so far:
$path = $db->get_row("SELECT t1.categories_id AS lev1,
t2.categories_id as lev2,
t3.categories_id as lev3,
t4.categories_id as lev4,
t5.categories_id as lev5 FROM categories as t1
LEFT JOIN categories as t2 on t2.parent_id=t1.categories_id
LEFT JOIN categories as t3 on t3.parent_id=t2.categories_id
LEFT JOIN categories as t4 on t4.parent_id=t3.categories_id
LEFT JOIN categories as t5 on t5.parent_id=t4.categories_id
WHERE t5.categories_id=$current_level");

This works in the event that the category is 5 levels deep (useful info obtained from http://conf.phpquebec.com/slides/2005/Dealing_with_hierarchical.ppt). Unfortunately, all categories are not going to have the same depth. Running this query with an additional join (which would cover 6 levels) returns nothing if the category is less than 6 levels deep, so I can't simply assume a maximum depth and use the same query for all categories. I'm also not sure how one would efficiently determine the depth of a given category to determine how many joins are needed (not to mention that this would be an additional database call).

Short of reformatting the database (which is not practical in this particular circumstance), does anyone have an idea of how I might solve this particular problem with a maximum of 2 (1 query to determine depth and another to get complete path) and preferably 1 query?

Thanks in advance, Matt

Terra
06-30-2005, 07:29 PM
I don't believe you can without some sort of table redesign, at least not in one query... To do that, you would need to use MySQL conditionals like IIF() - however since you are joining tables together, the only way to satisfy the 'and' style join is to flip a negative condition to positive... This unfortunately would cause all records between the tables to match...

You may be able to get away with giving your data a 'hint', like adding a new field labeled 'depth'... You would then SELECT for the id to retrieve the depth, then have a case statement that can dynamically build the right statement depth for your final data retrieval step...

It is overall much cheaper to store the depth and run two SELECTS then it is to have multiple inlined conditional test statements that will always be run regardless for all rows that would be tested to satisfy the join... There is also a slight chance that MySQL may see the inlined conditional evaluation and choose to ignore your indexes, which is _never_ good...

--
Terra
--Two proper indexed SELECTS -or- thousands of conditional tests per each SELECT, do the math--
FutureQuest

Stephen
06-30-2005, 09:13 PM
yes, i do this kind of thing with 2 selects as Terra suggests. first i retrieve the full category name (stored in the category record along with the local name) and determine the depth by looking the number of subcategory delimiters contained within the name. then i recursively formulate a query by joining the appropriate number of aliased category tables.

obviously i don't use this sort of query to retrieve the full category name, or path, because i already store it. but there are other category-related quantities that require a similar strategy to retrieve.

Matt
07-01-2005, 01:04 PM
I believe I have a single query solution, but this may fall under the multiple in-line conditionals that you cautioned about, Terra. Here it is (accomodates up to 6 levels, with depth between 2 and 6):

$path = $db->get_row("SELECT t1.categories_id AS lev1,
t2.categories_id as lev2, t3.categories_id as lev3,
t4.categories_id as lev4, t5.categories_id as lev5,
t6.categories_id as lev6 FROM categories as t1
LEFT JOIN categories as t2 on t2.parent_id=t1.categories_id
LEFT JOIN categories as t3 on t3.parent_id=t2.categories_id
LEFT JOIN categories as t4 on t4.parent_id=t3.categories_id
LEFT JOIN categories as t5 on t5.parent_id=t4.categories_id
LEFT JOIN categories as t6 on t6.parent_id=t5.categories_id WHERE t6.categories_id=$current_level OR t5.categories_id=$current_level
OR t4.categories_id=$current_level OR t3.categories_id=$current_level
OR t2.categories_id=$current_level");

In a traditional C program, the evaluation will return when the first condition is met, never evaluating the remaining conditions. Is this not the case with MySQL? Respective to the idea of providing a depth clue within the database, I would need to run a script to calculate and insert the category depth into the database. I do not see how this is any less computationally intensive than retrieving the path without depth clues. It would be worthwhile if the full path is needed on a regular basis, but in this particular circumstance, the full path is needed infrequently (like once every six months).

One additional step I am taking is caching paths, as they are calculated, in memory. This ensures that I never calculate the same path more than once, cutting back on the number of queries being run.

So, I've cut the number of queries by reducing traversal to a single query and am caching results to eliminate duplicate queries. In consideration of how infrequently this process needs to be done, is this "good enough?"

-Matt

Stephen
07-01-2005, 04:38 PM
if it works and you use it infrequently then the query is good enough. if used frequently and with a category table that has lots of records it's not so good (left joins generally result in a lot of rows to be sifted for the matches). this is the reason i store the path, as i use it all the time to provide navigational information.