PDA

View Full Version : Starting a new open source project...


FancyToy
10-19-2000, 10:17 PM
Hi,

All right, many groups have formed to develop content management system based on Apache/PHP/MySQL...

I will attempt to create such a group.

My focus will be on designing a flexible system.

There are 2 areas on which I would like to get some feedback:

1) How would you manage sessions?
2) How would you implement a page cache using Apache/PHP/MySQL?

I am not a newbie, so I did my homework.
I know about session support in PHP4.
I know about PHPLIB.
I know about FastTemplate.
I know about a few such content management systems.
I know I must limit the number of database connections per page request.

If you know any Apache/PHP/MySQL-based content management systems, please point them to me...

I am looking for bright and original ideas... :-)

Regards,

FancyToy

PaulKroll
10-21-2000, 01:23 AM
Midgard, at http://www.midgard-project.org/, and E-Grail at http://www.egrail.org/, are "The Big Ones" as far as I can tell.[nbsp][nbsp]I've not actually installed either one, yet, but I probably will.

I'm very interested in this subject, as we're in desperate need of a good content management system at work. I'm not yet convinced that the current PHP/Apache/MySQL solutions are all that viable compared to the Big Boys (not that the Big Boys are without problems, such as 5 to 6-figure price tags...).

FancyToy
10-21-2000, 01:53 AM
PaulKroll,

The big boy in that field is PortalBuilder, from TIBCO.
http://www.yahoo.com is running it.

I knew about Midgard and eGrail.
Midgard is really good.
eGrail is a monster. They preferred to make it an open source project because they could not sell it.

Mason is also an impressive product.

Now, all of these good products have severe design flaws and critical proprietary components.

I intend to design and build a system that will be entirely open source.

With PHP/Apache/MySQL, efficient implementation of persistent objects is an issue.

In terms of design, I currently face issues with regard to efficient caching and real-time delivery.

I really appreciate your input and interest.

FancyToy

PaulKroll
10-21-2000, 04:19 AM
The big boy in that field is PortalBuilder, from TIBCO.
Well, you SAID "If you know any Apache/PHP/MySQL-based content management systems, please point them to me...", which does not describe PortalBuilder. :) In the Apache/PHP/MySQL arena, Midgard is probably the Big Boy, eGrail is, well, they're pushing hard. (Ads on slashdot, etc.)

Just the fact that you're considering caching issues places you way ahead of the game.

I'm sure I've seen at least a couple of articles with variations on the theme of using a MySQL table as a cache: Basically the URL is one field, indexed of course, and theres a date field and big blob-ish text field for the static page. Maybe a reference count field. For each attempted hit, the cache is checked first, and if the date field is within "acceptable limits" the cached text is sent along to the browser and the date or reference count is bumped. Regularly, older or unreferenced pages are destroyed.[nbsp][nbsp]This is still a couple of queries per page, aside from any cookie tracking, but then we're assuming the fully-dynamic way would involve many queries (which may not be true depending on how complex the pages are).

Conversely, I've seen no solutions that actually write static pages out to files as cache (though eGrail writes static content when told, instead of doing db pulls, yes?), perhaps because that's just slightly hard to do when PHP is running under safe_mode, which is the case on most shared servers.

Depending on how the particular implementation of dbm "databases" is done (and we've got two here on FQ, ndbm on some, sleepycat on more recent servers), it might be possible to use THAT as a cache, which might (...might...) be faster than the MySQL server.[nbsp][nbsp]Some implementations have strict limits on the size of the value that you can associate with a key, so one or both might balk at storing a whole page. (I once dealt with a dbm implementation that limited you to 1K blocks, and had a minimum size per block of 1K. This gets wierd really fast...)

Course, when the page has to be generated and put into the cache, you still want that to be as quick as possible. I don't believe anyone has benchmarked the various ways of implementing templates (ereg, preg, str_replace, maybe a couple of others). The number of queries per page can be freakishly high under some systems, where every single tiny little part of the page is pulled from a seperate table and each table may or may not contain a valid element.

Let's not even get into putting PHP code in the MySQL database and running it from there: PHPLIB notwithstanding, that way lies madness. :)

You also mention "Now, all of these good products have severe design flaws..." well, OK, but given how loosely defined the content management field is, that's going to be true of any solution. i.e., no matter what you do, someone else will look at your eventual solution as having severe design flaws. It's the nature of the beast: if everyone is talking about "cars" and the range is from Geo Metros to Indy Racers, some people are going to look at each of those as fundamentally flawed, but they're really there to do different things. We can say "race car" but we really don't have a set number of standard modifiers for "content management system." A CMS optimized for thorough proofing of data by a few editors is going to be a very different beast than a slashdot-like system where anyone can contribute to the chaos.

Now that I've written a book here... geez, I get long-winded late at night. :) I suppose I'm saying, don't expect to solve the world's content management problems because... you won't, and please describe which specific set of CMS problems you DO want to solve, for what situation.
[This message has been edited by PaulKroll (edited 10-21-00@04:30 am)]

FancyToy
10-22-2000, 02:38 PM
Hi,

please describe which specific set of CMS problems you DO want to solve, for what situation
Well, I am still going through design, so I cannot pretend I surrounded the question. However, I am interested in addressing the following major concern:

1) Scalability
2) Granularity
3) Modularity
4) Extensibility

eGrail dies as far as "1)" is concerned.
Mason does better with "1)" but not so well with "2)".
Etc.

The context of this effort would be a "portal" receiving between 1 and 5 million hits per day.

"4)" is also becoming increasingly important. I will address the problem that way:


Portal ---> User Interface ---> HTML
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> XML
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> WML
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> Text
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> Etc.
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> Agency[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> User Interface Agent 1
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> User Interface Agent 2
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] ---> Etc.


User interface agents do not deliver the content of the portal to the end user directly. On the contrary, they are entities such as syndication agents, transformation engines, feed handlers, content aggregators, etc.

Brief, I think I understand what you are saying and I must apologize for saying other systems have severe design flaws. This is certainly an exaggeration due to[nbsp][nbsp]my personal views on what a CMS should be able to do.

Thank you for taking the time to read my rambling.

FancyToy
[This message has been edited by FancyToy (edited 10-22-00@10:34 pm)]

heath
10-25-2000, 07:23 PM
1) Scalability
2) Granularity
3) Modularity
4) Extensibility

eGrail dies as far as "1)" is concerned.
Mason does better with "1)" but not so well with "2)".
Etc.

The context of this effort would be a "portal" receiving between 1 and 5 million hits per day.
Can't all of these be solved by throwing more hardware at the application as traffic increases?

Yahoo, I am sure is using a hacked-up Apache with their own solutions.[nbsp][nbsp]One of their developers is a regular on the mysql mailing list (they use mysql on parts of their site).

Some of the best stuff I've seen on templates, etc is the cacheing articles started by Jesus C. at phpbuilder, then expanded on by others.

I have found that fasttemplate, et al. don't do will under high traffic loads - the solutions provider at phpbuilder may do better since the cache is a static html page, not something in the mysql server.

also, see freshmeat.org under 'portal' there are a few projects out there that do what you are talking about

Also - 'cacheing' the data in mysql seems like an oxymoron to me.[nbsp][nbsp]Doing a database hit to retrieve data is the easiest way to develop a web page, but it wouldn't be my choice for 'cacheing' - in fact it doesn't even meat the definition of cacheing.

If you were to use mysql, copy all the data that you can to heap tables (faster, but in memory only), otherwise use wget or a perl script to write out the static contents of the pages (yahoo certainly does something like this I would think) every x minutes or whatever.

I doubt this helps,[nbsp][nbsp]but I still wanted to add my thoughts... good luck...
heath

FancyToy
10-25-2000, 11:56 PM
Can't all of these be solved by throwing more hardware at the application as traffic increases?
Money will never make up for poor design.

You may ask IBM. They blew quite a bit of money and hardware during the olympics...to fetch poor results.

Some of the best stuff I've seen on templates, etc is the cacheing articles started by Jesus C. at phpbuilder, then expanded on by others.

I have found that fasttemplate, et al. don't do will under high traffic loads - the solutions provider at phpbuilder may do better since the cache is a static html page, not something in the mysql server.

also, see freshmeat.org under 'portal' there are a few projects out there that do what you are talking about
Thank you for the tips. I will check that out.

Also - 'cacheing' the data in mysql seems like an oxymoron to me.[nbsp][nbsp]Doing a database hit to retrieve data is the easiest way to develop a web page, but it wouldn't be my choice for 'cacheing' - in fact it doesn't even meat the definition of cacheing.

If you were to use mysql, copy all the data that you can to heap tables (faster, but in memory only), otherwise use wget or a perl script to write out the static contents of the pages (yahoo certainly does something like this I would think) every x minutes or whatever.
I won't argue. I have no intention to use MySQL for caching.
On the contrary, I am investigating another scheme for that.
I will post more information on this board when I get to the point where feedback is necessary.

FancyToy

PaulKroll
10-26-2000, 12:19 PM
heath wrote:
Doing a database hit to retrieve data is the easiest way to develop a web page, but it wouldn't be my choice for 'cacheing' - in fact it doesn't even meat the definition of cacheing.
FancyToy wrote:
I won't argue.
OK, well, I will. :)

Caching pages in mysql may or may not be efficient in a given case, especially compared to saving the page as a file, but any time you save a result that takes a lot of computation in a way that can be recalled with little computation, you're caching. Definition wise, caching covers a whole lotta ground.

"All programming is an exercise in caching."
-Terje Mathisen[nbsp]