PDA

View Full Version : Database design of user footprints


marty1101
07-31-2002, 10:23 PM
Hello,

One database table design question that may exist in some books, but I'd like to hear your experience.

A B2C web site wants to record users' browsing habit and shopping experience, so all data about the user are collected into a database, possiblly including IP, time on each page/step, click path ... etc. The data is huge. Data mining might be possible in the future. Current database is mysql. PHP scripts. Two ways to design the database:

1. Each table for each user. A new table will be created upon a new registration.
2. One table logs every action occured

What are the pros and cons of each approach?


Marty

psumner
08-02-2002, 06:43 PM
I would never create a different table for each user. Doing anything with the data would be a monumental task and I'm not sure how many tables MySQL even allows in a single data base.

I think the only reliable way to do this would be with sessions so you can be sure that each user is unique. If you're simply using IP address, you could end up with mutiple users on the same IP address.

You'd have to be much more specific about what data you wanted to collect to get any meaningful feedback. Maybe session number, page identification, duration on the page, previous page, and where he/she clicked through to next in each row?

session table
-------------------
id
session_id
page_id
duration
previous_page_id
next_page_id

page table
---------------
id
page


With this scheme you would know what the entry page was because previous_page_id would be NULL. You would know what the final page visited was if next_page_id was NULL. You'll have a hell of a lot of MySQL activity for this kind of thing though since for each page load you're going to have to connect to the DB and log the event. I would set it up to log to text files and use a cron job to update the data base on a schedule to reduce the load on the DB. Otherwise you'll have the overhead of the DB connection on each page hit and probably piss everybody off by loading the DB server way down if your site's busy.

Most of my pages have banner ads and call a function to display the banner and log the impression count to a text file as well as increment another file for hit count. Seems to work pretty well with a nightly (very late when the server is not busy) cron job to increment the banner impression counts in the DB.

Rich
08-02-2002, 10:37 PM
I would agree with what psummer has said...

A B2C web site wants to record users' browsing habit and shopping experience...

If you mean "visitor's browsing habits" as opposed to "user's browsing habits" then I thing collecting the necessary data is pretty straight-forward. If you really did mean user than you have the monumental task of trying to figure out and define exactly what a "user" is.

The data is huge.
If I "read between the lines" in your post, I assume that even more data may wnat to be collected than the fields you listed and that this information needs to be collected for each visitor. In this case, neither a flat-file method nor database method will be adequate for anything other than a site with very little traffic. If a high-volume of traffic is expected or desired, then the only reliable solution is a dedicated or virtual server where you have control over the apache daemon. Then, you would want to modify (re-write) the apache logging module to collect and log the information needed. This approach provides the most reliable and efficient solution. You should also consider third-party solution providers that provide these services--companies such as web side story or media metrix.