PDA

View Full Version : Regexp wizards, start your engines...


Jacob Stetser
06-17-1999, 04:38 PM
Ok, I'm getting better at Regular expressions, but I'm not the best yet...

I'm trying to write a module for my personal home page (heh, that's the original full name of PHP!) that tells me what today's active topics are.

Basically, it calls up the page referenced by the Today's Active Topics link at the top of the page, regexps the links out and then I want it make the links clickable.. here's the code...


<?php
$hostname[nbsp][nbsp][nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp][nbsp][nbsp]&quot;www.aota.net&quot;;
$port[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp][nbsp][nbsp]80;
$uri[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp][nbsp][nbsp]&quot;/cgi-bin/search.cgi?action=simplesearch&amp;SearchDate=0&amp;ForumChoice=ALL&quot;;
$header_passed[nbsp][nbsp][nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp][nbsp][nbsp]0;
$link_prefix[nbsp][nbsp][nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp][nbsp][nbsp]&quot;&amp;nbsp;&amp;nbsp;o &quot;;

[nbsp][nbsp][nbsp][nbsp][nbsp]$fpread = fsockopen(&quot;$hostname&quot;, $port, &amp;$errno, &amp;$errstr);
[nbsp][nbsp][nbsp][nbsp][nbsp]if(!$fpread) {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]echo &quot;$errstr ($errno)
\n&quot;;
[nbsp][nbsp][nbsp][nbsp][nbsp]} else {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]fputs($fpread,&quot;GET $uri HTTP/1.1\nHOST: $hostname\n\n&quot;);
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]while(!feof($fpread)) {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$line=fgets($fpread, 255);
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]// loop while there are links in the line
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]while(ereg(&quot;HREF=\&quot;([^\&quot;]*)\&quot;>([^</A>]*)</A>&quot;, $line, $match)) {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] if($match[1] != &quot;search.cgi?action=intro&quot; OR $match[1] != &quot;Ultimate.cgi?action=intro&amp;BypassCookie=true&quot;) {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$ao_content.=$link_prefix.&quot;<A HREF=\&quot;&quot;.$match[1].&quot;\&quot;>$match[2]</a>
\n&quot;;
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] }
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$replace=ereg_replace(&quot;\?&quot;, &quot;\?&quot;, $match[0]);
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]$line=ereg_replace($replace, &quot;&quot;, $line);
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]}
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]}
[nbsp][nbsp][nbsp][nbsp][nbsp]}
[nbsp][nbsp][nbsp][nbsp][nbsp]fclose($fpread);
print($ao_content);
?>


The Today's Active Topics page is at http://www.aota.net/cgi-bin/search.cgi?action=simplesearch&amp;SearchDate=0&amp;ForumChoice=ALL

Now, can anyone tell me the correct regexp to weasel out only the Topic titles and links?

Pleeeease?<!-- NO_AUTO_LINK -->
------------------
icongarden.com/?fq (http://icongarden.com/?fq)
icongarden: making good ideas grow.

Justin
06-17-1999, 05:12 PM
<?

# Path to the UBB non-cgi directory
$UBB = &quot;http://www.aota.net/ubb&quot;;

# URL for search
$SearchURL = &quot;http://www.aota.net/cgi-bin/search.cgi?action=simplesearch&amp;SearchDate=0&amp;ForumChoice=ALL&quot;;

# First, get the page into a string
$page = join(file($SearchURL), &quot;&quot;);

# Pull all hyperlinks from the page looking like a thread
while (eregi(&quot;href=\&quot;($UBB/Forum./HTML/([0-9]{6})\.html)\&quot;>([^/<]+)</a>&quot;, $page, $junk)) {

[nbsp][nbsp][nbsp][nbsp]# Delete the link from the page (and any dupes)
[nbsp][nbsp][nbsp][nbsp]$page = eregi_replace ($junk[1], &quot;&quot;, $page);
[nbsp][nbsp][nbsp][nbsp]print &quot; <a href=\&quot;$junk[1]\&quot;>$junk[3]</a></li>\n&quot;;
}

?>


HTH

------------------
Justin Nelson
FutureQuest Support

Jacob Stetser
06-17-1999, 05:17 PM
You make it all look so easy, Justin ;)

Thanks a bunch!
------------------
icongarden.com/?fq (http://icongarden.com/?fq)
icongarden: making good ideas grow.

Justin
06-17-1999, 05:24 PM
PS - the above HAS been tested exactly as it is and does in fact work :)

------------------
Justin Nelson
FutureQuest Support

usbnuts
01-04-2001, 02:42 AM
What does this thing do?