PDA

View Full Version : Cloaking for SEs...


auteur
02-25-1999, 12:06 PM
Well, I've been searching and researching this topic for myself and for work...so I thought I'd share a bit of my findings with FQ. http://www.aota.net/ubb/smile.gif

It has been determined that doorway pages are definitely the way to go to optimize your visibility on the search engines. But there are a few cons to doorway pages and listing high on the internet to consider. Doorway pages cons means your visitors have another click to get to your site and the con for listing high [yes, I did say there's a con for listing high] is that everyone wants your position and are willing to copy your code and use it to get there. This then boots your site out of the engine's database 100% because of the identical page creates an appearance of intentional spamming, so the engine gets rid of one page.

What to do?

Well, cloaking solves both problems. You create the doorway pages and when the search engine visits your site, it sees the doorway page created specifically for it. When visitors click on your link in the search engine they go directly to your main page not your doorway. AND possible thieves will only see the code for your main page and therefore have not access to the code that gets you on top.

Downside to cloaking...cost, I've found the range to be $800 - $1,000.00.

I don't own this software, due to cost right now. I don't resell any cloaking programs. I've just been researching it and because I have been finding more and more information on it, I thought I'd share the info. with you. I realize the places I hang out to get information isn't always where everyone else goes. http://www.aota.net/ubb/smile.gif but...I'm positive you will be hearing about this sooner or later.

------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

Justin
02-25-1999, 12:34 PM
Isn't that basically a page / script that redirects you right to the index page? If so, that wouldn't be hard to create. But there is one thing: Altavista (and others), I think, don't like being redirected. However, a script could easily just include the index page...

I was thinking (again). What if you had a large number of *.shtml pages, with only one line:

<!--#include file="index.html" -->

So basically they're replica's, with less work, and there's no redirecting. To make them each different, put meta tags with different keywords and descriptions in each one.

Then submit them each to the engines. They'd never know, plus you'd have that many more links to the rest of your site to spider. Noone can copy and paste an SSI from your page, either.

Ok, it's just an idea. I know these things are considered spamming, but if done carefully enough (I'm sure the $800 script is slick) it could be quite effective.

Justin

Justin
02-25-1999, 12:37 PM
Just to add, a while back when I was on Geocities, I used a redirect service. http://vdjstudio.cjb.net redirected to http://www.geocities.com/SiliconValley/Station/6751 - a lot easier to remember (not to mention type).

However, when I would submit it to AV, it instantly indexes the page, and it said that it indexed the geocities one, not the redirect. I think it knew...

auteur
02-25-1999, 12:45 PM
Justin... NOPE! not a redirect at all. It's all behind the scenes. The visitors are never redirected, they go directly to the main page right from the search engine. It's a IP or Agent watch for when the search engines hit your server/logs ... the robot will see the doorway created for that specific engine and will then be indexed. I have a meeting to shoot to right now. When I get back I'll scrimage up some urls for you to check out. http://www.aota.net/ubb/smile.gif

------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

auteur
02-25-1999, 02:12 PM
Okay, I'm outta the meeting and at home for lunch...busy day...it's one of those what can go wrong does but what can go right does too...---eeewwww--- http://www.aota.net/ubb/smile.gif

Here's some urls to check out more info:
Beyond Engineering CGI Food Script System
www.bey.com/ipdelivery/ (http://www.bey.com/ipdelivery/)

Search Engine Cloak
www.position-it.com/secloak/ (http://www.position-it.com/secloak/)
***this one has a trial download.


Swrac Feeder www.make-it-online.com/products.htm (http://www.make-it-online.com/products.htm)

Stealth Meta Tag Script www.outrank.com/stealth.shtml (http://www.outrank.com/stealth.shtml)

I've been searching for a cheaper way to do this. But haven't found anything. I even tried a barter offer of promotion for product...no can do. It's not priced for smaller sites to purchase. But of course if you become a reseller, then you can sell at a discount after purchasing the initial product. Frustrating, but that's what happens when big business takes over the net. http://www.aota.net/ubb/smile.gif



------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

Justin
02-26-1999, 01:35 AM
Ok, in other words, it's a special made page just for them - I think. Sort of like protecting your images by requiring a certain client domain and / or referer - and I suppose you could tailor one for each engine. I like that, but it is way too expensive. Isn't there a more affordable alternative that's almost as good? http://www.aota.net/ubb/smile.gif

Justin

Jacob Stetser
02-26-1999, 01:42 AM
Downside is right!

I've researched a few of these programs and the least expensive, but also least feature-laden, costs $99.

I wish there were a powerful freeware/shareware Cloaking program. Anyone want to help program one?

Ditto on a powerful advertising server. Those get real expensive too when you get up to the extended features.

Sigh http://www.aota.net/ubb/frown.gif

auteur
03-06-1999, 05:16 PM
There is two ways to do this, Agent name or IP. I don't think the index has to be a cgi, though. I've not gone further than what I had before. If you're a member of searchenginewatch.com then head here for some insight: www.searchenginewatch.com/subscribers/delivery.html (http://www.searchenginewatch.com/subscribers/delivery.html)

If not, here's an exerpt:
The drawback to agent name delivery for many people is the fact that it is
possible for someone to trick your server into believing they are a search engine
spider. This means that your code is not 100 percent safe from prying eyes. However, agent name delivery is a simple and easy way for many people to get closer to the goal of code security.



IP delivery is much harder than agent delivery. It will likely involve some custom
programming to rig your computer to deliver pages based on an IP address. Moreover, IP addresses can often change. You can expect to be monitoring search engines on a regular basis to find any changes, if you decide to use IP delivery.

Hope this helps. http://www.aota.net/ubb/smile.gif I'd love to have a group effort on this. Oh, and the IP is the way to hide the code. With the amount of code stealling going on, I'd want to have a way to hide the source from visitors. http://www.aota.net/ubb/smile.gif

------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

Justin
03-07-1999, 12:14 AM
I have to wonder how much of a difference it can really make to have agent specific pages. I'm not sure it would really generate that many more hits, considering the amount of work involved... am I just that new?? http://www.aota.net/ubb/smile.gif It just seems like that is a lot of work to trick the search engines.

I would imagine, if I had $800 - $1000 to spend on promotion for my site, I'd think that would be better spent on getting ads put up or site design - not a cgi. This is only my opinion, of course, and I'm not knocking the idea. I like the idea of building your own cgi to accomplish this, I just don't see spending that much money on something like that...

A lot of people would consider that spamming - I'm not sure where I stand on that right now, but it's certainly not the most honest way to generate traffic. I don't mean this to be against anyone, again, just my opinion, but if I had a thousand bucks to spend just to generate more traffic, I'd use it to buy a lot of coffee and just work on the site itself, and spend more time submitting to the search engines manually http://www.aota.net/ubb/smile.gif

I haven't gotten into Yahoo yet, but otherwise I've been satisfied with the SE's. AltaVista brings me quite a few hits, and the others have me listed. I guess I'm just not too worried about getting a million hits a day - with this particular site anyway. I make my money from downloads of my program, not hits to my site, and most of my downloads come from Download.com, etc, which has done me better than any SE could. My ad banners are just an extra buck or two every now and then http://www.aota.net/ubb/smile.gif I guess if that was my main reason for having a site, I'd be more interested in the SE's (and I would hope I'd do a better job designing the site, too http://www.aota.net/ubb/smile.gif)

On a side note, Windows keeps changing the registry back. I keep altering it and it puts it back http://www.aota.net/ubb/frown.gif It has a mind of it's own...

------------------
Justin Nelson, SFE Software
www.vdj.net (http://www.vdj.net)

[This message has been edited by Justin (edited 03-06-99).]

jenili
03-07-1999, 01:06 AM
The visitors are never redirected, they go directly to the main page right from the search engine. It's a IP or Agent watch for when the search engines hit yourserver/logs ... the robot will see the doorway created for that specific engine and will then be indexed.


A good idea. Folks, in its base form this is [b]very[/i] simple to do. Just make your home page a CGI application (name it index.cgi and add a line to your .htaccess file to use index.cgi as a home page name, oh, and another line to enable CGI as a file type in that directory -- caution, this has some implications.) The CGI application, in a really simple form, would be something like this.
----

$ua = $ENV{'HTTP_USER_AGENT'};

if ($ua eq Yahoo user-agent string)
{$file = "yahoopage.html";}

(repeat above for each user-agent)

if ($file eq "")
{$file = "index.html";}

open (UAPAGE, "<$file");
@page = <UAPAGE>;
close (UAPAGE);
print @page;

---

So what are the user-agent strings for Yahoo and all the rest? See
http://info.webcrawler.com/mak/projects/robots/active.html

'Course, if you want to do this for a *lot* of user-agents, you're going to get sick of all those if statements. You'll want to create a hash table instead, with the user-agent string and the customized page to deliver.

Then you'll get tired of editing the hash table directly, so you'll write yourself a little Web interface to edit it for you.

Then you'll get tired of typing in all those user-agent strings by hand, so you'll create a table from WebCrawler with *all* the user-agent strings (see robot.name, robot.user-agent, and robot.status in their machine-readable text file), and rewrite your Web interface to let you pick from their names when you indicate the page for 'em. You might even want to write a little weekly cron job (can we cron on FQ?) that goes out and adds new entries from WebCrawler's file to your table.

Then you'll want to do it for more than one index page, so you'll add another field to your hash table for file or directory names, modify your CGI to check the URL requested, and either include the CGI in all those pages with SSI or just change each one of those pages to a copy of the CGI (I vote for the former, which would be much cleaner, but you may not have privs to do that on your Web host).

Then you'll want to play around with a Web interface for creating the customized pages for each search engine. It might take keywords and description and put them into the appropriate format for each page. Maybe you'll want to run a cron job that touches all those pages too, even moving stuff around in them to keep them fresh. This would be the trickiest part, figuring out how to generate a desirable page for each engine you're interested in, and it would probably be less effective than creating your own highly tweaked pages for the biggies. But it might save you some time, so you might do it. I know there are lots of URLs on this topic; here's one I know of:
http://www.searchenginewatch.com/

Then you'll want to combine your two Web interfaces so you can just enter the keywords/descriptions/text and check off the search engines you want to cater to, and it'll do all the rest for you (creating the custom pages and editing the hash table).

By this time, you'll have some other ideas about what else you want to do, especially if you've dowloaded trials of any of the commercial cloaking products and wished for some of their features.

And then you'll have one of those $800-1000 packages (or better), and you'll start to think about publishing it in an open-source venue or marketing it at a reasonable price. But hopefully you'll share with the FQ community first (and maybe hand it over to Deb and Andrew, who could even include it as a CNC feature like the counters and sendmail forms if they chose to).

If any FQ-using folks are interested in collaborating on a tool like this, I am too. My time's really limited right now but should free up a bit in May or June. My biggest concern about releasing it would be people using it to dishonest ends. I like Cloakbot's license agreement that says your license ends immediately if you use it to spamdex, but really, how do they enforce something like that?

OBTW, the claim that nobody else can steal your position-grabbing code when you use this technique is total BS if you're using user-agent strings. All they have to do is change their user-agent string and hit your site. On Windows, that means editing the registry. It's easy, though I grant fewer people will be able to do it than would otherwise be able to snarf your code. If you're using IPs instead of user-agent strings, you have a better chance of not getting snarfed. Not impossible to snarf you, but much more difficult and probably not worth the effort for someone who can pull it off.

jeni

gwlubin
03-07-1999, 01:39 AM
dear elizabeth,

thanks for raising this subject and for the valuable info.

it's too much for me to take in right now, but i have a feeling it'll come in very handy in the future.

g

Justin
03-07-1999, 01:44 AM
Wow - impressive! But I want to add that I've heard some search engines stop on the word CGI - and they have many triggers to find whether a page is static or dynamic. So you'd have to call the page something like .chtml or something, setting up an AddHandler or whatever...

I'm not sure if all engines do this, but I have heard that somewhere before - or does the page come up to the user with a different name? Or can you make that happen?

PS - thanks for the User Agent Registry tip http://www.aota.net/ubb/smile.gif I'm now surfing as:

Godzilla/4.0 (compatible; M$IE 4.01; Windoze 98)

hehe - too cool http://www.aota.net/ubb/biggrin.gif


------------------
Justin Nelson, SFE Software
www.vdj.net (http://www.vdj.net)

Jacob Stetser
03-07-1999, 03:09 PM
I'd be interested in helping out in any way I can.. I have a membership at Searchenginewatch.com so I can get the IPs and user-agent stuff, I believe.

Justin, this sort of thing is really only spamming if you use it to hide the regular old spamdexing techniques, which the search engines get you for anyhow. The reason some people consider it spamming is because they can't see your meta tags (and copy them http://www.aota.net/ubb/wink.gif ) so they automatically assume you're doing something unfair, especially if you're high in the rankings.

As for blocking cgi, using the .htaccess to make index.cgi your default page goes around that.. because the search engine only sees http://www.yourdomain.com/ instead of http://www.yourdomain.com/index.cgi.

There are other ways of working around that limitation- why not write a PHP based cloaking script? It would use far less of the system resources, isn't blocked by search engines (as long as things like ? don't appear in the URL), and would probably be easier to integrate into FQ http://www.aota.net/ubb/smile.gif

A final thought, I think the best way to go about this is to keep the script separate from pages and call via SSI or PHP includes.

I'd love to be a part of this.. oh, Auteur, btw, about that Links script, I've actually done a bit of work on it, but I didn't hear back from you last week. If no-one else has already finished it for you, let me know and I'll show you what I've done.


Jake

Justin
03-07-1999, 03:39 PM
Well, I'm not saying that it's necessarily wrong. I am thinking of a less elaborate method myself. I never thought about php though - how simple. Just a simple if-else statement, outputing the proper HTML / meta tags, etc.

I was only thinking that it sounds a lot like when a food critic is on his way to your restaurant, you make sure the place is clean, and that his/her favorite dish is being served tonight.... you know? Not wrong, really, just a tad misleading.

But I also realize that all of the other restaurants are doing the same and more, like making sure his/her favorite table is available, his/her favorite music is playing on the jukebox, etc, and if my restaurant doesn't measure up, I won't get a good review http://www.aota.net/ubb/smile.gif

All in all, the more I think about it, the more I like it http://www.aota.net/ubb/smile.gif


------------------
Justin Nelson, SFE Software
www.vdj.net (http://www.vdj.net)

auteur
03-07-1999, 03:42 PM
I still don't think the index page has to necessarily be a cgi. But, since I'm not an expert I'm not going to pour these words in a concrete sign http://www.aota.net/ubb/smile.gif.

It is not spamming if the same rules to doorway pages are used...must pertain to content, don't keyword stuff hidden text, etc..oh and no stealing someone else's code. http://www.aota.net/ubb/smile.gif

Jake,
I was going to give it the college try yesterday but there was a thread in the server forum on troubles that Benson was having with the script. So, he said the owner of the script was going to update it and it'd be no good to install the script until the update is in place. He also said he'd do it for me too. Thanks for your help. I thought I'd emailed you and I hadn't heard back so I figured you must have been busy. Which I completely understand. http://www.aota.net/ubb/smile.gif

So it's on hold for now.

As for the cloaking, if instead of guessing one of the places from one of my above posts has a trial download. It does me no good being new to cgi to figure it out. But I'm sure the program itself will give more information about how it works. http://www.aota.net/ubb/smile.gif



------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

auteur
03-07-1999, 03:48 PM
Hey Justin, we're posting at the same time. I meant to respond to something you've said earlier...

As to the search engine game and it's relevancy. I guess I'm a little bias since I am paid to be specialized in the visibility of the CBS SportsLine site on the engines. But>>>

Research shows that 53% of internet searchers polled will find a website via search engines over any other vehicle, such as other sites, banners, etc...

It's extremely important to anyone who has a low to no budget for internet marketing to get as high as possible, since they can't make the big deals that we do at work with the big websites for impressions, partnership programs and backend bartering.

And if you have a free content only site with banner ads that support the site, it's extremely important to keep the unique visitor count up to get quality advertisers paying to be located on your site.

BUT ... like I said, I am a little bias on the subject. http://www.aota.net/ubb/smile.gif I live for internet promotion.. heeheehee

------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

jenili
03-07-1999, 05:23 PM
So you'd have to call the page something like .chtml or something, setting up an AddHandler or whatever...
I still don't think the index page has to necessarily be a cgi.


No, in our FQ environment, it certainly doesn't. First line of my .htaccess file:
AddHandler server-parsed .html

which means, treat all HTML documents as if they were .shtml. I just tested, and on FQ you can SSI a CGI no problem. So you're home free as far as that goes. The search engine would think it was dealing with a flat HTML file. If that didn't work for you, and if you were going full-out with this stuff, you might do something nutty in .htaccess like
Action search-spoofer secloak.cgi
AddHandler search-spoofer .html

which would make secloak.cgi handle all .html documents in your space (well, I haven't tested that, but that's what the Apache dox seem to say).

Elizabeth, I don't see anyone here guessing about how it's done -- just some discussion about what would be the best way for us to do it in our environment. There's always more than one way to do stuff in Web applications, and "how" is the most fun to think about.... http://www.aota.net/ubb/biggrin.gif

WRT IP address vs. user-agent, yeah, that would be a lot harder to fake (I didn't say impossible, but it's not something *I* can do without some extensive research http://www.aota.net/ubb/smile.gif ). Unfortunately, I imagine biggies like Yahoo could have a *lot* of totally distinct IP addresses, and I also imagine they could change for load-balancing or other purposes. If we could find a good, exhaustive, oft-refreshed list of SE IPs out there, IP filtering would be easy enough -- just check REMOTE_ADDR instead of HTTP_USER_AGENT.

I think there are some other good reasons to do something like this, besides search engine positioning. One example: On our latest redesign project, my team was actually sweating to shave 100ths of a K (as in, yes, 10s of bytes) off of image sizes to scrounge download time. At that point, the savings of not including metadata in the document itself, but including it in a search-engine-specific page, could become very real (to the tune of 1/2 K or more). Not to mention that our local copy of Ultraseek is tuned differently from Infoseek itself, and we might want to include some special metadata for *our* SE that we don't want other SEs to see (e.g., titles that don't list our organization's name). We may handle content differently once you're there than we would want to handle it to an external SE.

And some SEs seem to consider very legitimate design techniques (e.g., small fonts) spamdexing. It'd be nice to be able to deliver a page optimized for a particular SE's algorithm without compromising the for-people design of the page, assuming of course that you do so honestly and ethically and all that jazz.

I think this would be fun. I can bring CGI background to the table, and the algorithm is pretty clear to me. If anyone moves past talking and into doing over the next couple of months... count me in, as long as it's an altruistic type of project (read: no conflicts over marketing and intellectual property, and we give the results gratis to our tireless hosts). :p

Justin, on your UA getting rewritten (I assume every time you run IE), it does that to me too. If you want to change it permanently I think you have to edit the IE executable with debug (a bit scary) -- used to have a URL on doing this -- will look for it next week.

jeni

auteur
03-08-1999, 08:49 AM
Jenili, welcome to the discussion.. http://www.aota.net/ubb/smile.gif

I'm not a the techie so my confusion on cgi may get in the way every now and then. I'd love to get beyond the discussion and put into action this program.

I'll add my search engine knowledge for the doorway pages that have to be created.

I'm already in the process of finding the right template for work for each engine.

BTW, this technique --as with the doorway pages--shouldn't be used for Yahoo! as it is not an engine but a directory. This wouldn't help in any way with them since humans visit the sites and manually add it to their directory. A site's visibility on Yahoo! is completely based on the verbage used in the initial submission process.


------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

mopacfan
07-03-1999, 05:41 PM
Between IPush and Cloakbot, which one do you think is the better product?[nbsp][nbsp]They are both $150.[nbsp][nbsp]The cloakbot website is much less professionaly designed, but the features are better represented.[nbsp][nbsp]IPush has a nice, but spartan website. The user interface is unattractive.[nbsp][nbsp]I'd like to get everyone's thoughts.

Sincerely,

Michael

Rich
07-03-1999, 07:20 PM
Just a word of caution:

Search engines recognize doorway pages as an ethical way to index your site IF the doorway pages are relevant and are not just copies of other pages. One appropriate use example would be to create a doorway page for different products or services, etc.

Cloaking is NOT reconized by search engines as an ethical tool to index your sites. Use cloaking tools and other redirect methods with caution. Many search engines prohibit their use and will ban sites that they find using them.

[The above information was reported in a recent newsletter from First Place Software and was based on discussions with search engine personnel.]

Regarding cgi or other dynamically generated pages: you will find that these are extremely difficult to get indexed. My timezoneconverter.com site uses almost all dynamically generated pages due to the nature of the Converter utilities. I have found it very difficult to get anything indexed other than the home page, which is static. Doorway pages are an effective way to advertise and index dynamic sites.

Rich

Justin
07-04-1999, 01:35 AM
Or you could use PHP or embeded Perl to make dynamic pages that appear to be static pages :)[nbsp][nbsp]www.HostFacts.com (http://www.HostFacts.com) is nearly 100% dynamic, but doesn't look much different until you get into searching and other stuff like that :)

------------------
Justin Nelson
FutureQuest Support

auteur
07-04-1999, 02:29 PM
Actually that is not totally correct.[nbsp][nbsp]Cloaking is acceptable, but has the same implications that doorway pages have...must be relevant, must not be a copy of anothers pages and must not use spamdexing techniques.

Not all search engines like doorway pages, i.e. Infoseek. So, it depends on who you ask.

As for dynamic pages, I agree. Static pages get indexed much faster and have a better chance of getting a high ranking.

I don't know much on the subject of PHP, so I don't know how well that would or would not work.

My P.O.V.


------------------
Elizabeth M. Miller
Getting You the Attention You Deserve!
www.123marketing.com (http://www.123marketing.com)

Justin
07-04-1999, 03:53 PM
I know most search engines will stop on any of the following in a URL:

/cgi-bin/
.cgi
?
=
&amp;

These are all tell tale signs that it's going to be dynamic content - so even PHP would have the same problems if the file is followed by a query. Which is why I am considering making my site split arguments via a slash instead of ? - so, for example, viewing a hosting package:

Now: http://www.hostfacts.com/hosts/show.php3?ID=1

Better: http://www.hostfacts.com/hosts/show/1

Or even better (Amazon does this): http://www.hostfacts.com/hosts/000001.html

Where it would convert 000001.html to show.php3?ID=1 :) Then I could create a static page (one that looks static anyway) listing all hosting plans, along with a URL to the plan's listing, allowing an SE to index it :)

I'll post to let you know how it works - BTW, Jake gave me this idea a while back and I forgot about it until reading this thread again. Thanks, Jake :)

------------------
Justin Nelson
FutureQuest Support

Justin
07-04-1999, 05:47 PM
Ok, I have created a page of links to a bunch of innocent looking, static like plain jane HTML pages. What these do is take a URL like /hosts/66.html and show the contents of /hosts/show.php3?ID=66. Normally show.php3 sends no cache headers, stating basically that it is expired, was modified today, etc - but if the REQUEST_URI contains &quot;html&quot; it doesn't send those headers, in order for the search engines to believe it is static content.

I also removed the no cach headers from the links page. This page contains links to each hosting plan in the database, links to the search engine, showing all pagkages by any particular host (and the search outputs html links too), and links to all of my news items, also rewritten to html.

http://www.hostfacts.com/hosts/SE/

This looks static enough, it's small and has no banners etc - it's a search engine only page. All of the links are .html, rewritten to the proper script when clicked. So far it works awesome :) Now the engines will hopefully eventually spider that page and index all of my &quot;static&quot; content :)

We'll see how it works out. Thanks again, Jake, for the great idea - I don't think I would have thought this scheme up on my own (although I admit I did it a little differently than you showed me because I deleted the message by accident hehe ;))

------------------
Justin Nelson
FutureQuest Support

infinitevoid
05-21-2000, 01:39 AM
gone
[This message has been edited by infinitevoid (edited 04-03-01@2:30 pm)]