Thursday, October 21st, 2010.Dealing with CrawlersSearch Engine OptimizationThe Good, The Bad, & The Ugly.WELCOMECONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622.        URL: www.eSpace.com.egXNEXT
2What to Discuss?Dealing with the Crawlers the good & the bad robots …………………………..What is robots.txt file? The Definition ………………………………...Structure of robots.txt file for SEO purpose The Syntax ……….Standard User-agent, Disallow ……………………………..….Nonstandard Crawl-delay, Allow, Sitemap ………………….Extended Request-rate, Visit-time, Comment ………..…….Effective use of robots.txt file Best Practices ………………………Be aware of rel=“nofollow” Comment Spammers …...………………User Generated Spam How to avoid …………………....………………4567910111213CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622.        URL: www.eSpace.com.egXNEXTBACK
3Dealing with the CrawlersThe good & the bad robotsCONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622.        URL: www.eSpace.com.egXNEXTBACK
41. What is robots.txt file?A text file placed in the root directory and it is used to communicate with the search engines regarding the sections which you don’t want them to crawl and index. It is not necessary to have a robots.txt file.
 Make sure that you place the robots.txt file in the main directory.
 Restrict crawling where it's not needed.
 Common robot traps “Forms, Logins, Session IDs, Frames”.CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622.        URL: www.eSpace.com.egXNEXTBACK
52. Structure of Robots.txtIt is easy to create a robots.txt file as the structure of the robots.txt file is simple and basically, it contains the list of user agents and the files, directories which are to be excluded from crawling and indexing.Standard User-agent:
 Disallow:Nonstandard Crawl-delay:
 Allow:
 Sitemap:Extended Standard Request-rate:
 Visit-time:
 Comment: CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622.        URL: www.eSpace.com.egXNEXTBACK
62.a Standard: { User-agent }If you would like to set value for all crawlers use: User-agent: *If you would like to set a value to a specific search engine robot: User-agent: BotName A Complete updated list of Bots can be found at: http://www.user-agents.org/CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622.        URL: www.eSpace.com.egXNEXTBACK

Dealing with Crawlers

  • 1.
    Thursday, October 21st,2010.Dealing with CrawlersSearch Engine OptimizationThe Good, The Bad, & The Ugly.WELCOMECONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXT
  • 2.
    2What to Discuss?Dealingwith the Crawlers the good & the bad robots …………………………..What is robots.txt file? The Definition ………………………………...Structure of robots.txt file for SEO purpose The Syntax ……….Standard User-agent, Disallow ……………………………..….Nonstandard Crawl-delay, Allow, Sitemap ………………….Extended Request-rate, Visit-time, Comment ………..…….Effective use of robots.txt file Best Practices ………………………Be aware of rel=“nofollow” Comment Spammers …...………………User Generated Spam How to avoid …………………....………………4567910111213CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 3.
    3Dealing with theCrawlersThe good & the bad robotsCONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 4.
    41. What isrobots.txt file?A text file placed in the root directory and it is used to communicate with the search engines regarding the sections which you don’t want them to crawl and index. It is not necessary to have a robots.txt file.
  • 5.
    Make surethat you place the robots.txt file in the main directory.
  • 6.
    Restrict crawlingwhere it's not needed.
  • 7.
    Common robottraps “Forms, Logins, Session IDs, Frames”.CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 8.
    52. Structure ofRobots.txtIt is easy to create a robots.txt file as the structure of the robots.txt file is simple and basically, it contains the list of user agents and the files, directories which are to be excluded from crawling and indexing.Standard User-agent:
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Comment: CONTACT:2Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 14.
    62.a Standard: {User-agent }If you would like to set value for all crawlers use: User-agent: *If you would like to set a value to a specific search engine robot: User-agent: BotName A Complete updated list of Bots can be found at: http://www.user-agents.org/CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 15.
    72.a Standard: {Disallow }To tell all robots not to visit specific files: User-agent: *
  • 16.
    Disallow: /dir_name/file.htmlTotell specific robot not to visit specific Directory: User-agent: BotName
  • 17.
    Disallow: /dir_name/Totell specific robot not to visit specific File: User-agent: BotName
  • 18.
    Disallow: /dir_name/file.htmlToallow all robots to visit all files: User-agent: *
  • 19.
    Disallow: Tokeep all robots out: User-agent: *
  • 20.
    Disallow: /Totell all robots not to visit specific directories: User-agent: *
  • 21.
  • 22.
    Disallow: /users/login/CONTACT:2Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 23.
  • 24.
    Crawl-delay: 10AllowDirective: Allow: /dir_name1/file_name.html
  • 25.
    Disallow: /dir_name1/SitemapDirective: Sitemap: http://www.domain.com/sitemap.xml
  • 26.
    Sitemap: http://www.domain.com/dir/s/names-sitemap.xml.gzCONTACT:2Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 27.
    92.c Extended standard:Request-rateDirective: User-agent: *
  • 28.
  • 29.
    Request-rate: 1/5Visit-timeDirective: User-agent: *
  • 30.
  • 31.
    Visit-time: 1300-2030CommentDirective: User-agent: YahooSeeker/1.1
  • 32.
    Comment: becauseYahoo sucks :PCONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 33.
    103. Effective Robots.txtfile Restrict crawling where it's not needed with robots.txt.
  • 34.
    Use moresecure methods for sensitive content.
  • 35.
    Avoid allowingsearch result-like pages to be crawled.
  • 36.
    Avoid allowingURLs created as a result of proxy services to be crawled.
  • 37.
    Create separaterobots.txt file for each subdomain.CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 38.
    114. Rel=“nofollow”<a href=“http://www.domain.com”rel=“nofollow”>Spam</a> If your site has a blog with public commenting turned on, links within those comments could pass your reputation to pages that you may not be comfortable vouching for. Blog comment areas on pages are highly susceptible to comment spam. Nofollowing these user-added links ensures that you're not giving your page's hard-earned reputation to a spammy site.CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 39.
    125. User generatedspam Use anti-spam tools.
  • 40.
    Turn oncomment moderation.
  • 41.
  • 42.
  • 43.
    Block commentspages using robots.txt or META tags.
  • 44.
    Think twicebefore enabling guestbook or comments.
  • 45.
    Use blacklist to prevent repetitive spamming attempts.
  • 46.
    Add a“report spam” feature to user profiles and friend invitations.
  • 47.
    Monitor yoursite for spammy pages.CONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 48.
    13Upcoming SessionsHow toget on Google’s first page On Site Factors.
  • 49.
    On PageFactors.
  • 50.
    Off PageFactors.SEO for Mobile Phones Notify Google about mobile sites
  • 51.
    Guide mobileusers accuratelyPromotions and Analysis Promote your website in the right way
  • 52.
    Make useof free webmasters toolsCONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXNEXTBACK
  • 53.
    THANK YOUCONTACT:2 Saint Genny Street, Kafr Abdou. Suite 103 - 105. Alexandria 21311, Egypt.Phone: +2 (03) 546-7622. URL: www.eSpace.com.egXBACK