information station page comments and filter pages similar, also need to robots.txt with the URL rule set shield dynamic pages, prevent repeat included etc..
in the shopping mall, B2B and other large sites, often related to the condition of the problem of filtering through the selected product specifications and brand, there will be a large number of similar pages. If this problem can not be solved effectively will cause a large number of similar website content is to repeat the collection and so on, the problem can be conducted through the use of some URL shielding, or consider using Ajax form. But, not directly use the robots.txt protocol based robot has good effect, recommended or in the static URL rule to do by robots.txt on the prohibition of grab dynamic pages for processing.
is the first spider crawling ban dynamic pages as many people know, to reduce the repeated problems included the entire site. The weight advantage is beneficial to the web page content, is not easy because the repeat included lead content page weight dispersion and so on. This is the general significance of the skills and conventional website, but to the ShangCheng Railway Station, station, station and other large information quiz website, this standard is very large.
two, induced by the spider crawling important pages, improve the spider crawling efficiency
comments page information station
, a dynamic page crawling or by prohibiting certain pages, reduce duplication by
this technique is mainly to the site map and aggregate page labels >
in B2B, recruitment, Witkey website which will have the same problem, the specification spider can use robots.txt to effectively crawling, to avoid repeated collection and so on.
mall, B2B and other large web page filtering conditions
ADO, we start to get to. We all know that robots.txt robot protocol is to regulate the spider crawling set, generally in the routine of the spider crawling ban data, TMP the catalog will prohibit the crawling members, orders, inventory and so on module. But in addition to these conventional usage, robots.txt protocol of our work is to optimize the robot, there are many small skills can be used in conjunction with, let us work better for the website optimization.
just wrote an article recalled the new Adsense small error prone to the optimization process, the middle robot referred to the robots.txt agreement, can not help but want to discuss the use of a robots.txt robot skills protocol with you. Website optimization is a long-term work, but also an interactive work behind closed doors is not a good webmaster webmaster, so a lot of hope. Please correct the deficiencies.
3 and other similar