Site icon Code Exercise

Seo Analyzer Data Analysis Medium to Large sites SEO

I had to do data analysis and data mining origin, due to professional sensitivity in doing SEO time before the operation always like to analyse the data to see if we can find some kind of rule.

Website Seo Analysis

In a new website SEO before, I will first analyze the content of the site, namely, the site can be done as the main landing page for details of the number of pages. Then I will analyze the content of the website hosting SEO traffic landing pages type of composition. For example, a news site, you may have the following data 🙁 This data is for illustrative purposes only)

 

Channel Contents Content accounting Bring SEO traffic accounting SEO flow proportional /content accounted for
Blog 20,000,000 54.5% 10% 0.18
News 10,000,000 27.0% 40% 1.48
Finance and economics 5,000,000 13.3% 15% 1.13
Physical education 1,500,000 4.0% 32% 8.00
Military 500,000 1.2% 3% 2.50

As can be seen from the above data, the amount of the blog is very large, but it is very disproportionate to bring traffic, indicating that blog could be a potential source of traffic, it is possible to optimize a larger space (here with “probably” because The final conclusion also needs to consider the characteristics of the industry, quality of content, content characteristics, among other factors). Products can also be seen up to a single page of sports to bring traffic, stating that if we increase investment in content development in this regard, the input and output is relatively high. Can also be seen from the amount of content, than the larger blog, news, finance (total content proportion reached 95%), if time and energy is limited, can be SEO’s focus on the blog, news, finance these three channels above.

Develop SEO Strategy Important Steps:

Considering these factors, we can use the following SEO strategy:

  1. the expansion of news SEO victories in the news above continue cultivating; (because the current news is SEO main source, indicating the basis of good, the effect should be relatively easy to make);
  2. to explore the potential of the blog;
  3. increasing the number of sports channels content,
  4. CNBC continue cultivating (if there is time and energy to it)

Visible through the entire site content and SEO traffic is a simple data analysis will be on the status of the entire site to have a better understanding, in order to develop a more reliable overall strategy.

In the specific implementation stage, data analysis also plays a very important role.

Also: Read: How to optimize Post Titles After Publish it

Basic Principles Of Search Engines:

The basic principles of search engines is: first climb takes content from the Internet, and then establish the content index (such as the inverted index), and the content quality score; when users search, search engine, according to the first query in the index database find the relevant content, and then sorted according to the relevance of the content, quality scores, among other factors, and then returned to the user. The following from the reptiles (crawlers/spiders/bots) crawl pages indexed (enter index database), get traffic to talk about the importance of these three aspects of data analysis.

Also Read : What is Search Engine Optimization?

1 , reptiles(crawlers/spiders/bots) crawl. To the search engines (mainly domestic Baidu) get more traffic; you need to ensure that reptiles (crawlers/spiders/bots) crawling up the more you want it to crawl content. Reptile give every time slice of a site is fixed, for a lot of personal blog or small business sites, the content is very limited (usually in the thousands of pages below), these reptiles(crawlers/spiders/bots) enough time each day to all the content again crawling again. For medium to large sites, the general will have more than 10 million the number of pages, the contents of every new generation may also be more than a few hundred thousand, so the reptile can not be climbed all the pages. This time to develop some strategies to guide the reptile crawling is very important. But how to develops strategies and formulate what kind of strategy? We must first understand the behavior of reptiles (crawlers/spiders/bots), reptiles (crawlers/spiders/bots) visit will leave an access log, access logs are processed and analyzed, we can know that you need to know how many pages reptiles (crawlers/spiders/bots), climb the many kinds of products, each product on how much time whether each product and bring traffic agreement. For example, to analyze the results might look like the following:

Channel Reptiles(crawlers/spiders/bots) crawl
Blog (/ blog /) 300,000
News (/ news /) 800,000
Finance (/ finance /) 200,000
Sports (/ sports /) 200,000
Military (/ mil / 50,000
Registration / reg / 300,000
Pictures (/ pic /) 200.00

Note: That which we assume that these channels are in the main domain directory below to share reptiles (crawlers/spiders/bots) crawl daily quota.

From this data we can see reptiles(crawlers/spiders/bots) “register” and “picture” of these two products to waste a lot of reptiles(crawlers/spiders/bots) time, and basically did not bring traffic, you can use robots.txt or nofollow to prohibit reptiles(crawlers/spiders/bots) crawling, crawling precious Time left to other products; some products bring traffic rarely, such as many pictures and micro blogging site inside the station, but it often takes a lot of reptiles(crawlers/spiders/bots) resources can also be considered fully by robots.txt reptile ban on this product crawl.

This is just a simple example, we can also be obtained from the log from the reptile more resources: for example, can identify unnecessary jump pages (301, 302), that the URL often appear when the rules change, and the site Developers are often only concerned with the correctness of the function, but whether or not the occurrence of unnecessary jumps, you can also discover the incorrect dead links (404 pages). You can also see the speed and size of the page by reptile logs or too slow to identify the pages to be optimized. The best way is to do a reptile log statistics system, including a report of a given day, above and set the alarm threshold alarm.

2, pages included. Only one page is included, will it be possible to get traffic from search engines. For news sites, included a new page is more important, understanding the status of each page included especially the new generation included the status of the page is very meaningful. The reptiles (crawlers/spiders/bots) crawling the page, does not mean that the page will be included, so we also need a monitoring system included.

Want to determine whether a page is included, just to put this page URL address in the search engine search can be, if there is content in the search results, then said it had been included, or that have not been included. With this method you can check whether the content on the site to be included, if not included, then you need to consider other ways to improve the collection of probability, the method can be considered are:

(1)   By webmaster platform sitemap submission

(2)   by crawler system can know which pages crawled more frequently (half a page or a list of RSS page), you can embed the content of these batches frequently crawled pages.

3 Get Traffic. Page After Page Included An Analysis Of What Needs To Get The Traffic. 

I believe that most of the data SEOer will care, but also to do more in-depth, and here I only emphasize two data, one is the number of (unique landing page unique Landing Pages ),  and these pages bring traffic cumulative distribution;  another data is unique-queries , and these query brings the cumulative distribution. As can be seen from these two data flows primarily from their own is still hot from the long tail, if it is hot, then the SEO strategies above will need to invest some resources to be able to carry the flow of popular landing page above, the outer and inner chain given the above chain resources inclined to spend more energy on human editors; if the flow mainly from the long tail words, when the allocation of resources may require a little more balanced, while finding ways to expand the number of long-tail content, and in the development of SEO strategies to consider whether the impact of more pages, do not put too much effort in a number of popular resources above. At the same time pay more attention to unique landing pages and unique query number of upgrade.

In short, SEO steps, the energy of the stuff are quantified, so you can better understand their own websites .After in-depth understanding of their own website, not only can guarantee large SEO correctness of the overall strategy, but also be able to know exactly what page in every aspect (crawling, indexed or ranked) above problems, accelerate their discovery and solve problems speed.

Exit mobile version