A Finland Swede in Bavaria

Thursday, 14 May 2009

Home-Made Blog Statistics from WordPress MU

For long, I had wanted to get some statistics on the number of blog posts I do per month and blog on blogs.arno.fi. For even longer, I had had abstinence symptoms from not doing almost any coding at all this century. Today, I combined the desire for statistics with the desire for some semi-coding and got some statistics done. And this is how I did it.

First, I logged into MySQL on the server where my WordPress MU resides. And I did a "use wordpress" to start using the Wordpress MU database. A first SHOW TABLES showed a whopping 153 tables. Wow! That comes from me (and my family members) having 18 blogs, each of which requires 8 tables. The remainder, 153 - 8*18, is 9 tables. Here are the tables, and what I found out about them while eagerly SELECTing from them:

wp_blogs: The key WordPress table, listing the blogs



mysql> select blog_id,path,registered,last_updated from wp_blogs;
+---------+-----------------+---------------------+---------------------+
| blog_id | path | registered | last_updated |
+---------+-----------------+---------------------+---------------------+
| 1 | / | 2008-09-20 15:15:54 | 2008-10-19 14:39:15 |
| 2 | /fussball/ | 2008-09-20 15:19:12 | 2008-10-07 22:07:09 |
| 4 | /efib/ | 2008-09-20 15:41:14 | 2009-03-16 09:51:48 |
| 5 | /fib/ | 2008-10-13 10:44:16 | 2009-05-11 21:34:25 |
| 6 | /dolce_vita/ | 2008-10-25 00:54:01 | 2009-05-03 08:18:41 |
| 7 | /fandorin/ | 2008-10-25 02:35:46 | 2009-04-16 11:48:50 |
| 8 | /isit/ | 2008-10-27 14:39:57 | 2009-05-12 16:42:24 |
| 9 | /linnea/ | 2008-10-31 20:26:30 | 2008-11-16 19:13:43 |
| 10 | /poquito/ | 2008-11-03 22:15:21 | 2008-11-09 22:21:48 |
| 11 | /vaba_lava/ | 2008-11-17 08:33:04 | 2008-11-17 10:44:24 |
| 12 | /labrit/ | 2008-11-17 10:02:50 | 2008-11-19 09:10:03 |
| 13 | /laisvas_zodis/ | 2008-11-17 12:22:51 | 2008-11-18 06:48:00 |
| 14 | /alman_degilim/ | 2008-11-18 10:41:46 | 2008-11-20 21:25:48 |
| 15 | /sushi/ | 2008-11-18 23:51:00 | 2008-11-22 13:28:29 |
| 16 | /furuvik/ | 2008-12-07 14:19:10 | 2009-05-04 14:13:53 |
| 17 | /kajsql/ | 2009-02-11 10:25:26 | 2009-05-06 16:28:08 |
| 18 | /guanxi/ | 2009-05-06 18:26:24 | 2009-05-06 18:50:29 |
| 19 | /yilingyi/ | 2009-05-06 18:30:02 | 2009-05-06 20:00:25 |
+---------+-----------------+---------------------+---------------------+
18 rows in set (0.00 sec)


Conclusion: blog_id and path are the main fields

The other eight top-level tables



mysql> show tables where Tables_in_wordpress > "wp_a";
+---------------------+
| Tables_in_wordpress |
+---------------------+
| wp_blog_versions |
| wp_blogs |
| wp_registration_log |
| wp_signups |
| wp_site |
| wp_sitecategories |
| wp_sitemeta |
| wp_usermeta |
| wp_users |
+---------------------+
9 rows in set (0.00 sec)


Browsing through the other tables, they seem to be best left for browsing and editing through the user interface of WordPress itself.

The eight blog-level tables in WordPress MU


The tables on the blog level are named wp_1_blahbah, wp_2_blahblah, wp_3_blahblah and so on. I can't say that this is a better table design than merely entering blog_id as a further key in each of the tables, but hey, I'm a happy WordPress user and not out to redesign their database structure. Besides, they may well have solved some scaling issues by adding eight new tables for each new blog. However, the complexity of the database structure will affect my job of doing statistics. And looking at life from a positive standpoint, it will increase the amount of programming joy I get out of my desire to do relevant blogging statistics.

There are, as said, eight tables for each blog. Let's pick my blog no 8, /isit/ (this very blog, "It's Some Interesting Topic").

mysql> show tables where Tables_in_wordpress like "wp_8_%";
+-------------------------+
| Tables_in_wordpress |
+-------------------------+
| wp_8_comments |
| wp_8_links |
| wp_8_options |
| wp_8_postmeta |
| wp_8_posts |
| wp_8_term_relationships |
| wp_8_term_taxonomy |
| wp_8_terms |
+-------------------------+
8 rows in set (0.01 sec)


The two most interesting ones here are wp_n_posts and wp_n_comments. The three tables named around "terms" are about tagging and automatically updated through WordPress. The wp_n_options (blog level options) and wp_n_links (blogrolls etc.) tables are relevant, but adequately browsable and editable through the WordPress user interface. The same goes for the field settings in wp_n_postmeta.

The wp_n_posts table


The wp_n_posts table has 24 fields. Browsing at them for a while, here are the key ones

  • ID: Primary key (1, 2, 3 ...)

  • post_date: Publishing date of the post. Crucial for statistics. Remember, you can backdate posts or write posts in advance.

  • post_modified: Editing date of the post. This is when you last touched the post.

  • post_type: This is usually "post", for normal, published blog posts. For what WordPress calls pages, it's "page". But there are also plenty of instances with "revision" (for intermediate versions of the blog post, if you save and edit it multiple times) and "attachment" for instance if you upload a .jpg file. Consequently, where post_type in ("page","post") becomes an essential part of many blog statistics SELECTs.

  • post_title: The header of the blog entry

  • comment_count: Number of approved comments

  • comment_status: Either "closed" or "open" depending on whether comments are allowed or not. I have combatted spam by disallowing comments for particularly infested entries, using the normal WordPress UI. I could bulk reallow comments on all closed entries with a simple UPDATE on comment_status, once I've installed better spam prevention.

  • guid: the complete URL to a post, such as http://blogs.arno.fi/isit/?p=211 (which then will be converted to http://blogs.arno.fi/isit/2009/03/16/a-digital-native-and-my-first-nokia/ by WordPress once you surf to it).

  • post_name: contains the "a-digital-native-and-my-first-nokia" part of the URL (which is neatly editable from within the WordPress UI).

  • post_status: it's "inherit" or "publish" depending on whether the post is live or not -- but I prefer using post_type to identify real entries from earlier versions


My first attempt at statistics


This knowledge enables compact statistics, such as:

mysql> select date_format(post_date,"%Y-%m") as Month, count(*) as Blogs
-> from wp_8_posts
-> where post_type in ("page","post")
-> group by 1;
+---------+-------+
| Month | Blogs |
+---------+-------+
| 2008-10 | 11 |
| 2008-11 | 15 |
| 2008-12 | 3 |
| 2009-01 | 4 |
| 2009-02 | 3 |
| 2009-03 | 3 |
| 2009-05 | 2 |
+---------+-------+
7 rows in set (0.00 sec)


However, what I'd really like to have is that very same piece of statistics, but for all my blogs in one report. I don't want to run eighteen reports, one for each blog.

And that's where my coding desire can be satisfied.

My idea is to create a new, aggregate table for statistics, where I insert one row for each blog entry in each of the eighteen blogs.

Let me first create the table, based on the above blog (numbered 8 and called "/isit/") with its 41 blog entries. It's a simple CREATE as documented as the second example on http://dev.mysql.com/doc/refman/5.1/en/create-table.html -- in my case like this:

mysql> CREATE TABLE kaj_wp_stats
-> SELECT 8 as blog_id,"/isit/" as path,
-> ID,post_type,post_date,post_title,comment_count,post_modified,guid
-> FROM wp_8_posts
-> WHERE post_type IN ("page","post");
Query OK, 41 rows affected (0.00 sec)
Records: 41 Duplicates: 0 Warnings: 0


I specifically mentioned 8 as blog_id and "/isit/" as path in order to later be able to separate blog entries by blog.

Then I take the very same SELECT statement and use it for another blog, but substitute the CREATE with the equivalent INSERT statement:

mysql> INSERT INTO kaj_wp_stats
-> SELECT 4 as blog_id,"/fib/" as path,
-> ID,post_type,post_date,post_title,comment_count,post_modified,guid
-> FROM wp_4_posts
-> WHERE post_type IN ("page","post");
Query OK, 27 rows affected (0.00 sec)
Records: 27 Duplicates: 0 Warnings: 0


Note the changes: "INSERT INTO", "4", "fib" and "wp_4_posts".

Generating SELECTs with SELECTs


Now, I could of course do a lot of error-prone manual labour and insert the following 16 blogs in the same way. But I'm lazy and I desire some coding, so instead, what I do is that I create the corresponding code with a SELECT statement from the wp_blogs file. This is my meta-code:

select concat("INSERT INTO kaj_wp_stats SELECT ",blog_id," as blog_id,'",path,
"' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_",
blog_id,"_posts WHERE post_type in ('page','post');") from wp_blogs;


In this metacode, the lower-case "select" and "from" is what is executed. The upper case INSERTs and SELECTs end up in the output, which looks like this:

INSERT INTO kaj_wp_stats SELECT 1 as blog_id,'/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_1_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 2 as blog_id,'/fussball/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_2_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 4 as blog_id,'/efib/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_4_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 5 as blog_id,'/fib/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_5_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 6 as blog_id,'/dolce_vita/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_6_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 7 as blog_id,'/fandorin/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_7_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 8 as blog_id,'/isit/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_8_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 9 as blog_id,'/linnea/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_9_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 10 as blog_id,'/poquito/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_10_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 11 as blog_id,'/vaba_lava/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_11_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 12 as blog_id,'/labrit/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_12_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 13 as blog_id,'/laisvas_zodis/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_13_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 14 as blog_id,'/alman_degilim/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_14_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 15 as blog_id,'/sushi/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_15_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 16 as blog_id,'/furuvik/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_16_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 17 as blog_id,'/kajsql/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_17_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 18 as blog_id,'/guanxi/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_18_posts WHERE post_type in ('page','post');
INSERT INTO kaj_wp_stats SELECT 19 as blog_id,'/yilingyi/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_19_posts WHERE post_type in ('page','post');


I then clean up the kaj_wp_stats table by removing it:

mysql> DROP TABLE kaj_wp_stats;
Query OK, 0 rows affected (0.00 sec)


Now I manually substitute the first INSERT INTO with a CREATE TABLE, and run the select-generated SELECTs.

However, I get Warnings: for each of the INSERT statements -- which is due to the unfortunate fact that the first blog has a path "/", which turns into a varchar(1) into which of course none of the other paths fit. Hence I manually add

ALTER TABLE kaj_wp_stats MODIFY path varchar(50);


right after the first CREATE TABLE statement, and then everything runs fine. Here's a shortened version of it all:

mysql> DROP TABLE kaj_wp_stats;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE kaj_wp_stats SELECT 1 as blog_id,'/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_1_posts WHERE post_type in ('page','post');
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0

mysql> ALTER TABLE kaj_wp_stats MODIFY path varchar(50);
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0

:
:

mysql> INSERT INTO kaj_wp_stats SELECT 5 as blog_id,'/fib/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_5_posts WHERE post_type in ('page','post');
Query OK, 85 rows affected (0.00 sec)
Records: 85 Duplicates: 0 Warnings: 0

:
:

mysql> INSERT INTO kaj_wp_stats SELECT 8 as blog_id,'/isit/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_8_posts WHERE post_type in ('page','post');
Query OK, 41 rows affected (0.00 sec)
Records: 41 Duplicates: 0 Warnings: 0

:
:

mysql> INSERT INTO kaj_wp_stats SELECT 19 as blog_id,'/yilingyi/' as path,ID,post_type,post_date,post_title,comment_count,post_modified,guid from wp_19_posts WHERE post_type in ('page','post');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0


Finally, the statistics I desire!


Now, I can do statistics such as this:

mysql> select blog_id,path,count(*) as blogs from kaj_wp_stats group by blog_id,path;
+---------+-----------------+-------+
| blog_id | path | blogs |
+---------+-----------------+-------+
| 1 | / | 1 |
| 2 | /fussball/ | 6 |
| 4 | /efib/ | 27 |
| 5 | /fib/ | 85 |
| 6 | /dolce_vita/ | 10 |
| 7 | /fandorin/ | 8 |
| 8 | /isit/ | 41 |
| 9 | /linnea/ | 13 |
| 10 | /poquito/ | 9 |
| 11 | /vaba_lava/ | 2 |
| 12 | /labrit/ | 1 |
| 13 | /laisvas_zodis/ | 1 |
| 14 | /alman_degilim/ | 4 |
| 15 | /sushi/ | 2 |
| 16 | /furuvik/ | 38 |
| 17 | /kajsql/ | 2 |
| 18 | /guanxi/ | 1 |
| 19 | /yilingyi/ | 2 |
+---------+-----------------+-------+
18 rows in set (0.00 sec)


Or this

mysql> select date_format(post_date,"%Y-%m") as Month,
-> count(*) as Blogs,
-> sum(if(path="/isit/",1,0)) as isit,
-> sum(if(path="/fib/",1,0)) as fib,
-> sum(if(path="/efib/",1,0)) as efib,
-> sum(if(path="/furuvik/",1,0)) as furuvik,
-> sum(if(path="/dolce_vita/",1,0)) as dolce_vita,
-> sum(if(path="/fandorin/",1,0)) as fandorin,
-> sum(if(path="/poquito/",1,0)) as poquito
-> from kaj_wp_stats
-> where post_date > "2007-12-31"
-> group by 1;
+---------+-------+------+------+------+---------+------------+----------+---------+
| Month | Blogs | isit | fib | efib | furuvik | dolce_vita | fandorin | poquito |
+---------+-------+------+------+------+---------+------------+----------+---------+
| 2008-02 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 2008-03 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 2008-04 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 2008-05 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 2008-06 | 5 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2008-08 | 3 | 0 | 2 | 0 | 1 | 0 | 0 | 0 |
| 2008-09 | 8 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |
| 2008-10 | 50 | 11 | 11 | 9 | 0 | 6 | 3 | 0 |
| 2008-11 | 59 | 15 | 14 | 3 | 4 | 1 | 3 | 9 |
| 2008-12 | 23 | 3 | 9 | 5 | 3 | 2 | 1 | 0 |
| 2009-01 | 29 | 4 | 7 | 6 | 12 | 0 | 0 | 0 |
| 2009-02 | 23 | 3 | 12 | 1 | 6 | 0 | 0 | 0 |
| 2009-03 | 23 | 3 | 11 | 3 | 6 | 0 | 0 | 0 |
| 2009-04 | 4 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |
| 2009-05 | 12 | 2 | 2 | 0 | 5 | 0 | 0 | 0 |
| 2013-06 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
+---------+-------+------+------+------+---------+------------+----------+---------+
16 rows in set (0.00 sec)


Mission accomplished! The statistics are there, and I had fun "coding" with MySQL.

1 comment:

  1. [...] a blog entry on http://blogs.arno.fi/isit/2009/05/14/home-made-blog-statistics-from-wordpress-mu/ I describe what I [...]

    ReplyDelete