Pesky cache table

Sometimes that cache table just doesn’t clear. And when it doesn’t, look out. It’ll grow and grow until your website crashes into a glorious, user denied hellscape. Basically, it’ll just crash. Had this issue recently and while attempting to diagnose I came up with a little script to keep on top of things.

First, a spot of SQL

SELECT table_name AS “Tables”,
round(((data_length + index_length) / 1024 / 1024), 2) “Size in MB”
FROM information_schema.TABLES
WHERE table_schema = “DATABASE”
ORDER BY (data_length + index_length) DESC LIMIT 10;

Nothing too fancy here, just a list of the biggest tables

Next, we need a way to run that SQL on the command line. Well, it is a Drupal site, so drush is the answer

drush @prod sql-cli < table_size.sql

Again, nothing too fancy. Running an sql command line interface into the production alias and we will push in the above sql. This gives me:

Tables Size in MB
cache 6119.59
cache_menu 38.73
webform_submitted_data 31.09
search_index 16.03
cache_field 12.59
cache_metatag 5.58
field_revision_body 3.98
field_data_body 3.91
cache_form 3.53
cache_views 3.53

This is all fine and dandy, but I need something to tie it all together. Bash to the rescue!

drush @prod sql-cli < table_size.sql 2>&1 | awk ‘{ print $1 ” ” $2 }’ | while read output;
echo $output >/dev/null
table=$(echo $output | awk ‘{ print $1}’ )
size=$(echo $output | awk ‘{ print $2 }’ )

#echo $table $size

if [ $size != “Size” ] && [ ${size%.*} -ge 6000 ]; then
#echo “Alert: $table table is $size mb”
mail -s “Alert: $table table is $size mb” MY_EMAIL_ADDRESS

OK, let’s break this down.

  • Starting off we are reading that SQL output and pushing it into an output variable.
  • Using awk we get the table name and size.
  • There are some #echo’s in there I used for testing. Leave them in and uncomment for your own fun and profit.
  • From here we are testing that first the size is not the header “Size” and that it is in fact an integer larger than 6000. For production I increased this to 10000 but did not change it here. Set to your own liking.
  • If the size is indeed larger than your set number, fire off an email alert.

Could this be more robust? Sure. But the office is closing early for the holiday weekend and I wanted to get this off and running.

Have a great holiday everyone!


I’m watching you

I do enjoy Redhat’s Openshift service. It’s fast and easy to get up and running. Using this tool I’m able to run my own version of Analytics. Why would I do this when Google Analytics is a perfectly cromulent choice? Well, sometimes it’s not. Case in point, I have been posting on my writers group blog on a weekly basis and found myself jonesing for traffic data. Blogger is owned by Google and Google loves analytics, so I thought I found a happy place. Turns out no. Yes, they tell me the basics, who and from where, but for me the prime metric to watch is Bounce Rate (BR). That magical percentage that shows how many visitors you have come, then go. A large number indicates your keyword or title is misleading, and leads to mis-trust in future posts. I would like a number around 40%. Realistically, for a blog post, it’ll probably be around 60%, which is frighteningly high for normal landing pages.

JavaScript and some PHP to the rescue!

For the blog post I slide into each a snippet of javaScript

 *  Call the jQuery libaray
<script src="//"></script>

 *  Setup interval to fire off every second for the first 2 minutes.
 *  Send data to my server for collection.
<script type="text/javascript">
setInterval(function() {
    //$("p").append(+new Date);
    var  created = +new Date
    $.post("", { visitor: myIP(), site: "blog post title" });
}, 1000, 120);

 *  Grab the visitors IP address
function myIP() {
    if (window.XMLHttpRequest) xmlhttp = new XMLHttpRequest();
    else xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");"GET"," ",false);
      hostipInfo = xmlhttp.responseText.split("\n");
       for (i=0; hostipInfo.length >= i; i++) {
         ipAddress = hostipInfo[i].split(":");
         if ( ipAddress[0] == "IP" ) return ipAddress[1];
     return false;

When I was first trying out this script I didn’t add the 120 second timeout. Bad idea. Someone in San Jose opened the page and for two days had it open. That was a doh! moment for me and I realized anything past 2 minutes didn’t matter. It wasn’t a bounce!

For the server side I opted to use Openshift’s PHP 5.4 cartridge and MySQL. There are two parts to the server, the POST API and the means to present the data. To receive the post from the javascript I have a small script waiting to catch and save the data, and named it api.php:


 * MySQL Credentials
define('DB_HOST', getenv('OPENSHIFT_MYSQL_DB_HOST'));
define('DB_PORT', getenv('OPENSHIFT_MYSQL_DB_PORT'));
define('DB_NAME', getenv('OPENSHIFT_GEAR_NAME'));

 * MySQL Connection
$dsn = 'mysql:dbname='.DB_NAME.';host='.DB_HOST.';port='.DB_PORT;
$dbh = new PDO($dsn, DB_USER, DB_PASS);
 * Look for the POST data and
 * assign to variables
if ( isset($_POST) ) {
 foreach ( $_POST as $key => $value ) {
 if ( $key == 'site') {
 $site = $value;
 if ( $key == 'visitor') {
 $visitor = $value;

 * Populate table with visitor, timestamp and blog post title
 $now = time();
 $sql = $dbh->prepare('INSERT INTO log (line_created, date_created, tracking) VALUES (:line_created, :now, :tracking)');
 $sql->execute(array(':line_created' => $visitor, ':now' => $now, ':tracking' => $site));



Fairly straightforward capturing of data into a table I call log:


| Field        | Type         | Null | Key | Default | Extra |


| line_created | varchar(255) | YES  |     | NULL    |       |

| date_created | varchar(255) | YES  |     | NULL    |       |

| tracking     | varchar(255) | YES  |     | NULL    |       |


As this is all just proof of concept I do not have audit tracking or security measures in place…yet.

Next post: Displaying all this data


If you found this helpful, a couple of mBTC’s are always welcome 🙂