MySQL using SELECT IN (x, y, z) vs one at a time

Finally deployed the Zentester heatmapping re-write today and wanted to share a 3 things I’ve learnt on the way on optimization.

1. Do “WHERE IN ()” instead of “WHERE =” Wherever possible – The biggest thing I discovered was that when selecting individual rows from a database by doing something like:

SELECT * FROM database WHERE id = 5
SELECT * FROM database WHERE id = 6
									

It’s much much faster to select them all at once rather than doing it one row at a time. How much faster? In one section of the heatmapping I was grabbing visitor id’s from ipaddresses from the visitors database like so:

SELECT id FROM visitors WHERE ipaddress LIKE 'xxx.xxx.xxx.xxx'
									

The ipaddress column was a primary key but even then every time I did this query it took approximately 0.1 seconds. I combined all of these into one huge SQL query looking something like this:

SELECT id FROM visitors WHERE ipaddress IN ('xxx.xxx.xxx.xxx', 'yyy.yyy.yyy.yyy', 'zzz.zzz.zzz.zz')
									

Loading 1000 different rows in this way took just 0.7 seconds. Which equated to a 150 x speed increase over the old approach.

2. PHP Arrays are really fast – I used to think PHP Associative arrays were slow and It’d be better to just grab everything from MySQL one at a time rather than caching it in php. Oh how wrong I was. I changed the overall code structure of my data processing from the following (psudo code):

while (data) {
    $visitorData = SELECT * FROM visitors WHERE ipaddress IN data->ipaddress
    $pageData = SELECT * FROM pagedata WHERE page IN data->page
    $siteData = SELECT * FROM sitedata WHERE site IN data->site
    //Process and insert data into another database
}
									


To this:

while (data) {
    $ipAddressArray[] = $data->ipaddress;
    $pageArray[] = $data->page
    $siteArray[] = $data->site
}

$ipData = SELECT * FROM visitors WHERE ipaddress IN ($ipAddressArray) 
while ($ipData) {
    $ipAddressMap[$ipData->ipaddress] = $ipData->visitorid;
}

$pageData = SELECT * FROM pages WHERE page IN ($pageArray)
while ($pageData) {
    $pageMap[$pageData->page] = $pageData->id;
}

$siteData = SELECT * FROM sites WHERE site IN ($siteArray)
while ($siteData) {
    $siteMap[$siteData->site] = $siteData->id;
}

while (data) {
    //process data using the maps instead of constantly querying the database
}
									


Instead of looping through all the data to be processed and processing it one at a time It’s figuring out up front every single piece of data that’s going to be needed from the database and grabbing it in just a few queries.

Then all this data is stored in associate php arrays (which actually hash the keys of the array so you can make the array keys literally anything, in one case ip addresses were used as the key). Then everything is processed grabbing the data out of these php associative arrays rather than pulling it from the database every single time.

I’m processing approximately 10,000 rows of data at a time and after implementing this change it went from taking 5 – 10 minutes to process down to 10 – 15 seconds.

3. Batch your inserts – I also got a small speed increase by combining all my inserts into one large statement, though don’t make this statement too large or MySQL will run out of memory when you try to run it (I discovered about 5000 inserts at once works best for me). Batching inserting took the insert time from about 60 seconds down to about 10, so it wasn’t as huge of an increase as the batched SELECT but it is noticeable especially when you’re dealing with lots of data.

Netbeans is awesome

If you still can’t find that ‘perfect’ IDE give Netbeans a try. I was recently introduced it and am still blown away by the amount of functionality and features it has. It’s similar to eclipse but I always found eclipse to be too sluggish to use on windows (everything seemed to take half a second to do which adds up and gets frustrating very fast).

The main things I like

  • Plugins for things like CoffeeScript, SCSS, Maven etc which make life a lot easier.
  • Shortcuts for just about everything, so if you’re a keyboard ninja you can get things done really fast.
  • It has everything else you’d expect in an IDE (code completion, jumping to definitions etc).

I’ve been using PHPDesigner for about 2 years now and although it’s great for developing in PHP it doesn’t have the best support for other languages such as Javascript, Coffeescript and Ruby.

If you’re a Windows coder give it a whirl, it’s open source and free.

Coffeescript Classes and Public / Private functions

So I’ve seen some coders confused by how they should declare their functions in Coffeescript. Here’s some notes I took while learning on how to make the equivilant of Public and Private functions for your class in Coffeescript.

Public Function - This is your bread and butter function, it is what most coders use by default with coffeescript and is accessible from outside the class.  Notice how it uses a colon after the function name instead of an equals sign.

class SomeClass
    functionName: ->
        alert('hello world');
									

Private function - This will create a private function as it makes it a variable inside the class. So the function is a closure and can only be accessed by functions of that class. Notice how it uses an equal sign instead of a colon.

class SomeClass
    functionName = ->
        alert('hello world');
									

There are two catches with Private Functions. The first is because they are a completely seperate function the @ (this) callback will not point to the main object (even though it is called from another internal function). If you wish to use @ in a private function whenever you call the private function it you must use .call(this) which will load the function but use the main class object as the @ callback inside it. For example:

class Dog
    dogName: "fido"
    constructor: (@dogName) ->

    doStuff: ->
        alert('the dog is walking');
        sayHello.call(this);

    sayHello = ->
        alert("Hi! I'm "+@dogName);

ralph = new Dog("ralph");
ralph.doStuff();

peter = new Dog("peter");
peter.doStuff();
									

If you run this example yourself you’ll notice both dogs have separate names and have the private function sayName. The sayName function cannot be called from outside the class (if you try and call it you’ll get an error) which is exactly how private functions should work.

The second catch is that this private function is shared for every dog. This isn’t really an issue for private functions as whenever you use @ the @ variables are specific to each dog. Though it is an issue if you wish to create private variables for your dog, as if you create them in this way they will be shared between every dog which is most likely not what you want.

If you have any questions or queries let me know via the contact form.

Seeing the future

The best startups are the ones that can see trends and start working on their product before the trend even makes it into the early adopter market. Google did this well, they saw the information in the world was increasing expontentially and realized people need a better way to find things.

There’s a new startup on the block that has seen the future. It’s called zaarly (and there are a few others in this space I believe, but it’s hard to track any down), and it’s going to explode in the next 5 to 10 years, because they see where the world is trending.

The income divide between upper and middle class is growing, everyone knows that. But what most have missed is that the upper class is becoming more and more time poor and cash rich, while the lower classes are becoming more time rich and cash poor.

Do you see the opportunity here?

Traditional freelance systems and craigslist are great if you have a big task you want someone to handle, or want something in a few days. But most of our days are not taken up by big tasks, and we all want things done right this instant. This is what Zaarly delivers,  making it quick and easy to get these little things done faster than ever before.

Zipments is another awesome startup idea along the same lines, focusing on peer to peer couriering.

The second cool thing all these startups are doing is making the world a better place. They’re providing a real way for the huge number of unemployed people to earn some cash and not sit around bored all day, and weaken the stranglehold big corporations currently have on the world. As the income gap increases more and more peer to peer services like these are going to help make the world a better and much more enjoyable place for everyone which is friggin awesome.

Resurrection

It’s been a while. Dreamhost deleted all my sites when downgrading my service. Luckily the only site I cared about still hosted with them is this blog.

Here I’ll be sharing thoughts on things that interests me, probably about coding, business, life improvement or learning new things. In fact it’ll probably be a lot like Tim Ferris’s blog, but not as popular (yet).

It’ll be full of geeky stuff like nodejs, hacking (ycombinator style) and why zaarly is the coolest idea around (seriously, this shit is going to go crazy in 5 – 10 years, you’ll see).

I’ll also try and explain some of the crazy ideas that waft around in my brain, mostly psychology related, sometimes business related and somehow they’ll always relate back to women.

I’ll try and keep my posts as short and to the point as possible, as I tend to waffle way too much some times (look I just did it there!)

That enough for now, enjoy.



all posts



2012
January
11MySQL using SELECT IN (x, y, z) vs one at a time


2011
December
18Netbeans is awesome
17Coffeescript Classes and Public / Private functions
6Seeing the future
6Resurrection

subscribe