Thursday, May 27, 2010

InfiniDB Alpha 1.1.2 on 64-bit Windows and MySQL Workbench

This week has been rough on my computers and I had to reinstall Vista on my main Windows system. But being a 'glass half full' guy, I took it as an opportunity to load the new alphas for both InfiniDB and MySQl Workbench.

I downloaded the InfiniDB software from http://infinidb.org/downloads/cat_view/40-binary-release/137-112-alpha-binary-releases and Workbench from http://dev.mysql.com/downloads/workbench/ respectively. In stalled them in that order and started the database and then Workbench.

I was expecting to have to fiddle with configuring one or the other to get them to work together. But I received a delightful surprise. Both worked out of the chute with no customization. Congratulations to the folks writing the install scripts at both companies!

Tuesday, May 4, 2010

Getting started with InfiniDB Part I

My new job with Calpont has me scrambling to learn all the facets of the InfiniDB storage engine. For those of you new to the scene, InfiniDB is column-orientated as opposed to row-orientated and is designed for data warehouses and business intelligence. Most MySQL DBAs are probably not used to thinking sideways and hopefully this series of postings will change that.

To make things interesting, I am using real data from BP's Statistical Review of World Energy 2009. The data is a historical information on various forms of energy. And it is free to download and comes in Excel workbook format. I wanted data that was unfamiliar to me and in a format that would be commonly used in a BI setting.

The first step is to obtain and install the InfiniDB community software. Since the install instructions are extremely clear, I will not go over those steps.

Next I exported the data from the Crude oil prices 1861 to 2008 worksheet into a CSV file. Now you have choices on how to load the data into your table. The first is the good old MySQL LOAD DATA INFILE statement.
mysql>USE energy;
LOAD DATA INFILE '/tmp/oil.csv'
INTO TABLE oil 
FIELDS TERMINATED BY ',';

Or you can use the supplied bulk loading tool. In this example the table is simple and there are only 148 rows. LOAD DATA INFLE works fine for small sets. I copied CSV file into the directory InfiniDB used for the bulk loading tool -- /usr/local/Calpont/data/bulk/data/import --, changed the commas to pipes (s/,/|/g) preferred by the bulk loader, and prepare it for the columns tructure:

# /usr/local/Calpont/bin/colxml energy -t oil -j04

Bulkload root directory : /usr/local/Calpont/data/bulk
job description file : Job_04.xml
2010-05-04 12:42:21 (5364) INFO : successfully load job file /usr/local/Calpont/data/bulk/job/Job_04.xml
2010-05-04 12:42:21 (5364) INFO : PreProcessing check starts
2010-05-04 12:42:21 (5364) INFO : PreProcessing check completed
2010-05-04 12:42:21 (5364) INFO : preProcess completed, total run time : 0 seconds
2010-05-04 12:42:21 (5364) INFO : No of Read Threads Spawned = 1
2010-05-04 12:42:21 (5364) INFO : No of Parse Threads Spawned = 3
2010-05-04 12:42:21 (5364) INFO : For table energy.oil: 148 rows processed and 148 rows inserted.
2010-05-04 12:42:22 (5364) INFO : Bulk load completed, total run time : 1 seconds

[root@localhost import]# 
Now to load the data:
 /usr/local/Calpont/bin/cpimport -j 04
Running colxml with the following parameters:
2010-05-04 12:42:13 (5312) INFO : 
 Schema: energy
 Tables: oil  
 Load Files: 
 -b 0
 -c 1048576
 -d |
 -e 10
 -f CSV
 -j 04
 -n 
 -p /usr/local/Calpont/data/bulk/job/
 -r 5
 -s 
 -u
 -w 10485760
 -x tbl

File completed for tables:
 energy.oil
 
Normal exit.
Almost forgot to show you the table!
CREATE TABLE oil (`year` INT(4), price_historical DECIMAL(5,2), price_current(5,2)) ENGINE=InfiniDB;

So now we can ask for the historical average price of a barrel of oil.
mysql>SELECT AVG(price_historical) AS 'Historical Avg Price' FROM oil\g

+----------------------+
| Historical Avg Price |
+----------------------+
|             7.772297 |
+----------------------+
Many of you reading this are probably muttering 'I could have done this with any storage engine except Blackhole! and you are right. The beauty behind the column orientated approach does not appear until you are working with large sets of data. In the next post we will get into large sets of data and compare speeds with other storage engines.

Monday, May 3, 2010

High Availability MySQL Cookbook

MySQL is an easy database to get running initially but it can be tricky to run in demanding environments. High Availability MySQL Cookbook is a relatively thin book packed with information.

The first four chapters are on the care and feeding of a MySQL 7.x Cluster. It starts simply with an initial configuration, covers backups & replication, covers user defined partitioning, and covers troubleshooting node failures. The examples are short, to the point, and devoid of any unneeded filler.

Next come chapters on replication including the very tricky Multi Master Replication. Used shared storage is covered in a separate chapter from DRBD. The book finishes strongly with performance tuning.

The good points -- the book is concise and not padded with filler. The bad is that it is too short. Considering how well the author covers the topics that may not be too big a of a problem. For a novice to intermediate level MySQL DBA, this is a valuable guide on making your databases highly available. Long time MySQL DBAs will find nothing new but might appreciate the clear and concise explanations presented.