Planet Drizzle

Linux Conf AU 2010

Posted by: Eric Day, February 04, 2010 12:54 AM

I was really excited when I had my Gearman talk accepted to Linux Conf AU 2010 because I had never been out that far in the Pacific (only Hawaii). Of course it wasn’t in Australia this year, and instead in Wellington, New Zealand. My wife came too, and we also made a vacation out of the down times we had around the conference. It turned out Brian couldn’t make it this year so Monty, Stewart, and I gave the Drizzle talk. It was great to see some familiar faces, including Mark Atwood, Giuseppe Maxia, Josh Berkus, and Selena Deckelmann. Josh actually ended up being on the same flight out, so we got to catch up while going through New Zealand customs at 5am after a 13 hour flight. :)

New Zealand is an amazing place. We flew in and out of Auckland and took the train to Wellington. The train ride mostly consisted of grazing sheep once out of the metro areas, did you know there are more sheep than people in NZ? Beyond the sheep, there were great views along the way, especially in the middle near the larger mountains and volcanoes. We stopped for a day to hike the Tongariro Alpine Crossing. It was sunny when we started, but it it was raining with 40mph winds at the top, so we didn’t get to see as much as we hoped. There were still beautiful views on each side though.

The conference was very well run, thanks to anyone who had a hand in it! The speakers dinner was at this great museum nearby on the waterfront and included live Maori singing and dancing. The vegan options were tasty, and I got to meet a few interesting folks there (like the folks from Dreamwidth, a LiveJournal-like blogging service). Some notable sessions during the conference were “The World’s Worst Inventions” by Paul Fenwick, “Anti-features” Keynote by Benjamin Mako Hill, “The Hydras GCC Static Analysis Plugins” by Taras Glek, and “Simplicity Through Optimization” by Paul McKenney. There were many other great sessions, and some I wish I could have attended.

I’m certainly going to try to go again next year, which if you didn’t hear will be in Brisbane!

Fun with Table Functions

February 03, 2010 08:33 PM

I just about have all of the INFORMATION_SCHEMA replaced with Table Functions!

The big wins:
  • One Execution path (less bugs)
  • Simple interface, which means more langauges
  • Zero materialization happening
  • Less Code. This allows us to remove a lot of code (and single shot passes for particular use cases).

    The data dictionary operates entirely off the proto system, so what you see is what we have. We use the table names stored within the proto so no translation ever happens. This is pretty handy for filesystems which do not preserve case (and we don't have to do anything to support them any longer).

    You can also type "SELECT * FROM DATA_DICTIONARY.SCHEMAS".

    There is no longer a "SCHEMATA" tables, just SCHEMAS. Want INDEXES? SELECT FROM INDEXES.

    We will also be able to split up the tables that are from the SQL ANSI standard one from the ones that are not. The advantage is that we can provide an ANSI compliant INFORMATION_SCHEMA, and still have plenty of room to add additional tables as we see fit (yes, like the tables we provide today which house the information on the active Memcached cluster information).

    drizzle> use data_dictionary;
    Database changed
    
    drizzle> select * from plugins;
    +-------------------------------------------+-----------------------+-----------+-------------+
    | PLUGIN_NAME                               | PLUGIN_TYPE           | IS_ACTIVE | MODULE_NAME |
    +-------------------------------------------+-----------------------+-----------+-------------+
    | ARCHIVE                                   | StorageEngine         | TRUE      | TRUE        | 
    | ascii                                     | Function              | TRUE      | TRUE        | 
    | benchmark                                 | Function              | TRUE      | TRUE        | 
    | BLACKHOLE                                 | StorageEngine         | TRUE      | TRUE        | 
    | char_length                               | Function              | TRUE      | TRUE        | 
    | character_length                          | Function              | TRUE      | TRUE        | 
    | CHARACTER_SETS                            | TableFunction         | TRUE      | TRUE        | 
    | COLLATIONS                                | TableFunction         | TRUE      | TRUE        | 
    | COLUMNS                                   | TableFunction         | TRUE      | TRUE        | 
    | compress                                  | Function              | TRUE      | TRUE        | 
    | connection_id                             | Function              | TRUE      | TRUE        | 
    | console                                   | Listen                | TRUE      | TRUE        | 
    | crc32                                     | Function              | TRUE      | TRUE        | 
    | CSV                                       | StorageEngine         | TRUE      | TRUE        | 
    | default_replicator                        | TransactionReplicator | TRUE      | TRUE        | 
    | drizzle_protocol                          | Listen                | TRUE      | TRUE        | 
    | Error_message_stderr                      | ErrorMessage          | TRUE      | TRUE        | 
    | FunctionEngine                            | StorageEngine         | TRUE      | TRUE        | 
    | GLOBAL_STATEMENTS                         | TableFunction         | TRUE      | TRUE        | 
    | GLOBAL_STATUS                             | TableFunction         | TRUE      | TRUE        | 
    | GLOBAL_VARIABLES                          | TableFunction         | TRUE      | TRUE        | 
    | hello_world                               | Function              | TRUE      | TRUE        | 
    | INDEX_PARTS                               | TableFunction         | TRUE      | TRUE        | 
    | INDEXES                                   | TableFunction         | TRUE      | TRUE        | 
    
    
  • Drizzle build 1273 source tarball has been released

    Posted by: lbieber, January 28, 2010 08:40 PM

    Drizzle source tarball based on build 1273 has been released. This marks completion of our Bell milestone. We made a lot of great progress and improvements during the Bell milestone, one of the main goals was to make sure data loss does not occur so that active testing can now occur. While Bell has been completed, upgrading will continue to require dump/load , so hopefully we will nail that down soon and let you know when it is ready.

    Stay tuned for more information on goals and features for the our next major milestone which will be code named Cherry.

    The Drizzle download file and change log can be found here

    Tungsten 1.2.2 Release is Out - Faster, More Stable, More Fun

    Posted by: Robert Hodges (noreply@blogger.com), January 28, 2010 07:01 AM

    Release 1.2.2 of Tungsten Clustering is available on SourceForge as well as through the Continuent website.  The release contains mostly bug fixes in the open source version but there are also two very significant improvements of interest to all users.
    • The manager and monitoring capabilities of Tungsten are completely integrated on the same group communications channel.  This fixes a number of problems that caused data sources not to show up properly in older versions.  
    • We are officially supporting a new Tungsten Connector capability for MySQL called pass-through mode, which allows us to proxy connections by transferring network blocks directly rather than translating native request protocol to JDBC calls.  Our tests show that it speeds up throughput by as much as 200% in some cases. 
    The commercial version has additional features like PostgreSQL warm standby clustering, add-on rules to manage master virtual IP addresses and other niceties.   If you are serious about replication and clustering it is worth a look.

    This is a good time to give a couple of reminders for Tungsten users.  First, Tungsten is distributed as a single build that integrates replication, management, monitoring, and connectivity.   The old Tungsten Replicator and Myosotis builds are going away.   Second, we have a single set of docs on the Continuent website that covers both open source and commercial distributions.

    With that, enjoy the new release.  If you are using the open source edition, please post your experiences in the Tungsten community forums or write a blog article.  We would love to hear from you.

    P.s., We have added Drizzle support thanks to a patch from Marcus Eriksson but it's not in 1.2.2.  For that you need to build directly from the SVN trunk.  Drizzle support will be out in binary builds as part of Tungsten version 1.3.
    Notes on HEAP/MyISAM Index Key Handling on WRITE

    Posted by: Toru Maesaka, January 26, 2010 08:57 AM

    Disclaimer: This post is based on HEAP/MyISAM’s sourcecode in Drizzle.

    Here are my brief notes on investigating how index keys are generated in HEAP and MyISAM. I lurked through these because I’ve started preparing for decent index support in BlitzDB. I also wrote this to assist my biological memory for later grepping (I have terrible memory for names). I’m only going to cover key generation on write in this post. Otherwise this post is going to be massive.

    HEAP Engine

    The index structure of HEAP can be either BTREE or HASH (in MySQL doc terms). Like other engines HEAP has a structure for keeping Key definition (parts, type, logic and etc). This structure is called HP_KEYDEF and it contains function pointers for write, delete, and getting the length of the key. These function pointers are assigned to at table creation or when the table is opened. The assigned function depends on the data structure of the index and it can be either of the following:

    BTREE

    • hp_rb_write_key()
    • hp_rb_delete_key()

    HASH

    • hp_write_key()
    • hp_delete_key()

    As for get_key_length(), either of the following functions are used for both data structures.

    • hp_rb_var_key_length()
    • hp_rb_null_key_length()
    • hp_rb_key_length()

    When writing a row to the tree, HEAP writes to the index using a key generated by hp_rb_make_key(). Note that it does not use this for the hash index. The generated key is populated inside ‘recbuffer’ in HEAP’s handler object (HP_INFO structure).

    From my understanding, it loops through the key segments (I suspect it is similar the internal KEY_PART_INFO structure) and appropriately copies each key field value to the output buffer. By meaning “appropriately” it respects the characteristics of the data type when packing the buffer. For example, for a variable length field, it will only copy the actual data and not the max possible size of it. The final byte that is copied to the buffer is the address of the chunk where the record lives.

    MyISAM Engine

    The upper layer of key handling in MyISAM looks somewhat similar to HEAP so you can really tell that it was written by the same people. Things are nicely wrapped together by the MYISAM_SHARE structure so it’s relatively easy to follow. BlitzDB has a class called BlitzShare for the same purpose (This is based off Archive Engine’s ArchiveShare class).

    Like HEAP, MyISAM has a structure for individual key definition called MI_KEYDEF (it’s defined in myisam.h). There are more function pointers in this structure than HEAP.

    • bin_search()
    • get_key()
    • pack_key()
    • store_key()
    • ck_insert()
    • ck_delete()

    In Drizzle, _mi_ck_write() is assigned to ck_insert() which is the entry point to writing a MyISAM index. The key that MyISAM uses to write to the index is generated by _mi_make_key(). Like HEAP, it will loop through the key segments and pack the relevant fields accordingly to the characteristic of the data type. The output buffer belongs to MyISAM’s hander (lastkey2).

    From Here

    I’ve actually written a naive key generator for BlitzDB already based on Drizzle/MySQL’s internal KEY_PART_INFO array. It seems to be working on EXACT MATCH but I still need to implement an index scanner which looks much harder to pull off than a table scanner. What I’m really worried about is supporting composite indexes (namely reading/searching on it) but hopefully I’ll understand how this area of the storage system works soon.

    Replicating transactions directly to RabbitMQ

    Posted by: Marcus Eriksson (krummas@gmail.com), January 25, 2010 03:43 PM

    Previously RabbitReplication tailed the transaction log provided by Drizzle and then the Java application sent the protobuf serialized transaction to RabbitMQ. Now it is possible to skip the transaction log file and send the transaction directly to the RabbitMQ server without the extra step of storing it in a file first.

    The code is available at https://code.launchpad.net/~krummas/drizzle/rabbitmq_log and to build it with rabbitmq support you need to install librabbitmq which is a bit tricky;

    Installing librabbitmq

    1. Install mercurial
    2. Branch the librabbitmq code: hg clone http://hg.rabbitmq.com/rabbitmq-c/ 
    3. Branch the rabbitmq codegen repo into a subdirectory called codegen in the rabbitmq-c directory:
      1. cd rabbitmq-c
      2. hg clone http://hg.rabbitmq.com/rabbitmq-codegen/ codegen
    4. Run autoconf like this: autoreconf -i 
    5. Run the configure script
    6. make
    7. make install
    Build Drizzle
    When librabbitmq is installed, build drizzle like this:

    1. bzr branch lp:~krummas/drizzle/rabbitmq-log
    2. config/autorun.sh
    3. ./configure --with-rabbitmq-log-plugin
    4. make
    5. make install
    and it is done! 

    Start Drizzle with RabbitMQ support
    First, you can run drizzled with a --help flag to see the options available, they are all prefixed with --rabbitmq-log-XYZ. 

    The default values for the parameters makes drizzle connect to localhost as "guest" and replicate to an exchange called ReplicationExchange. Start it like this to replicate changes to a rabbitmq on localhost:
    $ sbin/drizzled --default-replicator-enable --rabbitmq-log-enable

    The other available options are described in --help
    From Scratch, Get set, Go!

    January 22, 2010 09:37 PM

    [brian@gaz fix_is]$ ./drizzled/drizzled --console-enable 
    InnoDB: The InnoDB memory heap is disabled
    InnoDB: Mutexes and rw_locks use GCC atomic builtins.
    100122 13:12:00  InnoDB: highest supported file format is Barracuda.
    100122 13:12:00 InnoDB Plugin 1.0.4 started; log sequence number 44600
    Listening on 0.0.0.0:3306
    Listening on :::3306
    Listening on 0.0.0.0:4427
    Listening on :::4427
    ./drizzled/drizzled: Forcing close of thread 0 user: ''
    ./drizzled/drizzled: ready for connections.
    Version: '2010.01.1273' Source distribution (fix_is)
    
    drizzled> use data_dictionary;
    OK
    
    drizzled> show tables;
    Tables_in_data_dictionary	
    MODULES	
    PLUGINS	
    PROCESSLIST	
    
    drizzled> desc PLUGINS;
    Field	Type	Null	Key	Default	Extra	
    
    PATH ./data_dictionary/plugins
    ID	bigint	YES		NULL		
    USER	varchar(16)	YES		NULL		
    HOST	varchar(64)	YES		NULL		
    DB	varchar(64)	YES		NULL		
    COMMAND	varchar(16)	YES		NULL		
    TIME	bigint	YES		NULL		
    STATE	varchar(64)	YES		NULL		
    INFO	varchar(100)	YES		NULL		
    drizzled> 
    
    
    
    Using libnotify for error messages

    Posted by: mordred@inaugust.com (Monty Taylor), January 21, 2010 12:34 AM

    Any good piece of infrastructure software should be able to pop up little windows on your desktop. :)

    Yesterday at LCA in Stewart's Hacking Drizzle talk, it occurred to me that error messages should pop up little windows on my desktop. So I wrote an error message plugin which uses libnotifymm to send Gnome pop-up window messages.

    I don't always get to post screenshots, so here you go:

    Screenshot of libnotify error message 

    Machines Plus Minds - Welcome

    Posted by: Mark Atwood (me@mark.atwood.name), January 20, 2010 08:37 PM

    Welcome!

    I write about the techne that is the amazing machine we call "The Net".

    At present, I am mostly interested in Memcached, the Drizzle DB fork of MySQL, NoSQL, and in open standards, but I will be writing about other stuff as well.

    I used to do my public geek blogging in my personal LiveJournal, but for various reasons it makes sense to separate my public writings about technology, my other public writings, and writings that are of interest only to my closer friends and family.


    My day job title is "Director of Community Development" for Gear6, which means that my main paid interest is memcached and the memcached community.  I read and write technical articles, research nosql databases and other web-scale open source software, and go to conferences to attend and to speak.


    You have already lost

    Posted by: Stewart Smith, January 18, 2010 02:43 AM

    When the following code introduces a valgrind warning… you are in a world of pain and loss:

    === modified file 'drizzled/field/blob.h'
    --- drizzled/field/blob.h	2009-12-21 08:16:13 +0000
    +++ drizzled/field/blob.h	2010-01-18 01:36:48 +0000
    @@ -32,6 +32,7 @@
      */
     class Field_blob :public Field_str {
     protected:
    +  uint32_t assassass;
       uint32_t packlength;
       String value;				// For temporaries
     public:
    Drizzle to Infinispan replication and a small code walkthrough

    Posted by: Marcus Eriksson (krummas@gmail.com), January 16, 2010 08:39 PM

    This post aims to explain how to build your own key/value-store applier in RabbitReplication by walking through the new Infinispan support as an example.

    Infinispan
    Infinispan is a "distributed, highly available data grid platform" and it exposes a REST interface where it is possible to manipulate the data in infinispan. For example it is a simple HTTP PUT method to store new data and a HTTP DELETE does exactly what you expect. The data is stored under resources, for RabbitReplication the data is stored under /<schema.table>/<key>. This means you can view the data in infinispan using a browser.


    When I implemented the Infinispan support, I simply dropped the .war file in the webapps directory of a jetty installation and started it. Since Infinispan has a REST interface, the client library can be any HTTP client, I picked the Jersey REST client since it is incredibly easy to use.


    Customizing RabbitReplication
    RabbitReplication uses Guice internally for dependency injection, so, to use another KeyValue store you need to create your own Module for configuration. I'll show an example below.

    To add support for a new key/value store, you need to implement an interface, org.drizzle.replication.transformer.objectstores.KeyValueStoreClient (here). It is a quite straight-forward add/get/remove interface. Look at the infinispan implementation (here) for an example. Note the @Inject on the constructor, it tells guice that the WebResource parameter should be injected when it constructs the object. You will need to put the rabbitreplication.jar file on your classpath when building your stuff.


    Guice is configured in modules where you bind() an interface to an implementation, so, to configure guice to use a new KeyValueStore, we need to bind() the KeyValueStoreClient interface to the new implementation. Look at the infinispan module (here) for an example how to do it, the method annotated with @Provides is called by guice when it needs to create a KeyValueStoreClient.

    To tell RabbitReplication to use your new module, you simply edit your configuration file, and set where the new Module is located, see this example. If you need to configure your new client, simply add the properties in the config file, guice will bind every property in the file to @Named(...) strings, check the Infinispan module provider method for an example how to use it.

    Now, build your code into a jar, drop the jar in the lib/ directory of rabbitreplication and start RabbitReplication like this; bin/start.sh config/someconf.properties - the config file should have the custom_module set to the name of your new module.

    Downloading and installing
    Best way to use rabbitreplication is still to branch the code from launchpad (lp:rabbitreplication) and then write ant in the base directory, this will create a .zip and a .tgz in the dist directory. You can also download the binaries here.

    • Unpack the distribution file
    • Copy a config file from .sample to .properties in the config directory and edit the file to your liking. objectslave.properties.sample is the sample you want to look at if you want to try out Infinispan.
    • Start rabbitreplication by executing bin/start.sh config/yourconf.properties
    Look at the previous posts on rabbitreplication to find out how to start a master etc.
    Further thoughts on BlitzDB’s Index Handling

    Posted by: Toru Maesaka, January 15, 2010 09:05 AM

    I’ve been thinking quite a bit about collation handling in BlitzDB for the last couple of days. The more I think about it, the more stuck I’ve been getting with BlitzDB’s index design. I’m actually so frustrated with myself at the moment that I want to hit my head against a wall or something.

    So, I’m writing this entry to clear up my mind. Heh, my blog is slowly becoming BlitzDB’s design document draft. This should hopefully be good though since by blogging it, people can tell me whether I’m moving towards a stupid direction or not.

    Collation Importance

    When writing database software that is intended for International use, it is important to handle textual data by respecting collation order. It is arguable that most people are only interested in English lexicographic ordering but unfortunately the world is not so standard.

    Internal Primary Key Handling

    I want to motivate people to actively define a PRIMARY KEY with BlitzDB. I plan to make this attractive by providing the best performance when PK is defined. In December 2009, my answer to this was to write the PK value as the key for the data dictionary (where actual rows are stored in BlitzDB). This allows BlitzDB to do a direct lookup on the data dictionary for PK based lookup, instead of consulting the B+Tree index. I’m still fond of this lookup optimization approach but it introduces problems too.

    Problem 1. Consider the following textual keys: “key” and “KEY”. They obviously have different binary representations but in certain cases they can be logically equivalent. Because the data dictionary is a hash database, this is a problem. The solution that instantly pops up is to normalize the key before writing or reading it. This however, causes a problem in cases where the two keys are inequivalent. Perhaps Drizzle/MySQL provides an internal normalization function that respects this. I still need to study this area of the storage subsystem.

    Problem 2. Directly writing a PK to the data dictionary means fast lookup but because of the data structure, it’s not possible to fetch the next “logical” key, meaning I can’t implement index scanning on PK as it is. Quick solution for users is to create an index on the PK column (this would create a separate B+Tree for it) but this is not so friendly because it requires the user to have prior knowledge of all this. So, my plan is to provide the best of both worlds. I’ll elaborate on how I’m planning on tackling this problem next.

    Current Primary Key Read/Write Behavior

    In general, keys of BlitzDB’s data dictionary is a unique 8 byte integer. The idea is that BlitzDB writes this unique ID along with the key to the B+Tree Index so that it can later identify that row. The difference with PK is that, if a PK is present in a table, BlitzDB will not generate an internal unique ID and use PK for the data dictionary’s key instead. BlitzDB won’t create a B+Tree index for PK at the time I wrote this blog entry.

    Next Step

    Create a B+Tree index for PK anyway. BlitzDB will still use the PK value as the key for data dictionary if it exists. For PK based lookup requests, BlitzDB will look directly at the data dictionary and for PK based requests that involve index scanning, BlitzDB will look at the B+Tree index.

    This approach can consume more space when textual data is used for keys but I think it’s worth it. At the same time, you can save space if you use use types that are smaller than 8 bytes for PK. For example, using a 4 byte integer would reduce BlitzDB’s key space by 50%.

    Hmm, I think my mind has cleared a little.

    Drizzle, BlitzDB and HTON_STATS_RECORDS_IS_EXACT

    Posted by: Toru Maesaka, January 13, 2010 01:23 PM

    Recently I enabled HTON_STATS_RECORDS_IS_EXACT in BlitzDB to let the optimizer know that BlitzDB can instantaneously return the number of rows in a specified table. As a result, the Drizzle kernel can directly call the Cursor::info() function to get the row count. To users, it means that SELECT COUNT statements can be executed in O(1). So it’s a great thing in general.

    Something Broke

    After I enabled HTON_STATS_RECORDS_IS_EXACT, I noticed that issuing SELECT statement on a table with 1 row would no longer return a resultset. Weird indeed! after investigating with GDB, I noticed that rnd_next() is only called once instead of twice on a table with 1 row (second time is to find EOF) when HTON_STATS_RECORDS_IS_EXACT is enabled. This makes sense because the kernel knows that there is only 1 row and therefore it doesn’t need to keep scanning for EOF. However, this made me scratch my head since this shouldn’t break BlitzDB’s table scanner.

    Remedy

    Logically, I was confident that BlitzDB’s table scanner was functioning properly so I decided to look at what was going on beyond the engine API. Turns out that join_read_system() in sql_select.cc looks at the table->status value and decides that it’s an error if 0 isn’t assigned to it. What’d you know? I realized that I wasn’t assigning anything to the status variable. It’s more that I didn’t know that I was meant to update an internal structure. You’d think that engine developers aren’t meant to touch those. It’s not mentioned in the Engine Documentation at MySQL Forge either. Nevertheless, the important thing is that it works now. Oh and SELECT COUNT is fast now too.

    Eye Opener

    This experience among other occasions where I had to read the kernel’s source made me think that it would be nice to provide an intensive up to date documentation on how to develop storage engines for Drizzle in the future (when the API becomes stable). Needless to say, this would be co-ordinated within the Drizzle community. I’m not a license person but it should hopefully be provided with a freely available license too.

    Drizzle build 1263 and libdrizzle 0.7 source tarballs have been released

    Posted by: lbieber, January 12, 2010 10:27 PM

    For Drizzle build 1263 this release includes continuing general code clean up, bug fixes and improvements as we get closer to our Bell milestone. The Drizzle download file and change log can be found here

    For libdrizzle version 0.7 this release includes:

  • Added test coverage reports using lcov
  • Updated autoconf build system
  • Updated RPM packaging
  • The libdrizzle download file and change log can be found here
  • Documentation for libdrizzle can be found at API and Development
  • Moving On

    Posted by: Eric Day, January 11, 2010 07:06 PM

    Friday was my last day at Sun Microsystems, and today is the first day at my new job (location coming soon). I’ve had a great time at Sun, and thank them for all the opportunities given to me there. I’ll be doing mostly the same work at the new gig, working on projects like Drizzle, but with a slightly different focus. For the most part my day-to-day won’t change much.

    Right now I’m focusing on libdrizzle again and am implementing the prepared statement API, cleaning up the MySQL protocol support a little, and also implementing the new Drizzle client/server protocol. I’ll continue to work on Gearman as well, especially where it is relevant to Drizzle. I also need to start blogging again with specific topics in the projects I’m working on, I’ve been fairly quiet lately.

    I’ll be in New Zealand next week at Linux Conf AU (yes, it’s not in AU this year). I have a talk on Gearman, and it looks like I’ll also be helping out with the Drizzle talk. It will be really nice to escape the Portland, OR winter for a bit. :)

    Same Bat-time, Same Bat-channel

    Posted by: mordred@inaugust.com (Monty Taylor), January 11, 2010 06:31 PM

    Friday was my last day working for Sun Microsystems, which means it was also the end of the job that started when I was hired by MySQL, Inc a little over four years ago. For those of you who didn't know me before that, it should be noted that the other than periods of time when I have started companies and worked for myself, the next longest I've ever been with one company was 10 months. I mention that as one of the few quantifiable metrics I could cite in describing what a wonderful experience it has been to work for MySQL and then Sun. The people I've worked with have been nothing short of stellar.

    I will be continuing my work as a core Drizzle developer, so this blog post notwithstanding, I doubt most people will notice the change. The only real outward difference I can think of is that now that I'm not a Sun employee, I am free to comment on the merger and on what some people see as the need to save MySQL from Oracle. I don't know if I will, but at least now I can.

    I would say I will miss everyone, but I imagine I will continue to see most people with about the same frequency as before. One of the wonderful things about the Open Source world is that changes in business and legal entities don't often have very much effect on the coding that we do day in and day out... so let's keep up the good work everybody! See you on IRC...

    The future of open source SQL databases (as I see it)

    Posted by: mike, January 10, 2010 06:58 AM

    With the whole MySQL/Oracle issue going on, I find myself looking into the future and how I see it. As far as I'm concerned, MySQL will start to lose it's popularity as the landscape changes. As far as I am concerned, there will be two key players in the MySQL replacement market, those being Drizzle and MariaDB.

    I am not just saying Drizzle just because I help out with the project in various ways, however, that should be a good sign that I believe in it if I am willing to put any effort into it. With people behind it like Brian Aker, Eric Day, Monty Taylor, Stewart Smith, Jay Pipes, you've got a coding powerhouse that could solve the cancer issue if it was up to software development to fix it. These guys work around the clock and have been refactoring and re-examining everything inside of MySQL. What's going to be left ideally is a superfast microkernel that supports plugins for everything - leveraging the best options out there for replication, messaging, storage engines, etc. Growing apart from the monolithic huge distribution model that MySQL currently follows.

    The second key player is MariaDB. Another fork off of MySQL, led by Monty Widenius himself and with other MySQL key players behind it, there is no doubt it will continue Monty's legacy as being able to spin success out of a tiny little open source product. I believe it will stay more traditional in-line with MySQL, but will provide more advanced functionality and scalability as it is developed further.

    I won't get into other options like PostgreSQL as I don't follow the rest of the community there much.

    Also, we'll see more NoSQL (did we ever bottom out on a better term for that?) options. CouchDB and MongoDB (both of which from a 50,000 foot view look identical from a usage model) and options like Cassandra will also become important and your data needs will become the decision maker for going with a SQL or a NoSQL database. Both of which offer advantages. However, I see Drizzle as making huge strides in leveling the playing field (or attempting to) with it's replication work to make it as scalable as NoSQL databases seem to be with their ability to scale out and replicate changes easily (which to me are their main selling point right now...)

    Anyway, this is from a user perspective, not a developer perspective, and from what I've seen from #drizzle on freenode, a few SQL and open source conferences, blog talk and my own gut feelings.

    I should make a note that I still use MySQL and will probably continue for some time. Neither Drizzle nor MariaDB are production-friendly yet. However, I believe 2010 should see the first "production capable" release of Drizzle (not sure of MariaDB.)

    It is an exciting time though as we're starting to be presented with more options by the day, in fact there are so many various NoSQL databases now, key/value stores, and even a few more SQL databases that it's too hard to keep track of them anymore. There's a lot of code being written and with this whole Oracle possibly inheriting MySQL depending on the EU's judgement, it could ultimately help usher in some of these smaller projects into the spotlight quicker depending on what Oracle does with MySQL...

    Multi threaded replication appliers

    Posted by: Marcus Eriksson (krummas@gmail.com), January 08, 2010 08:58 PM

    Lately I've been working on a transaction reducer to be able to multi thread the applier. Basic idea is to reduce the transaction to only affect one row with one statement, when that is the case, we can have a thread pool doing the actual persisting of the statements (of course it has some drawbacks as well, more about those later). This approach is particularly interesting for NoSQL appliers.

    Reducing transactions
    The transaction log in drizzle contains a list of statements. Each statement contains a list of records, where each record contains information about what changed on one row in the master. One example transaction could look like something like this:


    transaction_context {
      server_id: 1
      transaction_id: 10
      start_timestamp: 1262812100381445
      end_timestamp: 1262812153799963
    }
    statement {
      type: INSERT
      start_timestamp: 1262812100381446
      end_timestamp: 1262812134317464
      insert_header {
        table_metadata {
          schema_name: "unittests"
          table_name: "test1"
        }
        field_metadata {
          type: INTEGER
          name: "id"
        }
        field_metadata {
          type: VARCHAR
          name: "test"
        }
        field_metadata {
          type: VARCHAR
          name: "ignored"
        }
      }
      insert_data {
        segment_id: 1
        end_segment: true
        record {
          insert_value: "78"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "79"
          insert_value: "a"
          insert_value: "b"
        }
    ...
        record {
          insert_value: "87"
          insert_value: "a"
          insert_value: "b"
        }
      }
    }
    statement {
      type: UPDATE
      start_timestamp: 1262812134317466
      end_timestamp: 1262812151669387
      update_header {
        table_metadata {
          schema_name: "unittests"
          table_name: "test1"
        }
        key_field_metadata {
          type: INTEGER
          name: "id"
        }
        set_field_metadata {
          type: VARCHAR
          name: "test"
        }
      }
      update_data {
        segment_id: 1
        end_segment: true
        record {
          key_value: "85"
          after_value: "test"
        }
      }
    }
    statement {
      type: DELETE
      start_timestamp: 1262812151669389
      end_timestamp: 1262812153799963
      delete_header {
        table_metadata {
          schema_name: "unittests"
          table_name: "test1"
        }
        key_field_metadata {
          type: INTEGER
          name: "id"
        }
      }
      delete_data {
        segment_id: 1
        end_segment: true
        record {
          key_value: "81"
        }
      }
    }

    Or in SQL:
    BEGIN;
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (78, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (79, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (80, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (81, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (82, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (83, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (84, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (85, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (86, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (87, "a","b");
    UPDATE unittests.table1 set test = "test" WHERE id = 85;
    DELETE FROM unittests.table1 WHERE id = 81;
    COMMIT;


    I.e. a number of inserted rows, one updated and one deleted. If we could exploit this and make sure that every record in the transaction only affects one row then each row is totally independent from all other records in the transaction and we could then have a pool of threads applying the transaction.

    So, if i reduce the above transaction like this:

    TransactionReducer reducer = new DrizzleTransactionReducer();
    TransactionMessage.Transaction txn = getNextTransaction();
    TransactionMessage.Transaction t = reducer.reduce(txn);

    I get this transaction:


    transaction_context {
      server_id: 1
      transaction_id: 10
      start_timestamp: 1262812100381445
      end_timestamp: 1262812153799963
    }
    statement {
      type: INSERT
      start_timestamp: 1262812100381446
      end_timestamp: 1262812134317464
      insert_header {
        table_metadata {
          schema_name: "unittests"
          table_name: "test1"
        }
        field_metadata {
          type: INTEGER
          name: "id"
        }
        field_metadata {
          type: VARCHAR
          name: "test"
        }
        field_metadata {
          type: VARCHAR
          name: "ignored"
        }
      }
      insert_data {
        segment_id: 1
        end_segment: true
        record {
          insert_value: "78"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "79"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "80"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "82"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "83"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "84"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "85"
          insert_value: "test"
          insert_value: "b"
        }
        record {
          insert_value: "86"
          insert_value: "a"
          insert_value: "b"
        }
        record {
          insert_value: "87"
          insert_value: "a"
          insert_value: "b"
        }
      }
    }
    Or, in SQL:
    BEGIN;
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (78, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (79, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (80, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (82, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (83, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (84, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (85, "test","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (86, "a","b");
    INSERT INTO unittests.table1 (id, test, ignored) VALUES (87, "a","b");
    COMMIT;

    I.e. a list of only inserts. This means we can apply the transaction using several threads. Another benefit is that we reduce the total number of records to insert meaning better performance.

    Updates and deletes that affect rows outside the transaction are also reduced, for example if, within one transaction, one external row is updated twice, then deleted, only the delete statement will be executed.

    Drawbacks
    There are, of course some drawbacks, for example, if this is used when applying to a database, the applier will not be able to apply the transaction as a transaction since there is no way of sharing transaction context between several client threads.

    Using it
    To try it out, branch this repository: lp:~krummas/+junk/transactionreducer and look at the tests file. Should be straight forward to add more tests to see how it handles them.

    It will also be available in RabbitReplication as a configuration option on the slave. The size of the thread pool and number of client connections will also be configurable.

    Performance
    In theory it should be faster to apply fewer statements using more threads, and the time spent reducing the transaction should easily be less then the time spent doing network I/O etc. I've done a few non-scientific benchmarks using a multi threaded cassandra applier and it takes approximately half the time applying reduced transactions using the multi threaded applier. I will make some proper benchmarks when everything is in place in RabbitReplication.
    Describing Drizzle's Development Process

    Posted by: nospam@example.com (Jay Pipes), January 06, 2010 04:17 PM

    Yesterday, I was working on a survey that Selena Deckelmann put together for open source databases. She will be presenting the results at Linux.conf.au this month.

    One of the questions on the survey was this:

    How would you describe your development process?

    followed by these answer choices:

    • Individuals request features
    • Large/small group empowered to make decisions
    • Benevolent dictator
    • Other, please specify:____________

    I thought a bit about the question and then answered the following in the "Other, please specify:" area:

    Bit of a mix between all three above.

    The more I think about it, the more I really do feel that Drizzle's development process is indeed a mixture of individuals, groups, and a Benevolent dictator. And I think it works pretty well. :-) Here's some of the reasons why I believe our development process is effective in enabling contributions by being a mix of the above three styles.

    Who's the Benevolent Dictator of Drizzle?

    First, let me get the BDFL question out of the way. We've made a big deal in the Drizzle community and mailing lists that anyone and everyone is encouraged to participate in the development process — so why would I say that Drizzle has a benevolent dictator?

    Well, although he would probably disagree with the tile of BDFL, Brian Aker does have some dictator-like abilities with regards to the development process, and rightfully so. Brian came up with many of the concepts that Drizzle aspires to be, and Brian has more experience working on the code base than any other contributor.

    After having worked closely with Brian now for 18 months or so, I can definitively say that Brian's brain works in a very, well, interesting way. Those of us who work with him understand that sometimes his brain works so fast, his typing fingers struggle to keep up, resulting in something I call "Krowspeak". It's kinda funny sometimes trying to translate :-)

    With this wonderfully unique noodle, Brian tends to knock out large chunks of code at a time, and often he wants to push these chunks of code into our build and regression system and into trunk to see the results of his work quickly. Sometimes, this can cause other branches to get out of sync and get merge conflicts, and Brian will inform branch owners of the conflicts and work with them to resolve them.

    So, regarding dictator-like development processes, I suppose we have Brian acting as the merge dictator because he's got a lot of experience and understands best how both his code and other's code integrates. We tried a little while back having myself and Monty Taylor be merge captains, but that distribution of merge work actually created a number of other problems and we've since gone back to Brian being the merge captain by himself, with Lee, Monty, and myself improving our automated build and regression system to help Brian with the repetitive work.

    That said, what Brian does not do is make decisions in a dictator-like way. Decisions about the code style, reviews, features, syntax changes, etc are made on the mailing list by consensus vote. If a consensus is not reached, generally, no change is made which would depend on the decision. Brian does not influence the direction of the software or the source code style any more than anyone else on the mailing list which expresses an opinion about an issue; and for this, I greatly respect his wisdom to seek consensus in an open and community-oriented way.

    Groups Empowered to Make Decisions

    I'm assuming that what Selena's "large/small group empowered to make decisions" answer meant was what is sometimes called "Cabal Leadership" of a project. In other words, there is some group which steers the project and makes decisions about the project which affect the rest of the project's contributors.

    Drizzle has at least one such group, the Sun Microsystems Drizzle Team, which is composed of Brian, Monty Taylor, Lee Bieber, Eric Day, Stewart Smith, and myself. One might call us the core committers for Drizzle.

    However, while the Sun Drizzle team certainly is empowered to guide development, it is no different than any other group of developers that choose to contribute to Drizzle. There isn't a "what the Sun Drizzle team decides" rule in effect. Our "power" in the development process is no greater or less than any other group of contributors. We act merely as a team of individuals who work on the Drizzle code and advocate for the project's goals.

    Individuals Empowered to Make Decisions

    One thing I've been impressed with in the past 18 months is how the Drizzle community has embraced the opinions and work of individual contributors. I believe Toru Maesaka, Andrew Hutchings, Diego Medina and Padraig O'Sullivan were among the first individuals to begin actively contributing to Drizzle. Since then, dozens of others have joined the developer and advocate community, with each individual carving out a piece of the source code or community activities that they want to work on.

    I have learned much from all these individuals over the last year or so, and I've tried my best to share knowledge and encourage others to do the same. Our IRC channel and mailing list are active places of discussion. Our code reviews are always completely open to the public for comments and discussed transparently on Launchpad, and this code review process has been a great mixing bowl of opinion, discussion, learning and debate. I love it.

    More and more we have developers showing up and taking ownership of a bug, a blueprint, or just a part of the code that interests them. And nobody stands in their way and says "Oh, no, you shouldn't work on that because <insert another contributor's name> owns that code." Instead, what you will more likely see on the lists or on IRC is a response like "hey, that's awesome! be sure to chat with <insert another contributor's name>. They are interested in that code, too, and you should share ideas!" This is incredibly refreshing to see.

    In short, the Drizzle developer process is a nice mix of empowered individuals and groups, and a dash of dictatorship just to keep things moving efficiently. It's open, transparent, and fun to work on Drizzle. Come join us :-)

    2010, Life

    January 05, 2010 09:42 PM

    Things that happened in 2009:

  • We got a major chunk of MySQL refactored into Drizzle. Thanks to Jay we have a new replication system that we are shaking out. Monty Taylor has created an impressive bit of porting kung-fu that has made porting cake. Eric has the the Listener plugin work in shape which makes NULL plugins, drizzled console, and the future "whatever a server side script is" cake to work on. We have a BSD based libdrizzle so the "well if you link like this..." license questions all go away, along with any historical FUD that has been created around them. Lee, Jay, and community have put together an impressive regression system which is far beyond anything we ever had for MySQL. Stewart killed FRM which has always been one of our "we could do this, if we could replace FRM...". We have full time developers now in multiple companies and contributions continue to come in.

  • It took three years but I completed my "lets see if we can be a bit healthier if we lose some weight". I started off in 2005 at about 182 and nowadays I stick to about 150. What changed? I cut out eating bread in large part and I switched to eating when I felt like I needed to eat. I get a lot of "you look healthier" or "were you sick?". It is amazing how people see weight loss. People's reactions can really be across the board. For me though? I rarely have issues with my asthma now and I have a lot more energy in general.

  • I spent more time in Seattle then I have since 2005. I am still traveling, but I made a point of really trying to live in Seattle and be here more. This spring a friend commented on my being around by saying "It has really been great to see you at events again". Comments like that made it completely worth the effort I have been putting in to be here.

  • Completed the "puppy acquisition". I am always happiest when I have a dog in my life. I've been watching Rosalynd get old and while I know she has a few more years in her, I really wanted to keep a constant flow of dogs in my life.

  • We built the Worm Tube. For Burning Man this year I worked out a 2800 square foot structure that weighs under 500 pounds and cost less then $800 to build.

  • Libmemcached is alive and well. Its user base continues to grow and we did a lot of exciting things. It is now distributed in all major Linux distributions. We have a new library that wraps the Memcached protocol. Using it you can create add Memcached compatible existing services to existing projects. Schooner provided a new memslap in December that greatly enhances testing.

    What about 2010?

  • Put more effort into personal relationships. Something I walked away from Burning Man this year was a need to enhance the personal relationships I have. Focus on what matters and put more effort into the core of my social group/family/etc. There are some awesome people in my life and I should be spending a considerable amount of my time nurturing the relationships with them.

  • Is Drizzle production yet? Whenever I ask an audience if anyone is using Drizzle in production I get a few hands up in the air. I asked one group "what does production mean to you?". I got back an answer of "will we lose data?". My answer was, "if you trust Innodb today, then you are fine". We have our bugs and there are issues at hand but I feel like the gap is narrowing on that question. This year we will shake out the replication system, and it should be interesting to see what all happens there. We have seen plugins being written to replicate to many of the NoSQL solutions and I believe the flexibility that we have put into the system will solve a lot of the mixed shop problems that we run into. A database is not an island unto itself, it is just another piece of infrastructure and it needs to play well with other pieces. I believe this year we will see a Windows Binary. I would like to say we would double the number of major contributors we have in multiple companies, but I can't keep track any more of where all people come from. I should find this out so that I can create a metric :)

  • For Libmemcached we will see a restructuring of its guts to do more asynchronous work. We have a few years under our belts now of knowledge and we can improve on performance. I am going to be adding support for range queries, which is specified in the protocol. I am hoping to see at least one vendor adopt the protocol library. I believe we should finally support Windows directly out of the box. I have patches now that will allow us to do more centralized management of Memcached clusters. I am looking forward to pushing those. Some of this work will be done in the current version, but I expect to work on some of this in a version 2.0. We hit trillions of transactions some time ago, so claiming 2.0 seems humble enough :)

  • Raise puppy. This is not a small amount of work.

  • Build the Monorail :)

  • Spend about the same amount of time in Seattle. I want to take a couple of trips overseas this year but I want to spend less time in general away, especially weekends.

  • Be a better advocate for Open Source, both for the projects and the business models. There is a lot of FUD that has been spread over the years and I want to spend a bit of time being more vocal squashing it. We have seen coalition's built around cripple source concepts, license BS, developer model hijinks, etc... I spent a couple of hours in one country just explaining the differences between open source and free software to some government administrators after a certain luminary confused the hell out of them. There is no advantage too damning one license at the expense of another.

  • Advocate for open source databases. One thing I have been doing in the last year has been trying to help people pick a database. There are a lot of good open source databases right now and none of them are a one size fit all solution. It has been refreshing to make suggestions or provide answers based on need. When we worked out the original FAQ for Drizzle we made a point of mentioning different databases, I was very happy about that.
  • 2009 and looking forward to 2010

    Posted by: Marcus Eriksson (krummas@gmail.com), January 01, 2010 01:30 PM

    So, this is yet another 2009 retrospect with some goals for 2010, I'll do it in list form so someone might actually browse it;

    2009:
    • Main event of 2009 was that i got kid nr 2, Teo.
    • Ran 1000K despite injuries
    • Started Drizzle JDBC.
    • Got excited about programming again and realized I need a new job
    • Got Drizzle JDBC to version 0.6, not many users yet, so can't say much about the quality (or, one could look at it from another angle, it is bug free! *cough*)
    • Went to JavaOne, great stuff, probably the last one.
    • Started a very long parental leave!
    • Started working a lot on replication-related stuff:
    • "Learned" Haskell and started looking at Erlang.
    • Read some great books; Java Concurrency In Practice, Real World Haskell, Effective java (like every year) ...
    2010 goals:
    • Continue my long, sweet parental leave (living in Sweden has its benefits)
    • Run 1500K
    • Get myself a new job, main requirements:
      • Most importantly, has a high paced startup-feel to it, I want to build stuff, not have meetings about the stuff we could build
      • Uses open source products
      • Contributes stuff back to open source communities (or, best of all, has some open source products of their own)
    • Make RabbitReplication into a proper product
      • Create web page
      • Make regular releases with good documentation
      • Build support for more column/key-value stores
      • Document how to extend it
      • Clean up the code
    • Get Drizzle-JDBC to "1.0" with someone using it in production (given drizzle makes it there). I'm guessing that when Drizzle itself is production ready, the user (and bug-) count will increase.
    • Learn at least one new programming language (Erlang, I'm looking at you), and build something with it.
    • Go to at least one conference
    • Blog more, November and December frequencies have been ok, mainly because I actually had time to build stuff worth blogging about.
    • Read more books (current reading queue: Erlang programming, SICP, Reread Distributed Systems - Concepts and design, Java Generics and Collections)
    • Invent a clock with 30 hrs a day to actually manage all the above goals
    Drizzle now running dbt2 benchmark

    Posted by: lbieber, December 31, 2009 05:19 PM

    We recently added support for running the dbt2 benchmark as part of our drizzle automation suite. dbt2 is an OLTP transactional performance test. It simulates a wholesale parts supplier where several workers access a database, update customer information and check on parts inventories. We currently are using the defaults (10 warehouses, 5 minute test runs and running various number of connections for each run up to 1024. The initial runs exposed a race condition in our TemporalFormat::match() code at 1024 connections.

    We now have a very nice suite of tools to help us with tracking performance and scalability of Drizzle, besides dbt2 we also are running sysbench, sqlbench, crash-me and randgen for all of our builds in the staging branch of Drizzle. Of course we are always looking for more, so if you have any suggestions on other benchmarks or tools to add, please let us know.

    If you want to receive all of the various benchmark results you can subscribe to the mailing list.

    Note also we are in the process of getting our changes to dbt2 merged to the dbt2 trunk, it should be there very soon.

    Happy New Year to all!

    -Lee

    Drizzle replication to WebSockets

    Posted by: Marcus Eriksson (krummas@gmail.com), December 28, 2009 10:03 PM

    I just pushed a WebSocket applier to RabbitReplication, and yes, it is as crazy as it sounds. It works pretty much like all appliers - it consumes drizzle transactions from  RabbitMQ, converts them into objects by inspecting annotations, marshalls the object to JSON, and then stores the JSON string. In this case it stores it to a set of websockets. RabbitReplication is deployed as a war file to Jetty 7.01 which supports websockets.


    I set the demo up on my server at home (in Sweden) on a DSL line, so it might be slow, but it should show the idea, all operations are instant when latency is low (if anyone wants to host it at a better place, please let me know). Of course, it requires a WebSocket capable browser and the only one I know of is Google Chrome.

    It works like this:

    1. INSERT is executed from the "drizzle client" webapp - totally separate webapp that uses drizzle jdbc to insert/update/delete data.
    2. Drizzle stores the transaction in the database and in the transaction log.
    3. Master extractor extracts the transaction and publishes it to RabbitMQ
    4. Slave applier consumes the transaction from RabbitMQ
    5. Applier transforms the transaction to JSON
    6. Applier writes the JSON to a set of websockets
    7. Javascript voodoo is performed to make it visible
    Possible real usecases
    The demo app just shows what is possible, but a real use case could be that someone has a drizzle backed forum and want to add some real time post-updates to some front page somewhere. This would be real easy, simply start a new slave configured for WebSocket application (of course RabbitReplication is already used for other replication needs :) ), convert the JSON to something that makes sense and they are set! If someone has a cool usecase, please let me know and i'll build a more realistic demo app!
    new drizzle low-hanging-fruit milestones

    Posted by: mordred@inaugust.com (Monty Taylor), December 24, 2009 11:16 PM

    I've got some code in lp:drizzle/staging right now that's on its way (barring major catastrophes) to trunk. It's not code that does anything sexy as far as the actual running server is concerned. It's a code cleanup branch.

    Anyway - short story being - everything from mysys and mystrings that is actually part of public APIs has been moved into drizzled/ proper. Everything else has been moved into drizzled/internal. None of the headers from drizzled/internal are installed... so none of the headers in drizzled/ should be using any of them. Combine this with the past week's removal of both server_includes.h and global.h, and we're getting pretty close to having fully consumable headers.

    Which brings me to:

    In doing this, I noticed a bunch of things that either need to be fixed, still need to be deleted, or need to be put behind a namespace so that including our headers doesn't strangely and unexpectedly b0rk someone's application. To that end, I've added a bunch of new low-hanging-fruit blueprints. As always, you can look at the full list on Launchpad but I'm including links to the new ones here.

    New Drizzle Low Hanging Fruit:

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-consolidate-charset-headers

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-merge-decimal

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-merge-error

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-move-sql-alloc

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-types-into-namespace

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-mem-root-namespace

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-internal-namespace

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-remove-myisam-depends

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-remove-my-files

    http://blueprints.launchpad.net/drizzle/+spec/code-cleanup-remove-sql-list

    End of Year Progress on BlitzDB

    Posted by: Toru Maesaka, December 24, 2009 09:47 AM

    FURTHER UPDATE: Further thoughts on BlitzDB’s Index Handling

    My open source friends might have noticed that I’ve been working quite a bit on BlitzDB lately. To tell the truth, I had a hidden goal to get Version-1 done by Christmas. Unfortunately it doesn’t look like I can reach that goal. However, looking at the brightside I got a lot done in the past few weeks so allow me to “journal” it in this blog post.

    Agony of Knowing

    The more I understood Drizzle’s storage mechanism and Tokyo Cabinet’s internals, the more I disliked what I previously had. This led me to spending quite a bit of time rewriting BlitzDB’s codebase. I was using pthread’s rwlock for concurrency control but I decided to design and write BlitzDB’s own lock mechanism to get the best out of TC (in terms of concurrency). I also rewrote the entire table scan code which is something you’d hope won’t be executed that often (people should use indexes!) but needless to say, it’s an important component of a relational storage engine so I’ve put in a lot of effort there.

    Rewriting the Table Scanner

    In the process of rewriting the table scanner, Jay Pipes’ gave me a fantastic advise on using Drizzle’s internal atomic type (drizzled::atomics). He gave me this advise because he noticed that my atomic ID generator was securing atomicity with pthread’s mutex. It is debatable that this mutex was only enabled for only few CPU instructions but the philosophy of using the most efficient method on the platform where BlitzDB is to be run was appealing enough for me to use drizzled::atomics. Mikio did some experiments on this and found that in a competitive/congested environment, using the compiler’s builtin function can gain you 3x throughput.

    Hacking on Index Support

    I’ve finally started hacking on index support and I just finished supporting basic operations on a primary key. By design, BlitzDB’s index is a dense clustered b+tree but in the first release I am going to limit PK to only be a HASH index. This is because I want BlitzDB to treat all PKs as direct keys inside the data dictionary (hash database where the actual rows are stored). So in other words, I want people to use PK for “needle in a haystack” like queries only. An example of a needle in a haystack like query is:

    SELECT * FROM TABLE WHERE primary_key_column = whatever;

    Saying that, I don’t like to force people to do things the way I like so I plan on providing best of both worlds by supporting both data structures for PKs in Version-2:

    CREATE TABLE t1 (id int, PRIMARY KEY(id) USING btree) ENGINE=blitzdb;
    CREATE TABLE t1 (id int, PRIMARY KEY(id) USING hash) ENGINE=blitzdb;

    BlitzDB’s default configuration will use PK as a “direct” data dictionary index. If you wish to do range queries on PK, the solution is to create a index on the PK column.

    Primary Key lookup Performance

    So, how does my implementation perform? Here’s a quick benchmark with a test-run that randomly fetches 100 thousand rows from a BlitzDB table with 1 million rows. This is the table I used:

    CREATE TABLE t1 (id int PRIMARY KEY, a int, b int) ENGINE=blitzdb;

    and the query looks like this:

    SELECT * FROM t1 WHERE id = random_number_under_one_million;

    The hardware I used is the following commodity server: Intel Quad Xeon E5345 (2×4MB L2 cache), 8GB Memory, 500GB SATA II. Unfortunately I could not prepare a standalone client server today so both the server and the test program were run on the same machine. Yeah… this sucks so I can’t claim that this benchmark is 100% creditable.

    Here is the result I obtained from skyload. Please only view it as a guideline to BlitzDB’s lookup performance. I’ll do a proper benchmark with the Drizzle Community and publish it after I get Version-1 released.

    [ READ LOAD EMULATION RESULT ]
      SQL File               : 100k_select.sql
      Concurrent Connections : 1
      Task Completion Time   : 5.88856 secs
      Number of Queries:     : 100000
      Number of Test Runs:   : 1
     
    [ READ LOAD EMULATION RESULT ]
      SQL File               : 100k_select.sql
      Concurrent Connections : 2
      Task Completion Time   : 6.94474 secs
      Number of Queries:     : 100000
      Number of Test Runs:   : 1
     
    [ READ LOAD EMULATION RESULT ]
      SQL File               : 100k_select.sql
      Concurrent Connections : 4
      Task Completion Time   : 7.04455 secs
      Number of Queries:     : 100000
      Number of Test Runs:   : 1

    As you can see, “needle in a haystack” queries can be executed pretty efficiently in BlitzDB. Looking at the first result, we can observe that it took an average of 0.058 milliseconds to process a query.

    Future Plans

    Admittedly, primary key support isn’t completely done so I’ll continue working on it. After that, I will start hacking on b+tree indexes and write more tests as I go. Once I support at least two indexes, I’ll ask the Drizzle Community to consider merging BlitzDB into Drizzle’s trunk. This is my goal for BlitzDB at the moment.

    I also happen to own blitzdb.com so I’m planning on putting user documentation (including tutorial) and architectural notes there. This is currently not so high on my TODO list so I suspect it won’t happen until I get Version-1 released. All I can say about the release schedule at the moment is, “before the MySQL conference in april”.

    So, that’s all I have to summarize for now. Thanks for reading this far. Merry Christmas and have a Happy New Year. Don’t trip on ice :)

    Getting Drizzle Started -or- “Know thy Path!”

    Posted by: Kent Bozlinski, December 23, 2009 05:12 PM

    After being gone a while for the day job I have finally gotten around to setting up my storage file locations and permissions in Drizzle. Stuart Smith helped me get past some problems, mostly associated with not knowing basic syntax, but I also ran into one or two that are due to changes ongoing in Drizzle. After spending a lot of time not knowing what I didn’t know (one of the quirks of teaching yourself), a little chat finally straightened me out.

    So after Drizzle is installed on Ubuntu you obviously have to have a place to store your data. To avoid all the time we spent figuring out what was going on, just pay attention to the last lines after you run your “make install” so you know where the damn thing is.

    Run your sudo command then add your user group where drizzle is the name of the group you want to have access to the drizzle server:

    $ sudo groupadd drizzle

    then create a new user within that group, replacing USER with your user name.

    $ sudo useradd -g drizzle USER

    After running this procedure a couple times I found that on one install the directory was created for me, on another I had to create it. Either way, make sure you have a new directory for drizzle and then a data directory in that:

    $ sudo mkdir /home/USER/drizzle
    $ sudo mkdir /home/USER/drizzle/data

    Now set up the permission on that directory for your user:

    $ sudo chown -R drizzle:USER /home/drizzle/data

    Finally start up the Drizzle Server with:

    $ sudo -u USER /usr/local/sbin/drizzled --datadir=/home/USER/drizzle/data

    You should get a bunch of lines of code saying the Drizzle server and InnoDB are now running and what ports it is listening too. Now you can start up the Drizzle Client:

    sudo -u USER /usr/local/bin/drizzle

    and you should see:

    Welcome to the Drizzle client.. Commands end with ; or \g.
    Your Drizzle connection id is 1
    Server version: 2009.12.1251 Source distribution (drizzle)
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
    drizzle>

    I found that if I installed this on a server that already had MySQL running on it, the start up command would return an error where the InnoDB (a storage engine drizzle uses) was already using the port that Drizzle wanted to use. I had to shut down MySQL on port 3306 to get it up and running. Mark Atwood helped me get past that issue.

    Everything seemed to work much cleaner installing on a clean system but that I guess is to be expected. I don’t know all the ways Drizzle will overlap and conflict with other programs and I expect MySQL is the first and most frequent one I will notice.

    Drizzle build 1251 source tarball has been released

    Posted by: lbieber, December 22, 2009 09:23 PM

    For Drizzle build 1251 this release includes:

  • Support for building out of tree plugins
  • Now using ICC compiler on some build machines to clean up even more warnings!
  • The Drizzle download file and change log can be found here
  • Better replication from drizzle to cassandra

    Posted by: Marcus Eriksson (krummas@gmail.com), December 16, 2009 07:47 PM



    Introduction
    This article describes how one of the replication appliers work in rabbitreplication, namely the HashTransformer which transforms each INSERT/UPDATE into a hashmap which is then stored in a column-family based storage, currently Cassandra. For a better overview of RabbitReplication, go check out earlier posts on the subject here: http://developian.blogspot.com


    Configuration
    This example replicates changes done to a table called test1 in the schema called unittests. RabbitReplication is configured to only replicate the columns id and test (yes, good example, I know...). The column id is used as a key. The following slave configuration is used for this use case:


    replication_role = hashslave


    rabbitmq.host = 10.100.100.50
    rabbitmq.queuename = ReplicationQueue
    rabbitmq.exchangename = ReplicationExchange
    rabbitmq.routingkey = ReplicationRoutingKey
    rabbitmq.password =
    rabbitmq.username =
    rabbitmq.virtualhost =
    hashstore.host = localhost:9160
    hashstore.type = cassandra


    hashreplicator.replicate.unittests.test1 = id,test
    hashreplicator.key.unittests.test1 = id
    hashreplicator.keycolseparator.unittests.test1 = .


    The hashreplicator rows are the interresting ones, they describe what columns to replicate, what columns are the primary key and what separator to use between the columns when the key is multi column.


    Example
    Replicating an insert:
    drizzle> use unittests
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A


    Database changed
    drizzle> desc test1;
    +---------+-------------+------+-----+---------+-------+
    | Field   | Type        | Null | Key | Default | Extra |
    +---------+-------------+------+-----+---------+-------+
    | id      | int         | NO   | PRI | NULL    |       | 
    | test    | varchar(10) | YES  |     | NULL    |       | 
    | ignored | varchar(10) | YES  |     | NULL    |       | 
    +---------+-------------+------+-----+---------+-------+
    3 rows in set (0.02 sec)


    drizzle> insert into test1 (id, test) values (300, "firstins");
    Query OK, 1 row affected (0 sec)


    Results in the following on the Cassandra side:
    cassandra> get unittests.test1['300']
      (column=test, value=firstins; timestamp=1260985298391)
      (column=id, value=300; timestamp=1260985298385)
    Returned 2 rows.


    Update:
    drizzle> update test1 set test = "updated" where id = 300;
    Query OK, 1 row affected (0 sec)
    Rows matched: 1  Changed: 1  Warnings: 0


    Gives this in cassandra:
    cassandra> get unittests.test1['300']
      (column=test, value=updated; timestamp=1260985526210)
      (column=id, value=300; timestamp=1260985298385)
    Returned 2 rows.


    Note that the timestamp for the id column is not updated (only changes are updated, not entire rows).


    Delete:
    drizzle> delete from test1 where id = 300;
    Query OK, 1 row affected (0 sec)
    And in Cassandra:
    cassandra> get unittests.test1['300']
    Returned 0 rows.


    That is it, go to http://launchpad.net/rabbitreplication to check out the code, report bugs or suggest features!
    A MySQL Community Member Opinion of Oracle Buying Sun

    Posted by: Sheeri Cabral, December 16, 2009 03:53 PM

    The bottom line: As both a community member of MySQL, and a service provider, I am not worried about Oracle buying Sun and acquiring MySQL in the process. There is no validity to the argument that Oracle will slow down or stop MySQL development — it is not possible, with various forks already in heavy development, and it is not probable, because Oracle has owned the InnoDB codebase for 4 years and has not slowed that development down.

    My bias

    I use MySQL, and want to see it continue to be developed. I work for The Pythian Group, providing DBA services to clients running MySQL. Together with my MySQL colleagues at The Pythian Group, the services provided run the gamut from rotating logs, monitoring, performance tuning, designing and implementing and optimizing database architectures and schemas and queries and debugging problems throughout the full stack. The only service we do not provide is code patches.

    Some of our clients use MySQL Enterprise (and used the binaries when they were different), others use Google patches and the Percona fork of MySQL, and many stay with the official MySQL binary, or popular distributions’ packaging of the official, documented MySQL source code.

    If anything, my bias runs more towards “I want to see MySQL continue to be developed” because patching is the only service Pythian does *not* provide.

    Does Oracle lose money because of MySQL?

    The statement “There is overlap between the niches that Oracle and MySQL fills” is true. The Pythian Group provides system administration sercvices, and DBA services for Oracle and SQL Server in addition to MySQL. We have worked with many clients who want to switch from Oracle to MySQL, in order to save money.

    However, in most cases, it is either cost-prohibitive or impossible to switch from Oracle to MySQL. Because migrating often requires significant amounts of effort, many organizations decide to keep the current applications on Oracle and consider starting new projects on MySQL — particularly small transaction, high-volume applications, including web 2.0 applications (for example, a Facebook application).

    Is Oracle losing business there? Perhaps. Many companies just do not have the money required to develop technology using Oracle. Microsoft has combated this problem by offering free software (including their SQL Server) and services to small businesses (one such program is here). Many companies choose Postgres or MySQL by default because it is free, or because it is already in use unofficially in their organization.

    On the flip side, MySQL loses plenty of business to Oracle and Postgres for lacking some features, or having features that are not well-developed enough. For one client, not having a MySQL equivalent of SQL Loader was enough to stop them from converting from Oracle to MySQL. Other clients have a difficult time figuring out what is the lesser of two evils — Oracle’s well-developed partitioning feature costs $40,000 per server while MySQL’s partitioning is free, but has only basic functionality.

    Why I feel Oracle will not slow down MySQL development

    If Oracle wanted to slow down MySQL development, they could have put barriers in place when they bought Innobase in 2005. Four years ago, there were no popular forks of and patches to MySQL. The fact is that when Oracle bought Innobase in 2005, there was no alternative to using InnoDB for high-speed, high-concurrency, and high-volume ACID-compliant transactional needs. Thus, if the bottom line was the issue, Oracle would have slowed down InnoDB development or closed the source years ago.

    In fact, Oracle actually makes money from MySQL, because Innobase (which Oracle owns) sells the InnoDB Hot Backup program, the most popular hot backup program for InnoDB and MySQL (the free Xtrabackup has started to gain market share, but has not surpassed the official hot backup program yet).

    I have had several occasions to talk with Ken Jacobs, who oversees InnoDB. Every time I have talked to him in the past 5 years, he has expressed a commitment to developing InnoDB. There is absolutely no reason to believe that Oracle would put resources into developing InnoDB for 4 years, then turn around and throw away that hard work by somehow slowing MySQL development.

    Particularly now that there are InnoDB alternatives — commercial and free. Why would Oracle put money and business into the hands of other companies? If Oracle puts more resources into developing MySQL, they can reap the benefits — including producing more storage engine plugins, that could be free or commercial. Another important fact: MySQL — first as owned by MySQL AB, and then Sun Microsystems — attempted to develop an alternative to InnoDB after Oracle bought Innobase. To date, Falcon has failed to provide anything other than an alpha release.

    Why I am wary when others think Oracle buying MySQL is a bad idea

    When others spread the fear, uncertainty and doubt that Oracle will somehow kill MySQL, I consider the source. Many people weigh heavily on the fact that Monty Widenius, founder of MySQL, is doing everything in his power to avoid Oracle acquiring MySQL. However, I do not put much weight into his opinion — right now Monty owns a company that has created a MySQL fork, and he wants rights to be able to sell embedded and non-GPL’d versions of his MySQL fork.

    Many of the publicly available and popular patch sets (Google/Facebook) and forks (Drizzle/Percona) came about because MySQL — back when it was owned by Monty — was not able to accept patches from community sources quickly enough for the community. Even today, with Sun owning MySQL, a feature patch can take years to get back into the source code, due to this legacy Monty left behind.

    Conclusion

    While it is theoretically possible that Oracle could decide to slow the growth of MySQL, it is not probable — if Oracle wanted to damage MySQL, Oracle would have caused a lot more damage a long time ago. The FUD about Oracle slowing development MySQL are not valid, and not true. The motivations behind those spreading this FUD are monetary and selfish. As a community member, I have seen Oracle put plenty of time, money and effort into developing InnoDB. I look forward to even more of Oracle’s resources being used to develop MySQL further.

    Storage Engine Tests in Drizzle. Organized!

    Posted by: Toru Maesaka, December 13, 2009 03:33 PM

    Good news to storage engine developers. In Drizzle, you can now place your engine specific test files (.test and .result) in your engine’s directory. Here’s an example in BlitzDB:

    First, let’s look inside BlitzDB’s directory.

    $ ls -l blitzdb/
    Total 60
    -rw-r--r-- 1 maesaka maesaka   649 2009-12-13 20:51 AUTHORS
    -rw-r--r-- 1 maesaka maesaka  5878 2009-12-13 20:51 blitzdata.cc
    -rw-r--r-- 1 maesaka maesaka  3347 2009-12-13 20:51 blitzlock.cc
    -rw-r--r-- 1 maesaka maesaka 18146 2009-12-13 20:51 ha_blitz.cc
    -rw-r--r-- 1 maesaka maesaka  8360 2009-12-13 20:51 ha_blitz.h
    -rw-r--r-- 1 maesaka maesaka   289 2009-12-13 20:51 plugin.ac
    -rw-r--r-- 1 maesaka maesaka   261 2009-12-13 23:51 plugin.ini
    drwxr-xr-x 4 maesaka maesaka  4096 2009-12-13 23:51 tests

    Notice the final line? that’s where the tests are kept. So, let’s look inside it.

    $ ls -l blitzdb/tests/
    Total 8
    drwxr-xr-x 2 maesaka maesaka 4096 2009-12-13 23:51 r
    drwxr-xr-x 2 maesaka maesaka 4096 2009-12-13 23:51 t

    As you can see, there are two directories. By now, storage engine developers would have caught on to what’s going on. The r/ directory is where the .result files are kept and t/ is where the .test files are kept. This is exactly the same layout as what we’re used to working on (“src/tests/t/” and “src/tests/r/”).

    $ ls -l blitzdb/tests/t/
    Total 8
    -rw-r--r-- 1 maesaka maesaka   21 2009-12-13 23:51 blitzdb-master.opt
    -rw-r--r-- 1 maesaka maesaka 1964 2009-12-13 23:51 blitzdb.test

    The .opt file is used to make sure that the server is started with your storage engine enabled. You simply write the startup option inside the .opt file. Here’s what mine looks like at the moment (there’s only a single line in it).

    $ less blitzdb/tests/t/blitzdb-master.opt
    --plugin_add=blitzdb
    blitzdb/tests/t/blitzdb-master.opt (END)

    Next step is actually running it. You simply specify your engine name with the --suite option to dtr and you’re done! Unfortunately the symlink permission for dtr seems broken on my repository so I’ll directly call test-run.pl in this example.

    $ ./test-run.pl --suite=blitzdb
    Logging: ./test-run.pl --suite=blitzdb
    MySQL Version 2009.12.1245
    Use of uninitialized value in scalar assignment at ./test-run.pl line 1416.
    Using MTR_BUILD_THREAD      = -69.4
    Using MASTER_MYPORT         = 9306
    Using MASTER_MYPORT1        = 9307
    Using SLAVE_MYPORT          = 9308
    Using SLAVE_MYPORT1         = 9309
    Using SLAVE_MYPORT2         = 9310
    Using MC_PORT               = 9316
    Killing Possible Leftover Processes
    Removing Stale Files
    Creating Directories
    =======================================================
    DEFAULT STORAGE ENGINE: innodb
    TEST                           RESULT         TIME (ms)
    -------------------------------------------------------
     
    blitzdb.blitzdb                [ pass ]             63
    -------------------------------------------------------
    Stopping All Servers
    All 1 tests were successful.
    The servers were restarted 1 times
    Spent 0.063 of 2 seconds executing testcases

    That’s it! I really like this change since it makes sense for engine-specific tests to belong inside the storage engine’s directory. It makes conceptual sense and it’s a good step towards differentiating the database kernel and the storage engine, which Monty Taylor is actively hacking on. Hopefully he’ll blog more about these changes soon.

    The Table protobuf message format

    Posted by: Stewart Smith, December 11, 2009 10:00 PM

    If you’ve ever opened up drizzled/message/table.proto in the Drizzle source tree you will have seen what’s in the table message: the structure that describes a database table in Drizzle. Previously I’ve talked about the Table message more generally, giving a fair bit of history of the FRM file and how we’ve replaced it with both the Table protobuf message and an infrastructure inside Drizzle so that Storage Engines own their own metadata.

    Yesterday I talked about the Schema protobuf message format in more detail, and this time I’m talking about the Table protobuf message in a similar amount.

    The first time we were loading (then only part of) the table definition out of a protobuf message was way back in January 2009 (I blogged about it too). It was an adventure untangling all sorts of things to get to a much nicer place (where we are now). The code in the server is not perfect… I’ll be the first to admit that some of it is rather strange, but that’s mostly all behind the scenes for people interested in the protobuf Table message!

    The Table message has several embedded messages in it too. We need to have information on the Storage Engine, Fields and Indexes (and each of those can have other properties). It is much more complex than the simple Schema message.

    Let’s have a look at the basic structure of the Table message:

    message Table {
    
        /* *SNIP* (Here goes the definitions for TableType, StorageEngine, Field, Index, ForeignKeyConstrain, TableOptions and TableStats) */
    
      required string name = 1;
      required TableType type = 5;
      required StorageEngine engine = 2;
      repeated Field field = 3;
      repeated Index indexes = 4;
    
      repeated ForeignKeyConstraint fk_constraint = 8;
      optional TableOptions options = 9;
      optional TableStats stats = 10;
    }

    (We’ve skipped the definitions for the embedded messages for now)

    This seems all pretty logical; a table has a name, a type, is in a Storage Engine, has Fields, may have Indexes, may have foreign key constraints, it has some options and statistics (the statistics may go away at some point “soon”).

    Let’s have a look at the TableType message definition:

      enum TableType {
        STANDARD = 0;
        TEMPORARY = 1;
        INTERNAL = 2;
      }

    It’s pretty simple, the table type is either a standard table (what you get from CREATE TABLE), a temporary table (what you get from CREATE TEMPORARY TABLE) or an INTERNAL table (what you get when Drizzle uses a temporary table during query execution).

    Next, the StorageEngine message:

      message StorageEngine {
    
        message EngineOption {
          enum EngineOptionType {
            BOOL = 0;
            INTEGER = 1;
            STRING = 2;
          }
    
          required string option_name = 1;
          required string option_value = 2;
          required EngineOptionType option_type = 3;
        }
    
        required string name = 1;
        repeated EngineOption option = 2;
      }

    The main part is the “name” member, which is just the name of the storage engine (e.g. “PBXT”,  ”INNODB”, “ARCHIVE”). We do however have support specified in the StorageEngine message for engine specific options (in key value form). Expect these to be used more in the near future.

    Specifying Fields is probably the most complex part of the table message. The Field message looks like this (with many embedded messages):

    message Field {
        required string name = 1;
        required FieldType type = 2;
        optional FieldFormatType format = 3;
        optional FieldOptions options = 4;
        optional FieldConstraints constraints = 5;
        optional NumericFieldOptions numeric_options = 6;
        optional StringFieldOptions string_options = 7;
    
        optional string comment = 16; /* Reserve 0-15 for frequently accessed attributes */
        optional SetFieldOptions set_options = 17;
        optional TimestampFieldOptions timestamp_options = 18;
    }

    So… what does this all mean? Well, Fields have a type, they’re stored in a format, there’s options attached to them, there may be constraints as well as field type specific options.

    The different field types should be fairly familiar by now:

        enum FieldType {
          DOUBLE = 0;
          VARCHAR = 1;
          BLOB = 2;
          ENUM = 3;
          INTEGER = 4;
          BIGINT = 5;
          DECIMAL = 6;
          DATE = 7;
          TIME = 8;
          TIMESTAMP = 9;
          DATETIME = 10;
        }

    We also allow fields in different formats. Currently, these are default, fixed and dynamic. The idea is you can tell the engine (or the engine can tell you) how it’s storing the field. This is currently here as a nicety and the users for this are few and far between.

        enum FieldFormatType {
          DefaultFormat= 0;
          FixedFormat= 1;
          DynamicFormat= 2;
        }

    The FieldOptions get interesting though:

        message FieldOptions {
          optional string default_value = 1;
          optional string update_value = 2;
          optional bool default_null = 3 [default = false];
          optional bytes default_bin_value = 4;
        }

    You’ll no doubt be intrigued by the existence of both “default_value” and “default_bin_value”. Ordinarily, using a string to contain a textual representation of the default value (e.g. “foo” or “42″) is fine. However, for BLOB columns, you can have defaults that aren’t representable in a text string, you need binary data (e.g. the default value contains ‘\0′).

    For TIMESTAMP columns, we continue to support DEFAULT NOW() and the ability to update the timestamp column on UPDATE. How is this represented in the table message? Well… default_value will be “NOW()” and update_value will be “NOW()”. It is intended that in the future it will be possible to have arbitrary SQL expressions for these. This does, of course, require support in the Drizzle server.

    The default_null bool should be rather obvious :)

    Well… that’s enough for today. Next time: more of the Field message!

    Log Buffer #172: a Carnival of the Vanities for DBAs

    Posted by: David Edwards, December 11, 2009 06:56 PM

    One week and a whole lot of snow later, it is time for the 173rd edition of Log Buffer, the weekly review of database blogs. MySQL goes first this week.

    MySQL

    On the MySQL Performance Blog, Peter Zaitsev and his readers discuss the question, how many partitions can you have? In Peter’s opinion, “ . . . be careful with number of partitions you use. Creating unused partitions for future use may cost you.”

    Also, Peter’s colleague Aleksandr Kuzminsky announces the release of xtrabackup-1.0, an “open source online (non-blockable) backup solution for InnoDB and XtraDB engines.”

    On SELECT mysqlgenie FROM lamp; Nicklas Westerlund published the first part of a series on IO benchmarking for MySQL, showing the results of his tests with sysbench.

    Nick will also be tackling next’s week’s Log Buffer.

    Harrison Fisk’s MySQL Thoughts this week included this item about an Ext4 with MySQL binary logs oddity. “ . . . we were able to pin it down to ext4 and how it delays data writes for a very long time (30 minutes).”

    As long as we’re in /var, here’s Eric Bergen, attempting to unwind the tangled web of pid file creation, which he started doing having discovered a pid file creation race condition in mysqld_safe.

    Here on the Pythian Blog, Sheeri K. Cabral reports that active support for MySQL 5.0 ends soon.

    Lenz Grimmer announces the call for papers for the “MySQL and Friends” developer room at FOSDEM 2010.

    PostgreSQL

    Dave Page, PostgreSQL hacker and core team member, brings word that PostgreSQL at FOSDEM 2010 – the call for talks has been issued. Dave also has a post clarifying the PostgreSQL release support policy.

    Dan Langille, coordinator of the PGCon conference, shares bsdtalk’s interview with himself on BSDCan/PGCon 2010.

    SQL Server

    Linchi Shea exposes the performance impact of UNION ALL views, ANSI_PADDING, and bad query plans. Writes Linchi, “Whether or not you specify it explicitly, ANSI_PADDING setting is there when you create a table, and can have an impact on the performance of some queries. If you are not careful, it can even hurt performance big time!” Examples and script follow.

    In a post is called, For want of a nail, Michael Zilberstein writes, “I’ll describe here how small bug can ‘kill’ [a] strong server. The story begins when for some unknown reason [the] ‘rows’ column in [the] sys.partitions DMV begins to show 4.6 billion rows . . . ”

    How many rows? asks Kalen Delaney, inquiring into how best to answer that question.

    Here’s Adam Machanic with the release of Who is Active? v9.57, a “fast, comprehensive DMV collection” to answer this question: What’s Really Happening on Your Server?

    Adam also provides a roundup of T-SQL Tuesday #001 (Date/Time Tricks).

    In a similar vein, Dave Ballantyne gives a lesson in age calculation with SQL Server, sharing what he says is, “ . . . by far the simplest and accurate method that I know.”

    Oracle

    Let’s start the Oracle news with some reviews of the UKOUG Conference, which closed last weekend. Here are adhd ocd dba’s notes from UKOUG, and Marco Gralike’s UKOUG 2009 wrap-up: “The wifi was a minor bummer, but the rest was up on high standard as always. Good quality presentations as should be expected.”

    Martin Widlake introduces, “ . . . a tuning method with partition exclusion where you infer a relationship between the value you want to limit a query on and the partition key. It takes a while to explain the theory so I am going to give it a name, in the vain hope it catches on . . .  I’m going to call it Correlated Partition Exclusion.”

    From Igor’s Oracle Lab, Gary Myers brings the CREATE ANY TRIGGER three card trick, a security-related post.

    The Oracle Performance and Backup Blog offers the first part of a series on Oracle RAC One Node, which is, “about a installation and configuration of Oracle RAC 11gR2 in One Node configuration on VMWare.”

    Retuning to the My Oracle Support outcry, Hans Forbrich looks at MOS one week later, giving a break-down of improvements what remains wrong.

    John Hallas gives a lesson on using glance to see Oracle process usage on HPUX.

    To close things, here is Drupal guy Dries Buytaert on NoSQL and SQL. Dries writes, “Have a look at this video of Brian Aker’s great 10 minute lightning talk about NoSQL. NoSQL is a database movement which promotes non-relational data stores that do not need a fixed schema.  . . . NoSQL is an ‘and’ and not a ‘versus’. Plus, I expect the gap to close as there are a couple of interesting projects under way that bring some of the NoSQL advantages to the SQL world. One of them is Brian Aker’s own Drizzle project . . .  [Have] a look Brian’s NoSQL presentation. It’s funny!”

    Also funny—Geert JM Vanderkelen’s Clusterious defined.

    That is all. As always, I invite you to comment with your favourite DB blogs from the week gone by. Please return next week for Nick Westerlund’s LB#173. Till then!

    The Schema protobuf message: Drizzle’s metadata on a schema

    Posted by: Stewart Smith, December 10, 2009 10:00 PM

    I’ve previously talked about table metadata in Drizzle and how we use the table protobuf message to describe a table (see Drizzle FRM Replacement and others). The model in Drizzle is that the engine is responsible for its metadata. For schemas (you may be thinking ‘database’ but we’re moving to the Schema terminology in Drizzle) we also have a small amount of metadata.

    The protobuf message is specified in drizzled/message/schema.proto and is incredibly short. In fact, here it is in its entirety:

    1
    package drizzled.message;
    2
    option optimize_for = SPEED;
    3
    4
    message Schema {
    5
      required string name = 1;
    6
      optional string collation = 2;
    7
    }

    We don’t keep an awful lot of metadata about schemas. A Schema has a name and it has a default collation.

    You can also read the db.opt file directly using the provided (and very simple) schema_reader utility.

    In the near future, we could have CREATE DATABASE and CREATE SCHEMA replicated via this protobuf message. This would make it extremely easy to parse for utilities parsing the replication stream.

    We’ll also (rather shortly) have key,value pairs for options to CREATE SCHEMA/CREATE DATABASE. More on that later :)

    Cassandra support in rabbitreplication

    Posted by: Marcus Eriksson (krummas@gmail.com), December 10, 2009 08:55 PM

    Just pushed support for replicating into cassandra to http://launchpad.net/rabbitreplication

    The following format is used:
    KeySpace = schema name from the transaction
    ColumnFamily = table name from the transaction
    Column name = "object" since we only store objects
    Key = the key generated from the object to store, either by using the @Id annotation or by implementing the KeyAware interface

    In the CLI you would type something like this to get the data (drizzle schema name is unittests and table is test1):


    cassandra> get unittests.test1['1']['object']
    ==> (name=object, value={"name":"updated","ssn":1}; timestamp=1260472768425)
    Drizzle FRM replacement: the table proto

    Posted by: Stewart Smith, December 09, 2009 10:50 AM

    Drizzle originally inherited the FRM file from MySQL (which inherited it from UNIREG). The FRM file stores metadata about a table; what columns it has, what type those columns are, what indexes, any default values, comments etc are all stored in the FRM. In the days of MyISAM, this worked relatively well. The row data was stored in table.MYD, indexes on top of it in table.MYI and information about the format of the row was
    in table.FRM. Since MyISAM itself wasn’t crash safe, it didn’t really matter if creating/deleting the FRM file along with the table was either.

    As more sophisticated engines were introduced (e.g. InnoDB) that had their own data dictionary, there started to be more of a problem. There were now two places storing information about a table: the FRM file and the data dictionary specific to the engine. Even if the data dictionary of the storage engine was crash safe, the FRM file was not plugged into that, so you could end up in a situation where the storage engine
    recovered from a crash okay, but the FRM was incorrect for what the engine recovered to. This would always require manual intervention to find out what went wrong and then fix it (in some rather unusual ways).

    When the MySQL Cluster (NDB) engine was introduced, a new set of problems arose. Now the MySQL server was connecting to an existing database, where tables could be created on other nodes connected to the cluster. You now not only had the problems of crash recovery, but the problems of keeping the FRM files in sync across many nodes, requiring
    all sorts of interesting solutions that, for the most part, do work.

    The “obvious” solution to some of these problems would be for an engine to write out an FRM file itself. This is much easier said than done. The file format was never created to be read and written by multiple pieces of software, the code that did the reading and writing inside the server was not reusable elsewhere and the only documentation (that
    wasn’t a decent chunk of the MySQL source tree) is the rather incomplete definition in the MySQL Internals wiki (http://forge.mysql.com/wiki/MySQL_Internals_File_Formats) – not nearly enough to write a correct FRM file as the specifics are very, very odd.

    Our goals for reworking the metadata system in Drizzle were: to allow engines to own their own metadata (removing any opportunity to have inconsistencies between the engine and the ‘FRM’) and for engines without their own data dictionary, to replace the FRM file format with something simple and well documented.

    One option was to use SQL as the standard storage format, but it is rather non-trivial and expensive to parse – especially if we were to use it as the preferred way of talking table definitions with storage engines. We had been looking at the protobuf library
    (http://code.google.com/p/protobuf/) ever since its first release and it has a number of very nice characteristics: a description language of a data structure that is then used to generate APIs for reading and writing it in a number of programming languages and a standard (documented) way to serialize the data structure.

    After a bit of discussion, we arrived at a good outline for the table definition proto. The current one can always be found in the Drizzle source tree at drizzled/message/table.proto. The current format is very close to final (i.e. one that we’ll suppport upgrades from).

    The process of modifying the Drizzle code base so that it would write (and read) a file format different to the FRM isn’t worth going too much into here although there were some interesting hurdles to overcome. An interesting one was the FRM file contains a binary image of the default row for the table (which is in the row format that the server uses); we now store the default value for each column in the proto and generate the default row when we read the proto. Another interesting one was removing and refactoring “pack_flag” – the details of which should only be extracted from Jay or Stewart with a liberal application of fine ale.

    The end result is that we now have storage engines that are completely responsible for their own metadata. One example is the ARCHIVE engine. In the CREATE TABLE code path, the ARCHIVE storage engine gets the table definition in an object that represents the table proto. It can examine the parameters it needs to and then either store the proto directly, or convert it into its own format. Since ARCHIVE is simple, it just stores
    the table proto in a serialised form (using a standard function provided by the protobuf library) and stores it in the .ARZ data file for the table. This instantly makes the ARCHIVE storage engine crash safe for CREATE and DROP table as there is only 1 file on disk, so no two files to get out of sync.

    If an engine does not have its own data dictionary, it can still use the default implementation which just stores the serialised table proto in a file on disk.

    We can also now use this interface to move INFORMATION_SCHEMA into its own storage engine. This means we can remove a lot of special case code throughout the server for INFORMATION_SCHEMA and instead just have a INFORMATION_SCHEMA storage engine that says it has the following tables in the INFORMATION_SCHEMA database. Because the table definition is now in a documented format with a standard API, this becomes a relatively trivial exercise.

    What we’re all looking forward to is when the InnoDB data dictionary is linked into the new interface and we can have a truly crash safe database server.

    Another wonderful side effect is since we now have a standard data structure for representing a table definition, we can integrate this with the replication system. In the “near” future, we can represent a CREATE TABLE in the replication stream as a table proto and not the raw SQL. If you were wanting to apply the replication stream to a different database server, you then only have to write a table proto to SQL
    converter. If the target database system doesn’t do SQL at all, you could generate API calls to create the table.

    So we now have a rather flexible system in place, with the code implementing it being increasingly simple and possible to be “obviously correct”.

    Things that easily fall out of this work that people have written about:
    - CREATE TABLE LIKE with ENGINE clause
    http://krow.livejournal.com/671235.html
    - table_raw_reader – looking at the raw representation of table metadata
    http://www.flamingspork.com/blog/2009/10/01/table_raw_reader-reading-the-table-proto-from-disk-and-examining-everything/
    - Table discovery
    http://www.flamingspork.com/blog/2009/07/29/table-discovery-for-drizzle-take-2-now-merged/

    Some more info:
    http://krow.livejournal.com/642329.html

    Drizzle build 1240 and libdrizzle 0.6 source tarballs have been released

    Posted by: lbieber, December 08, 2009 06:49 PM

    For Drizzle build 1240 this release includes:

  • Dynamic loading of plugins at server start up
    • The majority of the existing plugins have been set to be loaded dynamically rather than compiled statically with the server
    • Non-default plugin modules can be loaded with a command line option, —plugin-add, which takes a comma-separated list of plugin module names, e.g.: —plugin-add=csv,memcached_functions
    • —plugin-add will load the default list of plugins in addition to the plugins requested
    • —plugin-load which takes a similar list but overrides the entire list of plugins to be loaded
    • In the source tree plugin test cases are now kept with the plugins themselves, so a provided plugin can be self-contained and also provide test cases
  • HEAP has been renamed to MEMORY when specifying the storage engine
  • The Drizzle download file and change log can be found here
  • For libdrizzle version 0.6 this release includes:

  • Various cleanup and fixes around custom socket event handling and listening code
  • Updated autoconf build system
  • Updated RPM packaging
  • The libdrizzle download file and change log can be found here
  • Documentation for libdrizzle can be found at API and Development
  • Drizzle persistence in Project Voldemort

    Posted by: Marcus Eriksson (krummas@gmail.com), December 07, 2009 08:56 PM

    I just built drizzle support into Project Voldemort:
    http://github.com/krummas/voldemort - just add drizzle in your stores.xml and it should work

    Basically just a cutnpaste of the mysql code, uses drizzle-jdbc (which is included in the git repo).

    To try it out, check out the code from github, execute "ant release" in the base dir and you get binaries in the dist directory.
    Replication from drizzle to memcached / project voldemort

    Posted by: Marcus Eriksson (krummas@gmail.com), December 06, 2009 10:32 PM

    The last few days I've been working on a way to replicate changes from drizzle into a key value store, currently project voldemort and memcached. It is built in my rabbit replication project which means that the transactions are transfered over a message bus (rabbitmq currently). The picture below describes an example of how the involved components could be set up (not likely that you want both memcached and project voldemort though):




    Current feature list of rabbitreplication:

    • Replication from drizzle into drizzle (or any database with a JDBC driver) / memcached / project voldemort.
    • Map inserts and updates onto java objects using annotated classes (see below for example).
    • Two different ways of marshalling objects, JSON ond Java object serialization
    • Full control over how the key is generated (just implement the KeyAware interface in your target object)
    • Simple interface to build new marshallers. 
    • Simple interface to build new object stores.
    • Simple interface to build new transports. (Will blog these extension points later)
    Example:
    The class below will catch any statements on the table unittests.test1 and take the column "id" and set it on the ssn field, and it will take the "test" column and set it on the name field. It will use the field annotated with @Id as key in the store and use the JSONMarshaller to marshal the object.




    @Entity(schema = "unittests", table = "test1", marshaller = JSONMarshaller.class)
    public class ExampleRepl {
        @Id
        @Column("id")
        private int ssn;


        @Column("test")
        private String name;
    /*...*/



    }



    Add this to your config to use it:
    managed_classes = org.drizzle.managedclasses.ExampleRepl, ...


    Then you just start your slave like this: 
    java -jar replication.jar objectslave.properties


    You need to put your managed classes on the classpath (drop them in the lib dir)


    (See earlier posts about rabbitreplication on how to get started)


    Todo:
    • Clean up configuration, quite messy right now
    • Write blogposts about how to roll your own transport/marshalling/key-value store implementations
    • Increase test suite and set up hudson for continuous integration
    • Write proper usage documentation
    • Build more backends, marshallers and transports, evolve apis
    • Write a MySQL binlog master (needs to transform mysql binlog into drizzle's protobuf based log, not even sure it is possible)
    • Create a way to not have to write code on the slave (pin tables to a hash and store it)
    • ...
    Getting involved
    • Get the code, bzr branch lp:rabbitreplication
    • Use it, give me feedback (krummas@gmail.com) <- most important!
    Download
    http://marcus.no-ip.biz/rabbitrepl.zip (yes, i will soon set up a proper download page).




    NoSQL: Not Going Anywhere For a While?

    Posted by: sogrady, December 01, 2009 09:12 PM

    \

    Everyone seems to have their own problems with the NoSQL term, so here’s mine: it doesn’t mean anything. Not that terms have to be specific to be useful: is the term database really that descriptive?

    But the challenge with NoSQL is that the name implies that it means something, and that’s enough for folks new to the space to form opinions on the matter. For better, maybe, but mostly for worse.

    The reason NoSQL exists is simple: the long time assumption that if persistence is the question, a relational database is the answer. As far back as March 2005, I’ve been skeptical of the sustainability of that assumption and expecting increased acceptance of non-relational datastores. The predicted adoption took a wee bit longer than I anticipated, but it’s here now. This outcome was inevitable.

    Not because relational databases are inherently flawed and poised to go the way of the dinosaur: they’re going to be around as long as I’m in this business. Adoption was inevitable because, just as in every other walk of life, there are different tools for different jobs in the technology world. Which brings us to the issue at hand: when different jobs refers to any workload where a relational database is a less than ideal solution, your bucket is too big. As we see, daily, from the inquiries.

    Lumped into the “NoSQL” bucket right now are tools as diverse as column databases, distributed databases, distributed filesystems, document databases, key value stores and even graph oriented databases. Even those categories blur: what’s the difference between a distributed database and a column database?

    Exactly.

    What do they have in common? Not a lot. Wikipedia implies that it’s about big data, and indeed some of the NoSQL stores scale remarkably well. But some don’t, intentionally. For all the talk of avoiding joins, eventual consistency, non-ACID compliance and such, the real common denominator for NoSQL stores is that they are generally not row/table oriented and they are mostly SQL ignorant. Except that, as Brian Aker points out, these distinctions may be nothing more than semantics. And as if that wasn’t complicated enough, SQL-like features are periodically being reintroduced via projects such as Pig.

    Whatever your feelings on whether or not NoSQL is actually about SQL – Michael Stonebraker certainly doesn’t think so – defining an entire category of software by what it doesn’t do rather than what it does seems like a problem.

    Which is part of the challenge for projects like Cassandra, CouchDB, Hadoop, HBase, HyperTable, InfiniDB, Memcache, MongoDB, Redis, Riak, Tokyo Cabinet/Tyrant, Voldemort et al. And part of the challenge, frankly, for folks that do what we do.

    The good news is that the need for such tools is very real. As we’ve seen with projects like Drizzle, which was forked specifically because the design trajectory was not meeting the needs of a certain class of customer. Flawed though it may be, the NoSQL term is being applied to a real and accelerating pattern of adoption, and we’re seeing spikes in interest across the board. Which is why I anticipate strong, though not mainstream, uptake of the tools in 2010.

    I wish we had a better designation than NoSQL, but I know better than to try to push that rock up a hill. Besides, some smart folks are only too happy to leave SQL behind. Looking further out, as we see heavier adoption within individual NoSQL software types, the unhelpful umbrella term may yet be retired, but until it does expect NoSQL to be a trending topic in 2010. Love it or hate it, the term isn’t going anywhere for a while. Even if it doesn’t mean anything.

    Disclosure: Basho, a commerical backer of Riak, Cloudera, a commercial backer of Hadoop, IBM a commercial backer of CouchDB, Hadoop and Casssandra are RedMonk customers.

    by-nc-sa
    OpenSQLCamp Videos online!

    Posted by: Sheeri Cabral, December 01, 2009 07:17 PM

    OpenSQLCamp was a huge success! I took videos of most of the sessions (we only had 3 video cameras, and 4 rooms, and 2 sessions were not recorded). Unfortunately, I was busy doing administrative stuff for opensqlcamp for the opening keynote and first 15 minutes of the session organizing, and when I got to the planning board, it was already full….so I was not able to give a session.

  • Drizzle Client Rewrite – Clark Boylan leads the requirements and design discussion for rewriting the Drizzle Client
  • Drizzle Plugin Hacking
  • Drizzle is Not MySQL with Changes – Brian Aker gives a talk on Drizzle, and how Drizzle is not MySQL.
  • Fractal Trees – by Bradley Kuszmaul of Tokutek
  • Geographic Operators in SQL (PostGIS) by Webb Sprague
  • Goal-Driven Performance – Peter Zaitsev speaks about how to achieve goal-driven performance.
  • Graph Engine for MySQL – by Antony Curtis
  • Intro to Cassandra
  • An Intro to CouchDB: What caught Ubuntu’s eye by Mike Miller (Cloudant)
  • Maria – MySQL founder Monty Widenius talks about Maria, and the past, present and future of MySQL.


  • Memcached functions in MySQL
  • mk-query-digest – Baron Schwartz talks about mk-query-digest, which parses logs and can capture live MySQL action to analyze, transform, filter, review and report on queries.
  • MongoDB
  • SQL For the Insane – Dan Colish presents SQL For the Insane – crazy SQL tricks presented at the November 2009 OpenSQLCamp in Portland, Oregon.
  • SQL vs. NoSQL Panel moderated by Ronald Bradford and including:
    • Brian Aker – Drizzle
    • Monty Widenius – MariaDB
    • Selena Deckelmann – PostgreSQL

    • Eric Evans – Cassandra
    • Mike Dirolf – MongoDB
    • Mike Miller – CouchDB
  • PBMS, BLOB, S3 Storage and Backup – Barry Leslie of Primebase talks about PBMS, BLOB, S3 Storage and backup with the PBMS Daemon.
  • Operations Aspects to Running DBs in the Cloud – Joe Williams from Cloudant talks about: automation (config management, dynamically adding nodes) performance (adding nodes to increase performance, disk tuning) and availability (what if a node goes down?)
  • Speak HTTP to your Database – John David Duncan speaks about lessons learned from mod_ndb
  • Storage Engine API – Bradley Kuszmaul of Tokutek describes his ideas for a storage engine API for MySQL.
  • Using and Optimizing Databases on Flash by Peter Zaitsev
  • Xtrabackup tricks – Peter Zaitsev talks about Xtrabackup tricks.
  • Lightning Talks (same as previously posted, nothing new below):

  • All the lightning talks below in one continuous video (just over 54 minutes total)

  • Tips on Drizzle Development and Valgrind

    Posted by: Toru Maesaka, December 01, 2009 02:05 PM

    In brief, valgrind is a framework of awesome tools that does an amazing job at detecting memory errors. It will catch silly (often unexpected) mistakes and memory leaks that you’ve made in your code. IMHO, it’s a must have tool for open source hackers that work with Linux. If you develop a plugin or a storage engine for Drizzle/MySQL, you often end up wanting to test your program for memory errors. Actually, it’s not a “want”, it’s a MUST.

    Conveniently by supplying a simple startup option, Drizzle and MySQL’s test runner will run the daemon process on valgrind’s virtual machine. I’m not sure about MySQL since I’ve never developed anything for it but at least with Drizzle you can run a test case independently by supplying the desired test name to the test runner.

     $ ./dtr your_test_file_name --valgrind

    So, with BlitzDB this is what I do to isolate the test runner to only run my tests:

     $ ./dtr blitzdb.test --valgrind

    Very simple.

    The minor complication here is that the test runner will not output the valgrind report to the console and instead it writes the output to a file. So where is this file? the answer is, it’s written to the daemon’s error log which is located in the source tree:

    $ less drizzle_src/tests/var/log/master.err
    CURRENT_TEST: main.blitzdb
    ==24563== Memcheck, a memory error detector
    ==24563== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
    ...

    Here’s another tip. If you ever wondered where the files that were generated in the test (like table and index files) are stored, they are stored inside the source tree as well. Here’s an example on my machine:

    $ ll drizzle_src/tests/var/master-data/
    total 20528
    -rw-rw---- 1 tmaesaka tmaesaka 10485760 2009-12-01 22:06 ibdata1
    -rw-rw---- 1 tmaesaka tmaesaka  5242880 2009-12-01 22:06 ib_logfile0
    -rw-rw---- 1 tmaesaka tmaesaka  5242880 2009-12-01 22:06 ib_logfile1
    drwxr-xr-x 2 tmaesaka tmaesaka     4096 2009-12-01 22:06 mysql
    drwxr-xr-x 2 tmaesaka tmaesaka     4096 2009-12-01 22:06 test

    So, with all that in mind, happy hacking :)

    Update of RabbitMQ Replicator

    Posted by: Marcus Eriksson (krummas@gmail.com), November 30, 2009 09:52 PM

    The little Drizzle RabbitMQ replicator I built last week got quite popular (and even  users!) so I decided to work a bit more on it. These are the changes:

    • Moved it to a real project on launchpad (http://launchpad.net/rabbitreplication
    • Store transaction log position on master so that a transaction is never sent twice over the wire.
    • Major internal refactorings, start using google guice for IoC
    • Make the transport pluggable, it should now be easy to write your own transport and swap out rabbitmq, JMS anyone?
    • Alot more configurations for rabbitmq, see example config
    User Guide:
    1. Download binaries here (or check out code: bzr branch lp:rabbitreplication)
    2. Unzip on master
    3. Edit master.properties to reflect your environment (hope the config vars are self-explanatory, let me know if there are any problems)
    4. Start the master: java -jar replication.jar master.properties
    5. Unzip on slave
    6. Edit slave.properties
    7. Start the slave: java -jar replication.jar slave.properties
    8. Watch changes replicate, report issues to me (krummas@gmail.com)
    Todo/Ideas:
    I'm thinking about using this code to build a framework for replicating changes from Drizzle into other storage forms, for example Hadoop/hbase or Cassandra. My thinking is that it could be useful for moving data that is very hot into a faster storage without changing your application too much. This might need business logic implemented on the "slave" (saying for example that this column should be stored and this should be ignored), and that is what I'm thinking could be built quite nicely using a framework (a DSL could be handy). 

    Please let me know if this has been done somewhere or if it is a stupid idea!
    Presentation: Drizzle is not MySQL with Changes by Brian Aker

    Posted by: Sheeri Cabral, November 25, 2009 10:42 PM

    Another video from the recent OpenSQLCamp in Portland, Oregon….Earlier today I uploaded the Lightning Talk Videos. Due to the holiday I am unsure when the rest of the videos will be ready. When they are, I will do one blog post featuring them all.

    However, I have had several requests for this specific video, so here is Brian Aker speaking about Drizzle.

    The slides are up at http://www.slideshare.net/brianaker/drizzle-opensql-camp, and here’s the video:

    (Note, I will not do a post for each video…..but since this one is up and ready, I figured I’d do it before I leave for the holiday).

    Drizzle Storage Engine Dev: Determining Query Type

    Posted by: Toru Maesaka, November 25, 2009 10:42 AM

    Determining what kind of SQL query is requested at the handler level is pretty important for BlitzDB since the strategy is to obtain the most suitable lock for a given request. Unfortunately there is no intuitive way to get this information. So, I took a peek into InnoDB’s sourcecode and found my solution (open source saves the day as usual).

    Solution

    In Drizzle, there is a function called session_sql_command(Session *session) which returns an integer that corresponds to one of the command type constants (which are accessible from the engine). Ideally I would like to call this function from anywhere in the engine but since it requires a session object as an argument, I could only call it from store_lock().

    My solution was to add a variable in the handler class and assign the appropriate value to it from store_lock(). This turned out to be okay since store_lock() is called before any other API functions but the concern here is that store_lock() is planned for removal in the future.

    Now I can do things like:

    ha_blitz::rnd_init(bool drizzled_will_scan) {
      if (sql_command_type == SQLCOM_UPDATE)
        /* get the most suitable lock type for this task */
      else if (sql_command_type == SQLCOM_SELECT)
        /* get the most suitable lock type for this task */
      ...
    }

    Personal Request to Drizzle

    Although I would like to see store_lock() disappear from the storage engine API, I would like storage engines (technically worker threads) to have ability to gather meta information on the query before any real work is done.

    My request is for store_lock() to become something along the line of gather_information() where it gives the handler (or worker threads) a chance to gather information about the query. Needless to say, drizzled must call this function before any other API calls are made.

    Drizzle is Running on my Jaunty Box(possibly the strangest sentence ever)

    Posted by: Kent Bozlinski, November 25, 2009 09:29 AM

    I fancy myself an astute computer user but some of the ways to get things running in Linux are challenging for a born-and-raised windows user like myself. You don’t just click on a button to install Drizzle. I ran into a few blocks, that can mostly be attributed to me tripping over my self.

    For the noobie,  make sure you run all of this from the root user which means you will see a $ sign after your user name. for example, my command line starts with: kent@server:-$.

    Jauder Ho had a great walk-through that I am going to steal and re-post here, with a couple alterations only for the sake of clarity and a minor change due to dependencies. His text is in red:

    The Drizzle wiki was useful in getting started but did not include all of the available packages.

    First, the basic dependencies:

    sudo apt-get install libpcre3-dev libevent-dev autoconf automake
    bison libtool ncurses-dev libreadline-dev libz-dev g++
    libssl-dev uuid-dev libpam0g libpam0g-dev gperf
    

    Next, the libevent on 8.04 is too old so a newer version needs to be installed. Did this on 9.04 too just to make sure it is up-to-date.

    sudo apt-get install libevent-dev

    Google’s Protocol Buffers as well as libdrizzle need to be installed and the packages are available on the drizzle-developers PPA on Launchpad.

    Google  Protocol Buffers is a bit more of an issue. I got an error later on that said protobuf was out of date. You need to go here to get a concise walkthrough to update your software sources on Ubuntu Desktop.

    ***Amendment*** if you are running Ubuntu Server (which is probably the ultimate goal) you need to access your repositories by running the following command: sudo nano /etc/apt/sources.list. Then you can add the lines as the sources walk-through suggests***

    Add the PPA to your software sources, add the key and then run the following:

    sudo apt-get install libprotobuf-dev protobuf-compiler

    Then:

    sudo aptitude install libdrizzle-dev

    I was beating my head against a wall for a while trying to install libdrizzle. I kept trying to run sudo apt-get install on libdrizzle-dev but for some reason that didn’t work. Then I did sudo aptitude install and it worked fine. If anyone can tell me why, that would be great. Aside from that Jauder’s walk through worked fine.

    *** Amendment, some people mentioned that I forgot to let you know to install Bazaar to compile Drizzle as well as make. Those commands are:”sudo apt-get install bzr” and “sudo apt-get install make“***

    Now it’s time to finally download, compile, and install Drizzle. Run the following commands:

    bzr branch lp:drizzle
    cd drizzle
    ./config/autorun.sh
    ./configure
    make
    sudo make install
    

    You should now have Drizzle installed on Ubuntu 9.04 Jaunty Jackalope. If you have an problems or see a way to make anything more concise or clear please tell me.

    I did find a few things in the wiki that could be made more clear for us noobies. I will get on that.

    I actually spent a lot of time doing this the hard way before I found Jauder’s succinct walk-through and will elaborate in the next post.  Next time I will be setting up my user accounts and beginning testing.

    Pluggable Database Client Tool

    Posted by: Eric Day, November 23, 2009 10:11 PM

    A few weeks ago I wrote about a student group who will be working with the Drizzle community to build a new database client tool. While the tool will be the primary replacement for the Drizzle client tool, we hope it will be generic (using the Python DB API) so it will work with others like MySQL and PostgreSQL. We’ve had a number of great discussions, including a session at OpenSQL camp last weekend. I wanted to toss out a few ideas of how such a tool could be structured to allow for maximum extensibility.

    One possibility is to borrow from typical Unix shells and DSP processing systems where you have a number of modules with I/O interfaces and data exchange formats between each module. Each module provides a specific signature so you know what other modules it can plug into. Here is a simple example:

    Simple

    drizzle> SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES LIMIT 3;
    +--------------------+---------------------------------------+
    | TABLE_SCHEMA       | TABLE_NAME                            |
    +--------------------+---------------------------------------+
    | information_schema | CHARACTER_SETS                        |
    | information_schema | COLLATIONS                            |
    | information_schema | COLLATION_CHARACTER_SET_APPLICABILITY |
    +--------------------+---------------------------------------+
    3 rows in set (0 sec)
    

    The client tool would start with a single active module, the Command Line reader that provides an interactive prompt. When a command is read, it creates a pipeline including a Query Executer module then a Console Output module (these are all the sane “default” modules to use). Once the pipeline is created, it then provides a Command message to the Query Executer, which talks to the appropriate Drizzle database server, and then starts providing column headers and rows in a Data Set message format to the Console output plugin. Initially this may seem a bit over engineered for such a simple use case, but let’s explore the flexibility this provides.

    File Reader/Writer

    drizzle> grep foo  commands > results
    

    In this example, the client tool is started with the input module being a file reader. This module will read commands from the file (or stdin if things are piped into stdin) and feed each command to the Query Executer module. Since we specified an optional “grep foo” on the command line, this will initialize the ‘grep’ module (it may be more efficient to put this filtering into the WHERE clause, but ignore that for now). The result set will therefore be piped into the grep module, and then a possibly modified result set is pipelined out of that and into the File Writer module which will write the data set to ‘results’. Now imagine being able to write your own plugins, similar to the grep plugin, that can easily plug right into the pipeline just by using the common name. Each plugin can have any number of options and arguments, allowing you to create flexible modules to do client-side data set processing.

    CSV Processor

    drizzle> csv input.csv | INSERT INTO x VALUES ($1, $3, $5);
    

    In this case, the command ‘csv’ causes a module to be initialized that will read the file input.csv file. The fields are separated out and formatted into a Data Set exchange message that is then fed into a query generator. The argument to the query generator is the INSERT command from the command line, and a number of INSERT commands would then be generated from the CSV input. These commands would then be piped into the query executer module and eventually results would be displayed on the console. Any other modules could also be plugged in to work on either the input or output data sets (filtering, rewriting, …).

    Multi-Server/Result Aggregator

    drizzle> \servers 10.0.0.1,10.0.0.2,10.0.0.3
    drizzle> SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES LIMIT 3; | merge -c
    +-------+--------------------+---------------------------------------+
    | COUNT | TABLE_SCHEMA       | TABLE_NAME                            |
    +-------+--------------------+---------------------------------------+
    | 3     | information_schema | CHARACTER_SETS                        |
    | 3     | information_schema | COLLATIONS                            |
    | 3     | information_schema | COLLATION_CHARACTER_SET_APPLICABILITY |
    +-------+--------------------+---------------------------------------+
    3 rows in set (0 sec)
    

    This example takes a list of servers, executes a command on all of them, aggregates the results, and displays the final result to the console. This specific command may not be all that useful, but similar aggregate data sets could be very useful when you manage a number of servers or have sharded data sets across multiple servers. In the command above, the ‘merge -c’ command merges all the resulting data sets into a single data set, and also inserts a ‘count’ column showing how many of each row appeared during the merge (think of the uniq -c unix command line tool).

    If you took this concept a step further, you could read result sets from one command and pipe them into a another command generator that could talk to one or more servers. This allows the tool to become a generic “data set router” for various purposes. Rather than using a command line interface, you could replace the Command Line and Console Output plugins with a simple GUI input/out. The entire system could also be used internally within your applications to provide a richer set of functionality that the normal simple DB API provides. Building such a system up from a series of equal modules (ie, there are no first class modules) would allow great flexibility for users and developers to customize to their environment.

    Drizzle Replication using RabbitMQ as a transport

    Posted by: Marcus Eriksson (krummas@gmail.com), November 23, 2009 09:38 PM

    Having spent a bit of time learning how the transaction log in drizzle works (and it is incredibly easy to work with), I got an idea to use RabbitMQ as a transport. RabbitMQ is an implementation of the AMQP standard in Erlang, so it must be awesome.

    It works like this; on the master, a java app is simply tailing the transaction log, sending all new transactions, in raw format, to a rabbitmq server.

    The slave(s) are connected to the messaging server and are guaranteed to get the raw messages. When the slave gets the message, it transforms it to a JDBC prepared statement and executes every statement as a batch operation. The reason I use prepared statements and batch operations is that I get a lot for free from the JDBC driver, for example correct escaping of strings etc, and I can also enable the rewrite batch handler feature to get a great performance boost.

    Another great thing we get for free by using rabbitmq is the fail-safety, if an exception is thrown in the slave, the message is not acknowledged and it will be retried later.

    Being able to write something like this in a few hours really shows how powerful the drizzle replication system is. It will be one of the killer features.

    There is one issue i really need to fix, namely that the master does not keep track of which transactions it has sent over the wire, so it will resend all transactions in the log every time it is restarted. Of course, since this was written in a short time, there are probably lots of other issues as well. If you want a real replication solution, go check out Tungsten Replicator.

    If anyone wants to contribute, the code is on launchpad: https://code.launchpad.net/~krummas/+junk/rabbitmq-replication

    If you simply want to try it out, get the binaries here: http://marcus.no-ip.biz/rabbitrepl.zip To start it, you do java -jar replication.jar master.properties - just make sure you edit the .properties files before starting.
    Drizzle source tarball 1223 has been released

    Posted by: lbieber, November 23, 2009 09:14 PM

    Drizzle source tarball based on build 1223 has been released. This release contains some changes from Eric for the start of changes to the Drizzle protocol. You must now use –drizzle-protocol-port=N when starting drizzled instead of –port=N. See his emails for more details

    The download file and change log can be found here.

    Log Buffer #170: a Carnival of the Vanities for DBAs

    Posted by: David Edwards, November 20, 2009 05:44 PM

    This is the 170th edition of Log Buffer, the weekly review of database blogs. Welcome. Let’s kick off this week with a double-helping of . . . 

    SQL Server

    There are lots of good technical posts this week. The SSIS Junkie has some observations and a straw poll on sort transform arbitration. He writes, “This post was prompted by a thread on the MSDN SSIS forum today where the poster was asking how he could replicate the behaviour of SSIS’s Sort transform using T-SQL, specifically he wanted to know how the Sort transform chooses what data to pass through when the ‘Remove Duplicates’ option is checked.”

    Another poll, courtesy of Tibor Karaszi: do you perform log backup for the model database?

    Eric Johnson has a lesson in looping through rows in a table in SSIS 2008, which begins, “When writing code against a SQL Server, as we usually are doing in SSIS Packages, you often need to iterate over all the rows in a table. This can be done using an SSIS Foreach Loop Container, but the how is not obvious.”

    Simple-Talk’s Tony Davis wonders, Do Scalar UDFs give SQL Server a Bad Name? “Many developers seem to regard SQL Server as if it were a science-fiction alien planet where unsuspecting crew-members in blue jumpers occasionally die horribly; everything is suddenly unsafe, and potentially malicious: nothing really works properly and so any serious code should be kept well away from it. Is this developer ignorance, or is their fear justified?”

    From Merrill Aldrich comes a trick question — part quattro. Spoiler: TPH is an evil trap. As a commenter says: “Very interesting, I’ve never heard this vehement an argument against TPH before.”

    Thinking outside the box lets us in on how to tell if you are running on a virtual environment, with a handy little bit of code.

    Kalen Delaney elucidates UPDATE Locks, “ . . . a hybrid of SHARED and EXCLUSIVE locks. [Contrary] to what you might think, UPDATE locks are not just acquired for UPDATE operations.”

    Stephen Forte shows how SQL Server R2 Does SQL Azure.

    Now that PASS 2009 in Seattle, Washington has passed, it’s time to fondly look back on it. Kendal Van Dyke shares his experiences and some photos in Looking Back – PASS Summit 2009 Day 4.

    The Rambling DBA, Jonathan Kehayias does so with the benefits of attending PASS realized: “ . . . as a testament to the value of attending this conference the very first session I attended, diagnosed and provided information for a problem that has existed in one of my servers for many weeks but was impossible to diagnose unless you knew what you were looking at.”

    Thomas LaRock, SQL Rockstar concurs: “If you want to grow your skills, then you need to connect, learn, and share with others. And there is no better place to do that than at PASS.”

    Not that PASS was the only game in town. Just down the coast a bit, Baron Schwartz gives his recap of Portland OpenSQL Camp 2009.

    Selena Marie Deckelmann reports that OpenSQLCamp was awesome!

    PostgreSQL

    Pavel Stehule has some news of a longtime plpgsql misfeature removed. He writes, “plpgsql is good language – simple, robust with good error diagnostic. But it had one bizarre behave. plpgsql connects two worlds – procedural ADA like code and SQL statements. Usually there are not problems. But there are one exception – collision of identifiers.”

    In the latest in his Waiting for 8.5 series, Hubert Lubacziewski introduces and tests something new: TRIGGERS on columns.

    MySQL

    On code.openark.org, Shlomi Noach is surprised by questions or queries. He writes, “I was used to checking for the ‘questions’ global status variables . . .  So, for example, I could run com_select/questions to learn the SELECT ratio out of all queries.  . . .  Apparently, as of 5.0.72-5.0.76 & 5.1.31 this has changed. A new status variable was introduced, called ‘queries’.” What’s the difference? Is this good or bad? Shlomi and his readers kick it around.

    Venu Anuganti also was surprised, in his case by InnoDB Tablespace Corruption: “When . . . InnoDB crashes, it automatically recovers during the next start by rolling back/forward based on what was pending and un-flushed/un-committed changes at the time of crash.  . . .  [On] one of the servers; we ran out of disk space . . . on data directory . . . and server was running for few hours in this mode . . .  [It] became un-available and not responding after a while. Only option left was to kill the server process and its PID along with cleaning the stuff to get the space back. After I (re)started the server . . .  the tablespace is corrupted.”

    From Arnold Daniels comes a version of versioning MySQL data, which Arnold introduces thus: “ . . . You’re probably using a versioning control system . . . to safeguard your data. Advantages of using a VCS are that you can walk to the individual changes for a document, see who made each change and revert back to specific revision if needed. These are features which would also be nice for data stored in a database. With the use of triggers we can implement versioning for data stored in a MySQL db.” Example code follows.

    On the MySQL Perfomance Blog, Morgan Tocker qualifies his earlier piece on why you don’t want to shard: “What I didn’t mention was that if you’ve established that you will need to eventually shard, is it better to just get it out of the way early? My answer is almost always no. That is to say I disagree with a statement I’ve been hearing recently; ’shard early, shard often’

    Morgan also mentions that interviews for InfiniDB and TokuDB are next: “I’d like to announce that Robert Dempsey (InfiniDB storage engine) and Bradley C. Kuszmaul (TokuDB storage engine) have also accepted an interview. If you have any questions about either storage engine, please post them here by Friday 20th November.”

    Here’s a new blog to watch, particularly if you’re new to MySQL or Drizzle: Kent Bozlinski’s Learning Drizzle. Kent says, “I’m not really scared of rain after living in Seattle for seven years. I am a little scared of sticking my neck out and writing about something which (for the moment) I know almost nothing about.” The post is Drizzle is Scary (A Little).

    Oracle

    Maybe you’ve already heard about the fabulous unpopularity of the new My Oracle Support. Daniel Fink comes back with some data and commentary on just that, with his My Oracle Support Survey Results.

    Richard Foote asks, An index only performs how much work???, the result of looking into exactly why index rebuilds can improve performance so significantly.

    Kerry Osborne gives a lesson on fixing bad index hints in SQL Profiles (automatically). He says, “With 10g and 11g, it appears the goal [or Outlines] has swung away from the “locking” concept and towards allowing the optimizer more flexibility.  . . . I must say that I find this decision to be irritating at best.  . . .  One of the main offenders in this regard is the use of a new format available for index hints as of 10g.”

    Inside the Oracle Optimizer covers similar turf in answering to the question, What should I do with old hints in my workload?, or more specifically, “When moving from 10g to 11g, should hints in existing SQL be removed?”

    Tom Kyte wants your opinions on comparative window functions: “ . . . they could be getting better in the near future.  . . . analytics [could be allowed] to access the current row value to be compared against any other row value in a defined window.  . . .  I’ve already supplied them with my feedback (which started with “this is an awesome idea”) – and you can too – by posting it here. They’ll be checking back to see what you say.”

    If you like staying on top of fresh things, perhaps John Piwowar’s method of retrieving Oracle patches with wget would also appeal to you.

    And that is all for now. If you think I’ve missed a worthwhile DB blog from this week, please mention it in a comment.

    Log Buffer will be back in a week’s time. See you then!

    BlitzDB and Tokyo Cabinet Concurrency Model

    Posted by: Toru Maesaka, November 19, 2009 01:29 PM

    Yesterday I sat in front of a whiteboard for few hours with Mikio, the author of Tokyo Cabinet discussing/debating what the optimal concurrency model would be for BlitzDB. I think we came to a pretty good conclusion so I’m going to note it on this entry. But before I step any further, allow me to go over Tokyo Cabinet’s concurrency model.

    Tokyo Cabinet’s Concurrency Model

    Tokyo Cabinet is often quoted as “single writer, multi reader” but this is not quite true. At the time of this blog entry, this statement holds true for TC’s B+Tree database but TC’s hash database can actually allow multiple writers to update and/or delete records concurrently.

    If you look at the entry point of tchdbput(), you will notice that it is actually obtaining a reader’s lock (in terms of rwlock). TCHDB then hashes the provided key and obtains the bucket index number where the record of interest belongs to. Given the bucket/block to work on, TC then looks at the 8 most significant bits of the hash value and attempts to obtain a granular update lock from slots of 256 mutexes (2 ^ 8 = 256). So, things are still concurrent at this stage though there are some chances of collision that would block a thread.

    If a record already exists, TC will go on and happily update that block but if the record is new (as in the key doesn’t exist), TC will lock the tail block of the database and write the new record there. So, only writing a new record is treated as a single writer and the rest can be processed concurrently. This is why I said it’s not quite true.

    BlitzDB’s Concurrency Model

    Taken the above into mind, this is what BlitzDB’s concurrency model looks like:

    1. SELECT queries can run concurrently.
    2. SELECT queries are blocked when UPDATE and/or REPLACE queries are being processed.
    3. UPDATE, REPLACE, DELETE queries can run concurrently.
    4. INSERT is never disrupted by BlitzDB and scheduled by TC.

    In an ideal world, I would allow Drizzle’s worker threads to _directly_ interact with TC and let TC handle thread synchronization. This would make my life fantastically easy but unfortunately life isn’t so easy.

    For example, if a record is deleted while BlitzDB’s table scan is occurring, the table scanner will stop scanning at the position where the deleted key existed. I would not have this problem if I used TC’s native iterator but my table scan implementation uses TC’s hidden API that won’t babysit me in this regard. In return I can gain maximum concurrent read throughput from TC which was a tradeoff I happily accepted.

    So, there are several little gotchas like this which forces me to implement concurrency control in BlitzDB. Here’s how I’m planning on doing it (with demo code!).

    Implementation (with demo code)

    In the past I’ve gone through several experimental stages with BlitzDB where I used pthread’s rwlock to control concurrency. Short answer to the result is, “IT WORKS!”. However it was not taking full advantage of TC’s concurrency model.

    For example I did not want to protect UPDATE queries with a writer’s lock since it would block other UPDATE/DELETE queries. So why not protect it with a reader’s lock? The issue here is that any query that can change the state of the table cannot be processed while a scanner is running (which btw is protected by a reader’s lock). Furthermore, a non-index based update/delete means that the scanner _is_ running so there’s a problem there too.

    What I need is a scheduler that can allow multiple INSERT/UPDATE/REPLACE/DELETE queries to run when the scanner is not running. On the other hand the scheduler must allow multiple scanners to run when an UPDATE/REPLACE/DELETE queries aren’t being processed _BUT_ let INSERT queries come through to TC.

    Implementing the above is probably possible by using multiple mutexes but it would bring complexity to the codebase and possible deadlocks that can be difficult to debug. So we decided to learn from pthread’s rwlock implementation and write an original lock mechanism similar to rwlock but something that allows us to write our own rules for scheduling.

    Here’s my first attempt at a standalone sandbox of the model:

    You can compile and run it to get a grasp of how threads are coordinated:

    $ wget http://torum.net/code/cc/blitzlock.cc
    $ g++ -Wall -pedantic blitzlock.cc -lpthread && ./a.out

    If you got the program running and wondering what the output means, think of the “updater” as a thread that performs either UPDATE, REPLACE or DELETE.

    There are much more that I’d love to go on about but I think I’ve bloated this entry enough so I will save my urge for another day :)

    Nov/Dec 2009 Open Source Database Magazine Released!

    Posted by: Keith Murphy, November 19, 2009 12:54 AM

    Exciting news!

    The next issue of Open Source Database Magazine is now available. This information-packed issue has over 60 pages of information including:

    • Firebird’s Road Trip and What’s New with 2.5
    • Coding Corner: Trees – Where’s the Performance?
    • PostgreSQL’s tsvector: Secret Sauce for Search Engines
    • The Lab: The XtraBackup Program for MySQL – Part Two
    • Drizzle – A Lightweight Database for the Web
    • Kontrollbase: Enterprise grade MySQL monitoring and analytics
    • Creating a Twitter Mashup with MongoDB
    • Introducing LucidDB

    Plus the usual news and views. All of this for $4.95.  It is simply the biggest and the best issue we have ever released. Ready to sign up? Head over here to register and then you can download the new issue. Curious as to what this is all about? I just posted an online addendum to the Drizzle article in our free content section that will give you a taste of what you can expect.

    Thanks to the contributors. You all did a great job and I appreciate it!

    Sorry the website isn’t currently as polished as the magazine. While it doesn’t look pretty, it is functioning. And now that the magazine is out I can turn my attention to it again.  If you experience any problems let me know at editor AT osdbzine.net.

    thanks,

    keith

    Macro Support in new Drizzle Client Console?

    Posted by: nospam@example.com (Jay Pipes), November 18, 2009 02:36 PM

    Hi all!

    I've been reading through the requested features for the new client on the wiki here:

    I think all the stuff on that link is excellent so far. I'd also like to request a feature that I think will be a really cool timesaver for DBAs and developers using Drizzle.

    Macro Support

    Remember, "way back when" you used Microsoft Excel and were able to start recording your actions, then when you stopped recording, Excel would store a "macro" of your actions that you could subsequently replay?

    I think this would be incredibly useful for folks who do repetitive work in the console.

    Sure, I know, I know...the first reaction folks will say is "but HEY, you guys removed stored procedures!" Yeah, yeah... but the feature I'm proposing here is different from stored procedures in the following ways:

    1. It's entirely client-side. There is no server-side storage/cache, processing, parsing, or anything.
    2. It's not limited to a small subset of SQL that stored procedures (at least in MySQL) are currently limited to. Anything the new client can do would be able to go into a macro.
    3. Since the client is in Python, the macros are themselves re-writable in a scripting language. This gives the recorded macros incredible flexibility.
    4. No fussing with SQL stored procedure permissions at runtime (you know, the silly INVOKER/DEFINER crap)
    5. Ability to interact with result sets in the macro. Just try doing that easily in a SQL stored procedure. Using CURSORs is incredibly clunk and ugly. Applying a Python function or closure/lambda on each of a result set is elegant and easy.

    Imagine the following rough example interface...

    drizzle> RECORD MACRO "sales_report_with_email" (to_email);
    macro recording started.
    
    drizzle> mode python;
    in python mode.
    
    python> import datetime
    python> today= datetime.datetime.now().isoformat()
    python> filename= "%s-%s-%s" % ("sales", to_email, today)
    python> Ctrl-D
    
    drizzle> SELECT * FROM sales
             WHERE manager = @to_email; > csv(@filename);
    drizzle> mode python;
    In python mode.
    
    python> report_txt= open(filename, "r+b").read()
    python> import smtplib
    python> mailserver = smtplib.SMTP('localhost')
    python> mailserver.sendmail('theboss@company.com', to_email, report_txt)
    python> mailserver.quit()
    python> print "Mail sent to %s\n" % to_email
    python> Ctrl-D
    
    drizzle> STOP MACRO;
    Macro "sales_report_with_email" saved.
    
    drizzle> macro("sales_report_with_email", "myboss@company.com");
    Mail sent to myboss@company.com
    

    Pretty powerful, eh?

    If you follow the flow above, you will notice the only real trick to solve is passing the macro's arguments into the console's variable array, and from the console's variable array into the Python interpreter's variable scope. But this is a fairly simple problem to solve...

    Thoughts? Suggestions? If you've got comments, please feel free to share here, or on the Drizzle Discussion mailing list, or even update the wiki pages posted above. Thanks! :-)

    Tungsten Replicator and Drizzle howto

    Posted by: Marcus Eriksson (krummas@gmail.com), November 17, 2009 09:08 PM

    Last week I got a few hours to spend on making an extractor for Tungsten Replicator which works against the Drizzle transaction log. This post aims to explain how you use it to replicate changes between drizzle instances.

    To get a better understanding of the drizzle replication system, please go read the article on Jay Pipes blog, here.

    Get the code
    First, you need to check the code out from the tungsten sourceforge repo, like this:
    svn co https://tungsten.svn.sourceforge.net/svnroot/tungsten/trunk

    Then you need to download my patch, here. Unzip it in the trunk/replicator directory and apply it like this:
    patch -p0 < drizzle_support.patch

    This patch includes all dependencies and the applier I wrote about in the last post.

    Build it
    Now you need to build tungsten, change working dir to the replicator directory and write
    ant

    If you got test errors, you can run ant allExceptJunit to skip the tests (there are some environment configurations needed to get the test suite running).

    The artifacts end up in the build/ directory.

    Get and build Drizzle
    Follow the instructions on the drizzle wiki, http://drizzle.org/wiki/Compiling, just make sure that you pass --with-transaction-log-plugin when you ./configure drizzle.

    Start Drizzle
    Follow the instructions on the drizzle wiki: http://drizzle.org/wiki/Starting_drizzled and add the parameters --transaction-log-enable --default-replicator-enable when starting drizzled, otherwise you wont get a transaction log.

    Set up Tungsten Replicator
    Using the binaries built before, read the instructions for MySQL on the Continuent page: http://www.continuent.com/community/tungsten-replicator/documentation

    Then we need to do some changes to the configuration files, for the master (extractor), use this configuration file as a template. Note that you must change the path to the transaction log. On the slave (applier), use this config file

    Now you should be good to go, start your drizzle and tungsten instances and watch the changes replicate.

    Note that almost all the involved components (drizzle, drizzle-jdbc, the drizzle extractor and applier) are not recommended for production use yet.

    As always, if you have any questions, shoot me an email (krummas@gmail.com) or ping me on #drizzle @ freenode
    Learning Drizzle

    Posted by: Stewart Smith, November 13, 2009 03:41 AM

    Wanting to learn Drizzle? My friend Kent is doing just that, and documenting everything as he goes. It’s a great time to learn, as we head up to the bell milestone (which will be awesome) and big user visible changes are certainly slowing.

    So, head on over to Learning Drizzle and learn Drizzle with Kent. Should be awesome.

    Equipment and stuff…

    Posted by: Kent Bozlinski, November 13, 2009 01:36 AM

    Last week I was at a lecture Brian Aker gave to the Seattle MySQL community. During his talk I got a feel for how the team is developing Drizzle, where they stand on various operating systems, and what they are testing on.

    Drizzle, when it’s completed, will be able to run on virtually any type of system, though it is specifically designed for web servers and cloud computing. At the beginning I will be doing almost everything on a 1U 64-bit AMD Athlon 2800+ rack mounted server with two gigs of Ram. Not a beast of a machine by any measure but certainly fine for our purposes, and running Ubuntu Server on it will mean it’s pretty zippy. I also have a couple of windows machines and a server running Open-Solaris which I will be playing with. Drizzle is being tested on astonishingly powerful machines, some of which are running 64 or more processing cores and hundreds of Gigs of Ram. In other words, I will not be pushing the limits with the software just yet.

    I am fan of Linux, specifically the Ubuntu distribution, because it is easy to get the feel after switching from Windows, even for the non-hacker like myself. It is not quite the type of OS I would set my grandma up with but it’s pretty straight forward.

    Installing software on Ubuntu (I’m using 9.04) can be incredibly easy and beats a trip to the store to spend a few hundred bucks any day. While the Graphical interface in Ubuntu is pretty simple, in reality we will be using the command line for most of what we do with Drizzle. I suspect most people interested in anything resembling a database will be relatively familiar with command line, but for those of you who are new to that as well I will be making the commands very specific. When in doubt, copy and paste; but please don’t forget to change things like <username> to your own username. It won’t work too well otherwise.

    Another piece of the puzzle we will need is a sample database. If you have one you want to use that’s great, but I was pointed in the direction of the Sakila sample database, often used for testing and practice with MySQL. After I get a feel for things I will actually use some free US environmental databases I downloaded for a project and see about real world implementation.

    The next post will actually include the installation of Drizzle! I promise! I learned a few new things about compiling and dependencies, and made a couple stumbles.

    libdrizzle 0.5 source tarball has been released

    Posted by: lbieber, November 11, 2009 05:00 AM

    libdrizzle 0.5 source tarball has been released. This release contains the following updates:

  • Added new logging functions
  • Added more test cases
  • Cleaned up API, added more API and Development documentation
  • Updated autoconf build system
  • The download file and change log can be found here.

    Sneak Peek - Drizzle Transaction Log and INFORMATION_SCHEMA

    Posted by: nospam@example.com (Jay Pipes), November 10, 2009 10:30 PM

    I've been coding up a storm in the last couple days and have just about completed coding on three new INFORMATION_SCHEMA views which allow anyone to query the new Drizzle transaction log for information about its contents. I've also finished a new UDF for Drizzle called PRINT_TRANSACTION_MESSAGE() that prints out the Transaction message's contents in a easy-to-read format.

    I don't have time for a full walk-through blog entry about it, so I'll just paste some output below and let y'all take a looksie. A later blog entry will feature lots of source code explaining how you, too, can easily add INFORMATION_SCHEMA views to your Drizzle plugins.

    Below is the results of the following sequence of actions:

    • Start up a Drizzle server with the transaction log enabled, checksumming enabled, and the default replicator enabled.
    • Open a Drizzle client
    • Create a sample table, insert some data into it, do an update to that table, then drop the table
    • Query the INFORMATION_SCHEMA views and take a look at the transaction messages and information the transaction log now contains

    Enjoy! :-)

    jpipes@serialcoder:~/repos/drizzle/replication-group-commit/tests$ ./dtr --mysqld="--default-replicator-enable"\
     --mysqld="--transaction-log-enable"\
     --mysqld="--transaction-log-enable-checksum"\
     --start-and-exit
    <snip>
    Servers started, exiting
    jpipes@serialcoder:~/repos/drizzle/replication-group-commit/tests$ ../client/drizzle --port=9306
    Welcome to the Drizzle client..  Commands end with ; or \g.
    Your Drizzle connection id is 2
    Server version: 2009.11.1181 Source distribution (replication-group-commit)
    
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
    
    drizzle> use test
    Database changed
    drizzle> CREATE TABLE t1 (   id INT NOT NULL PRIMARY KEY , padding VARCHAR(200) NOT NULL );
    Query OK, 0 rows affected (0.01 sec)
    
    drizzle> INSERT INTO t1 VALUES (1, "I love testing.");
    Query OK, 1 row affected (0.01 sec)
    
    drizzle> INSERT INTO t1 VALUES (2, "I hate testing.");
    Query OK, 1 row affected (0.01 sec)
    
    drizzle> UPDATE t1 SET padding="I love it when a plan comes together" WHERE id = 2;
    Query OK, 1 row affected (0.01 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    drizzle> DROP TABLE t1;
    Query OK, 0 rows affected (0.17 sec)
    
    drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG\G
    *************************** 1. row ***************************
             FILE_NAME: transaction.log
           FILE_LENGTH: 639
       NUM_LOG_ENTRIES: 5
      NUM_TRANSACTIONS: 5
    MIN_TRANSACTION_ID: 0
    MAX_TRANSACTION_ID: 9
     MIN_END_TIMESTAMP: 1257888458463696
     MAX_END_TIMESTAMP: 1257888473929116
    1 row in set (0 sec)
    
    drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG_ENTRIES;
    +--------------+-------------+--------------+
    | ENTRY_OFFSET | ENTRY_TYPE  | ENTRY_LENGTH |
    +--------------+-------------+--------------+
    |            0 | TRANSACTION |          141 | 
    |          141 | TRANSACTION |          121 | 
    |          262 | TRANSACTION |          121 | 
    |          383 | TRANSACTION |          181 | 
    |          564 | TRANSACTION |           75 | 
    +--------------+-------------+--------------+
    5 rows in set (0 sec)
    
    drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG_TRANSACTIONS;
    +--------------+----------------+-----------+------------------+------------------+----------------+------------+
    | ENTRY_OFFSET | TRANSACTION_ID | SERVER_ID | START_TIMESTAMP  | END_TIMESTAMP    | NUM_STATEMENTS | CHECKSUM   |
    +--------------+----------------+-----------+------------------+------------------+----------------+------------+
    |            0 |              0 |         1 | 1257888458463668 | 1257888458463696 |              1 | 3275955647 | 
    |          141 |              7 |         1 | 1257888462222183 | 1257888462226990 |              1 |  407829420 | 
    |          262 |              8 |         1 | 1257888465371330 | 1257888465378423 |              1 | 4073072174 | 
    |          383 |              9 |         1 | 1257888470209443 | 1257888470215165 |              1 |   92884681 | 
    |          564 |              9 |         1 | 1257888473929111 | 1257888473929116 |              1 | 2850269133 | 
    +--------------+----------------+-----------+------------------+------------------+----------------+------------+
    5 rows in set (0 sec)
    
    drizzle> SELECT PRINT_TRANSACTION_MESSAGE("transaction.log", ENTRY_OFFSET) as trx 
           > FROM INFORMATION_SCHEMA.TRANSACTION_LOG_ENTRIES\G
    *************************** 1. row ***************************
    trx: transaction_context {
      server_id: 1
      transaction_id: 0
      start_timestamp: 1257888458463668
      end_timestamp: 1257888458463696
    }
    statement {
      type: RAW_SQL
      start_timestamp: 1257888458463676
      end_timestamp: 1257888458463694
      sql: "CREATE TABLE t1 (   id INT NOT NULL PRIMARY KEY , padding VARCHAR(200) NOT NULL )"
    }
    
    *************************** 2. row ***************************
    trx: transaction_context {
      server_id: 1
      transaction_id: 7
      start_timestamp: 1257888462222183
      end_timestamp: 1257888462226990
    }
    statement {
      type: INSERT
      start_timestamp: 1257888462222185
      end_timestamp: 1257888462226989
      insert_header {
        table_metadata {
          schema_name: "test"
          table_name: "t1"
        }
        field_metadata {
          type: INTEGER
          name: "id"
        }
        field_metadata {
          type: VARCHAR
          name: "padding"
        }
      }
      insert_data {
        segment_id: 1
        end_segment: true
        record {
          insert_value: "1"
          insert_value: "I love testing."
        }
      }
    }
    
    *************************** 3. row ***************************
    trx: transaction_context {
      server_id: 1
      transaction_id: 8
      start_timestamp: 1257888465371330
      end_timestamp: 1257888465378423
    }
    statement {
      type: INSERT
      start_timestamp: 1257888465371332
      end_timestamp: 1257888465378422
      insert_header {
        table_metadata {
          schema_name: "test"
          table_name: "t1"
        }
        field_metadata {
          type: INTEGER
          name: "id"
        }
        field_metadata {
          type: VARCHAR
          name: "padding"
        }
      }
      insert_data {
        segment_id: 1
        end_segment: true
        record {
          insert_value: "2"
          insert_value: "I hate testing."
        }
      }
    }
    
    *************************** 4. row ***************************
    trx: transaction_context {
      server_id: 1
      transaction_id: 9
      start_timestamp: 1257888470209443
      end_timestamp: 1257888470215165
    }
    statement {
      type: UPDATE
      start_timestamp: 1257888470209446
      end_timestamp: 1257888470215163
      update_header {
        table_metadata {
          schema_name: "test"
          table_name: "t1"
        }
        key_field_metadata {
          type: INTEGER
          name: "id"
        }
        set_field_metadata {
          type: VARCHAR
          name: "padding"
        }
      }
      update_data {
        segment_id: 1
        end_segment: true
        record {
          key_value: "2"
          key_value: "I love it when a plan comes together"
          after_value: "I love it when a plan comes together"
        }
      }
    }
    
    *************************** 5. row ***************************
    trx: transaction_context {
      server_id: 1
      transaction_id: 9
      start_timestamp: 1257888473929111
      end_timestamp: 1257888473929116
    }
    statement {
      type: RAW_SQL
      start_timestamp: 1257888473929113
      end_timestamp: 1257888473929115
      sql: "DROP TABLE `t1`"
    }
    
    5 rows in set (0.06 sec)
    

    FYI, if you look closely, you'll see some odd things — namely that there is a transaction with an ID of zero. I'm aware of this and am working on fixing it :-) Like I said, I'm almost done coding...

    S3 Storage Engine with Memcached in Drizzle

    Posted by: posulliv, November 10, 2009 01:57 AM

    Previously, I had ported Brian’s memcached engine to Drizzle and rencently, I’ve been doing some work with Amazon’s S3 for school. Thus, I decided to have a look at Mark’s S3 storage engine for MySQL. Over the last 2 days, I created a new version of the S3 storage engine for Drizzle with the option to use Memcached as a write-through cache for the S3 backend store. I see this work more as showing the cool things we can do in Drizzle and how quickly we can get prototypes up and running. I don’t even know if this is a good idea or anything but its cool to be able to store all data in S3.

    First, lets see how to create a table with this engine. The one constraint on tables created with this engine is that they need to have a primary key specified on the table. Each table that is created in this engine is represented as a bucket in S3. So whenever you create a table with this engine, you create a bucket in S3. So lets try creating a table:

    drizzle> create database demo;
    Query OK, 1 row affected (0 sec)
    
    drizzle> use demo;
    Database changed
    drizzle> create table padara (
        -> a int primary key,
        -> b varchar(255),
        -> c varchar(255)) engine=mcaws;
    ERROR 1005 (HY000): Can't create table 'demo.padara' (errno: 1005)
    drizzle>
    

    Lets get some more information on why that table creation failed:

    drizzle> show warnings;
    +-------+------+-------------------------------------------------------------------------------------+
    | Level | Code | Message                                                                             |
    +-------+------+-------------------------------------------------------------------------------------+
    | Error | 1005 | Amazon S3 Connection Pool has not been created (Did you specify your credentials?)
     |
    | Error | 1005 | Can't create table 'demo.padara' (errno: 1005)                                      |
    +-------+------+-------------------------------------------------------------------------------------+
    2 rows in set (0 sec)
    
    drizzle>

    As you see, we need to specify our Amazon AWS access credentials before we can utilize this store engine. For the moment, I have the following system variables associated with this plugin:

    drizzle> show variables like '%AWS%';
    +-----------------------+-------+
    | Variable_name         | Value |
    +-----------------------+-------+
    | mcaws_accesskey       |       |
    | mcaws_mcservers       |       |
    | mcaws_secretaccesskey |       |
    +-----------------------+-------+
    3 rows in set (0 sec)
    
    drizzle>

    So I set the AWS access credentials by setting the appropriate system variables (this has to be done before tables can be created with this engine and in this order):

    drizzle> set global mcaws_accesskey = 'YOUR_ACCESS_KEY';
    Query OK, 0 rows affected (0 sec)
    
    drizzle> set global mcaws_secretaccesskey = 'YOUR_SECRET_ACCESS_KEY';
    Query OK, 0 rows affected (0 sec)
    
    drizzle> show variables like '%AWS%';
    +-----------------------+------------------------------------------+
    | Variable_name         | Value                                    |
    +-----------------------+------------------------------------------+
    | mcaws_accesskey       | YOUR_ACCESS_KEY                     |
    | mcaws_mcservers       |                                          |
    | mcaws_secretaccesskey | YOUR_SECRET_ACCESS_KEY |
    +-----------------------+------------------------------------------+
    3 rows in set (0 sec)
    
    drizzle>

    Before creating the table, lets look at what buckets are associated with my S3 account. I’m going to use the S3Fox firefox plugin for this (there is multiple other things you could use). Here are the buckets in my S3 account right now:

    I just have the one bucket for now. Now, I create a table using the S3 engine after specifying my AWS credentials:

    drizzle> create table padara (
        -> a int primary key,
        -> b varchar(255),
        -> c varchar(255)) engine=mcaws;
    Query OK, 0 rows affected (0.31 sec)
    
    drizzle>

    and when I look at my buckets in S3, I should see a new bucket representing the new table I created:

    As can be seen, the bucket name is the database name concatenated with the table name - ‘databasetable’. Next, lets insert some rows in the table and then see what objects are in the bucket:

    drizzle> insert into padara
        -> values (1, 'padraig', 'sullivan');
    Query OK, 1 row affected (0.07 sec)
    
    drizzle> insert into padara
        -> values (2, 'domhnall', 'sullivan');
    Query OK, 1 row affected (0.08 sec)
    
    drizzle> insert into padara
        -> values (3, 'tomas', 'sullivan');
    Query OK, 1 row affected (0.14 sec)
    
    drizzle>

    Now we can query the table. Queries on the table need to specify a primary key value in the WHERE clause for now so we will just be returning one row (I’ll be looking into range queries pretty soon):

    drizzle> select *
        -> from padara
        -> where a = 2;
    +---+----------+----------+
    | a | b        | c        |
    +---+----------+----------+
    | 2 | domhnall | sullivan |
    +---+----------+----------+
    1 row in set (5 sec)
    
    drizzle>

    That’s basically the simple S3 engine. It works just like a regular storage engine except the data is stored on S3. Of course, the latency involved in interacting with S3 for every request can be quite limiting. For example, the simple query above took 5 seconds to retrieve the data. Thus, I added support for using memcached as a write-through cache for this engine. All we need to do is specify the memcached servers to use in the appropriate system variable:

    drizzle> set global mcaws_mcservers = 'localhost:19191';
    Query OK, 0 rows affected (0 sec)
    
    drizzle>

    Now, whenever we query a table created in this engine, we will check for the data in memcached first and if we miss in the cache, only then do we go to S3 for the data. When inserting new data, we insert it in both memcached and S3. Using memcached for this engine is totally optional. It can simply be used as a way to store data in S3 through the engine interface but I thought it might prove to be a useful option for an engine like this.

    I wanted to show how clean the code to implement the functionality to do this in the plugin is. This goes to show the benefit of the great build system Monty Taylor has put a lot of work in to in Drizzle. I can easily utilize external libraries in my plugin - in this case libmemcached and libaws. The code below first checks for data in memcached and if it is not present there, retrieves the data from S3 and updates memcached before returning to the engine.

    So thats about it for now. In the future, there are a few things I plan on working on for this engine:

    • removing the need to have a table represented as a bucket in S3 (this design makes the code much simpler for now)
    • increasing the size of the objects transferred from/to S3 - make the unit of transfer between the engine a page instead of a row as it is now
    • create I_S tables for monitoring S3 usage
    • add support for range queries
    • remove the need for a table to have a primary key

    If you are interested in downloading the branch and playing with it, you can get it and build it by:

    $ bzr branch lp:~posulliv/drizzle/aws-mc-engine
    $ cd aws-mc-engine
    $ ./config/autorun.sh && ./configure && make

    libmemcached and libaws are prequisites that you will need installed before compiling this plugin.

    If anyone has any feedback or suggestions on what to do with this, that would be awesome. I really have no idea what to do with it!

    Drizzle source tarball 1208 has been released

    Posted by: lbieber, November 09, 2009 11:14 PM

    Drizzle source tarball based on build 1208 has been released. This update continues to provide a lot of code clean up and refactoring as well as several more updates to the Replication infrastructure from Jay. Make sure and take a look at Jay’s recent blogs about his updates.

    The download file and change log can be found here.