Category Archives: Uncategorized

Reverse Sharding (unsharding).

Sharding seems to be all the rage these days (as well as NoSQL). Unfortunately, reverse sharding is never addressed. What happens when you’ve sharded your application into 10 different servers and usage goes down? You’re stuck paying for 10 servers!

Understandably, everyone is an optimist in this field, but realistic business practices require cost savings which can only be realized via unsharding.  Be aware that unsharding, is not the same as merging.  If you’ve sharded by uid, and you can effectively dump all that data back into the main db, then lucky you, skip this article.   Unfortunately, sharding in the field sometimes consists of binary duplication of data from the original db, so doing a direct sql merge is not always possible and doing it programatically with huge databases will result in high loads and a very lengthy merge.

I will show you a quick and dirty way to unshard, while still retaining individual shards, and, most imporantly, retaining the ability to re-shard again.  This little tutorial assumes you’re an old hand at setting up db replication.  If not, brush up on it NOW.

We’ll start with a scenario where an application has been sharded once.  It doesn’t really matter how it was sharded, by pk or some other way.  In fact, the tables don’t even have to match!

The players:  MariaDB ServerA , MariaDB ServerB

Each db server contains the db ‘myapp’.  Your goal is to unshard the db unto ServerB (it’s arbitrary where it goes..).

On ServerA:

time mysqldump –single-transaction –master-data=2 myapp > ./myapp_sa.sql

<… some time goes by …>

copy ‘myapp_sa.sql’ to ServerB.

On ServerB:

mysqladmin create myapp_sa

time mysql myapp_sa < ./myapp_sa.sql

<… more time goes by …>

At this stage you’ve dumped and imported the db’s.  Now comes the crucial part, the most sensitive part.  The part the begs the question, have you made backups?  Have you punk?  Huh?  I hope you have, because if you make a typo, you’re SOL without backups.

On ServerB edit /etc/my.cnf and make sure the following are added:

log-slave-updates

replicate-rewrite-db=myapp->myapp_sa
replicate-do-db=myapp_sa

You will need to restart MariaDB/Mysql at this point.   Remember, you have NOT enabled replication from ServerA to ServerB at this point.  HAVE NOT, as in DO NOT.   If you skip any steps, do so at your own peril.

Now that’s you’ve restarted MariaDB, login and check to make sure everything is halal:

MariaDB [(none)]> show variables like ‘log_slave_updates’\G
*************************** 1. row ***************************
Variable_name: log_slave_updates
Value: ON
1 row in set (0.00 sec)

What’s that?  “show variables like ‘repli%’\G” doesn’t bring up the replication variables?  That’s correct.  Apparently, the rewrite occurs at a level below Mariadb, so the db doesn’t actually see these variables.  There’s no way to check to see if it’s working.  If you’re unsure, restart your db.  Make SURE that you have not misspelled the db names.  If you have, you’ll corrupt your primary shard!

Now that you’ve restarted MariaDB, enable replication from ServerA -> ServerB.  Let them catch up (you should be able to see via innotop that queries are going into the _sa table), or you can mysqlbinlog the master log to see what’s what.  Log_slave_updates is not crucial btw.  It assumes you have a slave server on ServerB, as well you should (for snapshots, etc.  More on this in a later post).

When they catch up, point the application’s shard to ServerB.myapp_sa instead of ServerA.myapp and you’re golden.

Step 2. Now get rid of ServerA!

Step 4. Profit!

Enjoy your cost savings!

Advertisements

Alas poor Mysql, I knew thee well.

My work often requires me to squeeze blood out of stones so to speak, on the Amazon cloud. I’ve been doing AWS with Rightscale for over a year now and it seems a good fit. We’re pushing massive amounts of data and are always looking for ways trim things down.

One area in particular where I was discussing with the Rightscale folks was MySQL. Apparently, they rolled out some alpha 5.0 and 5.1 templates with Percona patches. This was great since it was a definite speed boost. But they had no 32bit version, and that was a real pain. I spoke to Ryan over at Percona and he suggested I give the OurDelta MariaDB builds a try.

For those of you who don’t really know what it is, head over to http://askmonty.org/wiki/MariaDB and read up on it. But all you really need to know is that it’s a direct, drop-in replacement for mysql (including binary compatibility with innodb aka XtraDB). It’s like mysql on crack, but without all the side effects, stigma and jail time.

To make a long story short.. we’re almost 100% on MariaDB now and haven’t looked back. I grabbed one of the 5.1 percona templates from Rightscale. Tore out their mysql version, put in MariaDB (both 64bit and 32bit), added our own optimizations and viola! We went from running our db’s on Extra Large Memory instances, to Large Memory instances, with snaps being on Computational mediums instead of Large Memory instances.

The main issue with the XL/L memory instances is that you’re pretty much stuck with the same io anyway. With LVM all the ephemeral stores are striped and dm0 will get io saturated on an XL waaay before the ephemeral volumes. This is why with i/o bound mysql issues, it won’t help you to move up instances. You need to reduce io and shard. Which brings me to MariaDB. By reducing down to a Large, it’s now more economical to shard. With 2 large instances you’re getting slightly more ram then an XL and twice the i/o if you shard your data.

Now the OurDelta folks are pretty awesome, and the fact that Percona supports their builds is pretty awesome as well, considering MariaDB is being worked on by Monty. I predict a steady decline of mysql in Oracle’s fine hands and MariaDB filling that gap quite nicely.

So you want to try MariaDB you say? Perfect! My Linux of choice is CentOS on the server. Ubu.. what, where, how? 😉
You’ll want to head on over to http://ourdelta.org/centos. If you want the latest MariaDB though, don’t follow their instructions. It’s been months and they haven’t updated their links yet. That’s cool though, because you’ve found my blog, and I’ll tell you where to get the goods from:

32bit MariaDB 5.1.42 can be found here:
http://master.ourdelta.org/yum/CentOS-MariaDB51/5/i386/RPMS/

and 64bit here:
http://master.ourdelta.org/yum/CentOS-MariaDB51/5/x86_64/RPMS/

Their latest builds should be direct replacements of mysql, but you may want to remove your mysql install first nevertheless.

You can even drop a mariadb 5.1.42 on top of a mysql 5.0 binary tree for a quick upgrade. (Make damn sure you backup first, and make sure the ibdata file sizes match). run mysql_upgrade, restart, and you’re ready to roll.

If you peruse the Percona websites, you’ll find optimization goodies and other info. You’ll definitely want to read this http://www.percona.com/docs/wiki/percona-xtradb:start, and afterwards head on over to http://www.mysqlperformanceblog.com. Percona people and users post a lot of details on http://www.mysqlperformanceblog.com and optimization options. It’s an invaluable resource.

Being able to halve our instance costs is pretty sweet. MariaDB’s performance is out of this world.

Oh Maria!

NoSQL? Not yet.

Have been doing research on NoSQL db’s. Installed CouchDB, MongoDB, Riak, etc. Here are some comparisons from all the reading I’ve done. Not everything has a reference, most are things I gleaned from various postings as well as personal experience.

The crappy chart below is exported from Google Docs.

DB Comparison

MongoDB CouchDB Riak MckoiDDB Cassandra

Url http://www.mongodb.org http://couchdb.apache.org http://riak.basho.com http://www.mckoi.com http://incubator.apache.org/cassandra

Insert Speed (Bulk Insert) Very fast(3.5M docs is 20-30min) Very Slow1500 documents/sec [1] Unknown ?

Select Speed Fast http overhead fast ?

Datasize <400G ok <2.5G ok Backend related ?

Language C Erlang Erlang Java Java

Negatives

BIGINT broken, use as string Slow/Unoptimised `Enterprise’ backend not OSS/Free. Almost no info/resources available Beta/AlphaStill in incubator status

DB repair may take hours due to copy/recopy of all data files. Must create ‘views’ to use. Views may take hours to generate on a lot of data. See thoughts below on ETS & DETS. Where are the community pages, groups, etc?

No master<->master replication not officially supported. REST Only Pretty much only a Java API

Needs a stack to run

Positives

Fast+Native Driver Automatically distributed, eventually consistant Highly distributed Low latency queries

At least 4x faster then Mysql. Support multi-master defacto. Pluggable backend

Can fall-back to REST

.

Refences

.

.

http://journal.uggedal.com

.

http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance

.

http://johnpwood.net/2009/06/30/couchdb-databases-and-documents/

.

http://userprimary.net/user/2007/12/16/a-quick-look-at-couchdb-performance/

.

http://groups.google.com/group/mongodb-user

.

http://groups.google.com/group/mongodb-user/browse_thread/thread/e50dc69c33c11fd4

.

http://groups.google.com/group/mongodb-user/browse_thread/thread/fe05de77d93905f4

.

http://www.reddit.com/r/programming/comments/7f7m9/why_couchdb_sucks/

.

http://blog.boxedice.com/2009/07/25/choosing-a-non-relational-database-why-we-migrated-from-mysql-to-mongodb/

.

http://community.basho.com/Basho_Riak_EnterpriseDS_Fact_Sheet.pdf

.

http://swik.net/scalability+Database

.

.

Thoughts

.

An Erlang bridge from Riak to a MongoDB would be nice and would take care of the master<->master issue.

.

I have not seen performance numbers for Riak.. and in fact.. the info I googled was mysteriously missing from Basho’s site.

.

ETS & DETS SUCK