powered by STREAMPAD
Click to launch FredWilson.FM music player

« Another Good Reason To Get HD Radio | Main | Lifechanger - Voicemail Transcription »

Looking For MySQL Gurus

One of the many forms of growing pains our companies face are database performance issues. This is particularly a problem with many of the "web 2.0" companies that get started on a shoestring and once they start to scale up, they get hit with downtime, slow loads, etc. And it's usually a database issue at the core that's creating the problem. And it's usually a MySQL database that they are using.

So after seeing this happen to many of our companies, I've decided it's time that we had some database experts we can call on to help. MySQL experience is particularly helpful.

As nine out of our ten portfolio companies are in NYC, a local person would be really great.

If you think you can help or know someone you'd like to recommend, please send us an email.

Comments (23) | Posted March 1, 2007 in Venture Capital and Technology

Comments

If you have SQL perfomance issues then you need to profile your database requests. Usually you'll find that you're missing an index on say, an email address, which you're looking up 100,000 times a day on a half-million records and without an index it's doing a table scan each and every time.

And make sure your web people know what they're doing. All too often small companies use a self-taught PHP/NET/whatever developer who learned "everything" he needed to know in a couple of weeks by visiting a web site or two and by playing with the code... and who knows nothing about setting up proper table relationships and indices and foreign keys and triggers and stored procedures and...

Posted by: Michael Long | Mar 1, 2007 8:31:18 PM

While what Michael said is true, with mysql especially, experience is far more important than whether someone is "self-taught" or trained. There are many edge cases in which you have to veer from the "right" path, in order to get the best performance out of your database.

Table types, too many/too few indexes, my.cnf settings, excessive joins, etc. And there are lots of little differences between 3.x, 4.x, and 5.x.

Anyway, anyone can feel free to hit me up for questions, I've been using mysql in some pretty intense applications since '99. I'm a little out of day-to-day optimization mode, but happy to share whatever I know.

Posted by: Daryn | Mar 1, 2007 8:57:03 PM

Of course the best thing you can do for a company that is hoping a MySQL guru can save them from their performance woes as they scale up is... move them to Postgres.

Posted by: Nic Wolff | Mar 1, 2007 9:09:18 PM

MySQL's query caching could be much better. Replacing it with a generic object memory cache like memcached is a sure fire way to increase performance with database driven applications.

Posted by: Raj | Mar 1, 2007 9:12:21 PM

The comments on this post make me very very glad this site is answertips-enabled :)

Posted by: Andy Swan | Mar 1, 2007 9:27:02 PM

This is a case where your hosting provider can be invaluable if you choose the right one. Rackspace has been a huge help for us. Our entire team is extremely MySQL proficient but some of these guys still school us (Sergio Murillo in particular) when it comes to scaling database processes. And we don't have to "hire" any of them (but we would.)

Posted by: aaron | Mar 1, 2007 10:02:15 PM

/me votes +1 for memcached and +1 for Michael Long's profiling suggestions.

Generally you can fix scaling problems with some amount of profiling and caching, either caching at the database level, or higher up. There are exceptions to this, but your problem will have to be pretty special not to benefit from either of those tactics.

Posted by: Charles | Mar 2, 2007 3:29:01 AM

One area that may be worth looking at is page caching. The payback can be far greater than database optimisation. A lot of sites with dynamic pages do not send out any caching information to the browser, nor do they use a proxy, like squid, to sit in front of the application servers, to cache pages.

In my experience it is quite easy to get speed uplifts, and reductions on database loads, by many factors.

What I have done in the past is write a simple plug-in to the web page that sends the proxy the correct cache information. The plug-in understands what database tables have changed and are likely to change.

A very good example of this approach is Wikipedia.

Often with trying to optimise the database one has to rewrite the application. With the page caching technique the application does not need to be rewritten and solution is very scaleable.

It also enables the firm to easily geographical spread server farms in future to reduce latency for foreign markets. As the squid proxies can be moved to the edge.

I have built these types of solutions, mainly for mobile applications where page caching is extremely important, because of the latency of the connection and speed of the mobile application. I also have built a similar system for a flight booking system.

This technique mainly works with Database reads, but most applications read the database many time more than they write.

With database writes there are other solutions if they are needed.

Get in touch with me if you want me to look at the sites, and see if this sort of approach would be effective. There is some need for experience as both squid and the browsers do not always handle caching in the most logical way.

Posted by: James | Mar 2, 2007 6:10:29 AM

We're also having a hell of a time finding a DB "Guru", especially one willing to focus on MySQL for the time being. Problem is that people see themselves as DBAs, Architects, or Engineers. What we need for our mobile media startup here in Boston is someone who capable of performing ALL those functions, who also happens to be platform agnostic. When we have been lucky enough to encounter someone qualified to wear all these hats, they generally see themselves as Oracle specialists. Our ideal candidate would be a generalist who could oversee a future transition to Oracle after our pilot program(s).

There's also a cultural issue... it's harder to find DB gurus who understand/appreciate the startup culture.

The waters are no warmer up here in Boston. Good luck.

Posted by: Waldron Faulkner | Mar 2, 2007 11:20:19 AM


I'm also a fan of de-normalizing data in today's world of inexpensive storage resources. It's low hanging fruit in terms of performance gains.

Posted by: Raj | Mar 2, 2007 12:49:09 PM

2nd the caching- using something like memcached can massively reduce the load on the database in many applications, in addition to the view caching described by James above. A MySQL consultant can help with setting up clusters, physcial distribution, memory utilization, etc. but don't mess with that until you have the database design and caching in the application set up correctly.

Posted by: matt m | Mar 2, 2007 3:38:18 PM

Here's a few easy ways to improve performance of MySQL for large scale sites:

1. Check out the peformance tuning tips on free web seminars by MySQL at http://www.mysql.com/news-and-events/on-demand-webinars/

2. Read some of the free articles on performance tuning by MySQL at http://dev.mysql.com/tech-resources/articles/

3. Engage the MySQL consultants at www.mysql.com. Typically they are able to double performance of most sites in a matter of days. More info at http://www.mysql.com/consulting/

4. Use MySQL Enterprise monitoring & advisory service. This is a rules-based system that enables you to monitor MySQL performance and identify possible bottlenecks based on best practices rules. More info at http://www.mysql.com/enterprise

5. Attend the MySQL Conference, April 23-26, Santa Clara
Ok, you'd have to travel from NY, but there are plenty of tutorials and sessions on performance tuning, management, monitoring including presentations from Google, Yahoo, Alcatel and others. More info at http://www.mysqlconf.com.

I hope this is helpful.

--Zack

Posted by: ZUrlocker | Mar 2, 2007 5:38:47 PM

Nic: "Of course the best thing you can do for a company that is hoping a MySQL guru can save them from their performance woes as they scale up is... move them to Postgres."

A poorly coded app will perform even worse on Postgres than on MySQL. Postgres has a lot of advantages, but they're not generally 'out of the box' ones.. you have to know what you're doing. And if these supposedly tech people can't understand how even a simplistic database like MySQL works, there's more work to do first ;-)

Posted by: Peter Cooper | Mar 3, 2007 1:11:51 AM

@Raj memcache can be very effective but needs more of a redesign than page caching views. For the non technical memcache puts a cache between the program and the database and is used by Wikimedia, Facebook, Digg, etc... The challenge with memcache is that the program has to decide what to put into the cache, and what to expire. So requires some effort on design and programming to get to work.

  1. If I was having mysql load problems I would first look at simple optimizations of the database. Not a redesign or denormalising the tables. I would just look at indexes, and storage
  2. Look at page caching, as mentioned above in a previous post
  3. Then look at memcache

@Waldron Faulkner I would just get your application out to market. If the database hits performance issues then I would follow the list I outlined above. A good DBA who really understands getting performance out a database costs allot of money, and you probably only need them for a short time period. A good dba can optimise your database and help on the design of the database structure in a week or so. The challenge is that if the application has already been written it can be a massive task to rewrite the whole program to use the database effectively.

The challenge with Web 2.0 is most of the applications use a Relational database. Relational Database are great because of the simplicity and the wide number of people who understand the basics. The problem is that Relational Databases have performance issues. Often a non Relational database is far better option. If Google used a relational design, queries would take for ever. All memcache is doing is putting a simple flat database in front of MySql.

Posted by: James | Mar 3, 2007 7:01:45 AM

You should really consider a remote DBA service. We have had great luck in using an expert DBA to tune our databases, yet like most companies, we do noth have enough work for a full time dba. I suspect a fulltime person at a small company would get bored, or not be of a high enough caliber to do the great job you need. If you would like I can make a few suggestions. Your portfolio companies should spend thier precious capital on full time developers and a part time expert dba.

Posted by: Dan Cornish | Mar 3, 2007 8:50:15 PM

I actually talked a bit with Andrew about this, after finding out about outside.in on his blog and making the comment that it looked like a great service but wasn't responding quickly.

A lot can be gained by rewriting the application, checking the database for proper indexing, and caching.

If you can cache full pages, use a service such as Panther Express, which is like Akamai but less expensive, and also a NY-based startup. When using full page caching, if you have any dynamic elements on the page that vary user-by-user, you have to rewrite those using Ajax to issue another request back to the server to get the user-specific HTML. In other words, you lose most of the benefits of server sessions. I'd also consider pre-caching key pages on the server, updating them as new blog posts come in. You can combine both of these methods, with Panther Express handling 95% of the site's traffic, but also pre-caching the most-changed pages right on the server so that when Pather Express issues a refresh request, the server can serve it instantly.

In MySQL, make sure all the columns are properly indexed, take up as few bytes as possible, avoid varying length rows, and consider denormalization based on the query profiling data. Make sure identical queries use absolutely identical SQL text. This can be a problem in web apps that assemble their queries dynamically, and results in cache misses when the queries could have been split up.

A more long-term issue is that this site is archiving all of the blog entries, so the tables will grow huge, to tens of millions of blog posts. These tables will take a long time to perform queries or joins on. One idea worth exploring is to integrate a fulltext search engine like Lucene on queries where it has better performance. Indeed.com is basically doing the same thing with jobs as outside.in with blog posts, indexing them by "what", "when" and "where", and their site is ligntning fast. I'd check out how they've implemented Lucene, and see how this can be applied here.

Posted by: Lee Semel | Mar 5, 2007 9:30:17 AM

The obvious choice here is to get away from the bogus software (in this case MySql) and play with the real tools (in this case MS Sql). It might cost a bit more up front, but you get what you pay for.

Posted by: Brad | Mar 5, 2007 2:05:44 PM

I was about to post a pro-MS SQL message for some light relief then I noticed Brads so I guess it's redundant and I try not to flame bait unless I'm in a really silly mood ;)

On my resume is a 6 million+ page impression/day site (all dynamic, data driven pages) sitting on a single 64 bit SQL Server instance so I can attest to its performance.

However the hard lessons I learnt along the way about making sure you optimize queries, indecies and really look at locking and other contention issues are equally applicable.

Server optimization, smart caching strategies and attention to detail are also important factors.

Too often the DB guru is an overlooked and under-rewarded hero because unlike the UX designers people only notice their efforts when something goes wrong!

Posted by: offbeatmammal | Mar 6, 2007 2:06:06 AM

Just to add to the debate between Offbeatmamal and Brad there is a challenge with all relational databases, mainly because they are often been used for what they are not designed for. The reason that so much of the advice in this post points to memcache is that it is a very simple flat database, and therefore very fast. Relational databases are very good at storing relational information, securely, and robustly, but this comes with a penalty namely performance.

Db2, Oracle, and Ms Sql are all probably better database than MySql, not due to number of queries that they can handle per minute for a given CPU, but because of there robustness in dealing with data, and also there replication and ability to scale over many machines. My guess is that in terms of reads per minute MySql would win the race. But MySql lacks many of the features of the other databases. All of there performance would be a magnitude slower than a flat database like Berkley db. If Oracle was so fast why did Oracle buy Berkley!

Do I use Relational Database – yes - because Relational Databases are very fast to write applications for. It may be the case that an Object or Flat Database would make more sense in a pure design sense. But having just come off a project using an Object database I can attest that just having the tools that come with most Relational Database’s speeds up development by a massive magnitude.

Posted by: James | Mar 6, 2007 6:23:17 AM

MySQL Guru in the boston area: Justin Buser justin@sohoxdesign.com

Posted by: Luke Archer | Mar 6, 2007 11:25:11 AM

@James- (sorry about taking this a little off topic) Although I spent more years working in relational databases than I care to count (from DB/2 on an IBM 43xx and s390 mainframes, through Sybase and Oracle on various platforms, mySql and of course MS SQL) one of the smartest databases I ever came across was Datacom/DB from a company called ADR (bought and mothballed by CA).

Even now (especially for web apps) there were a lot of things that the inverse hierarchical approach offered that SQL still doesn't make easy or optimise well.

I agree wholeheartedly that there are possibly better solutions out there, but SQL has a huge base of skilled resource, and as we've seen from the posts above a lot of experience on making it work better - which makes it a perfect platform for a startup to get up to speed on. Of course if they strike lucky and have to cope with growth then they've got to worry about the issues of scale and performance, but hopefully have the funding to pick a mix of technologies that's possibly more appropriate for their needs.

I would have to say though that MS SQL 2005 64bit running on Win2003 is probably going to beat MySQL in a simple reading race - but I don't have any facts to back that claim up ;)

Posted by: OffBeatMammal | Mar 6, 2007 2:53:12 PM

@Offbeatmamal I totally agree with you that there better solutions out there, but for the startup relational is the way to go. Get the application out the door, and optimize later.

My experience: is as a programmer not a DBA. But I have worked on database from DB2, Oracle, Sql Server, and then everything from Object DB’s, Flat Tables, Berkley DB… etc on machines ranging from S390 to mobiles. The DB2 database was meant to be the largest database in number of records at the time (2000).

Quite interesting that nobody pointed out the series on radar.oreilly.com web 2.0 and databases

The point Ian Wilkes of Linden Labs makes is "I think the biggest lesson we learned is that databases need to be treated as a commodity. Standardized, interchangeable parts are far better in the long run than highly-optimized, special-purpose gear."

I would be surprised if MS Sql Server outperforms MySql as normally when MS products outperform open source products MS makes a big song and dance about it!

Posted by: James | Mar 6, 2007 8:33:22 PM

We use a company called Pythian for remote MySQL DBA services and have had great luck with them to date. They might be overkill for a very early stage startup but this is a great function to outsource during growth phase.

Posted by: Stephen Bronstein | Mar 18, 2007 4:18:46 AM

Post a comment

This weblog only allows comments from registered users. To comment, please Sign In.