Pages

Wednesday, August 7, 2013

Don' forget about IOPS on RDS

A lot of things a slightly different when you run your application on cloud services like AWS. Take database servers: Increased load can always change the performance of DB look-ups and writes. Larger tables can lead to slow queries if the tables are not indexed right, a lot of writes may cause unexpected locking, etc.

However if you use Amazon's AWS there is another important factor you should not loose sight of: IOPS. This measures the number of I/O operations between the database server and its storage, which is attached as network storage.

It's somewhat unclear if/how Amazon throttles the throughput if you don't reserve IOPS. You might be at the mercy of other RDS instances that are running on the machine and probably some kind of rate limit AWS implicitly imposes.

IOPS are handled like pretty much everything on AWS: You pay for what you use. Unless you reserve IOPS, then you pay for the reserved amount, even if you don't use it.

When one of our database servers recently started misbehaving although pretty much all the reads were by primary key and writes are mainly inserts, we checked read/write IOPS on the AWS console and saw an interesting pattern: The DB regularity peaked on writes with very few reads and then reversed that doing a lot of reads and very few writes, reducing the overall throughput considerably. So if you hit some limit the I/O operations are not evenly split between reads and writes, it appears to be rather random with high peaks and lows. This unpredictable behavior lead to very strange outcomes, on another DB with the same issue, sometimes complex queries ran really fast but simple updates by primarily key took in the order of minutes; just to do the opposite only minutes later.

Most of the erratic behavior went away once we reserved more IOPS. Since this is prepaid resources one should closely monitor while adjusting resevered IOPS. Going too big might be an issue as reserved IOPS are quite expensive and cannot be reduced. You'd have to create a new instance with fewer IOPS reserved and transfer the data. With very large DBs that can (and did in our case) become a major issue as well.

All this doesn't mean you shouldn't check first what queries you're running when you see performance problems. But if you see unexplainable or erratic behavior on RDS having a look at IOPS usage and settings may help solve the mystery.

No comments:

Post a Comment