Currently a one man band, but I will hire soon!
I offer analytic-, and hands on- services in the following fields:
- Extreme availability (Unix clustering, Oracle DB availability)
- Database Backup/Recovery
- Storage architecture
- Storage benchmarking
Latest Blog entry:
Benchmarking - to impress or not impress; is not even a question
I love benchmarking. Benchmarking is what I do well.
There is something shimmering about running tests on a system, trying to find out what it can do. After all, who is to say that something is fast or slow? Who defines “fast”, and how bad is “slow”.
And, there is a but; the word “benchmarking” is badly misused and misunderstood. The goal when performing a benchmark is not to produce impressive numbers (that is called performance tuning). A benchmark will show you metrics that an isolated system with a predefined and given set of parameters can produce. It is, of course, always satisfying to show metrics which are impressive. I mean, who keeps statistics of the losing team in the last year’s series of your favorite sport?
The graph below is not impressive at all, showing write performance of ca 30Mb/sec. Well, had it been 15 – 20 years ago, many companies would have paid good money to reach these numbers, but these figures were measured just recently, on a not that impressive piece of hardware stack.
So, what does it show?
- Server: AsRock ION 330
- CPU: Intel Atom 330@1.6GHz
- RAM: 4GB RAM
- Storage: 1TB IOMega USB Drive, 8MB Cache
And this is how I produced the nubers:
1) Start collecting io data
1 | nohup iostat -d -t -k -x 10 > iostat.out & |
2) Do something to produce an io load
1 | cp -r /export/stuff/new/Favorite_tv_series.S08E1* /export/stuff/tv_serier/temp/ |
From here, we are set to go. We’ve collected data, on a given system, and we produced a well defined and reproducible workload. But what about the data? Can just about anyone graph it so nicely and be able to read what it means? For sure not. As Kevin Closson once said in a very good talk I visited; “The solution is simple, but it is not easy”.
The output of iostat data looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | 2010-07-22T12:28:28+0200 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.40 0.50 0.70 0.70 6.80 4.80 16.57 0.07 47.86 7.86 1.10 sda1 0.40 0.50 0.70 0.70 6.80 4.80 16.57 0.07 47.86 7.86 1.10 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 1.10 1.20 6.80 4.80 10.09 0.09 39.57 4.78 1.10 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 3.60 0.00 15.90 0.00 78.00 9.81 1.99 125.22 2.45 3.90 sdc1 0.00 3.60 0.00 15.90 0.00 78.00 9.81 1.99 125.22 2.45 3.90 ... 2010-07-23T10:37:49+0200 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.30 0.00 0.50 0.00 3.20 12.80 0.01 30.00 10.00 0.50 sda1 0.00 0.30 0.00 0.50 0.00 3.20 12.80 0.01 30.00 10.00 0.50 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.80 0.00 3.20 8.00 0.01 18.75 6.25 0.50 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 |
You have to be quite darn good to graph someting like this. Even though I consider myself to possess a black belt in scripting, or perhaps because of it, I would not get at this data right away. You first have to transform it into something your favorite graph plotting tool can handle (read gnuplot). I like semicolon separated files, and I like ISO style date formats. This is what I did:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | device=sdc cat iostat.out | \ awk -v DEVICE=$device ' $1 ~ /^2010/ { ts=$1; gsub("T"," ",ts); gsub("\+0200","",ts); } $0 !~ /^$/ && $1 !~ /^2010/ && $0 !~ /Device/ && $1 == DEVICE { #--- set semicolon as output field separator OFS=";"; #--- recalculate $0 with OFS $1=$1; print ts, $0 }' > a.out |
This is what the data looks like right now. Notice, that I filtered the data to keep only “sdc” as well:
1 2 3 4 5 6 7 8 9 10 11 12 13 | 2010-07-22 12:34:08;sdc;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00 2010-07-22 12:34:18;sdc;0.00;0.00;0.10;0.00;0.40;0.00;8.00;0.00;20.00;20.00;0.20 2010-07-22 12:34:28;sdc;0.00;5001.40;0.00;164.30;0.00;19068.80;232.12;40.23;218.65;4.08;67.00 2010-07-22 12:34:38;sdc;0.00;6382.60;0.20;235.20;0.80;26948.80;228.97;109.58;472.74;4.00;94.20 2010-07-22 12:34:48;sdc;0.00;7032.70;0.20;250.30;0.80;29018.80;231.69;112.57;445.85;3.99;100.00 2010-07-22 12:34:58;sdc;0.00;5140.10;0.10;184.80;0.40;21379.20;231.26;47.91;265.46;4.05;74.80 2010-07-22 12:35:08;sdc;0.00;4776.10;0.20;204.80;0.80;21086.00;205.72;58.86;297.32;3.77;77.30 2010-07-22 12:35:18;sdc;0.00;4133.80;0.10;148.50;0.40;16593.20;223.33;44.39;288.05;3.84;57.10 2010-07-22 12:35:28;sdc;0.00;4056.70;0.10;149.90;0.40;17242.40;229.90;41.07;273.85;3.96;59.40 2010-07-22 12:35:38;sdc;0.00;4093.00;0.10;155.70;0.40;17115.20;219.71;40.54;269.99;3.87;60.30 2010-07-22 12:35:48;sdc;0.00;4284.60;0.00;161.30;0.00;17783.20;220.50;41.93;258.28;3.89;62.80 2010-07-22 12:35:58;sdc;0.00;0.00;0.00;0.80;0.00;3.60;9.00;0.01;350.00;1.25;0.10 2010-07-22 12:36:08;sdc;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00;0.00 |
The interesting data (you know, the column with the highest values) is in the 8th column, kB written per second. So the only thing I need to do now, is to run it through gnuplot. To simplify the script a bit, I here show you the hardcoded section. In my day to day world, I just don’t have the time to rewrite gnuplot scripts every time I use them, so I have written a wrapper around the whole thing so that I can reuse it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | malu@ml-sst7-0001:/home/malu/public_html/temp/quick_display/iostat $cat quickplot.gplot reset set terminal png size 800,400 set xdata time set timefmt "%Y-%m-%d %H:%M:%S" set output "pretty_picture.png" #--- time range must be in same format as data file set yrange [-1000:40000] set xlabel "Date-Time" set ylabel "kB/s" set title "ml-sst7-0001 - kB write - from date this and that _to_ date this and that" set datafile separator ";" set grid set grid front set key right filename="a.out" #--- for shading offset (below the plotted line) a=41000/20 #--- Plot the data 6 times with different shades of gray, black and green #--- to get the illusion of having a green line with a black frame (last two plots) #--- and a shade (the first 4 plots) plot ["2010-07-22 12:33:00":"2010-07-22 12:40:00"] \ filename u 1:($8 - a) w l lc rgb "#eeeeee" lw 9 t '' ,\ filename u 1:($8 - a) w l lc rgb "#dddddd" lw 7 t '', \ filename u 1:($8 - a) w l lc rgb "#cccccc" lw 5 t '', \ filename u 1:($8 - a) w l lc rgb "#bbbbbb" lw 3 t '', \ filename u 1:8 w l lc rgb "#555555" lw 3 t '', \ filename u 1:8 w l lc rgb "#00ff00" lw 1 t "kB/s" |
And… to make the magic, make sure a.out is in the same directory, and fire off gnuplot.
1 | cat quickplot.gplot | gnuplot |
That’s it, that’s how I produced the graph above.
Go ahead, define your own set of tests, find out how to collect the metrics, transform the collected data into something useful, plot it, write a presentation, make a decent load of money from it. People are eager to read benchmark papers to confirm or disprove an idea, and they are very often willing to pay you a fair fee for it as well.
Older entries:
- Benchmarking - to impress or not impress; is not even a question - July 23, 2010
- Starting all over - where did my contacts go? - June 14, 2010
- New beginning - April 29, 2010
Comments
Leave a comment