|
|
| 1 2 |
|
Thorsten von Eicken-2
|
I've been trying to get rrdcached and collectd to work together under
load and am running into a number of issues. I'm using: - collectd 4.7.4 - rrdtool trunk rev 1889 (had trouble compiling collectd with newer versions) - rrdcached stats snapshot: 9 Statistics follow QueueLength: 0 UpdatesReceived: 91123595 FlushesReceived: 83983 UpdatesWritten: 210810 DataSetsWritten: 82474667 TreeNodesNumber: 25925 TreeDepth: 17 JournalBytes: 6846161193 JournalRotate: 4 - approx 3k updates written to rrdcached per second - approx 200-300KB written to journal per second - approx 2k-3k data sets written per second - rrdached params: -w 3600 -z 3600 -f 7200 -t 10 - disk I/O is not an issue - rrdcached memory usage is not an issue (grows to 0.8GB then stays totally flat), no swapping - running collectd, rrdcached, and custom graphing app on same dual-core server, verified that flushing for graphing is working properly First issue is that over time the data in the rrd files lag behind the data arriving into collectd in the network. After 12 hours I see approx a 5 minute lag. I've seen it go to >1.5 hrs after a bunch of days. The symptoms are that data in the rrd files continues to advance at the normal rate (20 second interval in our case) but just in the past. The fact that the delay is steady leads me to believe that it's a program bug (I've seen delays due to I/O overload in the past and different rrds then show different lags and jump forward when they finally get some disk attention). I've done some tests looking at the last_updated in the rrd and looking at what the rrdcached daemon returns to a PENDING command for the same file and I'm sure the flushing works. The daemon just doesn't receive newer updates. The journal is also in sync with all this. If I restart collectd, then the lag pretty quickly vanishes. So either collectd has some queue with a bug, or data is queued in the socket between collectd and rrdcached. I get the same delay whether I use a unix sock or a tcp sock and the amount of data "queued" is such that it's not in system buffers (the rrdcached journal is written at 200kB/sec and I believe that's the same rate at which rrdcached receives data). The second issue, which may possibly cause the first one is that the cpu consumed by rrdcached is way too high. After running for about an hour it consumes a full cpu (~90% user + ~10% system). It could be that that's causing the above lag, dunno. I/O is not a problem as I mentioned, it's pure CPU. I've compiled rrdcched with -pg to get gprof output, but haven't been successful. I commented out install_signal_handlers (left the USR2 to be able to terminate gracefully) and ran with -g, but the gprof output shows only ~2 minutes of CPU time profiled when the daemon accumulated >250mins. Here's the top of the output: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 55.12 62.39 62.39 280843249 0.00 0.00 buffer_get_field 11.33 75.22 12.83 93607575 0.00 0.00 send_response 10.32 86.91 11.68 93464852 0.00 0.00 handle_request_update 5.36 92.97 6.06 connection_thread_main 4.03 97.53 4.57 93683555 0.00 0.00 handle_request 3.46 101.46 3.92 93484712 0.00 0.00 check_file_access 3.29 105.18 3.72 176583057 0.00 0.00 next_cmd 1.33 106.69 1.51 93686967 0.00 0.00 find_command 1.23 108.08 1.40 88419974 0.00 0.00 journal_write 1.00 109.22 1.14 93672403 0.00 0.00 has_privilege It looks to like that's mostly the journal replay stuff and very little more. If someone has tips on how to get real profiling output, I'm all ears. The journal replay is too slow. When I terminate the daemon it leaves several GB of journal files behind. Reading those in takes the better part of an hour, during which the daemon is unresponsive. Most of time is in buffer_get_field. (Note: in the most common cases buffer_get_field copies each field in-place, character by character. Seems to me that a simple if statement could avoid the writes.) By the way, I find the unix socket stuff undebuggable. I switched to TCP sockets because I can telnet to the socket and find out what the daemon is doing. (For example, when nothing seems to work for almost an hour when I start the daemon because it's replaying logs there is no information about what's going on anywhere.) I'm saying this because everyone recommends the unix sockets for security reasons. It's unusable IMHO. I think this is very close to being an extremely high performance RRD monitoring system, but it's not quite there yet. I'd appreciate any pointers on what to pursue. I hope tat the above descriptions will ring some bells in those of you that wrote some of the code. I'm available to test things out and collect more info. Unfortunately I have only little time to dig into the code myself, sigh. Thanks, Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
kevin brintnall
|
Hi Thorsten,
On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote: > I've been trying to get rrdcached and collectd to work together under > load and am running into a number of issues. > I'm using: > - collectd 4.7.4 > - rrdtool trunk rev 1889 (had trouble compiling collectd with newer > versions) > - rrdcached stats snapshot: > 9 Statistics follow > QueueLength: 0 > UpdatesReceived: 91123595 > FlushesReceived: 83983 > UpdatesWritten: 210810 > DataSetsWritten: 82474667 It looks like your RRD files must have a very large number of DS? Almost 400? > TreeNodesNumber: 25925 > TreeDepth: 17 > JournalBytes: 6846161193 > JournalRotate: 4 > - approx 3k updates written to rrdcached per second > - approx 200-300KB written to journal per second > - approx 2k-3k data sets written per second > - rrdached params: -w 3600 -z 3600 -f 7200 -t 10 > - disk I/O is not an issue > - rrdcached memory usage is not an issue (grows to 0.8GB then stays > totally flat), no swapping > - running collectd, rrdcached, and custom graphing app on same > dual-core server, verified that flushing for graphing is working properly > > First issue is that over time the data in the rrd files lag behind the > data arriving into collectd in the network. After 12 hours I see approx > a 5 minute lag. I've seen it go to >1.5 hrs after a bunch of days. The > symptoms are that data in the rrd files continues to advance at the > normal rate (20 second interval in our case) but just in the past. The > fact that the delay is steady leads me to believe that it's a program > bug (I've seen delays due to I/O overload in the past and different rrds > then show different lags and jump forward when they finally get some > disk attention). I've done some tests looking at the last_updated in the > rrd and looking at what the rrdcached daemon returns to a PENDING > command for the same file and I'm sure the flushing works. The daemon > just doesn't receive newer updates. The journal is also in sync with all > this. If I restart collectd, then the lag pretty quickly vanishes. So > either collectd has some queue with a bug, or data is queued in the > socket between collectd and rrdcached. Thorsten, This sounds like collectd not sending updates to rrdcached. If they are not in the journal, then rrdcached has not received them. > The second issue, which may possibly cause the first one is that the cpu > consumed by rrdcached is way too high. After running for about an hour > it consumes a full cpu (~90% user + ~10% system). It could be that > that's causing the above lag, dunno. If the updates are all being sent by librrd, then each file update requires one read() and one write() system call. The small read/write syscalls may increase CPU utilization. "BATCH" mode was introduced to deal with this. There is no way to use BATCH from within the librrd; it is only available to users willing to speak the wire protocol directly. In my environment, I am bursting to 3.4k updates/sec (800 avg). rrdcached uses very little CPU during these bursts (<0.2%). I am writing directly to the daemon using "BATCH". > I/O is not a problem as I mentioned, it's pure CPU. I've compiled > rrdcched with -pg to get gprof output, but haven't been successful. I > commented out install_signal_handlers (left the USR2 to be able to > terminate gracefully) and ran with -g, but the gprof output shows only > ~2 minutes of CPU time profiled when the daemon accumulated > >250mins. Here's the top of the output: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls s/call s/call name > 55.12 62.39 62.39 280843249 0.00 0.00 buffer_get_field > 11.33 75.22 12.83 93607575 0.00 0.00 send_response > 10.32 86.91 11.68 93464852 0.00 0.00 handle_request_update > 5.36 92.97 6.06 connection_thread_main > 4.03 97.53 4.57 93683555 0.00 0.00 handle_request > 3.46 101.46 3.92 93484712 0.00 0.00 check_file_access > 3.29 105.18 3.72 176583057 0.00 0.00 next_cmd > 1.33 106.69 1.51 93686967 0.00 0.00 find_command > 1.23 108.08 1.40 88419974 0.00 0.00 journal_write > 1.00 109.22 1.14 93672403 0.00 0.00 has_privilege > It looks to like that's mostly the journal replay stuff and very little > more. If someone has tips on how to get real profiling output, I'm all ears. I've never had this sort of problem... What is your `uname -a` ? > The journal replay is too slow. When I terminate the daemon it leaves > several GB of journal files behind. Reading those in takes the better > part of an hour, during which the daemon is unresponsive. That seems awfully long. Mine is able to replay 83M entries (about 8GB) in 7 minutes. > Most of time is in buffer_get_field. (Note: in the most common cases > buffer_get_field copies each field in-place, character by > character. Seems to me that a simple if statement could avoid the > writes.) Agreed that buffer_get_field implementation is not optimal. From what I can tell, it copies this way for three reasons: (1) desire to provide null terminated string to caller (2) do not want to modify original string (in case need to write it to journal) (3) allow escaped characters (presumably space) (1) with (2) implies that we have to copy, rather than use the source string. (3) implies that we have to handle \ specially, but not necessarily that we have to copy character by character. I would expect '\' to appear infrequently. The only case I can imagine is if the RRD file path contains spaces. It's possible to implement it with strncpy, but performance depends on whether the optimizations in strncpy offset the need to pre-scan the string for delimiters. If we were OK with modifying the string in place, we could do it without copying... Except, if we want to then write the original string to journal we have to keep a copy. More journal entries are written than read, so I would avoid this. > By the way, I find the unix socket stuff undebuggable. I switched to TCP > sockets because I can telnet to the socket and find out what the daemon > is doing. (For example, when nothing seems to work for almost an hour > when I start the daemon because it's replaying logs there is no > information about what's going on anywhere.) I'm saying this because > everyone recommends the unix sockets for security reasons. It's unusable > IMHO. It is possible to use Unix sockets and TCP sockets simultaneously. FYI netcat works just fine for talking on a unix socket: nc -U /wherever/sock YMMV, but I am able to get 2x as many updates when using a Unix socket on FreeBSD 7.x. -- kevin brintnall =~ /[hidden email]/ _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
Kevin, thanks for your thoughts.
kevin brintnall wrote: > Hi Thorsten, > > On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote: >> I've been trying to get rrdcached and collectd to work together under >> load and am running into a number of issues. >> I'm using: >> - collectd 4.7.4 >> - rrdtool trunk rev 1889 (had trouble compiling collectd with newer >> versions) >> - rrdcached stats snapshot: >> 9 Statistics follow >> QueueLength: 0 >> UpdatesReceived: 91123595 >> FlushesReceived: 83983 >> UpdatesWritten: 210810 >> DataSetsWritten: 82474667 > > It looks like your RRD files must have a very large number of DS? Almost > 400? No, in fact most have 1 (that's how collectd likes it). You may be looking at the recevied/written ratio which is skewed by journal replay and the very long cache period I configure (1 hour). >> TreeNodesNumber: 25925 >> TreeDepth: 17 >> JournalBytes: 6846161193 >> JournalRotate: 4 >> - approx 3k updates written to rrdcached per second >> - approx 200-300KB written to journal per second >> - approx 2k-3k data sets written per second >> - rrdached params: -w 3600 -z 3600 -f 7200 -t 10 >> - disk I/O is not an issue >> - rrdcached memory usage is not an issue (grows to 0.8GB then stays >> totally flat), no swapping >> - running collectd, rrdcached, and custom graphing app on same >> dual-core server, verified that flushing for graphing is working properly >> >> First issue is that over time the data in the rrd files lag behind the >> data arriving into collectd in the network. After 12 hours I see approx >> a 5 minute lag. I've seen it go to >1.5 hrs after a bunch of days. The >> symptoms are that data in the rrd files continues to advance at the >> normal rate (20 second interval in our case) but just in the past. The >> fact that the delay is steady leads me to believe that it's a program >> bug (I've seen delays due to I/O overload in the past and different rrds >> then show different lags and jump forward when they finally get some >> disk attention). I've done some tests looking at the last_updated in the >> rrd and looking at what the rrdcached daemon returns to a PENDING >> command for the same file and I'm sure the flushing works. The daemon >> just doesn't receive newer updates. The journal is also in sync with all >> this. If I restart collectd, then the lag pretty quickly vanishes. So >> either collectd has some queue with a bug, or data is queued in the >> socket between collectd and rrdcached. > > Thorsten, > > This sounds like collectd not sending updates to rrdcached. If they are > not in the journal, then rrdcached has not received them. Yes, the question is whether it's collectd's fault or rrdcached's fault.. >> The second issue, which may possibly cause the first one is that the cpu >> consumed by rrdcached is way too high. After running for about an hour >> it consumes a full cpu (~90% user + ~10% system). It could be that >> that's causing the above lag, dunno. > > If the updates are all being sent by librrd, then each file update > requires one read() and one write() system call. The small read/write > syscalls may increase CPU utilization. "BATCH" mode was introduced to > deal with this. Collectd doesn't use BATCH, but then, I'm using rrdcached to "batch" updates into fewer writes. The caching is set to 1 hour, each rrd is updated every 20 seconds. That seems to work fine. > In my environment, I am bursting to 3.4k updates/sec (800 avg). rrdcached > uses very little CPU during these bursts (<0.2%). I am writing directly > to the daemon using "BATCH". Yeah, as I mentioned above, I'm very steady at 3k updates received per sec. >> I/O is not a problem as I mentioned, it's pure CPU. I've compiled >> rrdcched with -pg to get gprof output, but haven't been successful. I >> commented out install_signal_handlers (left the USR2 to be able to >> terminate gracefully) and ran with -g, but the gprof output shows only >> ~2 minutes of CPU time profiled when the daemon accumulated >>> 250mins. Here's the top of the output: >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls s/call s/call name >> 55.12 62.39 62.39 280843249 0.00 0.00 buffer_get_field >> 11.33 75.22 12.83 93607575 0.00 0.00 send_response >> 10.32 86.91 11.68 93464852 0.00 0.00 handle_request_update >> 5.36 92.97 6.06 connection_thread_main >> 4.03 97.53 4.57 93683555 0.00 0.00 handle_request >> 3.46 101.46 3.92 93484712 0.00 0.00 check_file_access >> 3.29 105.18 3.72 176583057 0.00 0.00 next_cmd >> 1.33 106.69 1.51 93686967 0.00 0.00 find_command >> 1.23 108.08 1.40 88419974 0.00 0.00 journal_write >> 1.00 109.22 1.14 93672403 0.00 0.00 has_privilege >> It looks to like that's mostly the journal replay stuff and very little >> more. If someone has tips on how to get real profiling output, I'm all ears. > > I've never had this sort of problem... What is your `uname -a` ? Linux sketchy1.rightscale.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux >> The journal replay is too slow. When I terminate the daemon it leaves >> several GB of journal files behind. Reading those in takes the better >> part of an hour, during which the daemon is unresponsive. > > That seems awfully long. Mine is able to replay 83M entries (about 8GB) > in 7 minutes. One thing I noticed is that it starts out quite fast and gradually slows down. I may be able to run a more controlled experiment. The more I think about it, I have the feeling some rrdcached data structure is getting slower and slower as it grows. How big is your process? As I mentioned, mine is 0.8GB. >> Most of time is in buffer_get_field. (Note: in the most common cases >> buffer_get_field copies each field in-place, character by >> character. Seems to me that a simple if statement could avoid the >> writes.) > > Agreed that buffer_get_field implementation is not optimal. From what I > can tell, it copies this way for three reasons: > > (1) desire to provide null terminated string to caller > (2) do not want to modify original string (in case need to write it to journal) > (3) allow escaped characters (presumably space) Mhh, I don't think my C has gotten that rusty. Please look at the code. Here are significant snippets: buffer = *buffer_ret; field = *buffer_ret; field[field_size] = buffer[buffer_pos]; That's in-place modification as far as I can tell. In fact, in most cases (no \ escape chars) it actually reads and writes the same byte in the above assignment, so it doesn't actually copy anything. >> By the way, I find the unix socket stuff undebuggable. I switched to TCP >> sockets because I can telnet to the socket and find out what the daemon >> is doing. (For example, when nothing seems to work for almost an hour >> when I start the daemon because it's replaying logs there is no >> information about what's going on anywhere.) I'm saying this because >> everyone recommends the unix sockets for security reasons. It's unusable >> IMHO. > > FYI netcat works just fine for talking on a unix socket: nc -U /wherever/sock Ah, I must have missed that somehow. Will try, thanks! Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
kevin brintnall
|
On Fri, Oct 09, 2009 at 04:41:55PM -0700, Thorsten von Eicken wrote:
> > It looks like your RRD files must have a very large number of DS? Almost > > 400? > > No, in fact most have 1 (that's how collectd likes it). You may be > looking at the recevied/written ratio which is skewed by journal replay > and the very long cache period I configure (1 hour). You're right.. that's an indicator of how many values are cached, not how many DS's there are... It's been a while since I looked at the stats code. > Yes, the question is whether it's collectd's fault or rrdcached's fault.. The protocol is such that the client should wait for the server response to continue... there is no notion of "pipelined" operation, except BATCH mode. > >> The journal replay is too slow. When I terminate the daemon it leaves > >> several GB of journal files behind. Reading those in takes the better > >> part of an hour, during which the daemon is unresponsive. > > > > That seems awfully long. Mine is able to replay 83M entries (about 8GB) > > in 7 minutes. > > One thing I noticed is that it starts out quite fast and gradually slows > down. I may be able to run a more controlled experiment. The more I > think about it, I have the feeling some rrdcached data structure is > getting slower and slower as it grows. How big is your process? As I > mentioned, mine is 0.8GB. Mine is steady at 1.2GB. rrdcached will realloc() the cache_item_t.values on every update. If your realloc() implementation is slow, that could cause the decaying performance. > >> Most of time is in buffer_get_field. (Note: in the most common cases > >> buffer_get_field copies each field in-place, character by > >> character. Seems to me that a simple if statement could avoid the > >> writes.) > > > > Agreed that buffer_get_field implementation is not optimal. From what I > > can tell, it copies this way for three reasons: > > > > (1) desire to provide null terminated string to caller > > (2) do not want to modify original string (in case need to write it to journal) > > (3) allow escaped characters (presumably space) > > Mhh, I don't think my C has gotten that rusty. Please look at the code. > Here are significant snippets: > buffer = *buffer_ret; > field = *buffer_ret; > field[field_size] = buffer[buffer_pos]; > That's in-place modification as far as I can tell. In fact, in most > cases (no \ escape chars) it actually reads and writes the same byte in > the above assignment, so it doesn't actually copy anything. Yes, it looks like you're right.. it does indeed modify in-place. In that case, it should be relatively easy to optimize.. let me think on it. -- kevin brintnall =~ /[hidden email]/ _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
kevin brintnall wrote:
>> Yes, the question is whether it's collectd's fault or rrdcached's fault.. > > The protocol is such that the client should wait for the server response > to continue... there is no notion of "pipelined" operation, except BATCH > mode. What I mean is that as you state, the two are "locked together", so if one slows down, the other slows equally. Since rrdcached sits at 100% cpu I suspect it's the one determining the rate of processing inputs. >> That's in-place modification as far as I can tell. In fact, in most >> cases (no \ escape chars) it actually reads and writes the same byte in >> the above assignment, so it doesn't actually copy anything. > > Yes, it looks like you're right.. it does indeed modify in-place. In that > case, it should be relatively easy to optimize.. let me think on it. I found an explanation for the gprof oddity: multithreading. gprof doesn't deal with that out of the box. See <http://sam.zoy.org/writings/programming/gprof.html> for one write-up. I'm now trying the fix suggested on that page. Let's locate the issue before optimizing... Cheers, Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Sebastian Harl
|
In reply to this post
by Thorsten von Eicken-2
Hi Thorsten,
On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote: > I've compiled rrdcched with -pg to get gprof > output, but haven't been successful. Just a short note on that: Most of the actual work in RRDCacheD is done in threads. gprof is not able to handle that very well (basically not at all). > It looks to like that's mostly the journal replay stuff and very little > more. Journal replay is done during startup in the main thread, so gprof is able to grab information about that but probably misses about everything else. You might want to have a look at callgrind, which is one of the valgrind tools that may be used for call graph tracing and which supports multi- threaded programs as well. Cheers, Sebastian -- Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/ Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Florian Forster-2
|
In reply to this post
by Thorsten von Eicken-2
Hi Thorsten,
On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote: > - rrdtool trunk rev 1889 (had trouble compiling collectd with newer > versions) this is a bug introduced in RRDtool in revision 1906: src/rrd_client.h will do the following check / include: #ifdef HAVE_CONFIG_H #include "../rrd_config.h" #endif Of course “HAVE_CONFIG_H” is defined when building collectd - it's using the autotools, too. The (now globally available) config file will try to include "../rrd_config.h" and fail because it is not there. I don't think this inclusion is necessary when building RRDtool: Basically the same check / inclusion is done within src/rrd_tool.h. All we have to do is to make sure all .c-files within RRDtool include this header before including src/rrd_client.h. The only file doing this in the wrong order is src/rrd_client.c (I just checked). With the attached patch against revision 1934 I was able to build the rrdcached plugin of collectd without problems. Regards, —octo -- Florian octo Forster Hacker in training GnuPG: 0x91523C3D http://verplant.org/ commit 0299e04d2ed41324a38f6d3d90e1614a0b0e3f0d Author: Florian Forster <[hidden email]> Date: Sat Oct 10 12:16:21 2009 +0200 src/rrd_client.[ch]: Fix build issues introduced in revision 1906. diff --git a/src/rrd_client.c b/src/rrd_client.c index a4f1ba9..b018ee0 100644 --- a/src/rrd_client.c +++ b/src/rrd_client.c @@ -21,8 +21,8 @@ **/ #include "rrd.h" -#include "rrd_client.h" #include "rrd_tool.h" +#include "rrd_client.h" #include <stdlib.h> #include <string.h> diff --git a/src/rrd_client.h b/src/rrd_client.h index 787c2b6..6c48dec 100644 --- a/src/rrd_client.h +++ b/src/rrd_client.h @@ -22,14 +22,6 @@ #ifndef __RRD_CLIENT_H #define __RRD_CLIENT_H 1 -#if defined(_WIN32) && !defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(HAVE_CONFIG_H) -#include "../win32/config.h" -#else -#ifdef HAVE_CONFIG_H -#include "../rrd_config.h" -#endif -#endif - #ifndef WIN32 # ifdef HAVE_STDINT_H # include <stdint.h> _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Florian Forster-2
|
In reply to this post
by Thorsten von Eicken-2
Hi Thorsten,
On Fri, Oct 09, 2009 at 04:41:55PM -0700, Thorsten von Eicken wrote: > > This sounds like collectd not sending updates to rrdcached. If they > > are not in the journal, then rrdcached has not received them. > > Yes, the question is whether it's collectd's fault or rrdcached's > fault.. assuming you're receiving those values via collectd's Network plugin, this is what's going on: * There are two threads handling incoming network traffic. The first reads the packets from a network socket and appends them to a linked list. * The second thread parses the data and dispatches the included data to the daemon, resulting in roughly 20 “value lists” per packet. * The dispatch thread will call “rrdc_update” within the rrdcached plugin, resulting in a single update instruction being sent to RRDCacheD. The call returns after a status has been returned by the daemon. (This is what Kevin meant with the BATCH operation, where this call would return immediately without waiting for a status to be returned.) If RRDCacheD takes too long to answer, the dispatch thread will wait there and not dequeue any more values from that queue of received and unparsed packets. If this is the cache, you should see some (linear?) memory growth of the collectd process. You can also try to forcibly quit collectd (kill -9) and immediately restart collectd. If the data RRD files were lagging behind is simply lost, this is a indication of the data being within collectd and waiting to be sent to RRDCacheD. (It's not yet possible to “watch” the length of this queue directly. I'll add some measurements to the Network plugin so we can see what's going on eventually …) > Yeah, as I mentioned above, I'm very steady at 3k updates received per > sec. Are you talking about network packets or separate updates here? Depending on your data every packet can contain about 20–30 separate values, to the difference is significant ;) > > > I/O is not a problem as I mentioned, it's pure CPU. I've compiled > > > rrdcched with -pg to get gprof output, but haven't been > > > successful. *The* data structure within RRDCacheD that is *supposed* to grow as more data is to be cached is “cache_tree”. So *the* call that is supposed to be the limiting factor is this line within “handle_request_update”: ci = g_tree_lookup (cache_tree, file); (I'm talking about normal operation, of course. Replaying the journal is special.) > >> % cumulative self self total > >> time seconds seconds calls s/call s/call name > >> 55.12 62.39 62.39 280843249 0.00 0.00 buffer_get_field If the CPU really was busy for 250 minutes (and not stuck doing I/O), then about 62.39/15000 (~4.2 %) of the time was spent in “buffer_get_field”. It might be possible to optimize that function further, but I don't think it's worth it. The real bottleneck is probably somewhere else. One possible way to make this faster is to use something like: *ptr = strcspn (buffer, "\\ "); if (*ptr == ' ') /* Normal case: Field without backslash */ else /* More complex escape sequence handling */ I would surprised if this was the cause of those performance issues, though. Looking at the code it looks like the schoolbook case for branch prediction, something modern CPUs are *very* good in … Overall I have the feeling that the update command is slower than expected – at least this would explain your issues. It'd be best if you could try to get some reliable profiling data. Without it, optimization makes not much sense :/ Regards, —octo -- Florian octo Forster Hacker in training GnuPG: 0x91523C3D http://verplant.org/ _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
Some javascript/style in this post has been disabled (why?)
Florian Forster wrote:
Yes, this description fits. When rrdcached hits 100% cpu them collectd's memory size starts increasing linearly.Hi Thorsten, On Fri, Oct 09, 2009 at 04:41:55PM -0700, Thorsten von Eicken wrote: I also made progress in diagnosing rrdcached's performance issues. I had 10 queue threads before (-t 10) and I now reduced it to 2 (-t 2). It now behaves a lot better, so I suspect there was a lot of lock contention going on. I don't see any performance impact and in fact a "FLUSHALL" seems to go faster than with 10 threads. But there are still some funny effects, the flushall created a queue of length 23000. It went down to ~5000 in 3 minutes at which point it saturated the disk controller cache and proceeded at a somewhat slower pace. Maybe a single queue thread could do it too. The overall cpu load now swings between 20-40%, depending on the flush rate (-w 3600 -z 3600 -f 7200). BTW: I tried callgrind, haha, it's way too slow. Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
oetiker
|
In reply to this post
by Florian Forster-2
Hi Florian,
Yesterday Florian Forster wrote: > With the attached patch against revision 1934 I was able to build the > rrdcached plugin of collectd without problems. the patch is in tobi > > Regards, > ?octo > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch [hidden email] ++41 62 775 9902 / sb: -9900 _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
In reply to this post
by Florian Forster-2
Some javascript/style in this post has been disabled (why?)
Long list of observations and thoughts below...Florian Forster wrote: The linear memory growth is very clear. However, there are a number of things that still bug me:Hi Thorsten, On Fri, Oct 09, 2009 at 04:41:55PM -0700, Thorsten von Eicken wrote: - collectd+rrdcached were running steady processing ~25'000 tree nodes with ~2'500 updates per second (rrdcached's UpdatesReceived stats counter). I then threw another ~30'000 tree nodes with ~3'000 updates per second at it (this is all real traffic, not a simulation). Due to the way we deal with the creation of the required new rrds this caused very heavy disk activity for a while slowing down collectd and rrdcached so collectd started buffering for ~15 minutes, during which time it grew from ~40MB to just under 300MB, all good and expected so far. It then stayed steady at that size and judging by the rrdcached UpdatesReceived it must have been able to clear its backlog. Then I threw yet another 30'000 tree nodes and corresponding updates at it. At that point, collectd started immediately to grow again linearly to over 600MB. Given that it has more traffic coming at it I expect it to grow larger buffers than previously, but what bothered me is that it started to grow immediately. It's as if the previous 250MB of buffers hadn't been freed (in the malloc sense, I understand that the process size isn't going to shrink). Could it be that there is a bug? - if rrdcached is restarted, collectd doesn't reconnect. I know this is the case for TCP sockets but I'm pretty sure I observed it using the unix socket too. This is a problem because restarting collectd looses the data it has buffered while rrdcached was down. - the -z parameter is nice, but not quite there yet. I'm running with -w 3600 -z 3600 and the situation after the first hour is not pretty with a ton of flushes followed by a lull and a repeat after another hour. It takes about 4 hours before everything stabilizes and becomes smooth. I'm wondering whether it would be difficult to change to an adaptive rate system, where given a -w 3600 and the current number of dirty tree nodes rrdcached computes the rate at which it needs to flush to disk and then does that. If you think about it, within one collection interval (20s in my case) it would know the total set of RRDs (tree nodes) and they all would be dirty. In my case it would periodically compute the ratio (e.g. 25'000 tree nodes to flush over 3600 seconds = 6.9 flushes per second) and would start flushing the oldest dirty nodes immediately even though they've been dirty for much less than 3600 seconds. Of course rrdcached would need to re-evaluate the flush rate periodically, but if it keeps a running counter of dirty tree nodes that should be pretty easy. All this should put the daemon into a steady state from the very beginning. - running with 80-90k tree nodes for a while ended up bringing rrdcached to its knees. What I observe is that over time rrdcached uses more and more cpu and starts seeing page faults. Eventually, rrdached comes to a crawl and neither keeps up with the input (so collectd starts growing) nor manages to maintain its write-rate. The page faults are interesting because no swap space is used (it stays at 64k usage, which is the initial state). The only explanation I've come up with is that at the point where the "working set" of all the RRDs exceeds the amount of memory available (I have 8GB) everything starts degrading. At that point, rrdcached fights against the buffer cache and starts seeing page faults. Its write threads also slow down because now the disk is not just being written but also read (I can see that happening). I assume that once it page-faults the whole process slows down meaning that notjust the queue threads but also the connection threads start slowing down, which then causes collectd to start buffering data and grow -- it grew to >2GB for me! That now puts more pressure on memory and we're in a downward spiral. It's not yet clear to me whether the disk used for RRDs is maxed out when this process starts (eventually it does max out), so I don't know whether I'm hitting a hard disk I/O limit or whether I just spiral into it by successively reducing the amount of buffer cache available. I suspect it would be possible to push the system further if the various rrdcached threads could be decoupled better. Also, being able to put an upper bound on collectd memory would be smart 'cause it's clear that at some point the growth becomes self-defeating. It could randomly drop samples when it hits the limit and that would probably lead to an overall happier outcome. - I'm wondering how we could overcome the RRD working set issue. Even with rrdcached and long cache periods (e.g. I use 1 hour) it seems that the system comes to a crawl if the RRD working set exceeds memory. One idea that came to mind is to use the caching in rrdcached to convert the random small writes that are typical for RRDs to more of a sequential access pattern. If we could tweak the RRD creation and the cache write-back algorithm such that RRDs are always accessed in the same order, and we manage to get the RRDs allocated on disk in that order, then we could use the cache to essentially do one sweep through the disk per cache flush period (e.g. per hour in my case). Of course on-demand flushes and other things would interrupt this sweep, but the bulk of accesses could end up being more or less sequential. I believe that doing the cache write-back in a specific order is not too difficult, what I'm not sure of is how to make it such that the RRD files get allocated on disk in the that order too. Any thoughts? Cheers, Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
Some javascript/style in this post has been disabled (why?)
Thorsten von Eicken wrote:
- I'm wondering how we could overcome the RRD working set issue. Even with rrdcached and long cache periods (e.g. I use 1 hour) it seems that the system comes to a crawl if the RRD working set exceeds memory. One idea that came to mind is to use the caching in rrdcached to convert the random small writes that are typical for RRDs to more of a sequential access pattern. If we could tweak the RRD creation and the cache write-back algorithm such that RRDs are always accessed in the same order, and we manage to get the RRDs allocated on disk in that order, then we could use the cache to essentially do one sweep through the disk per cache flush period (e.g. per hour in my case). Of course on-demand flushes and other things would interrupt this sweep, but the bulk of accesses could end up being more or less sequential. I believe that doing the cache write-back in a specific order is not too difficult, what I'm not sure of is how to make it such that the RRD files get allocated on disk in the that order too. Any thoughts?One further thought, instead of trying to allocate RRDs sequentially, if there is a way to query/detect where each RRD file is allocated on disk, then rrdcached could sort the dirty tree nodes by disk location and write them in that order. I don't know whether Linux (or FreeBSD) have a way to query disk location or to at least infer it. TvE _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Benny Baumann
|
Am 12.10.2009 20:33, schrieb Thorsten von Eicken:
> Thorsten von Eicken wrote: >> - I'm wondering how we could overcome the RRD working set issue. Even >> with rrdcached and long cache periods (e.g. I use 1 hour) it seems >> that the system comes to a crawl if the RRD working set exceeds >> memory. One idea that came to mind is to use the caching in rrdcached >> to convert the random small writes that are typical for RRDs to more >> of a sequential access pattern. If we could tweak the RRD creation >> and the cache write-back algorithm such that RRDs are always accessed >> in the same order, and we manage to get the RRDs allocated on disk in >> that order, then we could use the cache to essentially do one sweep >> through the disk per cache flush period (e.g. per hour in my case). >> Of course on-demand flushes and other things would interrupt this >> sweep, but the bulk of accesses could end up being more or less >> sequential. I believe that doing the cache write-back in a specific >> order is not too difficult, what I'm not sure of is how to make it >> such that the RRD files get allocated on disk in the that order too. >> Any thoughts? >> > One further thought, instead of trying to allocate RRDs sequentially, > if there is a way to query/detect where each RRD file is allocated on > disk, then rrdcached could sort the dirty tree nodes by disk location > and write them in that order. I don't know whether Linux (or FreeBSD) > have a way to query disk location or to at least infer it. > > TvE unrelated to this as modern harddrives mayreallocate certain sectors if they feel that one particular sector cannot be read\written properly. Thus trying to infer the physical location of data will not be accurate. To take this even further not even the volume you write onto hast to exist physically, e.g. you might have a RAID or LVM in which case one logical sector corresponds to more than one physical location or as in cases of a RAM disc none at all (at least no permanent one). To make the picture complete there's even another factor that makes this nothing you want to do in software: Modern harddrives usually "plan" their read and write requests automatically already. So when write accesses occure right behind each other the harddrive will already figure out the best way to write them - unless you enforce synchronous request completion with e.g. O_DIRECT or thelike. Regards, BenBE. _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
Some javascript/style in this post has been disabled (why?)
Benny Baumann wrote:
Ben, I understand what you are saying. The question I have is not whether it's always possible and meaningful to do this, the question is whether it's possible to arrange this with the right set-up. Also, I wouldn't be looking for 100% accuracy, even if it was only roughly it could be a significant improvement.Am 12.10.2009 20:33, schrieb Thorsten von Eicken: TvE _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
In reply to this post
by Thorsten von Eicken-2
Florian Forster wrote:
> Hi Thorsten, > > I'm having a bit of a hard time replying to this message because it (and > the previous one) were sent as HTML-only. Could you maybe switch to > multipart or plain text messages? Thanks :) > Oops, sorry, will do better. >> It's as if the previous 250MB of buffers hadn't been freed (in the >> malloc sense, I understand that the process size isn't going to >> shrink). Could it be that there is a bug?<br> >> > > We're talking about the resident segment size (RSS) here, right? Because > *that* ought to descrese. > Yes, RSS. >> - if rrdcached is restarted, collectd doesn't reconnect. >> > > The collectd plugin calls “rrdc_connect” before each update. The > semantic of that function is to check whether a valid connection to the > daemon exists and try to reconnect if necessary. If anything goes wrong > with sending / receiving data, other functions will simply close / > invalidate the connection and it is supposed to be opened in the next > iteration. > > If the connection is not reestablished, my guess is that the socket > descriptor is not properly invalidated. I'll have to look further into > this though. > can troubleshoot if you see it reconnect properly on your box. > I'm running with -w 3600 -z 3600 and the situation after the first >> hour is not pretty with a ton of flushes followed by a lull and a >> repeat after another hour. >> > > That's unexpected (at least for me). With those setting I would have > expected the first hour to be memory only (i.e. no disk activity at all) > and after that basically uniformly distributed writes for an hour. Two > hours after start I'd expect a drop in writes which increases for an > hour and has its peak at three hours after start. > what I observe. >> I suspect it would be possible to push the system further if the >> various rrdcached threads could be decoupled better. >> > > Do you have anything specific in mind? As far as I can tell the various > threads are pretty much as decoupled as they can safely be. > I did have something in mind, but now I'm not sure my hypothesis was correct... >> Also, being able to put an upper bound on collectd memory would be >> smart 'cause it's clear that at some point the growth becomes >> self-defeating. >> > > Sounds like a reasonable idea. Any idea which values to drop? The > oldest, the newest, either (chosen randomly), both? > I've converged on a XFF value of 0.9, 'cause else it's too easy to loose a lot fo data if there is any flakyness in the collection. So I would prefer totally random dropping of values irrespective of age. That'll uniformly lower the resolution across the board. Visually imperceptible until it starts dropping significant amounts. I'm sure others have different ideas. >> - I'm wondering how we could overcome the RRD working set issue. >> > > Let's assume every RRD file has only one data source and you have > 100,000 files. Then the total data cached should be: > > 8 Byte * 100,000 files * 3600 / 20 seconds ⇒ 144 MByte > > This should be possible *somehow* … > dimension is time, the other the data sources. You write in time order and you read in data source order. >> One idea that came to mind is to use the caching in rrdcached to >> convert the random small writes that are typical for RRDs to more of a >> sequential access pattern. >> > > Well, the problem is that currently RRD files look like this on disk: > > [a0,a1,a2,a3,…,an] [b0,b1,b2,b3,…,bn] [c0,c1,c2,c3,…,cn] > > To get a sequential access pattern, we'd have to reorder this to: > > a0,b0,c0 a1,b1,c1 a2,b2,c2 a3,b3,c3 … an,bn,cn > > I think the only way to achieve this is to have all that data in one > file. The huge problem here is adding new data: If we need to add > d[0,…,n] to the set above, almost *all* data has to be moved. And we're > not even touching several RRAs with differing resolutions. I think to > get this RRDtool / RRDCacheD would have to be turned into something much > more like a database system and less like a frontend for writing > separate files. > RRD file format. If you do one pass updating all your RRDs you end up writing 1/Nth of the disk blocks, where N has to do with the RRA's being updated vs. the total stored data for an RRD. If you do this pass over your RRDs in random order, the disk will do random seeks between read-modify-writes with some possible ordering thanks to elevator algorithm and such. Now imagine instead that you could update the RRDs in the order in which they're stored on disk. Depending on the cylinder size vs. RRD size you'd get away with fewer seeks, and with predominantly short seeks. This is not "sequential access" strictly speaking, but it should be a whole lot faster than random seeks across the entire disk. I just restarted everything afresh to get a clean set of data. It's already not looking pretty. Here's the set-up: - /usr/bin/rrdcached -w 3600 -z 3600 -f 7200 -t 2 -b /rrds -B -j /rrds/journal -p /var/run/rrdcached/rrdcached.pid -l 127.0.0.1:3033 - ~55k tree nodes, collected every 20 seconds - see the rrdcached-1*.png in http://www.voneicken.com/dl/rrd/ What I see: - the system ran with half the load for 5 minutes at start-up before I added the "second half" - the input is constant (see network rx pkts in last graph in 1c.png) - rrdcached has ok cpu load for the first 15 minutes, then it really ramps up to using over half a cpu - the connection thread seems to be affected because the "receive update" and "journal bytes" rates start to degrade - note that the journal files are on a separate set of disks from the RRDs, and that set of disks is always pretty unloaded - note that so far we haven't hit the end of the first hour, so no flushes to disk yet - collectd keeps and keeps growing after the first 15 minutes, it's clear that the degradation in "receive update" is due to rrdcached and collectd has to start buffering (note how the first 15 minutes were nice and flat) Conclusions so far: - it's interesting that the connection thread can't keep up with collectd sending stuff, I hadn't seen that before because I had always increased the load after flushes had occurred, so there were more moving parts to suspect - it's also interesting that the connection thread can keep up fine for the first 15 minutes, note that the tree depth goes to 19 levels right after the full traffic hits, so I don't see a correlation there. This also means that it's not string parsing that is the problem as that would show up immediately and not with 10 minutes of delay. As I'm finishing to write this email rrdcached started to flush to disk. So far nothing interesting happening (other than I/O). The connection thread performance (or rather lack thereof) is virtually unchanged. I'll grabs a fresh set of graphs as rrdtool-2*.png Without being able to run any decent profiler I'm a bit stumped. I tried to change main the other day to run the listen loop in a separate thread and to run one queue loop in main, so I could get gprof stats for a queue loop. That almost worked -- I had trouble getting the pgm to exit cleanly to give me stats. Maybe a similar hack to a connection thread could work. Mhh, sounds mor difficult since these are forked off the listen thread, ughh. Ideas? Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Benny Baumann
|
In reply to this post
by Thorsten von Eicken-2
Am 13.10.2009 08:50, schrieb Thorsten von Eicken:
> Benny Baumann wrote: >> Am 12.10.2009 20:33, schrieb Thorsten von Eicken: >> >>> Thorsten von Eicken wrote: >>> >>> One further thought, instead of trying to allocate RRDs sequentially, >>> if there is a way to query/detect where each RRD file is allocated on >>> disk, then rrdcached could sort the dirty tree nodes by disk location >>> and write them in that order. I don't know whether Linux (or FreeBSD) >>> have a way to query disk location or to at least infer it. >>> >>> TvE >>> >> Even though Linux and Windows (and I guess most other OSes) allow to >> query the "logical" disc position the physical location maybe completely >> unrelated to this as modern harddrives mayreallocate certain sectors if >> they feel that one particular sector cannot be read\written properly. >> Thus trying to infer the physical location of data will not be accurate. >> To take this even further not even the volume you write onto hast to >> exist physically, e.g. you might have a RAID or LVM in which case one >> logical sector corresponds to more than one physical location or as in >> cases of a RAM disc none at all (at least no permanent one). >> >> To make the picture complete there's even another factor that makes this >> nothing you want to do in software: Modern harddrives usually "plan" >> their read and write requests automatically already. So when write >> accesses occure right behind each other the harddrive will already >> figure out the best way to write them - unless you enforce synchronous >> request completion with e.g. O_DIRECT or thelike. >> > Ben, I understand what you are saying. The question I have is not > whether it's always possible and meaningful to do this, the question > is whether it's possible to arrange this with the right set-up. Also, > I wouldn't be looking for 100% accuracy, even if it was only roughly > it could be a significant improvement. > TvE difficult process. The following parameters might have influence (I tried to order them from low-level to high-level, YMMV): - The type of storage used, like magnetic, flash, optical, ... (Influence: The way data is accessed or modified) - The encoding used to store the data (Influence: determines if two pieces of data nearby might get reached) - Physical organization of the media (Influence: Where is physical sector 0) - Sector reallocation (Influence: Remapping of data locations) - Logical structure of the media (Influence: Where is logical sector 0 and how to map from physical to logical sectors?) - Virtual Resizing of the drive (e.g. hiding parts of the drive from view) (Influence: Parts of the drive become invisible to the casual observer) - Partitioning of the media for the OS (Influence: Offset for sector mapping) - RAID / LVM (Influence: Virtualizing of the storage) - Volume File System (Influence: Remapping of partition sectors to logical file system sectors/clusters) - File Allocation (Mapping a logical file offset to a volume cluster) Taking all or at least some of these into account, humbly said, it doesn't make sense to try guessing the disc location as you not only will have to care about tousands of combinations of those factors, have a lot of more, complex work, but certainly will guess wrong most of the time. And guessing wrong will in buest case change nothing, but in worst case will harm your performance twice: Once for guessing at all and second for guessing wrong. The better alternative IMHO should be analyzing where things waste valueable time. I'm not sure about the internals of the write threads, but usually you have one "task" thread preparing a queue of stuff to write (or multiple if needed) and many (I'll come to this in a second) worker threads to flush things to disc. Now for the number: 2 threads can be better than ten, but needn't be as this depends on some things: 1. the number of CPUs in the system, 2. the drive speed, 3. data buffering. To get a feeling for this: Consider compiling a large program or software on a multi core CPU: You'll not get anything from your hardware when working with one make instance. That's for sure. Two will be better, but needn't be optimum. Although Compiling isn't as I/O-bound as flushing data to disk, there's a point where the number of processes that read data from disk (for compiling) need just as much time as things need to compile. That's why for compiling you won't do one job per CPU but usually 2 or 3: One process per CPU does I/O, the other does CPU work. So to take this for the I/O-Heavy work: Do two threads per CPU (one prepares the data, the other writes) and encourage the OS to use much buffer space in RAM, i.e. avoid using direct disc I/O with O:_DIRECT as O_DIRECT forces the OS to wait for the operation to complete. Instead use as many asynchronous calls and do e.g. select calls to find which succeeded and which failed. So one I/O thread can track say 16* I/O operations at a time, discard those that completed successfully, retry those that failed and leave the kernel working for you. One way to optimize things here is, to collect requests for one file per thread, i.e. if there are 2 requests on descriptor 7 push them to the same thread if possible instead of distributing them among threads. But since async threads cannot be paralleld on the same file (at least on Windows IIRC) you'd need to do something like this anyway. This saves some trouble with synching your threads with using that file descriptor and thus improves performance a bit. I hope this little description helps a bit with finding a suitable solution. *arbitrarily choosen, no empiric data _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Bernhard Reutner-Fischer
|
In reply to this post
by Thorsten von Eicken-2
On Tue, Oct 13, 2009 at 03:52:15AM -0700, Thorsten von Eicken wrote:
>Florian Forster wrote: >> Hi Thorsten, >> >> I'm having a bit of a hard time replying to this message because it (and >> the previous one) were sent as HTML-only. Could you maybe switch to >> multipart or plain text messages? Thanks :) >> >Oops, sorry, will do better. much better, thanks alot. [] >As I'm finishing to write this email rrdcached started to flush to disk. >So far nothing interesting happening (other than I/O). The connection >thread performance (or rather lack thereof) is virtually unchanged. I'll >grabs a fresh set of graphs as rrdtool-2*.png Do you, by chance, have figures for the just-rrdtool (with mmap) "baseline"? Just curious.. Cheers, Bernhard _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-2
|
Bernhard Reutner-Fischer wrote:
> On Tue, Oct 13, 2009 at 03:52:15AM -0700, Thorsten von Eicken wrote: >> As I'm finishing to write this email rrdcached started to flush to disk. >> So far nothing interesting happening (other than I/O). The connection >> thread performance (or rather lack thereof) is virtually unchanged. I'll >> grabs a fresh set of graphs as rrdtool-2*.png > > Do you, by chance, have figures for the just-rrdtool (with mmap) > "baseline"? > Just curious.. Bare collectd with rrdtool is instant death. Trying to do 5000 random writes per second just doesn't fly on this hardware by an order of magnitude. What I do have is collectd with rrdtool running on a tempfs backed by the same disk storage. This works well but has pretty bad behavior when the working set gets close to the memory available. Also, if the set of RRDs changes over time (e.g. new servers replace old ones) then you start getting into fragmentation issues. I can post some graphs of what a server with this looks like later today. Thorsten _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
kevin brintnall
|
In reply to this post
by Thorsten von Eicken-2
On Mon, Oct 12, 2009 at 11:33:40AM -0700, Thorsten von Eicken wrote:
> Thorsten von Eicken wrote: > > - I'm wondering how we could overcome the RRD working set issue. Even > with rrdcached and long cache periods (e.g. I use 1 hour) it seems that > the system comes to a crawl if the RRD working set exceeds memory. One > idea that came to mind is to use the caching in rrdcached to convert > the random small writes that are typical for RRDs to more of a > sequential access pattern. If we could tweak the RRD creation and the > cache write-back algorithm such that RRDs are always accessed in the > same order, and we manage to get the RRDs allocated on disk in that > order, then we could use the cache to essentially do one sweep through > the disk per cache flush period (e.g. per hour in my case). Of course > on-demand flushes and other things would interrupt this sweep, but the > bulk of accesses could end up being more or less sequential. I believe > that doing the cache write-back in a specific order is not too > difficult, what I'm not sure of is how to make it such that the RRD > files get allocated on disk in the that order too. Any thoughts? Thorsten, Read back about a year in the archives. Daniel Pocock <[hidden email]> started some work on "striping" RRD files together.. i.e. you would have one large disk file that contained multiple RRD databases, where the rows were clustered on disk by time.. I'm not sure how far he got on it. -- kevin brintnall =~ /[hidden email]/ _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
|
Thorsten von Eicken-3
|
Some javascript/style in this post has been disabled (why?)
kevin brintnall wrote:
Yeah, that sounds like a really exciting approach if you have a lot of time! I believe it's not too difficult if you don't implement RRDtools' notion of aggregation. But if do... good luck... I was looking for creative ways to get some disk sweeping without turning RRDtool upside down. Ha, maybe I should just use a FAT filesystem, that would make it easy to get a file -> block mapping :-).On Mon, Oct 12, 2009 at 11:33:40AM -0700, Thorsten von Eicken wrote: TvE _______________________________________________ rrd-developers mailing list [hidden email] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |