|
|
|
Traian Stanev
|
For what it's worth, patching the provider to addref the parent connection from the feature reader fixes the problem. This is on top of Friday's trunk MapGuide and FDO code, without any other extra fixes. I do think it is a bug not to addref the connection from the reader, when it in fact depends on it to be alive, in order to get stuff from it at a later time. I am pretty sure the addref fixes the problem, because the test I am using is fairly brutal and produces the crash/deadlock in less than 5 seconds every time I run it with unpatched GDAL provider. With the change, I ran it multiple times to completion. I am unhappy with the fix though, because it leaves CPU usage by mgserver at ~5%, even though I am doing something like 16 simultaneous requests... So then I decided to remove the global mutex from inside the provider -- that didn't make a difference in CPU usage, and the test still passed without problems. So that mutex can go. Then I messed with the connection pooling settings in serverconfig.ini -- those didn't make any difference. I tried them all for the GDAL provider and the test still passed with 5% CPU usage. I don't think the process is blocked for I/O because the TIFF I am using is not huge and I also generated an overlay pyramid (which I highly recommend to anyone by the way, it makes a huge performance difference for TIFFs). Traian > -----Original Message----- > From: [hidden email] [mailto:mapguide- > [hidden email]] On Behalf Of Haris Kurtagic > Sent: Saturday, March 28, 2009 4:43 AM > To: MapGuide Internals Mail List > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > ticket #462 > > Regarding MG raster problems ( or any other provider ) the main reason > is connection manager. > If you would set connection pool size 1 , for sdf provider MG will not > work properly. > > The patch Jason submitted, is workaround for issue. It will not cause > GDAL to work slower, because provider itself is basically > single-threaded now. > That was done trough work we were doing years or so ago. I was looking > at provider back then and realized that making gdal calls guarded will > make it more stable. Frank then submitted patch with guard on every > single gdal library call. > > I will try to explain what is happing now: > 1. Map with more than one gdal layer > 2. There are two calls in server code execute query and stylize > 3. One thread will open connection in execute query and went to stylize > procedure > 4. Another thread will come into execute query and because it could be > another layer it will try to open new connection > 5. Connection manager will check pool size ( 1 in gdal case ) and > because provider didn't inc. reference count to connection (not way to > solve, but anyhow provider needs to increment ) it will delete previous > connection and create new one > 6. Access violation in first thread while stylization because of > deleted > connection > > > In shorter version things to change: > 1. FDO GDAL provider is not increasing reference count on FDO > connection > when used in other classes like georasterband , which is later used in > query (wrong) > 2. connection manager is managing connection pool by using fdo > connection reference count ( wrong ) > 3. it looks like working for other providers, because connection pool > size is 200 and other providers are incrementing connection pointer in > select command for example, but if there would be provider with select > command with no reference count to connection, it will crash too > 4. Server code is making assumption that pool size for single threaded > is 1 (not correct), we need to make code change to allow for making > calls to provider single threaded > > Haris > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Bruce > Dechant > Sent: Friday, March 27, 2009 4:55 PM > To: MapGuide Internals Mail List > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, ticket > #462 > > I totally agree - the locking needs to be in the provider. > > Thanks, > Bruce > > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Traian > Stanev > Sent: Friday, March 27, 2009 9:40 AM > To: MapGuide Internals Mail List > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, ticket > #462 > > > It does look like a sledgehammer approach -- what if some day there is > a > raster provider that works ok and doesn't need the locks? If anything, > the lock should be in the FDO provider. > > Traian > > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Walt > Welton-Lair > Sent: Friday, March 27, 2009 11:11 AM > To: MapGuide Internals Mail List > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, ticket > #462 > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, and I'll > submit that. > > The stability patch (mapguide_raster_stability.patch) looks reasonable, > but I'd like another 1 or 2 people to also look at it. It's > significant > since it limits MapGuide to processing one raster stylization request > at > a time, but I guess that's better than having a dead lock. Anyone know > where the thread-unsafe code actually is? Is it in the provider itself? > Can the mutex be moved closer to the actual thread-unsafe code? > > Walt > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Jason > Birch > Sent: Friday, March 27, 2009 3:10 AM > To: [hidden email] > Subject: [mapguide-internals] PATCH: Raster stability fixes, ticket > #462 > > Hi, > > Haris has been working on the raster stability problem against MapGuide > 2.0.2. I was having the same problems, and was able to duplicate on > 2.1. I have worked with Haris to create patches against trunk that > address a couple of the problems and increase stability immensely. > > Could someone please review the patches attached to: > > http://trac.osgeo.org/mapguide/ticket/462 > > and commit them to trunk if appropriate? > > My feeling is that the connection management stuff may need some > additional work, since limiting the connection pool to 1 for > single-threaded providers doesn't appear to have the desired effect. > However, getting both of these patches into 2.1 would at least improve > stability for Raster users. > > Jason_______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Traian Stanev
|
I attached the patch for the GDAL provider to ticket 462.
Traian > -----Original Message----- > From: [hidden email] [mailto:mapguide- > [hidden email]] On Behalf Of Traian Stanev > Sent: Sunday, March 29, 2009 1:22 AM > To: MapGuide Internals Mail List > Cc: FDO Internals Mail List > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > ticket #462 > > > For what it's worth, patching the provider to addref the parent > connection from the feature reader fixes the problem. This is on top of > Friday's trunk MapGuide and FDO code, without any other extra fixes. I > do think it is a bug not to addref the connection from the reader, when > it in fact depends on it to be alive, in order to get stuff from it at > a later time. > > I am pretty sure the addref fixes the problem, because the test I am > using is fairly brutal and produces the crash/deadlock in less than 5 > seconds every time I run it with unpatched GDAL provider. With the > change, I ran it multiple times to completion. > > I am unhappy with the fix though, because it leaves CPU usage by > mgserver at ~5%, even though I am doing something like 16 simultaneous > requests... So then I decided to remove the global mutex from inside > the provider -- that didn't make a difference in CPU usage, and the > test still passed without problems. So that mutex can go. Then I messed > with the connection pooling settings in serverconfig.ini -- those > didn't make any difference. I tried them all for the GDAL provider and > the test still passed with 5% CPU usage. I don't think the process is > blocked for I/O because the TIFF I am using is not huge and I also > generated an overlay pyramid (which I highly recommend to anyone by the > way, it makes a huge performance difference for TIFFs). > > > > > Traian > > > > -----Original Message----- > > From: [hidden email] [mailto:mapguide- > > [hidden email]] On Behalf Of Haris Kurtagic > > Sent: Saturday, March 28, 2009 4:43 AM > > To: MapGuide Internals Mail List > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket #462 > > > > Regarding MG raster problems ( or any other provider ) the main > reason > > is connection manager. > > If you would set connection pool size 1 , for sdf provider MG will > not > > work properly. > > > > The patch Jason submitted, is workaround for issue. It will not cause > > GDAL to work slower, because provider itself is basically > > single-threaded now. > > That was done trough work we were doing years or so ago. I was > looking > > at provider back then and realized that making gdal calls guarded > will > > make it more stable. Frank then submitted patch with guard on every > > single gdal library call. > > > > I will try to explain what is happing now: > > 1. Map with more than one gdal layer > > 2. There are two calls in server code execute query and stylize 3. > One > > thread will open connection in execute query and went to stylize > > procedure 4. Another thread will come into execute query and because > > it could be another layer it will try to open new connection 5. > > Connection manager will check pool size ( 1 in gdal case ) and > because > > provider didn't inc. reference count to connection (not way to solve, > > but anyhow provider needs to increment ) it will delete previous > > connection and create new one 6. Access violation in first thread > > while stylization because of deleted connection > > > > > > In shorter version things to change: > > 1. FDO GDAL provider is not increasing reference count on FDO > > connection when used in other classes like georasterband , which is > > later used in query (wrong) 2. connection manager is managing > > connection pool by using fdo connection reference count ( wrong ) 3. > > it looks like working for other providers, because connection pool > > size is 200 and other providers are incrementing connection pointer > in > > select command for example, but if there would be provider with > select > > command with no reference count to connection, it will crash too 4. > > Server code is making assumption that pool size for single threaded > is > > 1 (not correct), we need to make code change to allow for making > calls > > to provider single threaded > > > > Haris > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of > Bruce > > Dechant > > Sent: Friday, March 27, 2009 4:55 PM > > To: MapGuide Internals Mail List > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket > > #462 > > > > I totally agree - the locking needs to be in the provider. > > > > Thanks, > > Bruce > > > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of > > Traian Stanev > > Sent: Friday, March 27, 2009 9:40 AM > > To: MapGuide Internals Mail List > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket > > #462 > > > > > > It does look like a sledgehammer approach -- what if some day there > is > > a raster provider that works ok and doesn't need the locks? If > > anything, the lock should be in the FDO provider. > > > > Traian > > > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of Walt > > Welton-Lair > > Sent: Friday, March 27, 2009 11:11 AM > > To: MapGuide Internals Mail List > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket > > #462 > > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, and > > I'll submit that. > > > > The stability patch (mapguide_raster_stability.patch) looks > > reasonable, but I'd like another 1 or 2 people to also look at it. > > It's significant since it limits MapGuide to processing one raster > > stylization request at a time, but I guess that's better than having > a > > dead lock. Anyone know where the thread-unsafe code actually is? Is > > it in the provider itself? > > Can the mutex be moved closer to the actual thread-unsafe code? > > > > Walt > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of > Jason > > Birch > > Sent: Friday, March 27, 2009 3:10 AM > > To: [hidden email] > > Subject: [mapguide-internals] PATCH: Raster stability fixes, ticket > > #462 > > > > Hi, > > > > Haris has been working on the raster stability problem against > > MapGuide 2.0.2. I was having the same problems, and was able to > > duplicate on 2.1. I have worked with Haris to create patches against > > trunk that address a couple of the problems and increase stability > immensely. > > > > Could someone please review the patches attached to: > > > > http://trac.osgeo.org/mapguide/ticket/462 > > > > and commit them to trunk if appropriate? > > > > My feeling is that the connection management stuff may need some > > additional work, since limiting the connection pool to 1 for > > single-threaded providers doesn't appear to have the desired effect. > > However, getting both of these patches into 2.1 would at least > improve > > stability for Raster users. > > > > Jason_______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Traian Stanev
|
In reply to this post
by Traian Stanev
OK, if I now enable connection pooling for GDAL and *completely remove* the global raster lock from MappingUtil.cpp, I get CPU at 100% and the test still passes! And it's fassssst... YMMV. Traian > -----Original Message----- > From: [hidden email] [mailto:mapguide- > [hidden email]] On Behalf Of Traian Stanev > Sent: Sunday, March 29, 2009 1:22 AM > To: MapGuide Internals Mail List > Cc: FDO Internals Mail List > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > ticket #462 > > > For what it's worth, patching the provider to addref the parent > connection from the feature reader fixes the problem. This is on top of > Friday's trunk MapGuide and FDO code, without any other extra fixes. I > do think it is a bug not to addref the connection from the reader, when > it in fact depends on it to be alive, in order to get stuff from it at > a later time. > > I am pretty sure the addref fixes the problem, because the test I am > using is fairly brutal and produces the crash/deadlock in less than 5 > seconds every time I run it with unpatched GDAL provider. With the > change, I ran it multiple times to completion. > > I am unhappy with the fix though, because it leaves CPU usage by > mgserver at ~5%, even though I am doing something like 16 simultaneous > requests... So then I decided to remove the global mutex from inside > the provider -- that didn't make a difference in CPU usage, and the > test still passed without problems. So that mutex can go. Then I messed > with the connection pooling settings in serverconfig.ini -- those > didn't make any difference. I tried them all for the GDAL provider and > the test still passed with 5% CPU usage. I don't think the process is > blocked for I/O because the TIFF I am using is not huge and I also > generated an overlay pyramid (which I highly recommend to anyone by the > way, it makes a huge performance difference for TIFFs). > > > > > Traian > > > > -----Original Message----- > > From: [hidden email] [mailto:mapguide- > > [hidden email]] On Behalf Of Haris Kurtagic > > Sent: Saturday, March 28, 2009 4:43 AM > > To: MapGuide Internals Mail List > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket #462 > > > > Regarding MG raster problems ( or any other provider ) the main > reason > > is connection manager. > > If you would set connection pool size 1 , for sdf provider MG will > not > > work properly. > > > > The patch Jason submitted, is workaround for issue. It will not cause > > GDAL to work slower, because provider itself is basically > > single-threaded now. > > That was done trough work we were doing years or so ago. I was > looking > > at provider back then and realized that making gdal calls guarded > will > > make it more stable. Frank then submitted patch with guard on every > > single gdal library call. > > > > I will try to explain what is happing now: > > 1. Map with more than one gdal layer > > 2. There are two calls in server code execute query and stylize 3. > One > > thread will open connection in execute query and went to stylize > > procedure 4. Another thread will come into execute query and because > > it could be another layer it will try to open new connection 5. > > Connection manager will check pool size ( 1 in gdal case ) and > because > > provider didn't inc. reference count to connection (not way to solve, > > but anyhow provider needs to increment ) it will delete previous > > connection and create new one 6. Access violation in first thread > > while stylization because of deleted connection > > > > > > In shorter version things to change: > > 1. FDO GDAL provider is not increasing reference count on FDO > > connection when used in other classes like georasterband , which is > > later used in query (wrong) 2. connection manager is managing > > connection pool by using fdo connection reference count ( wrong ) 3. > > it looks like working for other providers, because connection pool > > size is 200 and other providers are incrementing connection pointer > in > > select command for example, but if there would be provider with > select > > command with no reference count to connection, it will crash too 4. > > Server code is making assumption that pool size for single threaded > is > > 1 (not correct), we need to make code change to allow for making > calls > > to provider single threaded > > > > Haris > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of > Bruce > > Dechant > > Sent: Friday, March 27, 2009 4:55 PM > > To: MapGuide Internals Mail List > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket > > #462 > > > > I totally agree - the locking needs to be in the provider. > > > > Thanks, > > Bruce > > > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of > > Traian Stanev > > Sent: Friday, March 27, 2009 9:40 AM > > To: MapGuide Internals Mail List > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket > > #462 > > > > > > It does look like a sledgehammer approach -- what if some day there > is > > a raster provider that works ok and doesn't need the locks? If > > anything, the lock should be in the FDO provider. > > > > Traian > > > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of Walt > > Welton-Lair > > Sent: Friday, March 27, 2009 11:11 AM > > To: MapGuide Internals Mail List > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket > > #462 > > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, and > > I'll submit that. > > > > The stability patch (mapguide_raster_stability.patch) looks > > reasonable, but I'd like another 1 or 2 people to also look at it. > > It's significant since it limits MapGuide to processing one raster > > stylization request at a time, but I guess that's better than having > a > > dead lock. Anyone know where the thread-unsafe code actually is? Is > > it in the provider itself? > > Can the mutex be moved closer to the actual thread-unsafe code? > > > > Walt > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of > Jason > > Birch > > Sent: Friday, March 27, 2009 3:10 AM > > To: [hidden email] > > Subject: [mapguide-internals] PATCH: Raster stability fixes, ticket > > #462 > > > > Hi, > > > > Haris has been working on the raster stability problem against > > MapGuide 2.0.2. I was having the same problems, and was able to > > duplicate on 2.1. I have worked with Haris to create patches against > > trunk that address a couple of the problems and increase stability > immensely. > > > > Could someone please review the patches attached to: > > > > http://trac.osgeo.org/mapguide/ticket/462 > > > > and commit them to trunk if appropriate? > > > > My feeling is that the connection management stuff may need some > > additional work, since limiting the connection pool to 1 for > > single-threaded providers doesn't appear to have the desired effect. > > However, getting both of these patches into 2.1 would at least > improve > > stability for Raster users. > > > > Jason_______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Traian Stanev
|
In reply to this post
by Traian Stanev
I attached a new patch for the provider to the ticket 462, which speeds up GDAL FDO connection opening by 95%, in addition to the refcounting fix, and the global mutex removal. The global mutex is still needed in two places where I did not disable it. This change allows you to completely disable MG connection pooling for the GDAL provider (like for SHP and SDF) and still have very good raster speed, since opening a connection is fast now. Also, the global raster mutex can be removed from the MG code. I will continue testing with this configuration, but so far it looks stable and fast. Obviously it is impossible to prove that a problem no longer exists, when threads are involved, but I will throw some more difficult tests at it -- so far I have only tested with a TIFF-based layer only. The TIFF is 450 megs or so. Traian > -----Original Message----- > From: [hidden email] [mailto:mapguide- > [hidden email]] On Behalf Of Traian Stanev > Sent: Sunday, March 29, 2009 1:29 AM > To: MapGuide Internals Mail List > Cc: FDO Internals Mail List > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > ticket #462 > > I attached the patch for the GDAL provider to ticket 462. > > Traian > > > -----Original Message----- > > From: [hidden email] [mailto:mapguide- > > [hidden email]] On Behalf Of Traian Stanev > > Sent: Sunday, March 29, 2009 1:22 AM > > To: MapGuide Internals Mail List > > Cc: FDO Internals Mail List > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket #462 > > > > > > For what it's worth, patching the provider to addref the parent > > connection from the feature reader fixes the problem. This is on top > of > > Friday's trunk MapGuide and FDO code, without any other extra fixes. > I > > do think it is a bug not to addref the connection from the reader, > when > > it in fact depends on it to be alive, in order to get stuff from it > at > > a later time. > > > > I am pretty sure the addref fixes the problem, because the test I am > > using is fairly brutal and produces the crash/deadlock in less than 5 > > seconds every time I run it with unpatched GDAL provider. With the > > change, I ran it multiple times to completion. > > > > I am unhappy with the fix though, because it leaves CPU usage by > > mgserver at ~5%, even though I am doing something like 16 > simultaneous > > requests... So then I decided to remove the global mutex from inside > > the provider -- that didn't make a difference in CPU usage, and the > > test still passed without problems. So that mutex can go. Then I > messed > > with the connection pooling settings in serverconfig.ini -- those > > didn't make any difference. I tried them all for the GDAL provider > and > > the test still passed with 5% CPU usage. I don't think the process is > > blocked for I/O because the TIFF I am using is not huge and I also > > generated an overlay pyramid (which I highly recommend to anyone by > the > > way, it makes a huge performance difference for TIFFs). > > > > > > > > > > Traian > > > > > > > -----Original Message----- > > > From: [hidden email] [mailto:mapguide- > > > [hidden email]] On Behalf Of Haris Kurtagic > > > Sent: Saturday, March 28, 2009 4:43 AM > > > To: MapGuide Internals Mail List > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket #462 > > > > > > Regarding MG raster problems ( or any other provider ) the main > > reason > > > is connection manager. > > > If you would set connection pool size 1 , for sdf provider MG will > > not > > > work properly. > > > > > > The patch Jason submitted, is workaround for issue. It will not > cause > > > GDAL to work slower, because provider itself is basically > > > single-threaded now. > > > That was done trough work we were doing years or so ago. I was > > looking > > > at provider back then and realized that making gdal calls guarded > > will > > > make it more stable. Frank then submitted patch with guard on every > > > single gdal library call. > > > > > > I will try to explain what is happing now: > > > 1. Map with more than one gdal layer > > > 2. There are two calls in server code execute query and stylize 3. > > One > > > thread will open connection in execute query and went to stylize > > > procedure 4. Another thread will come into execute query and > because > > > it could be another layer it will try to open new connection 5. > > > Connection manager will check pool size ( 1 in gdal case ) and > > because > > > provider didn't inc. reference count to connection (not way to > solve, > > > but anyhow provider needs to increment ) it will delete previous > > > connection and create new one 6. Access violation in first thread > > > while stylization because of deleted connection > > > > > > > > > In shorter version things to change: > > > 1. FDO GDAL provider is not increasing reference count on FDO > > > connection when used in other classes like georasterband , which is > > > later used in query (wrong) 2. connection manager is managing > > > connection pool by using fdo connection reference count ( wrong ) 3. > > > it looks like working for other providers, because connection pool > > > size is 200 and other providers are incrementing connection pointer > > in > > > select command for example, but if there would be provider with > > select > > > command with no reference count to connection, it will crash too 4. > > > Server code is making assumption that pool size for single threaded > > is > > > 1 (not correct), we need to make code change to allow for making > > calls > > > to provider single threaded > > > > > > Haris > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > > Bruce > > > Dechant > > > Sent: Friday, March 27, 2009 4:55 PM > > > To: MapGuide Internals Mail List > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket > > > #462 > > > > > > I totally agree - the locking needs to be in the provider. > > > > > > Thanks, > > > Bruce > > > > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > > > Traian Stanev > > > Sent: Friday, March 27, 2009 9:40 AM > > > To: MapGuide Internals Mail List > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket > > > #462 > > > > > > > > > It does look like a sledgehammer approach -- what if some day there > > is > > > a raster provider that works ok and doesn't need the locks? If > > > anything, the lock should be in the FDO provider. > > > > > > Traian > > > > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > Walt > > > Welton-Lair > > > Sent: Friday, March 27, 2009 11:11 AM > > > To: MapGuide Internals Mail List > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket > > > #462 > > > > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, and > > > I'll submit that. > > > > > > The stability patch (mapguide_raster_stability.patch) looks > > > reasonable, but I'd like another 1 or 2 people to also look at it. > > > It's significant since it limits MapGuide to processing one raster > > > stylization request at a time, but I guess that's better than > having > > a > > > dead lock. Anyone know where the thread-unsafe code actually is? > Is > > > it in the provider itself? > > > Can the mutex be moved closer to the actual thread-unsafe code? > > > > > > Walt > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > > Jason > > > Birch > > > Sent: Friday, March 27, 2009 3:10 AM > > > To: [hidden email] > > > Subject: [mapguide-internals] PATCH: Raster stability fixes, ticket > > > #462 > > > > > > Hi, > > > > > > Haris has been working on the raster stability problem against > > > MapGuide 2.0.2. I was having the same problems, and was able to > > > duplicate on 2.1. I have worked with Haris to create patches > against > > > trunk that address a couple of the problems and increase stability > > immensely. > > > > > > Could someone please review the patches attached to: > > > > > > http://trac.osgeo.org/mapguide/ticket/462 > > > > > > and commit them to trunk if appropriate? > > > > > > My feeling is that the connection management stuff may need some > > > additional work, since limiting the connection pool to 1 for > > > single-threaded providers doesn't appear to have the desired effect. > > > However, getting both of these patches into 2.1 would at least > > improve > > > stability for Raster users. > > > > > > Jason_______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Steve Dang
|
I'm not surprised if it is related to some FDO pointer reference counting problems (i.e. all FdoIDisposable derived classes are not thread safe on reference counting). I have run load tests against the Autodesk Raster provider using huge data set before. I will try to load test the GDAL provider when I have a chance.
Thanks, Steve. ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Traian Stanev Sent: Sunday, March 29, 2009 11:17 AM To: MapGuide Internals Mail List Cc: FDO Internals Mail List Subject: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster stability fixes, ticket #462 I attached a new patch for the provider to the ticket 462, which speeds up GDAL FDO connection opening by 95%, in addition to the refcounting fix, and the global mutex removal. The global mutex is still needed in two places where I did not disable it. This change allows you to completely disable MG connection pooling for the GDAL provider (like for SHP and SDF) and still have very good raster speed, since opening a connection is fast now. Also, the global raster mutex can be removed from the MG code. I will continue testing with this configuration, but so far it looks stable and fast. Obviously it is impossible to prove that a problem no longer exists, when threads are involved, but I will throw some more difficult tests at it -- so far I have only tested with a TIFF-based layer only. The TIFF is 450 megs or so. Traian > -----Original Message----- > From: [hidden email] [mailto:mapguide- > [hidden email]] On Behalf Of Traian Stanev > Sent: Sunday, March 29, 2009 1:29 AM > To: MapGuide Internals Mail List > Cc: FDO Internals Mail List > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > ticket #462 > > I attached the patch for the GDAL provider to ticket 462. > > Traian > > > -----Original Message----- > > From: [hidden email] [mailto:mapguide- > > [hidden email]] On Behalf Of Traian Stanev > > Sent: Sunday, March 29, 2009 1:22 AM > > To: MapGuide Internals Mail List > > Cc: FDO Internals Mail List > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket #462 > > > > > > For what it's worth, patching the provider to addref the parent > > connection from the feature reader fixes the problem. This is on top > of > > Friday's trunk MapGuide and FDO code, without any other extra fixes. > I > > do think it is a bug not to addref the connection from the reader, > when > > it in fact depends on it to be alive, in order to get stuff from it > at > > a later time. > > > > I am pretty sure the addref fixes the problem, because the test I am > > using is fairly brutal and produces the crash/deadlock in less than 5 > > seconds every time I run it with unpatched GDAL provider. With the > > change, I ran it multiple times to completion. > > > > I am unhappy with the fix though, because it leaves CPU usage by > > mgserver at ~5%, even though I am doing something like 16 > simultaneous > > requests... So then I decided to remove the global mutex from inside > > the provider -- that didn't make a difference in CPU usage, and the > > test still passed without problems. So that mutex can go. Then I > messed > > with the connection pooling settings in serverconfig.ini -- those > > didn't make any difference. I tried them all for the GDAL provider > and > > the test still passed with 5% CPU usage. I don't think the process is > > blocked for I/O because the TIFF I am using is not huge and I also > > generated an overlay pyramid (which I highly recommend to anyone by > the > > way, it makes a huge performance difference for TIFFs). > > > > > > > > > > Traian > > > > > > > -----Original Message----- > > > From: [hidden email] [mailto:mapguide- > > > [hidden email]] On Behalf Of Haris Kurtagic > > > Sent: Saturday, March 28, 2009 4:43 AM > > > To: MapGuide Internals Mail List > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket #462 > > > > > > Regarding MG raster problems ( or any other provider ) the main > > reason > > > is connection manager. > > > If you would set connection pool size 1 , for sdf provider MG will > > not > > > work properly. > > > > > > The patch Jason submitted, is workaround for issue. It will not > cause > > > GDAL to work slower, because provider itself is basically > > > single-threaded now. > > > That was done trough work we were doing years or so ago. I was > > looking > > > at provider back then and realized that making gdal calls guarded > > will > > > make it more stable. Frank then submitted patch with guard on every > > > single gdal library call. > > > > > > I will try to explain what is happing now: > > > 1. Map with more than one gdal layer > > > 2. There are two calls in server code execute query and stylize 3. > > One > > > thread will open connection in execute query and went to stylize > > > procedure 4. Another thread will come into execute query and > because > > > it could be another layer it will try to open new connection 5. > > > Connection manager will check pool size ( 1 in gdal case ) and > > because > > > provider didn't inc. reference count to connection (not way to > solve, > > > but anyhow provider needs to increment ) it will delete previous > > > connection and create new one 6. Access violation in first thread > > > while stylization because of deleted connection > > > > > > > > > In shorter version things to change: > > > 1. FDO GDAL provider is not increasing reference count on FDO > > > connection when used in other classes like georasterband , which is > > > later used in query (wrong) 2. connection manager is managing > > > connection pool by using fdo connection reference count ( wrong ) 3. > > > it looks like working for other providers, because connection pool > > > size is 200 and other providers are incrementing connection pointer > > in > > > select command for example, but if there would be provider with > > select > > > command with no reference count to connection, it will crash too 4. > > > Server code is making assumption that pool size for single threaded > > is > > > 1 (not correct), we need to make code change to allow for making > > calls > > > to provider single threaded > > > > > > Haris > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > > Bruce > > > Dechant > > > Sent: Friday, March 27, 2009 4:55 PM > > > To: MapGuide Internals Mail List > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket > > > #462 > > > > > > I totally agree - the locking needs to be in the provider. > > > > > > Thanks, > > > Bruce > > > > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > > > Traian Stanev > > > Sent: Friday, March 27, 2009 9:40 AM > > > To: MapGuide Internals Mail List > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket > > > #462 > > > > > > > > > It does look like a sledgehammer approach -- what if some day there > > is > > > a raster provider that works ok and doesn't need the locks? If > > > anything, the lock should be in the FDO provider. > > > > > > Traian > > > > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > Walt > > > Welton-Lair > > > Sent: Friday, March 27, 2009 11:11 AM > > > To: MapGuide Internals Mail List > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket > > > #462 > > > > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, and > > > I'll submit that. > > > > > > The stability patch (mapguide_raster_stability.patch) looks > > > reasonable, but I'd like another 1 or 2 people to also look at it. > > > It's significant since it limits MapGuide to processing one raster > > > stylization request at a time, but I guess that's better than > having > > a > > > dead lock. Anyone know where the thread-unsafe code actually is? > Is > > > it in the provider itself? > > > Can the mutex be moved closer to the actual thread-unsafe code? > > > > > > Walt > > > > > > -----Original Message----- > > > From: [hidden email] > > > [mailto:[hidden email]] On Behalf Of > > Jason > > > Birch > > > Sent: Friday, March 27, 2009 3:10 AM > > > To: [hidden email] > > > Subject: [mapguide-internals] PATCH: Raster stability fixes, ticket > > > #462 > > > > > > Hi, > > > > > > Haris has been working on the raster stability problem against > > > MapGuide 2.0.2. I was having the same problems, and was able to > > > duplicate on 2.1. I have worked with Haris to create patches > against > > > trunk that address a couple of the problems and increase stability > > immensely. > > > > > > Could someone please review the patches attached to: > > > > > > http://trac.osgeo.org/mapguide/ticket/462 > > > > > > and commit them to trunk if appropriate? > > > > > > My feeling is that the connection management stuff may need some > > > additional work, since limiting the connection pool to 1 for > > > single-threaded providers doesn't appear to have the desired effect. > > > However, getting both of these patches into 2.1 would at least > > improve > > > stability for Raster users. > > > > > > Jason_______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > _______________________________________________ > > > mapguide-internals mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > mapguide-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/mapguide-internals fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals_______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Traian Stanev
|
Hi Steve,
The problem Haris found does not have much to do with thread safety -- you could write code to produce the problem within a single thread. It has to do with a raster feature reader using a weak reference to the FDO connection when it needs to be a strong one (i.e. it should addref the connection). If you release the connection and then try to use the reader, it will crash. Traian > -----Original Message----- > From: [hidden email] [mailto:fdo-internals- > [hidden email]] On Behalf Of Steve Dang > Sent: Sunday, March 29, 2009 2:22 PM > To: FDO Internals Mail List; MapGuide Internals Mail List > Subject: RE: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster > stability fixes, ticket #462 > > I'm not surprised if it is related to some FDO pointer reference > counting problems (i.e. all FdoIDisposable derived classes are not > thread safe on reference counting). I have run load tests against the > Autodesk Raster provider using huge data set before. I will try to load > test the GDAL provider when I have a chance. > > Thanks, > Steve. > ________________________________________ > From: [hidden email] [fdo-internals- > [hidden email]] On Behalf Of Traian Stanev > Sent: Sunday, March 29, 2009 11:17 AM > To: MapGuide Internals Mail List > Cc: FDO Internals Mail List > Subject: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster > stability fixes, ticket #462 > > I attached a new patch for the provider to the ticket 462, which speeds > up GDAL FDO connection opening by 95%, in addition to the refcounting > fix, and the global mutex removal. The global mutex is still needed in > two places where I did not disable it. > > This change allows you to completely disable MG connection pooling for > the GDAL provider (like for SHP and SDF) and still have very good > raster speed, since opening a connection is fast now. Also, the global > raster mutex can be removed from the MG code. > > I will continue testing with this configuration, but so far it looks > stable and fast. Obviously it is impossible to prove that a problem no > longer exists, when threads are involved, but I will throw some more > difficult tests at it -- so far I have only tested with a TIFF-based > layer only. The TIFF is 450 megs or so. > > Traian > > > > -----Original Message----- > > From: [hidden email] [mailto:mapguide- > > [hidden email]] On Behalf Of Traian Stanev > > Sent: Sunday, March 29, 2009 1:29 AM > > To: MapGuide Internals Mail List > > Cc: FDO Internals Mail List > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket #462 > > > > I attached the patch for the GDAL provider to ticket 462. > > > > Traian > > > > > -----Original Message----- > > > From: [hidden email] [mailto:mapguide- > > > [hidden email]] On Behalf Of Traian Stanev > > > Sent: Sunday, March 29, 2009 1:22 AM > > > To: MapGuide Internals Mail List > > > Cc: FDO Internals Mail List > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket #462 > > > > > > > > > For what it's worth, patching the provider to addref the parent > > > connection from the feature reader fixes the problem. This is on > top > > of > > > Friday's trunk MapGuide and FDO code, without any other extra fixes. > > I > > > do think it is a bug not to addref the connection from the reader, > > when > > > it in fact depends on it to be alive, in order to get stuff from it > > at > > > a later time. > > > > > > I am pretty sure the addref fixes the problem, because the test I > am > > > using is fairly brutal and produces the crash/deadlock in less than > 5 > > > seconds every time I run it with unpatched GDAL provider. With the > > > change, I ran it multiple times to completion. > > > > > > I am unhappy with the fix though, because it leaves CPU usage by > > > mgserver at ~5%, even though I am doing something like 16 > > simultaneous > > > requests... So then I decided to remove the global mutex from > inside > > > the provider -- that didn't make a difference in CPU usage, and the > > > test still passed without problems. So that mutex can go. Then I > > messed > > > with the connection pooling settings in serverconfig.ini -- those > > > didn't make any difference. I tried them all for the GDAL provider > > and > > > the test still passed with 5% CPU usage. I don't think the process > is > > > blocked for I/O because the TIFF I am using is not huge and I also > > > generated an overlay pyramid (which I highly recommend to anyone by > > the > > > way, it makes a huge performance difference for TIFFs). > > > > > > > > > > > > > > > Traian > > > > > > > > > > -----Original Message----- > > > > From: [hidden email] > [mailto:mapguide- > > > > [hidden email]] On Behalf Of Haris Kurtagic > > > > Sent: Saturday, March 28, 2009 4:43 AM > > > > To: MapGuide Internals Mail List > > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability > fixes, > > > > ticket #462 > > > > > > > > Regarding MG raster problems ( or any other provider ) the main > > > reason > > > > is connection manager. > > > > If you would set connection pool size 1 , for sdf provider MG > will > > > not > > > > work properly. > > > > > > > > The patch Jason submitted, is workaround for issue. It will not > > cause > > > > GDAL to work slower, because provider itself is basically > > > > single-threaded now. > > > > That was done trough work we were doing years or so ago. I was > > > looking > > > > at provider back then and realized that making gdal calls guarded > > > will > > > > make it more stable. Frank then submitted patch with guard on > every > > > > single gdal library call. > > > > > > > > I will try to explain what is happing now: > > > > 1. Map with more than one gdal layer > > > > 2. There are two calls in server code execute query and stylize 3. > > > One > > > > thread will open connection in execute query and went to stylize > > > > procedure 4. Another thread will come into execute query and > > because > > > > it could be another layer it will try to open new connection 5. > > > > Connection manager will check pool size ( 1 in gdal case ) and > > > because > > > > provider didn't inc. reference count to connection (not way to > > solve, > > > > but anyhow provider needs to increment ) it will delete previous > > > > connection and create new one 6. Access violation in first thread > > > > while stylization because of deleted connection > > > > > > > > > > > > In shorter version things to change: > > > > 1. FDO GDAL provider is not increasing reference count on FDO > > > > connection when used in other classes like georasterband , which > is > > > > later used in query (wrong) 2. connection manager is managing > > > > connection pool by using fdo connection reference count ( wrong ) > 3. > > > > it looks like working for other providers, because connection > pool > > > > size is 200 and other providers are incrementing connection > pointer > > > in > > > > select command for example, but if there would be provider with > > > select > > > > command with no reference count to connection, it will crash too > 4. > > > > Server code is making assumption that pool size for single > threaded > > > is > > > > 1 (not correct), we need to make code change to allow for making > > > calls > > > > to provider single threaded > > > > > > > > Haris > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > > Bruce > > > > Dechant > > > > Sent: Friday, March 27, 2009 4:55 PM > > > > To: MapGuide Internals Mail List > > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > > ticket > > > > #462 > > > > > > > > I totally agree - the locking needs to be in the provider. > > > > > > > > Thanks, > > > > Bruce > > > > > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > > > Traian Stanev > > > > Sent: Friday, March 27, 2009 9:40 AM > > > > To: MapGuide Internals Mail List > > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > > ticket > > > > #462 > > > > > > > > > > > > It does look like a sledgehammer approach -- what if some day > there > > > is > > > > a raster provider that works ok and doesn't need the locks? If > > > > anything, the lock should be in the FDO provider. > > > > > > > > Traian > > > > > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > Walt > > > > Welton-Lair > > > > Sent: Friday, March 27, 2009 11:11 AM > > > > To: MapGuide Internals Mail List > > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > > ticket > > > > #462 > > > > > > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, > and > > > > I'll submit that. > > > > > > > > The stability patch (mapguide_raster_stability.patch) looks > > > > reasonable, but I'd like another 1 or 2 people to also look at it. > > > > It's significant since it limits MapGuide to processing one > raster > > > > stylization request at a time, but I guess that's better than > > having > > > a > > > > dead lock. Anyone know where the thread-unsafe code actually is? > > Is > > > > it in the provider itself? > > > > Can the mutex be moved closer to the actual thread-unsafe code? > > > > > > > > Walt > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > > Jason > > > > Birch > > > > Sent: Friday, March 27, 2009 3:10 AM > > > > To: [hidden email] > > > > Subject: [mapguide-internals] PATCH: Raster stability fixes, > ticket > > > > #462 > > > > > > > > Hi, > > > > > > > > Haris has been working on the raster stability problem against > > > > MapGuide 2.0.2. I was having the same problems, and was able to > > > > duplicate on 2.1. I have worked with Haris to create patches > > against > > > > trunk that address a couple of the problems and increase > stability > > > immensely. > > > > > > > > Could someone please review the patches attached to: > > > > > > > > http://trac.osgeo.org/mapguide/ticket/462 > > > > > > > > and commit them to trunk if appropriate? > > > > > > > > My feeling is that the connection management stuff may need some > > > > additional work, since limiting the connection pool to 1 for > > > > single-threaded providers doesn't appear to have the desired > effect. > > > > However, getting both of these patches into 2.1 would at least > > > improve > > > > stability for Raster users. > > > > > > > > Jason_______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > fdo-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/fdo- > internals_______________________________________________ > fdo-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/fdo-internals fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Steve Dang
|
Sounds like the same known problem about using a weak reference to the FDO connection. I've seen a few of these documented/addressed in the Server code.
Steve. ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Traian Stanev Sent: Sunday, March 29, 2009 12:46 PM To: FDO Internals Mail List; MapGuide Internals Mail List Subject: RE: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster stability fixes, ticket #462 Hi Steve, The problem Haris found does not have much to do with thread safety -- you could write code to produce the problem within a single thread. It has to do with a raster feature reader using a weak reference to the FDO connection when it needs to be a strong one (i.e. it should addref the connection). If you release the connection and then try to use the reader, it will crash. Traian > -----Original Message----- > From: [hidden email] [mailto:fdo-internals- > [hidden email]] On Behalf Of Steve Dang > Sent: Sunday, March 29, 2009 2:22 PM > To: FDO Internals Mail List; MapGuide Internals Mail List > Subject: RE: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster > stability fixes, ticket #462 > > I'm not surprised if it is related to some FDO pointer reference > counting problems (i.e. all FdoIDisposable derived classes are not > thread safe on reference counting). I have run load tests against the > Autodesk Raster provider using huge data set before. I will try to load > test the GDAL provider when I have a chance. > > Thanks, > Steve. > ________________________________________ > From: [hidden email] [fdo-internals- > [hidden email]] On Behalf Of Traian Stanev > Sent: Sunday, March 29, 2009 11:17 AM > To: MapGuide Internals Mail List > Cc: FDO Internals Mail List > Subject: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster > stability fixes, ticket #462 > > I attached a new patch for the provider to the ticket 462, which speeds > up GDAL FDO connection opening by 95%, in addition to the refcounting > fix, and the global mutex removal. The global mutex is still needed in > two places where I did not disable it. > > This change allows you to completely disable MG connection pooling for > the GDAL provider (like for SHP and SDF) and still have very good > raster speed, since opening a connection is fast now. Also, the global > raster mutex can be removed from the MG code. > > I will continue testing with this configuration, but so far it looks > stable and fast. Obviously it is impossible to prove that a problem no > longer exists, when threads are involved, but I will throw some more > difficult tests at it -- so far I have only tested with a TIFF-based > layer only. The TIFF is 450 megs or so. > > Traian > > > > -----Original Message----- > > From: [hidden email] [mailto:mapguide- > > [hidden email]] On Behalf Of Traian Stanev > > Sent: Sunday, March 29, 2009 1:29 AM > > To: MapGuide Internals Mail List > > Cc: FDO Internals Mail List > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > ticket #462 > > > > I attached the patch for the GDAL provider to ticket 462. > > > > Traian > > > > > -----Original Message----- > > > From: [hidden email] [mailto:mapguide- > > > [hidden email]] On Behalf Of Traian Stanev > > > Sent: Sunday, March 29, 2009 1:22 AM > > > To: MapGuide Internals Mail List > > > Cc: FDO Internals Mail List > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > ticket #462 > > > > > > > > > For what it's worth, patching the provider to addref the parent > > > connection from the feature reader fixes the problem. This is on > top > > of > > > Friday's trunk MapGuide and FDO code, without any other extra fixes. > > I > > > do think it is a bug not to addref the connection from the reader, > > when > > > it in fact depends on it to be alive, in order to get stuff from it > > at > > > a later time. > > > > > > I am pretty sure the addref fixes the problem, because the test I > am > > > using is fairly brutal and produces the crash/deadlock in less than > 5 > > > seconds every time I run it with unpatched GDAL provider. With the > > > change, I ran it multiple times to completion. > > > > > > I am unhappy with the fix though, because it leaves CPU usage by > > > mgserver at ~5%, even though I am doing something like 16 > > simultaneous > > > requests... So then I decided to remove the global mutex from > inside > > > the provider -- that didn't make a difference in CPU usage, and the > > > test still passed without problems. So that mutex can go. Then I > > messed > > > with the connection pooling settings in serverconfig.ini -- those > > > didn't make any difference. I tried them all for the GDAL provider > > and > > > the test still passed with 5% CPU usage. I don't think the process > is > > > blocked for I/O because the TIFF I am using is not huge and I also > > > generated an overlay pyramid (which I highly recommend to anyone by > > the > > > way, it makes a huge performance difference for TIFFs). > > > > > > > > > > > > > > > Traian > > > > > > > > > > -----Original Message----- > > > > From: [hidden email] > [mailto:mapguide- > > > > [hidden email]] On Behalf Of Haris Kurtagic > > > > Sent: Saturday, March 28, 2009 4:43 AM > > > > To: MapGuide Internals Mail List > > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability > fixes, > > > > ticket #462 > > > > > > > > Regarding MG raster problems ( or any other provider ) the main > > > reason > > > > is connection manager. > > > > If you would set connection pool size 1 , for sdf provider MG > will > > > not > > > > work properly. > > > > > > > > The patch Jason submitted, is workaround for issue. It will not > > cause > > > > GDAL to work slower, because provider itself is basically > > > > single-threaded now. > > > > That was done trough work we were doing years or so ago. I was > > > looking > > > > at provider back then and realized that making gdal calls guarded > > > will > > > > make it more stable. Frank then submitted patch with guard on > every > > > > single gdal library call. > > > > > > > > I will try to explain what is happing now: > > > > 1. Map with more than one gdal layer > > > > 2. There are two calls in server code execute query and stylize 3. > > > One > > > > thread will open connection in execute query and went to stylize > > > > procedure 4. Another thread will come into execute query and > > because > > > > it could be another layer it will try to open new connection 5. > > > > Connection manager will check pool size ( 1 in gdal case ) and > > > because > > > > provider didn't inc. reference count to connection (not way to > > solve, > > > > but anyhow provider needs to increment ) it will delete previous > > > > connection and create new one 6. Access violation in first thread > > > > while stylization because of deleted connection > > > > > > > > > > > > In shorter version things to change: > > > > 1. FDO GDAL provider is not increasing reference count on FDO > > > > connection when used in other classes like georasterband , which > is > > > > later used in query (wrong) 2. connection manager is managing > > > > connection pool by using fdo connection reference count ( wrong ) > 3. > > > > it looks like working for other providers, because connection > pool > > > > size is 200 and other providers are incrementing connection > pointer > > > in > > > > select command for example, but if there would be provider with > > > select > > > > command with no reference count to connection, it will crash too > 4. > > > > Server code is making assumption that pool size for single > threaded > > > is > > > > 1 (not correct), we need to make code change to allow for making > > > calls > > > > to provider single threaded > > > > > > > > Haris > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > > Bruce > > > > Dechant > > > > Sent: Friday, March 27, 2009 4:55 PM > > > > To: MapGuide Internals Mail List > > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > > ticket > > > > #462 > > > > > > > > I totally agree - the locking needs to be in the provider. > > > > > > > > Thanks, > > > > Bruce > > > > > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > > > Traian Stanev > > > > Sent: Friday, March 27, 2009 9:40 AM > > > > To: MapGuide Internals Mail List > > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > > ticket > > > > #462 > > > > > > > > > > > > It does look like a sledgehammer approach -- what if some day > there > > > is > > > > a raster provider that works ok and doesn't need the locks? If > > > > anything, the lock should be in the FDO provider. > > > > > > > > Traian > > > > > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > Walt > > > > Welton-Lair > > > > Sent: Friday, March 27, 2009 11:11 AM > > > > To: MapGuide Internals Mail List > > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes, > > > > ticket > > > > #462 > > > > > > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good, > and > > > > I'll submit that. > > > > > > > > The stability patch (mapguide_raster_stability.patch) looks > > > > reasonable, but I'd like another 1 or 2 people to also look at it. > > > > It's significant since it limits MapGuide to processing one > raster > > > > stylization request at a time, but I guess that's better than > > having > > > a > > > > dead lock. Anyone know where the thread-unsafe code actually is? > > Is > > > > it in the provider itself? > > > > Can the mutex be moved closer to the actual thread-unsafe code? > > > > > > > > Walt > > > > > > > > -----Original Message----- > > > > From: [hidden email] > > > > [mailto:[hidden email]] On Behalf Of > > > Jason > > > > Birch > > > > Sent: Friday, March 27, 2009 3:10 AM > > > > To: [hidden email] > > > > Subject: [mapguide-internals] PATCH: Raster stability fixes, > ticket > > > > #462 > > > > > > > > Hi, > > > > > > > > Haris has been working on the raster stability problem against > > > > MapGuide 2.0.2. I was having the same problems, and was able to > > > > duplicate on 2.1. I have worked with Haris to create patches > > against > > > > trunk that address a couple of the problems and increase > stability > > > immensely. > > > > > > > > Could someone please review the patches attached to: > > > > > > > > http://trac.osgeo.org/mapguide/ticket/462 > > > > > > > > and commit them to trunk if appropriate? > > > > > > > > My feeling is that the connection management stuff may need some > > > > additional work, since limiting the connection pool to 1 for > > > > single-threaded providers doesn't appear to have the desired > effect. > > > > However, getting both of these patches into 2.1 would at least > > > improve > > > > stability for Raster users. > > > > > > > > Jason_______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > > > _______________________________________________ > > > > mapguide-internals mailing list > > > > [hidden email] > > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > > _______________________________________________ > > mapguide-internals mailing list > > [hidden email] > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals > _______________________________________________ > fdo-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/fdo- > internals_______________________________________________ > fdo-internals mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/fdo-internals fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals_______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |