GT3 security performance evaluation

Olle Mulmo, PDC
 

I have a created a small test service with GT3, and then turn on security. What do I see?

You see exactly what the people in theLCG Grid Technology Area at CERN showcased in their tests with their DummyService tests: response times will increase by a factor 10. In addtion, it takes 5 seconds or more to interact with the service factory and create a new service.
 

5 seconds? Surely, this can't be?

The short answer is unfortunately yes. There are many factors that contribute though: I have found several of them and will try to explain what I have found as thoroughly as possible.

NOTE: Actual timings don't mean a thing as they only apply to the computer you run on. Below, you will find the timings I measured on my instrumented GT3 installation running on my particular hardware with my particular version of the JVM. The important thing to remember from the figures are their RELATIVE sizes.

Timings are denoted in milliseconds throughout this document. Prefixes K and M denote kibi and mibi, not kilo and mega.
 

So the service creation takes several seconds? It must be an overly complex operation?

Actually, the service creation process itself takes about 50 msec. It's neglible.

What hits you and accounts for the many seconds is the initialization of the underlying tooling in your client: initializing the Axis handlers and the XML security libraries account for the major part of this time.

In addition, there are many other one-time operations happening, such as initializing the secure random number generator, loading the proxy certificate and your trusted CA certificates from disk, loading the WSDL defining the factory service porttype, and finally establish a security context with the server when using GSI-SecureConversation (which you tend to do). None of this is performed if you were to create a second service instance in the same security context.
 

OK, but there's still a factor 10 penalty on my method invocations regardless of this. Is it the complicated and expensive crypto operations, perhaps?

What I tried to showcase (but failed miserably) in an earlier email was that the XML security implementation is broken in that it is generating a huge overhead: parsing the surrounding XML is 2-3 times more expensive than the crypto operation itself.

The DummyService tests make use of HMAC, a lightweight form of creating a digital signature to ensure message integrity. Another alternative is full-blown encryption. The table below displays characteristics for both. The case of no encryption (plain) is shown as well.

SecurityPayload (b)SOAPsize (b)T(engine)T(roundtrip)
plain 16 481 0 15
plain 16000 16465 <10 23
plain 160000 160465 20 106
 
HMAC 16 1390 40 120
HMAC 16000 17490 70 250
HMAC 160000 161400 360 1100
 
Encr 16 1570 30 75
Encr 16000 23200 80 220
Encr 160000 218050 500 1380
To summarize, we do see a performance hit when using the XML security library, but it's far from a factor 10.
 

How can you say that? Clearly, I see figures differing by more than a factor 10 in that table!

Yes, but remember that the load on the client is the same as the load on the server! If you look at the size-16 case for, you get a penalty of 40 msec on both ends in the HMAC case. The enveloping of the encrypted or signed data enlarges the message size, but I wouldn't attribute too much of the overhead to slower network transfer and XML (de-)serializing.

I think this is where the DummyService tests made were WRONG: they put many clients on the same machine, expecting to create a huge load on the server. In fact, the load on the client machine was just as high, and this may attribute to the fact that the CPU utilization on the server was not 100% when it reached "max".

Notice also that the poor implementation of XML security hits you 4-fold on the roundtrip time, as you need to encrypt and decrypt on both client and server side, in sequence. Thus, for every msec of overhead that we can save by improving the implementation of this library, we will gain 4.
 

But these figures are still in the millisecond range: how do you attribute my factor of 10 slowdown?

The table does not include the overhead for establishing a shared secret used to encrypt the data, and this is what hits you.

The handshake between the client and the server is an exchange of 3 roundtrip messages in total. In my case, the size of the SOAP messages sent were {request,response}:
 

{787,3275} {2606,787} {670,686}

This sequence of messages takes roughly 150 msec to complete.

BUT, this is for establishing a context only. In the DummyService case, the client performs a credential delegation to the service instance as well; This costs an additional roundtrip. The exchange pattern and corresponding messages sizes in my case were:
 

{787,3275} {2606,787} {670,1131} {1322,686}

A handshake with delegation implies much more work for the involved parties as it involves the creation and validation of a RSA key pair: the four roundtrips takes roughly 400 msec to complete.
 

But I already had my secure context established -- I did that when talking to the factory!

No: you establish a new context for each new port type reference that you make use of.

In the case of the secure DummyService client (source code), this means the following message pattern:

In total, this results in 4 roundtrips in the non-secure case (the lines that start with 'invoke'), and an additional 7 to in the secure case, totalling 11 roundtrips.

Notice the BUG in the client program: the service instance is destroyed without any security! Furthermore, since the DummyService does not extend the GridService port type, the client needs to grabs a new reference to a GridService port type from the same service locator. The new reference would need 3 additional roundtrips in order to establish a new security context with this new port type, before invoking destroy(). (One can argue heavily that all this is nothing but a good example of stupid shortcomings in the tooling.)
 

Huh? I thought I would understand more by reading all this?

The following is a bit misleading and wrong, but makes for a good conceptual summary of what happens: When using security, you get a factor-3 performance penalty when you invoke a single message. Furthermore, you invoke 3 times as many messages: multiply them together, and you have your slowdown of a factor-10.
 

Anything else?

Clearly, the tooling does not handle the simple use case very well. The initialization overheads are way off the chart, and invoking a single method securely increases the overhead by a factor 10.

The tooling works much better in scenarios where initialization cost is neglible, for instance when you perform hundreds of invocations using the same security context: The overhead can then be measured in additional 10ths of milliseconds for small payloads.

On a related note, Java Hotspot works quite well on GT3: while performing hundreds of invocations on the same service instance, you will gradually see improvements on both client and server side that eventually cut as much as 40% of the roundtrip time.
 

Conclusions?

Not really. Clearly, we need a solution for the invoke-once-then-die usage scenario which the current tooling is simply not built for. It will need some serious thinking on how you can go about to fix this.

In the meanwhile, I suggest we concentrate us on the XML security library and its duplicate parsing and internal use of XPath queries: any millisecond saved there will cut the roundtrip time by 4.


mulmo@pdc.kth.se

Notes: