Improving startup time: sanity check

4 messages Options
Embed this post
Permalink
Erik Huelsmann

Improving startup time: sanity check

Reply Threaded More More options
Print post
Permalink
Last weekend, we experimented with better autoloading. It turned out
to strip roughly .4 seconds from a cold startup time of 1.7s, making
it a 25% improvement.

However, the reason we started out with the startup time improvements
in the first place was the ABCL startup time on Google App Engine. It
turns out that our CPU usage during startup hasn't really decreased
much (as per their benchmark indicator - they can't really give an
actual figure).

So, I asked for advice on #appengine (on freenode). Their reaction was
"we can't imagine the startup time being related to the size of the
JAR" even though Peter Graves calculated a 34% ratio between ABCL and
Clojure jar sizes and a 35% ratio between startup times - that looks
like a linear match. Their reaction continued "you're probably just
doing too much work during the init() phase."

The init() phase is where the ABCL environment gets loaded and all
function objects get created.

Let's assume for a second they're right. In that case we must assume
it's not I/O holding us up: it's the work the CPU must do to get us up
and running. If that's true, profiling the application should tell us
something about the bottlenecks we're running into. I happen to have
done quite a number of such profiles in the course of last week. The
conclusion which stands out is that ABCL - during the startup process
- spends ~ 40% of its time finding class constructors: the main
component of creating function objects.

This brought me to the conclusion that our startup process could be
much faster, if we decided to delay function object creation until the
function is actually used: we would eliminate the need to construct
function objects until they're used instead of creating them when
their siblings are requested to be loaded.

The idea is to create another Autoload derivative which will be
"installed" in the appropriate places which, when invoked, loads the
actual class from the byte array. I'm hoping this will cause a more
equally spread "initialization load". The performance hit will only be
the first call to the function: after it has been converted from the
byte array, the autoload object will remove itself from the function
call chain.

So, how about it? Comments most welcome!


Bye,

Erik.

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
armedbear-j-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/armedbear-j-devel
Alessio Stalla

Re: Improving startup time: sanity check

Reply Threaded More More options
Print post
Permalink
On Wed, Oct 28, 2009 at 9:15 PM, Erik Huelsmann <[hidden email]> wrote:

> Last weekend, we experimented with better autoloading. It turned out
> to strip roughly .4 seconds from a cold startup time of 1.7s, making
> it a 25% improvement.
>
> However, the reason we started out with the startup time improvements
> in the first place was the ABCL startup time on Google App Engine. It
> turns out that our CPU usage during startup hasn't really decreased
> much (as per their benchmark indicator - they can't really give an
> actual figure).
>
> So, I asked for advice on #appengine (on freenode). Their reaction was
> "we can't imagine the startup time being related to the size of the
> JAR" even though Peter Graves calculated a 34% ratio between ABCL and
> Clojure jar sizes and a 35% ratio between startup times - that looks
> like a linear match. Their reaction continued "you're probably just
> doing too much work during the init() phase."
>
> The init() phase is where the ABCL environment gets loaded and all
> function objects get created.
>
> Let's assume for a second they're right. In that case we must assume
> it's not I/O holding us up: it's the work the CPU must do to get us up
> and running. If that's true, profiling the application should tell us
> something about the bottlenecks we're running into. I happen to have
> done quite a number of such profiles in the course of last week. The
> conclusion which stands out is that ABCL - during the startup process
> - spends ~ 40% of its time finding class constructors: the main
> component of creating function objects.
>
> This brought me to the conclusion that our startup process could be
> much faster, if we decided to delay function object creation until the
> function is actually used: we would eliminate the need to construct
> function objects until they're used instead of creating them when
> their siblings are requested to be loaded.
>
> The idea is to create another Autoload derivative which will be
> "installed" in the appropriate places which, when invoked, loads the
> actual class from the byte array. I'm hoping this will cause a more
> equally spread "initialization load". The performance hit will only be
> the first call to the function: after it has been converted from the
> byte array, the autoload object will remove itself from the function
> call chain.
>
> So, how about it? Comments most welcome!
I have mixed feelings about the idea. I think it's clever; but I also
think we (I, at least) need more data to know if it will be actually
beneficial.

If the goal is speeding up startup time in a context like AppEngine -
where not only Lisp, but the whole user application will be loaded
from scratch from time to time - then it is critical to know how many
Lisp functions a generic application uses on average (both directly
and indirectly). If it turns up that, say, 50% of Lisp is commonly
used, then no matter how clever an autoloading scheme you implement,
you'll cut loading times only by roughly 50% at best.
If getting constructors through reflection is really the bottleneck,
and if we determine that using new instead of reflection is
significantly faster (from a quick test of mine, it seems it *really*
is [1]), then it might be sensible to avoid reflection altogether and
devise another scheme. For example, the compiler-generated class X
could contain in its static initialization block the equivalent of
something like

Lisp.someThreadLocal.set(new X())

and loadCompiledFunction or what it is could just fetch the instance
from the threadlocal; not very elegant, but if it speeds things up...

Alessio

[1] this is the astounding result on a couple of runs on 50000
iterations (test files attached):
REFLECTION: 16262373155
NEW: 84267527
% SLOWER: 19298

REFLECTION: 15917190176
NEW: 103681915
% SLOWER: 15351

REFLECTION: 15838714133
NEW: 77235481
% SLOWER: 20507

(times in ns) i.e. reflection as we use it is roughly 150-200 times
slower than new and that's on a very simple class with no superclasses
and a single constructor! The test might be wrong as I wrote it
quickly and it's quite tricky. It uses the very same classloader of
abcl, though (copy-pasted).

[Test.java]

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.lang.reflect.Method;
import java.net.URL;
import java.net.URLClassLoader;


public class Test {

        public Test() {}
       
        public static void test(byte[] testClass, long[] times) {
                long start = System.nanoTime();
                try {
                        Class c = new JavaClassLoader().loadClassFromByteArray("Test", testClass);
                        c.newInstance();
                } catch (Exception e) {
                        e.printStackTrace();
                }
                long time = System.nanoTime() - start;
                times[0] += time;
               
                start = System.nanoTime();
                new Test();
                time = System.nanoTime() - start;
                times[1] += time;
        }
       
        public static void main(String[] args) {
                try {
                        ByteArrayOutputStream bos = new ByteArrayOutputStream();
                        BufferedInputStream bin = new BufferedInputStream(Test.class.getResourceAsStream("Test.class"));
                        try {
                                int r = bin.read();
                                while(r != -1) {
                                        bos.write(r);
                                        r = bin.read();
                                }
                        } catch(IOException e) {
                                e.printStackTrace();
                                System.exit(1);
                        }
                        byte[] testClassBytes = bos.toByteArray();
                        long[] times = new long[] { 0, 0 };
                        long ntimes = 50000;
                        for(long i = 0; i < ntimes; i++) {
                                Class testClass = new URLClassLoader(new URL[] { Test.class.getResource("Test.class") }).loadClass("Test");
                                Method testMethod = testClass.getMethod("test", byte[].class, long[].class);
                                testMethod.invoke(null, testClassBytes, times);
                        }
                        System.out.println("REFLECTION: " + times[0]);
                        System.out.println("NEW: " + times[1]);
                        System.out.println("% SLOWER: " + (times[0] * 100) / times[1]);
                } catch (Exception e) {
                        e.printStackTrace();
                }
        }
       
}


[JavaClassLoader.java]

import java.util.Collections;
import java.util.HashSet;
import java.util.Set;

public class JavaClassLoader extends ClassLoader {

    private static JavaClassLoader persistentInstance;

    private static Set<String> packages = Collections.synchronizedSet(new HashSet<String>());

    public JavaClassLoader()
    {
        super(JavaClassLoader.class.getClassLoader());
    }

    public static JavaClassLoader getPersistentInstance()
    {
        return getPersistentInstance(null);
    }

    public static JavaClassLoader getPersistentInstance(String packageName)
    {
        if (persistentInstance == null)
            persistentInstance = new JavaClassLoader();
        definePackage(packageName);
        return persistentInstance;
    }

    private static void definePackage(String packageName)
    {
        if (packageName != null && !packages.contains(packageName)) {
            persistentInstance.definePackage(packageName,"","1.0","","","1.0","",null);
            packages.add(packageName);
        }
    }

    public Class<?> loadClassFromByteArray(byte[] classbytes) {
        return loadClassFromByteArray(null, classbytes);
    }

    public Class<?> loadClassFromByteArray(String className,
                                                byte[] classbytes)
    {
        try {
            long length = classbytes.length;
            if (length < Integer.MAX_VALUE) {
                Class<?> c =
                    defineClass(className, classbytes, 0, (int) length);
                if (c != null) {
                    resolveClass(c);
                    return c;
                }
            }
        }
        catch (LinkageError e) {
            throw e;
        }
        catch (Throwable t) {
            t.printStackTrace();
        }
        return null;
    }

    public Class<?> loadClassFromByteArray(String className, byte[] bytes,
                                                int offset, int length)
    {
        try {
            Class<?> c = defineClass(className, bytes, offset, length);
            if (c != null) {
                resolveClass(c);
                return c;
            }
        }
        catch (Throwable t) {
            t.printStackTrace();
        }
        return null;
    }
}


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
armedbear-j-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/armedbear-j-devel
Alessio Stalla

Re: Improving startup time: sanity check

Reply Threaded More More options
Print post
Permalink
On Wed, Oct 28, 2009 at 11:20 PM, Alessio Stalla
<[hidden email]> wrote:
> The test might be wrong

It was, indeed, wrong. It measured class loading time too for
reflection, but that can't be eliminated, no matter what instantiation
technique you use.

New results on 50000 iterations:

REFLECTION: 9660480097
NEW: 88353767
% SLOWER: 10933

REFLECTION: 9602410291
NEW: 78459331
% SLOWER: 12238

REFLECTION: 9925574038
NEW: 86923015
% SLOWER: 11418

~100 times slower is still impressive.

Ale

[Test.java]

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.lang.reflect.Method;
import java.net.URL;
import java.net.URLClassLoader;


public class Test {

        public Test() {}
       
        public static void test(Class testClass, long[] times) {
                long start = System.nanoTime();
                try {
                        testClass.newInstance();
                } catch (Exception e) {
                        e.printStackTrace();
                }
                long time = System.nanoTime() - start;
                times[0] += time;
               
                start = System.nanoTime();
                new Test();
                time = System.nanoTime() - start;
                times[1] += time;
        }
       
        public static void main(String[] args) {
                try {
                        ByteArrayOutputStream bos = new ByteArrayOutputStream();
                        BufferedInputStream bin = new BufferedInputStream(Test.class.getResourceAsStream("Test.class"));
                        try {
                                int r = bin.read();
                                while(r != -1) {
                                        bos.write(r);
                                        r = bin.read();
                                }
                        } catch(IOException e) {
                                e.printStackTrace();
                                System.exit(1);
                        }
                        byte[] testClassBytes = bos.toByteArray();
                        long[] times = new long[] { 0, 0 };
                        long ntimes = 50000;
                        for(long i = 0; i < ntimes; i++) {
                                Class testClass1 = new URLClassLoader(new URL[] { Test.class.getResource("Test.class") }).loadClass("Test");
                                Method testMethod = testClass1.getMethod("test", Class.class, long[].class);
                               
                                Class testClass2 = new JavaClassLoader().loadClassFromByteArray("Test", testClassBytes);
                                testMethod.invoke(null, testClass2, times);
                        }
                        System.out.println("REFLECTION: " + times[0]);
                        System.out.println("NEW: " + times[1]);
                        System.out.println("% SLOWER: " + (times[0] * 100) / times[1]);
                } catch (Exception e) {
                        e.printStackTrace();
                }
        }
       
}


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
armedbear-j-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/armedbear-j-devel
Alessio Stalla

Re: Improving startup time: sanity check

Reply Threaded More More options
Print post
Permalink
On Wed, Oct 28, 2009 at 11:34 PM, Alessio Stalla
<[hidden email]> wrote:

> On Wed, Oct 28, 2009 at 11:20 PM, Alessio Stalla
> <[hidden email]> wrote:
>> The test might be wrong
>
> It was, indeed, wrong. It measured class loading time too for
> reflection, but that can't be eliminated, no matter what instantiation
> technique you use.
>
> New results on 50000 iterations:
>
> REFLECTION: 9660480097
> NEW: 88353767
> % SLOWER: 10933
>
> REFLECTION: 9602410291
> NEW: 78459331
> % SLOWER: 12238
>
> REFLECTION: 9925574038
> NEW: 86923015
> % SLOWER: 11418
>
> ~100 times slower is still impressive.

Incidentally, if those figures are right, it means that class loading
from byte array is very costly too, and Erik's idea is likely to
improve performance significantly if the application never uses a
relatively large part of Lisp.

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
armedbear-j-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/armedbear-j-devel