123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200 |
- ===============================================================
- libbcc: A Versatile Bitcode Execution Engine for Mobile Devices
- ===============================================================
- Introduction
- ------------
- libbcc is an LLVM bitcode execution engine that compiles the bitcode
- to an in-memory executable. libbcc is versatile because:
- * it implements both AOT (Ahead-of-Time) and JIT (Just-in-Time)
- compilation.
- * Android devices demand fast start-up time, small size, and high
- performance *at the same time*. libbcc attempts to address these
- design constraints.
- * it supports on-device linking. Each device vendor can supply his or
- her own runtime bitcode library (lib*.bc) that differentiates his or
- her system. Specialization becomes ecosystem-friendly.
- libbcc provides:
- * a *just-in-time bitcode compiler*, which translates the LLVM bitcode
- into machine code
- * a *caching mechanism*, which can:
- * after each compilation, serialize the in-memory executable into a
- cache file. Note that the compilation is triggered by a cache
- miss.
- * load from the cache file upon cache-hit.
- Highlights of libbcc are:
- * libbcc supports bitcode from various language frontends, such as
- Renderscript, GLSL (pixelflinger2).
- * libbcc strives to balance between library size, launch time and
- steady-state performance:
- * The size of libbcc is aggressively reduced for mobile devices. We
- customize and improve upon the default Execution Engine from
- upstream. Otherwise, libbcc's execution engine can easily become
- at least 2 times bigger.
- * To reduce launch time, we support caching of
- binaries. Just-in-Time compilation are oftentimes Just-too-Late,
- if the given apps are performance-sensitive. Thus, we implemented
- AOT to get the best of both worlds: Fast launch time and high
- steady-state performance.
- AOT is also important for projects such as NDK on LLVM with
- portability enhancement. Launch time reduction after we
- implemented AOT is signficant::
- Apps libbcc without AOT libbcc with AOT
- launch time in libbcc launch time in libbcc
- App_1 1218ms 9ms
- App_2 842ms 4ms
- Wallpaper:
- MagicSmoke 182ms 3ms
- Halo 127ms 3ms
- Balls 149ms 3ms
- SceneGraph 146ms 90ms
- Model 104ms 4ms
- Fountain 57ms 3ms
- AOT also masks the launching time overhead of on-device linking
- and helps it become reality.
- * For steady-state performance, we enable VFP3 and aggressive
- optimizations.
- * Currently we disable Lazy JITting.
- API
- ---
- **Basic:**
- * **bccCreateScript** - Create new bcc script
- * **bccRegisterSymbolCallback** - Register the callback function for external
- symbol lookup
- * **bccReadBC** - Set the source bitcode for compilation
- * **bccReadModule** - Set the llvm::Module for compilation
- * **bccLinkBC** - Set the library bitcode for linking
- * **bccPrepareExecutable** - *deprecated* - Use bccPrepareExecutableEx instead
- * **bccPrepareExecutableEx** - Create the in-memory executable by either
- just-in-time compilation or cache loading
- * **bccGetFuncAddr** - Get the entry address of the function
- * **bccDisposeScript** - Destroy bcc script and release the resources
- * **bccGetError** - *deprecated* - Don't use this
- **Reflection:**
- * **bccGetExportVarCount** - Get the count of exported variables
- * **bccGetExportVarList** - Get the addresses of exported variables
- * **bccGetExportFuncCount** - Get the count of exported functions
- * **bccGetExportFuncList** - Get the addresses of exported functions
- * **bccGetPragmaCount** - Get the count of pragmas
- * **bccGetPragmaList** - Get the pragmas
- **Debug:**
- * **bccGetFuncCount** - Get the count of functions (including non-exported)
- * **bccGetFuncInfoList** - Get the function information (name, base, size)
- Cache File Format
- -----------------
- A cache file (denoted as \*.oBCC) for libbcc consists of several sections:
- header, string pool, dependencies table, relocation table, exported
- variable list, exported function list, pragma list, function information
- table, and bcc context. Every section should be aligned to a word size.
- Here is the brief description of each sections:
- * **Header** (MCO_Header) - The header of a cache file. It contains the
- magic word, version, machine integer type information (the endianness,
- the size of off_t, size_t, and ptr_t), and the size
- and offset of other sections. The header section is guaranteed
- to be at the beginning of the cache file.
- * **String Pool** (MCO_StringPool) - A collection of serialized variable
- length strings. The strp_index in the other part of the cache file
- represents the index of such string in this string pool.
- * **Dependencies Table** (MCO_DependencyTable) - The dependencies table.
- This table stores the resource name (or file path), the resource
- type (rather in APK or on the file system), and the SHA1 checksum.
- * **Relocation Table** (MCO_RelocationTable) - *not enabled*
- * **Exported Variable List** (MCO_ExportVarList) -
- The list of the addresses of exported variables.
- * **Exported Function List** (MCO_ExportFuncList) -
- The list of the addresses of exported functions.
- * **Pragma List** (MCO_PragmaList) - The list of pragma key-value pair.
- * **Function Information Table** (MCO_FuncTable) - This is a table of
- function information, such as function name, function entry address,
- and function binary size. Besides, the table should be ordered by
- function name.
- * **Context** - The context of the in-memory executable, including
- the code and the data. The offset of context should aligned to
- a page size, so that we can mmap the context directly into memory.
- For furthur information, you may read `bcc_cache.h <include/bcc/bcc_cache.h>`_,
- `CacheReader.cpp <lib/bcc/CacheReader.cpp>`_, and
- `CacheWriter.cpp <lib/bcc/CacheWriter.cpp>`_ for details.
- JIT'ed Code Calling Conventions
- -------------------------------
- 1. Calls from Execution Environment or from/to within script:
- On ARM, the first 4 arguments will go into r0, r1, r2, and r3, in that order.
- The remaining (if any) will go through stack.
- For ext_vec_types such as float2, a set of registers will be used. In the case
- of float2, a register pair will be used. Specifically, if float2 is the first
- argument in the function prototype, float2.x will go into r0, and float2.y,
- r1.
- Note: stack will be aligned to the coarsest-grained argument. In the case of
- float2 above as an argument, parameter stack will be aligned to an 8-byte
- boundary (if the sizes of other arguments are no greater than 8.)
- 2. Calls from/to a separate compilation unit: (E.g., calls to Execution
- Environment if those runtime library callees are not compiled using LLVM.)
- On ARM, we use hardfp. Note that double will be placed in a register pair.
|