Avoiding retpolines with static calls
Indirect calls happen when the address of a function to be called is not known at compile time; instead, that address is stored in a pointer variable and used at run time. These indirect calls, as it turns out, are readily exploited by speculative-execution attacks. Retpolines defeat these attacks by turning an indirect call into a rather more complex (and expensive) code sequence that cannot be executed speculatively.
Retpolines solved the problem, but they also slow down the kernel, so developers have been keenly interested in finding ways to avoid them. A number of approaches have been tried; a few of which were covered here in late 2018. While some of those techniques have been merged, static calls have remained outside of the mainline. They have recently returned in the form of this patch set posted by Peter Zijlstra; it contains the work of others as well, in particular Josh Poimboeuf, who posted the original static-call implementation.
An indirect call works from a location in writable memory where the destination of the jump can be found. Changing the destination of the call is a matter of storing a new address in that location. Static calls, instead, use a location in executable memory containing a jump instruction that points to the target function. Actually executing a static call requires "calling" to this special location, which will immediately jump to the real target. The static-call location is, in other words, a classic code trampoline. Since both jumps are direct — the target address is found directly in the executable code itself — no retpolines are needed and execution is fast.
Static calls must be declared before they can be used; there are two macros that can do that:
#include <linux/static_call.h> DEFINE_STATIC_CALL(name, target); DECLARE_STATIC_CALL(name, target);
DEFINE_STATIC_CALL() creates a new static call with the given name that initially points at the function target(). DECLARE_STATIC_CALL(), instead, declares the existence of a static call that is defined elsewhere; in that case, target() is only used for type checking the calls.
Actually calling a static call is done with:
static_call(name)(args...);
Where name is the name used to define the call. This will cause a jump through the trampoline to the target function; if that function returns a value, static_call() will also return that value.
The target of a static call can be changed with:
static_call_update(name, target2);
Where target2() is the new target for the static call. Changing the target of a static call requires patching the code of the running kernel, which is an expensive operation. That implies that static calls are only appropriate for settings where the target will change rarely.
One such setting can be found in the patch set: tracepoints. Activating a tracepoint itself requires code patching. Once that is done, the kernel responds to a hit on a tracepoint by iterating through a linked list of callback functions that have been attached there. In almost every case, though, there will only be one such function. This patch in the series optimizes that case by using a static call for the single-function case. Since the intent behind tracepoints is to minimize their overhead to the greatest extent possible, use of static calls makes sense there.
This patch set also contains a further optimization not found in the original. Jumping through the trampoline is much faster than using a retpoline, but it is still one more jump than is strictly necessary. So this patch causes static calls to store the target address directly into the call site(s), eliminating the need for the trampoline entirely. Doing so may require changing multiple call sites, but most static calls are unlikely to have many of those. It also requires support in the objtool tool to locate those call sites during the kernel build process.
The end result of this work appears
to be a significant reduction in the cost of the Spectre mitigations
when using tracepoints — a slowdown of just over 4% drops to
about 1.6%. It has been through a number of revisions, as well as
some improvements to the underlying text-patching code, and appears to be
about ready. Chances are that static calls will go upstream in the near
future.
Index entries for this article | |
---|---|
Kernel | Retpoline |
Kernel | Static calls |
Posted Mar 27, 2020 4:53 UTC (Fri)
by alison (subscriber, #63752)
[Link] (29 responses)
Posted Mar 27, 2020 6:50 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (27 responses)
This won't work if you make calls, for example, into a filesystem driver from the VFS layer.
Posted Mar 27, 2020 15:21 UTC (Fri)
by alison (subscriber, #63752)
[Link] (26 responses)
In other words, if the calling code wants one function from library A and another from library B, we just change the code.
> This won't work if you make calls, for example, into a filesystem driver from > the VFS layer.
That's pretty much the case I was considering: resolution of a base class pointer. In your example, the VFS has a default implementation of whatever action, and individual filesystems may override it. Compiler Explorer shows explicitly that the base class pointer resolution is in fact an indirect branch. AFAICT, this new mechanism offers no hope to eliminate this branch. Some type of "reflection" would be needed to resolve the problem, perhaps with _THIS_IP and _RET_IP_ in include/linux/kernel.h.
Posted Mar 27, 2020 17:48 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
> That's pretty much the case I was considering: resolution of a base class pointer.
This actually can work faster than indirects if you only have a couple of implementations (which is normally the case).
Posted Mar 29, 2020 10:07 UTC (Sun)
by Homer512 (subscriber, #85295)
[Link]
Posted Mar 29, 2020 14:28 UTC (Sun)
by anton (subscriber, #25547)
[Link] (23 responses)
Inline caching corresponds pretty much with the "further optimization" of the article: it works with an inline call, and guards it with an inline check that the object has the correct class (in VFS, that the file has the correct file system type); it was useful if the class rarely changes for a call site.
Polymorphic inline caching has several inline calls, selected by several inline checks. It was useful if a call site gets passed objects of a few classes alternatingly, and may be appropriate for the VFS case.
These techniques fell out of fashion when correctly predicted indirect branches became as fast as direct branches and indirect-branch prediction became pretty good. Now, with Spectre they may see a renaissance (hopefully a short one curtailed by the introduction of properly fixed hardware).
Posted Mar 29, 2020 18:30 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (20 responses)
As I currently understand things, this is at best optimistic.
Spectre is not a simple hardware flaw that you can fix by rearranging a few transistors. It is a design flaw in speculative execution itself. Spectre has been endemic to every general-purpose microprocessor that has been manufactured in the past twenty or more years (and probably most special-purpose microprocessors too). To "fix" it, you would have to completely redesign, not just one or two architectures, but the entire concept of what a modern microprocessor looks like.
I am open to being corrected, of course. It's entirely possible that some manufacturers have gone down that "redesign everything from the ground up" road that I described, or perhaps they found a simpler path forward. But my current understanding is that workarounds, usually at higher layers of the system, are all we've got right now and for the foreseeable future.
Posted Mar 29, 2020 18:47 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
Posted Mar 29, 2020 20:56 UTC (Sun)
by anton (subscriber, #25547)
[Link] (6 responses)
Another avenue is to make it hard to train the predictor for a Spectre v2 exploit, e.g., by scrambling the input and/or the output of the predictor with a per-thread secret that changes now and then. You can combine this avenue with the other one.
Posted Mar 29, 2020 21:01 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Mar 30, 2020 1:52 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
The problem is that performance is a form of observable behavior. So you have to completely disentangle performance variations from every piece of data in main memory, cache, and registers, or else you have to exhaustively prove that the code under execution has access to that data, in every security model that the system cares about. The latter would extend to, for example, determining that the code under execution lives inside of a Javascript sandbox and should not be allowed to access the state of the rest of the web browser. I don't see how the processor can be reasonably expected to do that, so we're back to the first option: The processor's performance must not vary in response to any data at all. But then you can't have a branch predictor.
I would really like someone to convince me that I'm wrong, because if true, this is rather terrifying. But as far as I can see, there is no catch-all mitigation for this problem.
Posted Mar 30, 2020 2:32 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Apr 2, 2020 8:35 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (1 responses)
I'd be surprised if the bulk of instances were package isolated. The Xeon Platinum 8175M used for M5 instances has 24 cores per package, 48 threads, and therefore 48 vCPUs. AFAIU, EC2 doesn't share cores, but a vCPU is still the equivalent of a logical SMT-based core, so any instance type using less than 47 vCPUs would be leaving at least an entire core unused. AWS offers 2-, 8-, 16-, 32-, 48-, 64-, and 96-vCPU M5 instances. I'd bet a large number and probably a majority of customers are utilizing 2- to 32-vCPU instances, that they invariably share packages, and thus share L3 cache. And I'd also bet that 64-vCPU instances share one of their packages with other instances.
Posted Apr 2, 2020 16:37 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
> I'd keep in mind for high-value assets absent details from AWS.
Posted Mar 30, 2020 9:50 UTC (Mon)
by anton (subscriber, #25547)
[Link]
If you refer to the scrambling of the branch predictor inputs and outputs , note that the secret changes regularly (with an interval designed such that only a small part of the secret can be extracted).
Concerning attacks from within the same thread (as in JavaScript engines), assuming good-enough scrambling, the attacker cannot predict which other branches a branch is aliased with in the branch predictor, and also cannot get the prediction to branch to the gadget the attacker wants to be speculatively executed.
Concerning observable behaviour, lots of things are theoretically observable (e.g., through power consumption or electromagnetic emissions), and have been used practically on other components (e.g., emissions from CRTs or screen cables to watch what you are watching), but the danger is that you get into a "the sky is falling" mode of thinking that does not differentiate how realistically possible an attack is. I have seen hardware and software providers blamed for waiting for the demonstration of an attack before they start acting, but OTOH, if they acted on any far-fetched theoretical weakness (not even attack scenario) that somebody worries about, they would not have a product.
Posted Apr 2, 2020 9:32 UTC (Thu)
by ras (subscriber, #33059)
[Link] (1 responses)
The more general problem is privilege separation. This is just a case of lower privilege code being able to deduct what higher privilege code left in its cache. I wonder how long it will be before someone figures out how to exploit spectre via a javascript JIT, or wasm. The only solution is hardware protecting the browser memory from the javascript, but the hardware doesn't provide anything lightweight enough. If we did have something lightweight enough micro kernels would be more viable.
Posted Apr 2, 2020 12:04 UTC (Thu)
by anton (subscriber, #25547)
[Link]
If the access becomes non-speculative (the vast majority of cases in most code), the caches will be updated a number of cycles later. The performance impact of that is minimal.
If the access is on a mispredicted path, the caches will not be updated. If the line is then not accessed before it would be evicted, this is a win. If the line is accessed, we incur the cost of another access to an outer level or main memory. A somewhat common case is when you have a short if, and then a load after it has ended; if the if is mispredicted, and the load misses the cache, the speculative execution will first perform an outer access, and then it will be reexecuted on the corrected path and will perform the outer access again; the cost here is the smaller of the cache miss cost and the branch misprediction cost. And the combination is relatively rare.
Overall, not updating caches on speculative memory accesses reduces the usefulness of caches by very little, if at all (there are also cases where it increases the usefulness of caches).
A Spectre-based exploit using JavaScript has been published pretty early. No, MMU-style memory protection is not the only solution, Spectre-safe hardware as outlined above would also be a solution.
Posted Apr 2, 2020 11:58 UTC (Thu)
by davecb (subscriber, #1574)
[Link] (9 responses)
I suspect that the cpu family used in the T5 from (the late, lamented) Sun Microsystems dodged this: it's claim to fame was running other, non-blocked code whenever a thread was blocked on a memory fetch, as described in https://en.wikipedia.org/wiki/UltraSPARC_T1
Posted Apr 2, 2020 12:27 UTC (Thu)
by anton (subscriber, #25547)
[Link] (8 responses)
UltraSPARC T1-SPARC T3 are in-order, but SPARC T4, T5, M7, M8 are OoO. Performing other threads on a cache miss does not protect against Spectre: The cache or other microarchitectural state can still be updated from a speculative load.
Recent in-order CPUs include Cortex A53 and A55, and Bonnell, the first generation of Intel's Atom line.
Posted Apr 2, 2020 12:47 UTC (Thu)
by davecb (subscriber, #1574)
[Link]
Interestingly, the kind of side-channel attack that Spectre and friends are variants of were studied back in the mainframe days, well before anyone got around to inventing them (;-))
--dave
Posted Apr 2, 2020 13:27 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (6 responses)
Note, though, that even in-order CPUs can be affected. The core of Spectre family attacks is that the CPU speculatively executes code that changes micro-architectural state, and thus any CPU with speculation can be affected. There exist demonstrations that suggest that the Cortex A53, for example, is partially affected by Spectre, but the resulting side channel is too limited to be practical with the currently known exploits.
In theory, even a branch predictor can carry enough state to be attacked!
Posted Apr 2, 2020 15:12 UTC (Thu)
by davecb (subscriber, #1574)
[Link]
Posted Apr 2, 2020 15:37 UTC (Thu)
by anton (subscriber, #25547)
[Link] (4 responses)
A Spectre-style attack consists of a load of the secret, and some action that provides a side channel, with a data flow from the load to the action (otherwise you don't side-channel the secret). An OoO processor can do that, but an in-order processor as outlined above cannot AFAICS. If you can provide a reference to the A53 demonstration, I would be grateful.
Posted Apr 2, 2020 18:18 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (1 responses)
Even something as apparently trivial as using the branch predictor to decide which instructions to fetch is enough to make a timing difference that Spectre-type exploits can predict. "All" you need to do is ensure that the branch predictor correctly predicts for a single value of the secret and mispredicts for all others (or vice-versa), and that I have a timer that lets me determine whether or not the branch predictor correctly predicted (made easier if I can control L1I$ contents), and I have a side-channel. Not a very good side channel, but enough that I can extract secrets slowly.
And there's no point having a branch predictor if you then insert a pipeline bubble that means that there's no timing difference between a correct prediction and a mispredict; it's that timing difference that creates the side-channel, though, so you need to somehow prevent the attacker from measuring a timing difference there to be completely Spectre-free.
The demonstration was something in-house, using exactly the components I've described above; it was a slow enough channel to be effectively useless, but it existed.
Posted Apr 3, 2020 7:59 UTC (Fri)
by anton (subscriber, #25547)
[Link]
But for a Spectre-type attack the branch predictor needs to be updated based on the speculatively loaded value. This does not happen in in-order processors (because the execution does not get that far), and it will not happen in a Spectre-resistant OoO CPU (e.g., by only updating the branch predictor based on committed branches).
Note that not all side-channel attacks are Spectre attacks. We can prevent some side channels through hardware design. E.g., there are Rowhammer-immune DRAMs; I also dimly remember some CPU-resource side channel (IIRC a limitation on the cache ports) that was present in one generation of Intel CPUs, but not in the next generation, which had more resources. Closing other side channels in the non-speculative case appears to be too expensive, e.g., cache and branch predictor side channels, as you explain.
There are established software mitigations for protecting critical secrets (branchless code, data-independent memory accesses for code that handles these secrets) such as encryption keys from non-Spectre/Meltdown side channels. But these do not help against Spectre, because Spectre can use all code in the process to extract the secret.
From your description, you read about a non-Spectre side-channel attack through the branch predictor of the A53.
Posted Apr 3, 2020 14:30 UTC (Fri)
by excors (subscriber, #95769)
[Link] (1 responses)
I don't see why an in-order processor couldn't do that, in principle. They're still going to do branch prediction, and speculatively push some instructions from the predicted target into the start of the pipeline. Once they detect a mispredict they'll flush the pipeline and start again with the correct target. As long as the flush happens before the mispredicted instructions reach a pipeline stage that writes to memory or to registers, that should be safe (ignoring Spectre). But if the mispredicted instructions read from a memory address that depends on a register (which may contain secret data), and the read modifies the TLB state or cache state, then it would be vulnerable to Spectre-like attacks.
(I'm thinking of an artificial case like "insecure_debug_mode = 0; int x = secret; if (insecure_debug_mode) x = array[x];", where mispredicting the 'if' means the very next instruction will leak the secret data via the cache. It doesn't need a long sequence of speculative instructions. That could be a case where the programmer has carefully analysed the expected code path to avoid data-dependent memory accesses etc so their code is safe from side-channels attacks according to the advertised behaviour of the processor, but the processor is speculatively executing instructions that the programmer did not expect to be executed, so I think that counts as a Spectre vulnerability. As I see it, the fundamental issue with Spectre is that it doesn't matter how careful you are about side-channels when writing code, because the processor can (speculatively) execute an arbitrarily different piece of code that is almost impossible for you to analyse.)
In practice, in-order processors seem to typically have short enough pipelines that the misprediction is detected within a few cycles, before the mispredicted instructions have got far enough through the pipeline to have any side effects. But that seems more by luck than by design, and maybe some aren't quite lucky enough. OoO processors aren't more dangerous because they're OoO, they're more dangerous because they typically have much longer pipelines and mispredicted instructions can progress far enough to have many side effects.
Posted Apr 3, 2020 16:10 UTC (Fri)
by anton (subscriber, #25547)
[Link]
The length of the pipeline of an OoO processor is not the decisive factor. Unlike an in-order processor, speculatively executed instructions on an OoO do not just enter the pipeline, they produce results that other speculatively executed instructions can use as inputs. Essentially the in-order front end (instruction fetch and decoding) is decoupled from the in-order commit part (where the results become architecturally visible) by the OoO engine, and the front end can run far ahead (by hundreds of instructions and many cycles) of the commit part.
Posted Mar 30, 2020 9:32 UTC (Mon)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
> These techniques fell out of fashion when correctly predicted indirect branches became as fast as direct branches and indirect-branch prediction became pretty good. Now, with Spectre they may see a renaissance (hopefully a short one curtailed by the introduction of properly fixed hardware).
Inline caching and polymorphic inline caching are not only useful to avoid indirect branches, but also (and especially) to avoid method lookups in dynamic languages. The code for an inline cache would be more like this
Posted Mar 30, 2020 17:35 UTC (Mon)
by anton (subscriber, #25547)
[Link]
But why did they patch the code rather than using separate variables as your C code does? The paper states
Posted Mar 29, 2020 0:39 UTC (Sun)
by ndesaulniers (subscriber, #110768)
[Link]
Posted Mar 27, 2020 7:36 UTC (Fri)
by josh (subscriber, #17465)
[Link] (14 responses)
(This would also allow eliminating a type of callback entirely if no non-NUL function is ever stored in that function pointer. That could optimize various kinds of function pointers that don't always get used in a given struct of pointers.)
Posted Mar 27, 2020 9:07 UTC (Fri)
by epa (subscriber, #39769)
[Link] (13 responses)
Posted Mar 27, 2020 9:31 UTC (Fri)
by gus3 (guest, #61103)
[Link] (11 responses)
Posted Mar 27, 2020 9:39 UTC (Fri)
by epa (subscriber, #39769)
[Link] (10 responses)
Posted Mar 28, 2020 10:41 UTC (Sat)
by ncm (guest, #165)
[Link] (9 responses)
Fetishism around object-oriented conventions is worse than pointless. Virtual calling is a language mechanism that has various uses, implementing polymorphic runtime binding being one among them.
Sometimes virtual calls perform better than calling through a function pointer, because there is less choice. But sometimes the function pointer call is faster, particularly if the pointer rarely or never changes. On modern cores, both are hardly slower than a direct call except in pathological cases.
A virtual call, by offering fewer choices for what could happen, is often easier for a reader to understand than a call through a function pointer. So, it is often a good idea to cobble up a base class just to provide a virtual call, even where there is logically no meaningful object involved. Having cobbled one up, though, it often becomes useful to park things there. Having parked things there, it may be worth formalizing a class with invariants. Or it might not.
Posted Mar 28, 2020 19:12 UTC (Sat)
by nivedita76 (subscriber, #121790)
[Link] (8 responses)
Posted Mar 29, 2020 3:18 UTC (Sun)
by ncm (guest, #165)
[Link] (7 responses)
But of course C++ code is hard to get into the kernel. Linus's deliberate choice to use C++ keywords in kernel headers is one barrier to overcome.
Posted Mar 29, 2020 4:06 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Mar 29, 2020 10:41 UTC (Sun)
by ballombe (subscriber, #9523)
[Link] (4 responses)
Posted Mar 29, 2020 12:28 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link]
Posted Apr 3, 2020 18:07 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link]
\let foo: u32 = 0;
Posted Apr 4, 2020 17:18 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Apr 6, 2020 15:03 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Mar 29, 2020 15:11 UTC (Sun)
by nivedita76 (subscriber, #121790)
[Link]
Posted Mar 29, 2020 15:07 UTC (Sun)
by nivedita76 (subscriber, #121790)
[Link]
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Yep.
This can also be done, but not with this approach. Basically, you rewrite indirect calls into "if" chains:
if (this->class == ClassA) return ClassA_Method(this, ...)
if (this->class == ClassB) return ClassB_Method(this, ...)
Avoiding retpolines with static calls
Actually the literature on implementing object-oriented languages discusses optimizations called "inline caching" and "polymorphic inline caching", which are targeted at exactly this problem.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Yes. E.g., do not update the caches from speculative loads. Instead, keep the load results in a private buffer and only update the caches if the loads become non-speculative, or when they are committed.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Pretty much all major cloud computing providers assign CPUs exclusively to customers, except maybe for the cheapest offerings (like T2/T3 instances on AWS).
Avoiding retpolines with static calls
Avoiding retpolines with static calls
All major cloud providers also have dedicated instances that won't be shared across customers. In case of AWS you can even run your own T2/T3 instances on top of the dedicated hardware nodes.
It's not clear what attack scenario you have in mind. Can you give an example?
Avoiding retpolines with static calls
Avoiding retpolines with static calls
The question is how much a proper hardware mitigation costs compared to the current mitigations. So what are the costs of not updating caches on speculative memory accesses?
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Generally, in-order implementations are not affected, and existing out-of-order implementations are affected; not because it's not possible to make Spectre-immune OoO implementations, but because they did not think about how to do it when they designed these CPUs (Spectre was discovered later).
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Yes, in-order CPUs perform branch prediction and use that to perform several stages of the pipeline of the first predicted instructions, but normally they do not do anything that depends on the result of one of the predicted instructions; they would need some kind of register renaming functionality for that, and if as a hardware designer you go there, you can just as well do a full OoO design.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Yes, the branch predictor can be used as the side channel; in a Spectre-type attack the action part would then be a branch.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
So you are thinking about an architecturally visible (i.e., non-speculative) secret, and using speculation only for the side channel. One difference from the classical Spectre attacks is that, like for non-speculative side-channel attacks, software developers only need to inspect the code that deals with the secret and avoid such ifs there; but yes, it's an additional rule they have to observe in such code.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
if (class != inline_cache->last_class) {
inline_cache->last_class = class;
inline_cache->last_method = hashtable_lookup(class, method_name);
}
inline_cache->last_method(...);
where possibly inline_cache->last_method
is not an indirect call but a patched (self-modifying) direct call.
Yes, inline caching was originally introduced in a Smalltalk implementation, which had a hash-table-based cache as an intermediate-speed path and the slow path was not even a hash table, but a linear search through the chain of superclasses (I don't remember what was used for looking up methods within a class). I think that one reason for that was that the RAM sizes of the time discouraged speeding up the method lookup with complete hash tables or full class*methods matrices.
Avoiding retpolines with static calls
Note that this is a kind of dynamic
code modification, which is generally condemned in modern
practice. The n[ative]-method address can just as well be placed out-of-line and accessed indirectly; code modification is more
efficient, and we are using it in a well-confined way.
Why was it more efficient? Apart from the relative speed of direct vs. indirect calls at the hardware of the time there were the following factors:
One other difference from the C code you posted is that the actual check for the correct class was in the called method:
The entry code of an n-code method checks the stored
receiver class from the point of call against the actual receiver
class. If they do not match, relinking must occur, just as if the
call had not yet been linked.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Stroupstrup deliberate choice to break C compatibility by adding new keyword to C++ is one barrier to overcome.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls