|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The CatchObjOffset is relative to the end of the EH registration node
for 32-bit x86 WinEH targets.  A special sentinel value, 0, is used to
indicate that no catch object should be initialized.
This means that a catch object allocated immediately before the
registration node would be assigned a CatchObjOffset of 0, leading the
runtime to believe that a catch object should not be initialized.
To handle this, allocate the registration node prior to any other frame
object.  This will ensure that catch objects will not be allocated
before the registration node.
This fixes PR26757.
Differential Revision: http://reviews.llvm.org/D17689
llvm-svn: 262294 | 
| | 
| 
| 
| 
| 
| 
| 
| | locality and code size from SP/FP offset encoding.
Differential Revision: http://reviews.llvm.org/D15393
llvm-svn: 260917 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | allocation
Reviewers: mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16733
llvm-svn: 259367 | 
| | 
| 
| 
| | llvm-svn: 257804 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages
between the current stack limit and the desired new stack pointer location. This implements support for
the inline expansion on x64.
For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call
is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications
that arise when introducing new machine basic blocks during prolog and epilog creation.
Added a new test case, modified an existing one to exclude non-x64 coreclr (for now).
Add test case
Fix tests
llvm-svn: 252578 | 
| | 
| 
| 
| | llvm-svn: 249903 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The __CxxFrameHandler3 tables for 32-bit are supposed to hold stack
offsets relative to EBP, not ESP. I blindly updated the win-catchpad.ll
test case, and immediately noticed that 32-bit catching stopped working.
While I'm at it, move the frame index to frame offset WinEH table logic
out of PEI.  PEI shouldn't have to know about WinEHFuncInfo. I realized
we can calculate frame index offsets just fine from the table printer.
llvm-svn: 249618 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | There was an off-by-one bug in ip2state tables which manifested when one
call immediately preceded the try-range of the next. The return address
of the previous call would appear to be within the try range of the next
scope, resulting in extra destructors or catches running.
We also computed the wrong offset for catch parameter stack objects. The
offset should be from RSP, not from RBP.
llvm-svn: 249578 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.
In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.
When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.
One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)
Differential Revision: http://reviews.llvm.org/D12681
llvm-svn: 248832 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.
The testcase is based on a test submitted by Kit Barton.
Differential Revision: http://reviews.llvm.org/D13176
llvm-svn: 248620 | 
| | 
| 
| 
| | llvm-svn: 248617 | 
| | 
| 
| 
| 
| 
| | Differential Revision: http://reviews.llvm.org/D12924
llvm-svn: 248084 | 
| | 
| 
| 
| | llvm-svn: 247901 | 
| | 
| 
| 
| 
| 
| | It never really worked, and the new code is working better every day.
llvm-svn: 247860 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Clang now passes the adjectives as an argument to catchpad.
Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.
llvm-svn: 247844 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: This fixes a variety of typos in docs, code and headers.
Subscribers: jholewinski, sanjoy, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12626
llvm-svn: 247495 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.
64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12546
llvm-svn: 247092 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We have a detailed def/use lists for every physical register in
MachineRegisterInfo anyway, so there is little use in maintaining an
additional bitset of which ones are used.
Removing it frees us from extra book keeping. This simplifies
VirtRegMap.
Differential Revision: http://reviews.llvm.org/D10911
llvm-svn: 242173 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():
- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
  the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
  physcial registers which are only read but never modified.
Related to rdar://21539507
Differential Revision: http://reviews.llvm.org/D10909
llvm-svn: 242165 | 
| | 
| 
| 
| 
| 
| 
| | Although this is not incorrect to insert such code, it is useless
and it hurts the binary size.
llvm-svn: 241946 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This was previously returning int. However there are no negative opcode
numbers and more importantly this was needlessly different from
MCInstrDesc::getOpcode() (which even is the value returned here) and
SDValue::getOpcode()/SDNode::getOpcode().
llvm-svn: 237611 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false
true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false
false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret
With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507 | 
| | 
| 
| 
| 
| 
| 
| 
| | Generate tables in the .xdata section representing what actions to take
when an exception is thrown.  This currently fills in state for
cleanups, catch handlers are still unfinished.
llvm-svn: 233636 | 
| | 
| 
| 
| | llvm-svn: 232722 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Turns out it's pretty straightforward and simplifies the implementation.
Reviewers: andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D8051
llvm-svn: 231386 | 
| | 
| 
| 
| 
| 
| 
| | There is no need to open-code the alignment calculation, we have a
handy RoundUpToAlignment function which "Does The Right Thing (TM)".
llvm-svn: 230392 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.
(Re-commit of r227728)
Differential Revision: http://reviews.llvm.org/D6789
llvm-svn: 227752 | 
| | 
| 
| 
| | llvm-svn: 227746 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.
Differential Revision: http://reviews.llvm.org/D6789
llvm-svn: 227728 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.
Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.
These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.
Reviewers: echristo, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D6493
llvm-svn: 225746 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | PEI tries to keep track of how much starting or ending a call sequence adjusts the stack pointer by, so that it can resolve frame-index references. Currently, it takes a very simplistic view of how SP adjustments are done - both FrameStartOpcode and FrameDestroyOpcode adjust it exactly by the amount written in its first argument.
This view is in fact incorrect for some targets (e.g. due to stack re-alignment, or because it may want to adjust the stack pointer in multiple steps). However, that doesn't cause breakage, because most targets (the only in-tree exception appears to be 32-bit ARM) rely on being able to simplify the call frame pseudo-instructions earlier, so this code is never hit. 
Moving the computation into TargetInstrInfo allows targets to override the way the adjustment is computed if they need to have a non-zero SPAdj.
Differential Revision: http://reviews.llvm.org/D6863
llvm-svn: 225437 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)
The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.
Reviewed by: atrick, ributzka
llvm-svn: 223085 | 
| | 
| 
| 
| 
| 
| | MachineFunction rather than TargetMachine.
llvm-svn: 219670 | 
| | 
| 
| 
| | llvm-svn: 216351 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838 | 
| | 
| 
| 
| 
| 
| 
| 
| | to use register units instead of registers.
reviewed by Jakob Stoklund Olesen.
llvm-svn: 214798 | 
| | 
| 
| 
| 
| 
| | information and update all callers. No functional change.
llvm-svn: 214781 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211691 | 
| | 
| 
| 
| 
| 
| | It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64.
llvm-svn: 211480 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211399 | 
| | 
| 
| 
| | llvm-svn: 210401 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837 | 
| | 
| 
| 
| 
| 
| | instead of comparing to nullptr.
llvm-svn: 206142 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
llvm-svn: 205997 | 
| | 
| 
| 
| | llvm-svn: 205610 | 
| | 
| 
| 
| 
| 
| | Remove the old functions.
llvm-svn: 202636 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | stackmaps and patchpoints into target-specific code.
The lowering of the frame index for stackmaps and patchpoints requires some
target-specific magic and should therefore be handled in the target-specific
eliminateFrameIndex method.
This is related to <rdar://problem/16106219>
llvm-svn: 201904 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.
The sspstrong layout rules are:
 1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
 2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
 3. Variables that have had their address taken are 3rd closest to the
protector.
Differential Revision: http://llvm-reviews.chandlerc.com/D2546
llvm-svn: 200601 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.
Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.
llvm-svn: 198685 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | in PEI a nd LocalStackSlot passes.
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).
The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally.  After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.
Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument. 
The next patch will implement the rules for sspstrong and sspreq.  The end goal
is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D2158
llvm-svn: 197653 |