| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D14495
llvm-svn: 252621
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is one of the problems noted in PR25016:
https://llvm.org/bugs/show_bug.cgi?id=25016
and:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/090998.html
The spilling problem is independent and not addressed by this patch.
The MachineCombiner was doing reassociations that don't improve or even worsen the critical path.
This is caused by inclusion of the "slack" factor when calculating the critical path of the original
code sequence. If we don't add that, then we have a more conservative cost comparison of the old code
sequence vs. a new sequence. The more liberal calculation must be preserved, however, for the AArch64
MULADD patterns because benchmark regressions were observed without that.
The two failing test cases now have identical asm that does what we want:
a + b + c + d ---> (a + b) + (c + d)
Differential Revision: http://reviews.llvm.org/D13417
llvm-svn: 252616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh!
Original commit message:
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252606
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is fix for PR24059.
When we are hoisting instruction above some condition it may turn out
that metadata on this instruction was control dependant on the condition.
This metadata becomes invalid and we need to drop it.
This patch should cover most obvious places of speculative execution (which
I have found by greping isSafeToSpeculativelyExecute). I think there are more
cases but at least this change covers the severe ones.
Differential Revision: http://reviews.llvm.org/D14398
llvm-svn: 252604
|
|
|
|
|
|
|
| |
This is needed for targets which do not support big-endian with the default
triple.
llvm-svn: 252603
|
|
|
|
|
|
|
|
|
|
|
|
| |
For big-endian targets, when we merge two halfword loads into a word load, the
order of the halfwords in the loaded value is reversed compared to
little-endian, so the load-store optimiser needs to swap the destination
registers.
This does not affect merging of two word loads, as we use ldp, which treats the
memory as two separate 32-bit words.
llvm-svn: 252597
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D14499
llvm-svn: 252595
|
|
|
|
|
|
|
|
| |
instructions.
Differential Revision: http://reviews.llvm.org/D14492
llvm-svn: 252592
|
|
|
|
| |
llvm-svn: 252579
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages
between the current stack limit and the desired new stack pointer location. This implements support for
the inline expansion on x64.
For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call
is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications
that arise when introducing new machine basic blocks during prolog and epilog creation.
Added a new test case, modified an existing one to exclude non-x64 coreclr (for now).
Add test case
Fix tests
llvm-svn: 252578
|
|
|
|
| |
llvm-svn: 252574
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AArch64 has the ability to use the top 8-bits of an "address" for extra
information, with the memory subsystem automatically masking them off for loads
and stores. When that's happening, we can sometimes skip masks on memory
operations in the compiler.
However, this requires the host OS and support stack to preserve those bits so
it can't be enabled everywhere. In principle iOS 8.0 and above do take the
required precautions and but we'll put it under a flag for now.
llvm-svn: 252573
|
|
|
|
|
|
|
|
| |
darwin’s nm(1).
Also a small fix to match printing of Mach-O objects with -format posix.
llvm-svn: 252567
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lower LLVM's 'unreachable' terminator to ISD::TRAP, and lower ISD::TRAP to
wasm's 'unreachable' expression.
WebAssembly type-checks expressions, but a noreturn function with a
return type that doesn't match the context will cause a check
failure. So we lower LLVM 'unreachable' to ISD::TRAP and then lower that
to WebAssembly's 'unreachable' expression, which typechecks in any
context and causes a trap if executed.
Differential Revision: http://reviews.llvm.org/D14515
llvm-svn: 252566
|
|
|
|
| |
llvm-svn: 252561
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug in ARMAsmPrinter::EmitUnwindingInstruction where
llvm_unreachable was reached because t2ADDri wasn't handled.
Test case provided by Tim Northover.
rdar://problem/23270609
http://reviews.llvm.org/D14518
llvm-svn: 252557
|
|
|
|
| |
llvm-svn: 252555
|
|
|
|
|
|
|
| |
MachineInstr::print() with SkipOppers==true does not produce a
linebreak, so we have to do that in MachineVerifier::report().
llvm-svn: 252551
|
|
|
|
|
|
|
| |
Instead, emit a CATCHPAD node which will get selected to a target
specific sequence.
llvm-svn: 252528
|
|
|
|
| |
llvm-svn: 252519
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivation for this patch starts with the epic fail example in PR18007:
https://llvm.org/bugs/show_bug.cgi?id=18007
...unfortunately, this patch makes no difference for that case, but it solves some
simpler cases. We'll get there some day. :)
The current 'or' matching code was using computeKnownBits() via
isBaseWithConstantOffset() -> MaskedValueIsZero(), but that's an unnecessarily limited use.
We can do more by copying the logic in ValueTracking's haveNoCommonBitsSet(), so we can
treat the 'or' as if it was an 'add'.
There's a TODO comment here because we should lift the bit-checking logic into a helper
function, so it's not duplicated in DAGCombiner.
An example of the better LEA matching:
leal (%rdi,%rdi), %eax
andl $1, %esi
orl %esi, %eax
Becomes:
andl $1, %esi
leal (%rsi,%rdi,2), %eax
Differential Revision: http://reviews.llvm.org/D13956
llvm-svn: 252515
|
|
|
|
|
|
|
|
|
|
| |
For some reason we'd never run MachineVerifier on WinEH code, and you
explicitly have to ask for it with llc. I added it to a few test cases
to get some coverage.
Fixes PR25461.
llvm-svn: 252512
|
|
|
|
|
|
|
|
| |
additional fixes for determinism problems
Differential Revision: http://reviews.llvm.org/D14454
llvm-svn: 252508
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
same basic block.
Summary: Call instructions that are from the same line and same basic block needs to have separate discriminators to distinguish between different callsites.
Reviewers: davidxl, dnovillo, dblaikie
Subscribers: dblaikie, probinson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14464
llvm-svn: 252492
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When GlobalOpt splits an internal, global variable with an aggregate type, it
should propagate the externally_initialized flag to the newly created globals.
This makes the pass safe for our downstream use of this flag, while still
allowing some useful optimisations (such as removing dead parts of the split
aggregate) to be performed.
Differential Revision: http://reviews.llvm.org/D13382
llvm-svn: 252490
|
|
|
|
|
|
|
|
|
|
| |
Implemented as many of Michael's suggestions as were possible:
* clang-format the added code while it is still fresh.
* tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible.
* Reduce the pass list on loop-vectorization-factors.ll.
* Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I].
llvm-svn: 252469
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
LAA currently generates a set of SCEV predicates that must be checked by users.
In the case of Loop Distribute/Loop Load Elimination, no such predicates could have
been emitted, since we don't allow stride versioning. However, in the future there
could be SCEV predicates that will need to be checked.
This change adds support for SCEV predicate versioning in the Loop Distribute, Loop
Load Eliminate and the loop versioning infrastructure.
Reviewers: anemet
Subscribers: mssimpso, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D14240
llvm-svn: 252467
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This matches the sum-of-absdiff patterns emitted by the vectoriser using log2 shuffles.
Relies on D14207 to be able to match the `extract_subvector(..., 0)`
Reviewers: t.p.northover, jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14208
llvm-svn: 252465
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"GCC requires the freestanding environment provide memcpy, memmove, memset
and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html
Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent
'__aeabi_memops'. This convertion violates GCC contract.
The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI
targets.
Without -meabi: use the triple default EABI.
With -meabi=default: use the triple default EABI.
With -meabi=gnu: use 'memops'.
With -meabi=4 or -meabi=5: use '__aeabi_memops'.
With -meabi set to an unknown value: same as -meabi=default.
Patch by Vinicius Tinti.
llvm-svn: 252462
|
|
|
|
|
|
|
| |
This reverts commit r252057, as it broke ARM self-hosting buildbots, probably
due to a code-gen fault.
llvm-svn: 252460
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't currently have any runtime library functions for operations on
f16 values (other than conversions to and from f32 and f64), so we
should always promote it to f32, even if that is not a legal type. In
that case, the f32 values would be softened to f32 library calls.
SoftenFloatRes_FP_EXTEND now needs to check the promoted operand's type,
as it may ne a no-op or require a different library call.
getCopyFromParts and getCopyToParts now need to cope with a
floating-point value stored in a larger integer part, as is the case for
any target that needs to store an f16 value in a 32-bit integer
register.
Differential Revision: http://reviews.llvm.org/D12856
llvm-svn: 252459
|
|
|
|
| |
llvm-svn: 252450
|
|
|
|
|
|
| |
parsing tests. General updating of the code emission.
llvm-svn: 252443
|
|
|
|
| |
llvm-svn: 252423
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under most circumstances, if SCEV can simplify X-Y to a constant, then it can
also simplify Y-X to a constant. However, there is no guarantee that this is
always true, and concensus is not to consider that a correctness bug in SCEV
(although it is undesirable).
PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and
prefetches) into buckets, where in each bucket the relative pointer offsets are
constant. We used to keep each bucket as a multimap, where SCEV's subtraction
operation was used to define the ordering predicate. Instead, use a fixed SCEV
base expression for each bucket, record the constant offsets from that base
expression, and adjust it later, if desirable, once all pointers have been
collected.
Doing it this way should be more compile-time efficient than the previous
scheme (in addition to making the implementation less sensitive to SCEV
simplification quirks).
Fixes PR25170.
llvm-svn: 252417
|
|
|
|
|
|
|
|
| |
We cannot really insert fixup code into a PHI's predecessor.
This fixes PR25445.
llvm-svn: 252416
|
|
|
|
|
|
|
|
|
|
|
| |
The TailDuplication machine pass ran across a malformed CFG: a PHI node
referred it's predecessor's predecessor instead of it's predecessor.
This occurred because we split the edge in X86ISelLowering when we
processed the CATCHRET but forgot to do something about the PHI nodes.
This fixes PR25444.
llvm-svn: 252413
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Teach the FunctionAttrs to do the right thing for IR with operand
bundles.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14408
llvm-svn: 252387
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change fixes an iterator wraparound bug in
`determinePointerReadAttrs`.
Ideally, ++'ing off the `end()` of an iplist should result in a failed
assert, but currently iplist seems to silently wrap to the head of the
list on `end()++`. This is why the bad behavior is difficult to
demonstrate.
Reviewers: chandlerc, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14350
llvm-svn: 252386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
|
|
|
|
|
|
|
|
| |
FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if
there is an EHPad in the BB. Doing so would result in an instruction
inserted after a terminator.
llvm-svn: 252377
|
|
|
|
|
|
|
|
| |
We tried to insert a cast of a phi in a block whose terminator is an
EHPad. This is invalid. Do not attempt the transform in these
circumstances.
llvm-svn: 252370
|
|
|
|
|
|
|
|
|
|
|
|
| |
This marker prevents optimization passes from adding 'tail' or
'musttail' markers to a call. Is is used to prevent tail call
optimization from being performed on the call.
rdar://problem/22667622
Differential Revision: http://reviews.llvm.org/D12923
llvm-svn: 252368
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).
Give up, let SDAG agree with itself on a vNi1 legalization strategy.
llvm-svn: 252364
|
|
|
|
|
|
|
|
|
|
|
| |
When matching non-LSB-extracting truncating broadcasts, we now insert
the necessary SRL. If the scalar resulted from a load, the SRL will be
folded into it, creating a narrower, offset, load.
However, i16 loads aren't Desirable, so we get i16->i32 zextloads.
We already catch i16 aextloads; catch these as well.
llvm-svn: 252363
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we recognize this, we can support it instead of bailing out.
That is, we can fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc (srl Y, 16)))))
llvm-svn: 252362
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to incorrectly assume that the offset we're extracting from
was a multiple of the element size. So, we'd fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc Y))))
whereas we should have extracted the higher bits from X.
Instead, bail out if the assumption doesn't hold.
llvm-svn: 252361
|
|
|
|
|
|
|
|
|
|
|
|
| |
extload
Reviewers: resistor, arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13805
llvm-svn: 252349
|
|
|
|
| |
llvm-svn: 252345
|
|
|
|
| |
llvm-svn: 252344
|