| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 306288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Csmith discovered that this function can be called with a zero argument,
in which case an assert for this triggered.
This patch also adds a guard before the other call to this function since
it was missing, although the test only covers the case where it was
discovered.
Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll.
Review: Ulrich Weigand
llvm-svn: 306287
|
|
|
|
| |
llvm-svn: 306265
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.
Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.
llvm-svn: 306264
|
|
|
|
|
|
| |
The 'scalar' simd bitops were dropped a while ago
llvm-svn: 306248
|
|
|
|
| |
llvm-svn: 306247
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665
Reviewers: qcolombet, t.p.northover, zvi, guyblank
Reviewed By: guyblank
Subscribers: guyblank, rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33957
llvm-svn: 306240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).
Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).
Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).
Differential Revision: https://reviews.llvm.org/D34023
llvm-svn: 306238
|
|
|
|
| |
llvm-svn: 306211
|
|
|
|
| |
llvm-svn: 306202
|
|
|
|
|
|
|
|
|
|
|
| |
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.
I left a fix for anyone trying to figure out what producers should be
producing a different expression.
llvm-svn: 306200
|
|
|
|
| |
llvm-svn: 306190
|
|
|
|
| |
llvm-svn: 306189
|
|
|
|
| |
llvm-svn: 306178
|
|
|
|
|
|
|
|
|
|
|
|
| |
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
llvm-svn: 306177
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After fixing (r306173) a failing test in the lld test suite (r306173),
reland r306095.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aui/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
llvm-svn: 306174
|
|
|
|
|
|
| |
regexes. NFC.
llvm-svn: 306170
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Without this patch some types have incorrect size and/or alignment
according to the MSP430 EABI.
Reviewers: asl, awygle
Reviewed By: asl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34561
llvm-svn: 306159
|
|
|
|
|
|
| |
This breaks passing of aligned function arguments.
llvm-svn: 306145
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses.
cbz w8, .LBB1_2 -> b.eq .LBB1_2
sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses.
tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.
Differential Revision: https://reviews.llvm.org/D34220.
llvm-svn: 306144
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit r306010 adjusted the condition as follows:
- if (Is64Bit) {
+ if (!STI.isTargetWin32()) {
The intent was to preserve the behavior on all Windows platforms
but extend the behavior on 64-bit Windows platforms to every
other one. (Before r306010, emitStackProbeCall only ever executed
when emitting code for Windows triples.)
Unfortunately,
if (Is64Bit && STI.isOSWindows())
is not the same as
if (!STI.isTargetWin32())
because of the way isTargetWin32() is defined:
bool isTargetWin32() const {
return !In64BitMode && (isTargetCygMing() ||
isTargetKnownWindowsMSVC());
}
In practice this broke the JIT tests on 32-bit Windows, which did not
satisfy the new condition:
LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll
LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll
LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll
LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll
LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll
LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll
LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll
LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll
because %esp was not updated correctly. The failures are only visible
on a MSVC 2017 Debug build, for which we do not have bots.
llvm-svn: 306142
|
|
|
|
|
|
|
| |
It causes an extra pass of the machine verifier to be added to the pass
manager, and causes test/CodeGen/Generic/llc-start-stop.ll to fail.
llvm-svn: 306140
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not sure yet why this wouldn't fail in the simple case,
but clearly I used the wrong value type with:
https://reviews.llvm.org/rL306040
...and the bug manifests with:
https://bugs.llvm.org/show_bug.cgi?id=33560
llvm-svn: 306139
|
|
|
|
| |
llvm-svn: 306124
|
|
|
|
| |
llvm-svn: 306121
|
|
|
|
|
|
|
|
| |
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
llvm-svn: 306120
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905. Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.
Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.
The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD. (This also
seems overkill, but that can be addressed separately.)
llvm-svn: 306117
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D34349
llvm-svn: 306112
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.
(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).
SystemZ now returns true in isMachineVerifierClean() :-)
These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll
Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047
https://reviews.llvm.org/D34143
llvm-svn: 306106
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ELF/mips-plt-r6.s in lld-test is failing. Reverting the change.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aut/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
llvm-svn: 306099
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swapped the position of the rt and rs register in the aut/daui instructions
for mips32r6 and mips64r6. With this change, the format of the generated
instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D33988
llvm-svn: 306095
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian.
Additionally, masking has been performed for vshf via which splat.d is created.
Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero.
Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases.
Masking with one results in generating a single splati.d instruction when possible.
Differential Revision: https://reviews.llvm.org/D32216
llvm-svn: 306090
|
|
|
|
|
|
|
|
|
|
| |
X86_64 COFF only has support for 32 bit pcrel relocations. Produce an
error on all others.
Note that gnu as has extended the relocation values to support
this. It is not clear if we should support the gnu extension.
llvm-svn: 306082
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Details: There was a use but it was in the assert which was not
exercised during product build.
Reviewers: Andrew Kaylor
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32658
llvm-svn: 306073
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.
Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1 | 1.0 | | | | | | | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
The larger motivation is to clean up all select-of-constants combining/lowering
because we're missing some common cases.
llvm-svn: 306072
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: RKSimon, DavidKreitzer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32658
llvm-svn: 306068
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously -fast-isel getelementptr would constant-fold non-constant i8
load/stores.
Reviewers: sunfish
Subscribers: jfb, dschuff, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D34044
llvm-svn: 306060
|
|
|
|
|
|
|
| |
The feeder instruction will be moved to right before the compare, so
the updating code should not be looking for kills past the compare.
llvm-svn: 306059
|
|
|
|
|
|
| |
Remove the previous, manual shuffling of the kill flags.
llvm-svn: 306054
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsics aren't used by clang and haven't been for a while.
There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today.
Reviewers: spatel, RKSimon, zvi, igorb
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34389
llvm-svn: 306047
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.
First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx
Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):
This is the "obvious" x86 codegen (what gcc appears to produce currently):
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | test edi, edi
| 1 | | | | | | 1.0 | CP | setnz al
| 1 | | 1.0 | | | | | CP | neg eax
This is the adc version:
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | adc eax, 0xffffffff
And this is sbb:
| 1 | 1.0 | | | | | | | neg edi
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be
clearly better than the alternatives going forward.
llvm-svn: 306040
|
|
|
|
|
|
|
| |
This refactors a bit of duplicated code and fixes an assertion failure
on ELF.
llvm-svn: 306035
|
|
|
|
|
|
|
| |
Variable was unused in non-debug build (used in assert) causing compile time
warning and eventual build failure
llvm-svn: 306034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intrinsic already existed for llvm.SI.tbuffer.store
Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*
Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr
Differential Revision: https://reviews.llvm.org/D30687
llvm-svn: 306031
|
|
|
|
| |
llvm-svn: 306012
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds prologue code emission for stack probe function
calls.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D34387
llvm-svn: 306010
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The ARM ELF ABI requires the linker to do interworking for wide
conditional branches from Thumb code to ARM code.
That was pointed out by @peter.smith in the comments for D33436.
Reviewers: rafael, peter.smith, echristo
Reviewed By: peter.smith
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, peter.smith
Differential Revision: https://reviews.llvm.org/D34447
llvm-svn: 306009
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows $AT to be used as a register name in assembly files.
Currently only $at is recognized as a valid register name.
Patch by Stanislav Ocovaj.
Differential Revision: https://reviews.llvm.org/D34348
llvm-svn: 306007
|
|
|
|
|
|
|
| |
Reserve an extra scavenging stack slot if the offset field in store-
-immediate instructions may overflow.
llvm-svn: 306004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
encoding
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.
Reviewers: arsenm, vpykhtin, cfang
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34403
llvm-svn: 305999
|