| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
It happens to mostly work for the other targets because they are extremely
conservative, but Power for example had to switch to AtomicExpand to be
able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
x.store(1);
Thread 1:
y.store(1);
Thread 2:
r1 = x.load();
r2 = y.load();
Thread 3:
r3 = y.load();
r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..
This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.
Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.
Test Plan: make check-all, no functional change
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5474
llvm-svn: 219957
|
| |
|
|
|
|
|
|
| |
These haven't been necessary since allowing
selecting SALU instructions in non-entry blocks
was enabled.
llvm-svn: 219956
|
| |
|
|
|
|
| |
This was resulting in invalid simplifications of sdiv
llvm-svn: 219953
|
| |
|
|
|
|
| |
These days -std-compile-opts was just a silly alias for -O3.
llvm-svn: 219951
|
| |
|
|
|
|
|
|
|
|
|
| |
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.
This fixes rdar://problem/18678801.
llvm-svn: 219934
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nodes.
Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5753
llvm-svn: 219931
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5751
llvm-svn: 219927
|
| |
|
|
| |
llvm-svn: 219925
|
| |
|
|
| |
llvm-svn: 219879
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).
Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ.
Fixes <rdar://problem/18426089>
llvm-svn: 219874
|
| |
|
|
|
|
|
|
|
| |
The new attributes are NumElts and the CD8TupleForm. This prepares the code
to enable x8 and x2 inserts.
NFC, no change in X86.td.expanded except for the new attributes.
llvm-svn: 219871
|
| |
|
|
|
|
|
| |
It's the W bit that selects between 32 or 64 elt type and not the opcode. The
opcode selects between the width of the insert (128 or 256).
llvm-svn: 219870
|
| |
|
|
|
|
|
| |
Zero-width BFEs are combined away already, so there's no point in
handling them.
llvm-svn: 219868
|
| |
|
|
| |
llvm-svn: 219867
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.
This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.
This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created. This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.
llvm-svn: 219848
|
| |
|
|
|
|
|
|
|
| |
This original fix for the build break was correct. LLVM_ATTRIBUTE_USED
removes the warning message because it keeps the function in the object
file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used
depending on build settings.
llvm-svn: 219846
|
| |
|
|
| |
llvm-svn: 219837
|
| |
|
|
|
|
| |
Fixes break when -Wunused-function is used.
llvm-svn: 219833
|
| |
|
|
|
|
|
|
|
|
|
| |
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.
The original commit had a bug in the add emit logic, which has been fixed.
llvm-svn: 219831
|
| |
|
|
|
|
|
|
| |
function. NFC.
Simplify add with immediate emission by factoring it out into a helper function.
llvm-svn: 219830
|
| |
|
|
|
|
|
|
|
|
| |
This adds the MCInstPrinter to the LLVMHexagonDesc library and removes
the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is
a prerequisite needed by the disassembler.
Phabricator Revision: http://reviews.llvm.org/D5734
llvm-svn: 219826
|
| |
|
|
| |
llvm-svn: 219823
|
| |
|
|
|
|
| |
SimplifyDemandedBits would break the other uses of the operand.
llvm-svn: 219819
|
| |
|
|
| |
llvm-svn: 219799
|
| |
|
|
|
|
| |
The .note.GNU-stack section is not SystemZ/X86 specific.
llvm-svn: 219796
|
| |
|
|
| |
llvm-svn: 219778
|
| |
|
|
| |
llvm-svn: 219777
|
| |
|
|
|
|
| |
This breaks our internal build bots. Reverting it to get the bots green again.
llvm-svn: 219776
|
| |
|
|
|
|
|
|
|
|
| |
Early attempts to support AAPCS bare metal MachO targets based the decision on
the CPU being compiled for. This was not a particularly great idea and we've
got a better option now, but this check remained.
No functional change for any target we care about.
llvm-svn: 219767
|
| |
|
|
| |
llvm-svn: 219750
|
| |
|
|
|
|
|
|
| |
This is a follow up to commit r219742. It removes the CCInMI variable
and accesses the CC in CSCINC directly. In the case of a conditional
branch accessing the CC with CCInMI was wrong.
llvm-svn: 219748
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.
Examples:
1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
to b.<invCC>
2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
to b.<CC>
rdar://problem/18506500
llvm-svn: 219742
|
| |
|
|
|
|
|
|
| |
Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions.
Differential Revision: http://reviews.llvm.org/D5598
llvm-svn: 219738
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.
So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.
llvm-svn: 219734
|
| |
|
|
|
|
|
|
| |
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.
llvm-svn: 219733
|
| |
|
|
|
|
|
|
| |
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.
llvm-svn: 219726
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5741
llvm-svn: 219725
|
| |
|
|
|
|
|
|
|
|
|
| |
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.
There's no actual ABI change here, just to reassure people.
llvm-svn: 219719
|
| |
|
|
|
|
|
|
|
|
|
| |
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.
Related to rdar://problem/18495928.
llvm-svn: 219716
|
| |
|
|
|
|
|
|
|
| |
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
|
| |
|
|
|
|
|
|
| |
This reverts commit r219705.
CodeGen/R600/work-item-intrinsics.ll was failing on linux.
llvm-svn: 219707
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Add SI lowering
Add test
v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
|
| |
|
|
|
|
| |
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219704
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In order to facilitate use of common code, checking by reviewers of other fast-isel ports, and hopefully to eventually move most of Mips and other fast-isel ports into target independent code, I've tried to get the two implementations to line up.
There is no functional code change. Just methods moved in the file to be in the same order as in AArch64.
Test Plan: No functional change.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, aemerson, rfuhler
Differential Revision: http://reviews.llvm.org/D5692
llvm-svn: 219703
|
| |
|
|
|
|
|
|
| |
Use 0 as the base address for a constant address, so if
we have a constant address we can save moves and form
read2/write2s.
llvm-svn: 219698
|
| |
|
|
|
|
| |
Added encoding tests.
llvm-svn: 219686
|
| |
|
|
|
|
| |
Added encoding tests.
llvm-svn: 219685
|
| |
|
|
|
|
| |
workaround
llvm-svn: 219684
|
| |
|
|
|
|
| |
indirecting through the TargetMachine.
llvm-svn: 219674
|
| |
|
|
|
|
| |
the subtarget.
llvm-svn: 219673
|