summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset2014-10-163-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
* R600/SI: Remove unnecessary VALU patternsMatt Arsenault2014-10-161-41/+0
| | | | | | | | These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956
* R600: Fix nonsensical implementation of computeKnownBits for BFEMatt Arsenault2014-10-161-5/+1
| | | | | | This was resulting in invalid simplifications of sdiv llvm-svn: 219953
* Delete -std-compile-opts.Rafael Espindola2014-10-161-22/+22
| | | | | | These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951
* [AArch64] Fix miscompile of sdiv-by-power-of-2.Juergen Ributzka2014-10-162-4/+3
| | | | | | | | | | | When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934
* [mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 ↵Vasileios Kalintiris2014-10-161-1/+4
| | | | | | | | | | | | | | | | | | | | nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931
* [mips] Marked the DI/EI instruction aliases as MIPS32r2Vasileios Kalintiris2014-10-161-2/+2
| | | | | | | | | | | | Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5751 llvm-svn: 219927
* Test commit access: remove extra new line at the end of fileVasileios Kalintiris2014-10-161-1/+0
| | | | llvm-svn: 219925
* R600: Remove dead functionMatt Arsenault2014-10-162-15/+0
| | | | llvm-svn: 219879
* [AVX512] Add DQ subvector insertsAdam Nemet2014-10-152-11/+33
| | | | | | | | | | | | | | In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874
* [AVX512] Two new attributes in X86VectorVTInfo for subvector insertAdam Nemet2014-10-152-4/+14
| | | | | | | | | The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871
* [AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_sizeAdam Nemet2014-10-151-4/+4
| | | | | | | It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870
* R600: Remove unnecessary part of computeKnownBitsForTargetNodeMatt Arsenault2014-10-151-5/+0
| | | | | | | Zero-width BFEs are combined away already, so there's no point in handling them. llvm-svn: 219868
* Move variable down to useMatt Arsenault2014-10-151-4/+4
| | | | llvm-svn: 219867
* R600/SI: Fix bug where immediates were being used in DS addr operandsTom Stellard2014-10-151-1/+4
| | | | | | | | | | | | | | | | | | | The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848
* Wrong attribute. LLVM_ATTRIBUTE_UNUSED not LLVM_ATTRIBUTE_USEDSid Manning2014-10-151-1/+1
| | | | | | | | | This original fix for the build break was correct. LLVM_ATTRIBUTE_USED removes the warning message because it keeps the function in the object file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used depending on build settings. llvm-svn: 219846
* Wrong attribute. LLVM_ATTRIBUTE_USED not LLVM_ATTRIBUTE_UNUSEDSid Manning2014-10-151-1/+1
| | | | llvm-svn: 219837
* Add LLVM_ATTRIBUTE_UNUSED to function currently just used in an assertSid Manning2014-10-151-0/+2
| | | | | | Fixes break when -Wunused-function is used. llvm-svn: 219833
* Reapply "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-151-0/+76
| | | | | | | | | | | This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831
* [FastISel][AArch64] Factor out add with immediate emission into a helper ↵Juergen Ributzka2014-10-151-13/+28
| | | | | | | | function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830
* Enable the instruction printer in HexagonMCTargetDescSid Manning2014-10-154-4/+64
| | | | | | | | | | This adds the MCInstPrinter to the LLVMHexagonDesc library and removes the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is a prerequisite needed by the disassembler. Phabricator Revision: http://reviews.llvm.org/D5734 llvm-svn: 219826
* R600/SI: Also try to use 0 base for misaligned 8-byte DS loads.Matt Arsenault2014-10-151-0/+17
| | | | llvm-svn: 219823
* R600: Fix miscompiles when BFE has multiple usesMatt Arsenault2014-10-151-7/+10
| | | | | | SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819
* Simplify handling of --noexecstack by using getNonexecutableStackSection.Rafael Espindola2014-10-1516-67/+36
| | | | llvm-svn: 219799
* Move getNonexecutableStackSection up to the base ELF class.Rafael Espindola2014-10-156-23/+0
| | | | | | The .note.GNU-stack section is not SystemZ/X86 specific. llvm-svn: 219796
* R600: Use existing variableMatt Arsenault2014-10-151-1/+1
| | | | llvm-svn: 219778
* R600: Remove outdated commentMatt Arsenault2014-10-151-3/+0
| | | | llvm-svn: 219777
* Revert "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-151-85/+0
| | | | | | This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776
* ARM: drop check for triple that's no longer used.Tim Northover2014-10-151-3/+2
| | | | | | | | | | Early attempts to support AAPCS bare metal MachO targets based the decision on the CPU being compiled for. This was not a particularly great idea and we've got a better option now, but this check remained. No functional change for any target we care about. llvm-svn: 219767
* Remove unused variable.Eric Christopher2014-10-151-1/+0
| | | | llvm-svn: 219750
* [AArch64] Wrong CC access in CSINC-conditional branch sequenceGerolf Hoflehner2014-10-141-5/+1
| | | | | | | | This is a follow up to commit r219742. It removes the CCInMI variable and accesses the CC in CSCINC directly. In the case of a conditional branch accessing the CC with CCInMI was wrong. llvm-svn: 219748
* [AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner2014-10-142-29/+137
| | | | | | | | | | | | | | | | | | | | | Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
* [X86][SSE] pslldq/psrldq shuffle mask decodesSimon Pilgrim2014-10-143-0/+71
| | | | | | | | Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738
* ARM: remove ARM/Thumb distinction for preferred alignment.Tim Northover2014-10-141-5/+0
| | | | | | | | | | | | Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734
* ARM: allow misaligned local variables in Thumb1 mode.Tim Northover2014-10-141-3/+1
| | | | | | | | There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733
* [FastISel][AArch64] Add custom lowering for GEPs.Juergen Ributzka2014-10-141-0/+85
| | | | | | | | This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726
* [x86 asm] allow fwait alias in both At&t and Intel modes (PR21208)Hans Wennborg2014-10-141-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D5741 llvm-svn: 219725
* ARM: set preferred aggregate alignment to 32 universally.Tim Northover2014-10-141-4/+3
| | | | | | | | | | | Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719
* [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.Juergen Ributzka2014-10-141-39/+190
| | | | | | | | | | | Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716
* Reapply "R600: Add new intrinsic to read work dimensions"Jan Vesely2014-10-143-5/+20
| | | | | | | | | This effectively reverts revert 219707. After fixing the test to work with new function name format and renamed intrinsic. Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219710
* Revert "R600: Add new intrinsic to read work dimensions"Rafael Espindola2014-10-143-20/+5
| | | | | | | | This reverts commit r219705. CodeGen/R600/work-item-intrinsics.ll was failing on linux. llvm-svn: 219707
* R600: Add new intrinsic to read work dimensionsJan Vesely2014-10-143-5/+20
| | | | | | | | | | | | | | v2: Add SI lowering Add test v3: Place work dimensions after the kernel arguments. v4: Calculate offset while lowering arguments v5: rebase v6: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219705
* R600: FMA is VecALU only instructionJan Vesely2014-10-141-1/+1
| | | | | | Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219704
* Finish getting Mips fast-isel to match up with AArch64 fast-iselReed Kotler2014-10-141-402/+396
| | | | | | | | | | | | | | | | | | | Summary: In order to facilitate use of common code, checking by reviewers of other fast-isel ports, and hopefully to eventually move most of Mips and other fast-isel ports into target independent code, I've tried to get the two implementations to line up. There is no functional code change. Just methods moved in the file to be in the same order as in AArch64. Test Plan: No functional change. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D5692 llvm-svn: 219703
* R600/SI: Use DS offsets for constant addressesMatt Arsenault2014-10-141-0/+12
| | | | | | | | Use 0 as the base address for a constant address, so if we have a constant address we can save moves and form read2/write2s. llvm-svn: 219698
* [AVX512] Extended avx512_binop_rm to DQ/VL subsets.Robert Khasanov2014-10-141-0/+2
| | | | | | Added encoding tests. llvm-svn: 219686
* [AVX512] Extended avx512_binop_rm to BW/VL subsets.Robert Khasanov2014-10-141-67/+125
| | | | | | Added encoding tests. llvm-svn: 219685
* [AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) ↵Bradley Smith2014-10-141-15/+26
| | | | | | workaround llvm-svn: 219684
* Grab the subtarget info off of the MachineFunction rather thanEric Christopher2014-10-141-1/+1
| | | | | | indirecting through the TargetMachine. llvm-svn: 219674
* Use the triple to figure out if this is a darwin target, notEric Christopher2014-10-141-1/+1
| | | | | | the subtarget. llvm-svn: 219673
OpenPOWER on IntegriCloud