summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [llvm-tblgen] Avoid StringMatcher for GCC and MS builtin namesReid Kleckner2016-01-271-6/+1
| | | | | | | | | | | | | | | This brings the compile time of Function.cpp from ~40s down to ~4s for me locally. It also shaves off about 400KB of object file size in a release+asserts build. I also realized that the AMDGPU backend does not have any GCC builtin names to match, so the extra lookup was a no-op. I removed it to silence a zero-length string table array warning. There should be no functional change here. This change really ends the story of PR11951. llvm-svn: 258897
* [llvm-tblgen] Stop emitting the intrinsic name matching codeReid Kleckner2016-01-261-17/+20
| | | | | | | | | The AMDGPU backend was the last user of the old StringMatcher recognition code. Move it over to the new lookupLLVMIntrinsicName funciton, which is now improved to handle all of the interesting edge cases exposed by AMDGPU intrinsic names. llvm-svn: 258875
* [WebAssembly] Omit no-op adds for non-mem uses of FrameIndexDerek Schuff2016-01-263-10/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D16554 llvm-svn: 258872
* [x86] make the subtarget member a const reference, not a pointer ; NFCISanjay Patel2016-01-262-731/+731
| | | | | | It's passed in as a reference; it's not optional; it's not a pointer. llvm-svn: 258867
* [X86] Add support for zeroed shuffle elements to getShuffleScalarEltSimon Pilgrim2016-01-261-2/+6
| | | | | | Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads. llvm-svn: 258865
* Remove autoconf supportChris Bieneman2016-01-2682-1409/+0
| | | | | | | | | | | | | | | | Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861
* [WebAssembly] Remove check for FrameIndex operands in WebAssemblyPeepholeDerek Schuff2016-01-261-14/+9
| | | | | | | This pass runs after FrameIndex elimination, so it should never see FI operands. NFC llvm-svn: 258860
* [x86] add materializeVectorConstant() helper function; NFCSanjay Patel2016-01-261-15/+29
| | | | | | LowerBUILD_VECTOR is still over 300 lines long, but it's a start... llvm-svn: 258858
* WebAssembly NFC: update error messageJF Bastien2016-01-261-1/+2
| | | | | | I forgot to update this one in my previous patch. llvm-svn: 258853
* WebAssembly: don't optimize memcpy/memmove/memcpy to frame indexJF Bastien2016-01-262-20/+31
| | | | | | r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases. llvm-svn: 258851
* Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.Cong Hou2016-01-262-48/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 258847
* [x86] simplify getOnesVector() ; NFCISanjay Patel2016-01-261-20/+11
| | | | | | | Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. llvm-svn: 258833
* Fix Clang-tidy modernize-use-nullptr and modernize-use-override warnings; ↵Eugene Zelenko2016-01-261-10/+8
| | | | | | | | other minor fixes. Differential revision: reviews.llvm.org/D16568 llvm-svn: 258831
* Update wasm target for r258819.Benjamin Kramer2016-01-261-1/+1
| | | | llvm-svn: 258827
* Reflect the MC/MCDisassembler split on the include/ level.Benjamin Kramer2016-01-2613-13/+13
| | | | | | No functional change, just moving code around. llvm-svn: 258818
* [WebAssembly] Fix a typo in a comment.Dan Gohman2016-01-261-1/+1
| | | | llvm-svn: 258810
* [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to ↵Simon Pilgrim2016-01-261-56/+87
| | | | | | | | | | | | | | EltsFromConsecutiveLoads This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load). It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads. After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion. Differential Revision: http://reviews.llvm.org/D16217 llvm-svn: 258798
* [X86] Mark LDS/LES as not being allowed in 64-bit mode.Craig Topper2016-01-261-4/+8
| | | | | | Their opcodes are used as part of the VEX prefix in 64-bit mode. Clearly the disassembler implicitly decoded them as AVX instructions in 64-bit mode, but I think the AsmParser would have encoded them. llvm-svn: 258793
* AMDGPU: Move AMDGPU intrinsics only used by R600Matt Arsenault2016-01-262-10/+13
| | | | llvm-svn: 258790
* AMDGPU: Tidy minor td file issuesMatt Arsenault2016-01-264-247/+249
| | | | | | | | | | Make comments and indentation more consistent. Rearrange a few things to be in a more consistent order, such as organizing subtarget features from those describing an actual device property, and those used as options. llvm-svn: 258789
* AMDGPU: Make v32i8/v64i8 illegal typesMatt Arsenault2016-01-264-21/+13
| | | | | | | | Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788
* AMDGPU: Remove old sample intrinsicsMatt Arsenault2016-01-264-61/+0
| | | | | | | | | | | I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787
* AMDGPU: Add new amdgcn intrinsics for cube instructionsMatt Arsenault2016-01-262-5/+9
| | | | | | | More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786
* AMDGPU: Implement read_register and write_register intrinsicsMatt Arsenault2016-01-262-0/+50
| | | | | | | | | | | | | | Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785
* AMDGPU: Restore AMDGPU prefixed rsq intrinsic for nowMatt Arsenault2016-01-264-6/+13
| | | | | | Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783
* [WebAssembly] Optimize memcpy/memmove/memcpy calls.Dan Gohman2016-01-263-41/+127
| | | | | | | | These calls return their first argument, but because LLVM uses an intrinsic with a void return type, they can't use the returned attribute. Generalize the store results pass to optimize these calls too. llvm-svn: 258781
* [WebAssembly] Remove a completed entry from the README.txt.Dan Gohman2016-01-261-4/+0
| | | | llvm-svn: 258780
* [WebAssembly] Implement unaligned loads and stores.Dan Gohman2016-01-2616-246/+489
| | | | | | Differential Revision: http://reviews.llvm.org/D16534 llvm-svn: 258779
* Sort intrinsics by LLVM intrinsic name, rather than tablegen def nameReid Kleckner2016-01-261-92/+92
| | | | | | | | | | | | Step one towards using a simple binary search to lookup intrinsic IDs instead of our crazy table generated switch+memcmp+startswith code that makes Function.cpp take about a minute to compile. See PR24785 and PR11951 for why we should do this. The X86 backend contains tables that need to be sorted on intrinsic ID, so reorder those. llvm-svn: 258757
* X86ISelLowering: Fix cmov(cmov) special lowering bugMatthias Braun2016-01-251-1/+2
| | | | | | | | | | | | | | | | There's a special case in EmitLoweredSelect() that produces an improved lowering for cmov(cmov) patterns. However this special lowering is currently broken if the inner cmov has multiple users so this patch stops using it in this case. If you wonder why this wasn't fixed by continuing to use the special lowering and inserting a 2nd PHI for the inner cmov: I believe this would incur additional copies/register pressure so the special lowering does not improve upon the normal one anymore in this case. This fixes http://llvm.org/PR26256 (= rdar://24329747) llvm-svn: 258729
* [X86][AVX] Add commutation support for VPERM2X128 instructions Simon Pilgrim2016-01-252-0/+16
| | | | | | | | Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726
* [WebAssembly] Fix unbalanced register stack code in the case of late DCE.Dan Gohman2016-01-251-3/+3
| | | | | | | Instructions can be DCE'd after the RegStackify pass. If the instruction which would be the pop for what would be a push is removed, don't use a push. llvm-svn: 258694
* [WebAssembly] Minor code formatting cleanups. NFC.Dan Gohman2016-01-253-9/+10
| | | | llvm-svn: 258692
* [AVX512] Adding PTESTNMB/D/W/Q instructionMichael Zuckerman2016-01-251-0/+12
| | | | | | Differential Revision: http://reviews.llvm.org/D16520 llvm-svn: 258688
* [AVX512] Adding PTESTMB/W/D/Q instruction Michael Zuckerman2016-01-251-0/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D16519 llvm-svn: 258686
* [ARM] Add DSP build attribute and extension targetingBradley Smith2016-01-251-0/+3
| | | | | | | | This patch was originally committed as r257885, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258683
* [ARM] Add new system registers to ARMv8-M Baseline/MainlineBradley Smith2016-01-254-7/+112
| | | | | | | | This patch was originally committed as r257884, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258682
* [ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/MainlineBradley Smith2016-01-258-1/+95
| | | | | | | | This patch was originally committed as r257883, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258681
* [X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned ↵Asaf Badouh2016-01-255-0/+79
| | | | | | | | | | | 52bit integer VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680
* [ARM] Add ARMv8.2-A FP16 scalar instructionsOliver Stannard2016-01-2512-6/+784
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was originally committed as r255762, but reverted as it broke windows bots. Re-commitiing the exact same patch, as the underlying cause was fixed by r258677. ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. The assembly for these instructions uses S registers (AArch32 does not have H registers), but the instructions have ".f16" type specifiers rather than ".f32" or ".f64". The top 16 bits of each source register are ignored, and the top 16 bits of the destination register are set to zero. These instructions are mostly the same as the 32- and 64-bit versions, but they use coprocessor 9 rather than 10 and 11. Two new instructions, VMOVX and VINS, have been added to allow packing and extracting two 16-bit floats stored in the top and bottom halves of an S register. New fixup kinds have been added for the PC-relative load and store instructions, but no ELF relocations have been added as they have a range of 512 bytes. Differential Revision: http://reviews.llvm.org/D15038 llvm-svn: 258678
* [TableGen] Fix sort order of asm operand classesOliver Stannard2016-01-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | This is a fix for https://llvm.org/bugs/show_bug.cgi?id=22796. The previous implementation of ClassInfo::operator< allowed cycles of classes such that x < y < z < x, meaning that a list of them cannot be correctly sorted, and the sort order could differ with different standard libraries. The original implementation sorted classes by ValueName if they were otherwise equal. This isn't strictly necessary, but some backends seem to accidentally rely on it. If I reverse this comparison I get 8 test failures spread across the AArch64, Mips and X86 backends, so I have left it in until those backends can be fixed. There was one case in the X86 backend where the observable behaviour of the assembler is changed by this patch. This was because some of the memory asm operands were not marked as children of X86MemAsmOperand. Differential Revision: http://reviews.llvm.org/D16141 llvm-svn: 258677
* Silence a -Wparentheses warning; NFC.Junmo Park2016-01-251-2/+2
| | | | llvm-svn: 258676
* AVX1 : Enable vector masked_load/store to AVX1.Igor Breger2016-01-252-110/+41
| | | | | | | | Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
* [AVX512] [CMPPS ][ CMPPD ] Adding full Comparison Predicate names Michael Zuckerman2016-01-251-0/+14
| | | | | | | | | | | X86AsmParser.cpp is missing full comparison predicate names for CMPPD and CMPPS Instructions. X86AsmParser.cpp defines only the short names of the Comparison predicate that you can find in the following pdf: https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Page 5-61 table 5-3 Differential Revision: http://reviews.llvm.org/D16518 llvm-svn: 258671
* Added Skylake client to X86 targets and featuresElena Demikhovsky2016-01-244-140/+129
| | | | | | | | | | | | | Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659
* AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.Igor Breger2016-01-244-8/+42
| | | | | | Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657
* [X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC.Simon Pilgrim2016-01-231-16/+11
| | | | | | Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types. llvm-svn: 258645
* [CUDA] Die gracefully when trying to output an LLVM alias.Justin Lebar2016-01-231-0/+5
| | | | | | | | | | | | | | Summary: Previously, we would just output "foo = bar" in the assembly, and then ptxas would choke. Now we die before emitting any invalid code. Reviewers: echristo Subscribers: jholewinski, llvm-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16490 llvm-svn: 258638
* [CUDA] Make empty parameter lists in nvptx function decls easier to read.Justin Lebar2016-01-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Before: .func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv( ) { After: .func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv() { Reviewers: bkramer Subscribers: llvm-commits, tra, jhen, echristo, jholewinski Differential Revision: http://reviews.llvm.org/D16512 llvm-svn: 258637
* Silence a -Wparentheses warning; NFC.Aaron Ballman2016-01-231-1/+1
| | | | llvm-svn: 258626
OpenPOWER on IntegriCloud