summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable ↵Craig Topper2016-05-214-19/+25
| | | | | | AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318
* [AVX512] Disable AVX2 VPERMD, VPERMQ, VPERMPS, and VPERMPD patterns when ↵Craig Topper2016-05-211-4/+4
| | | | | | AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones. llvm-svn: 270317
* [AVX512] Disable AVX/AVX2 VBROADCASTSS/VBROADCASTSD patterns when AVX512VL ↵Craig Topper2016-05-211-3/+3
| | | | | | is enabled. llvm-svn: 270316
* [AVX512] Use update_llc_test_checks to update some tests so we can see all ↵Craig Topper2016-05-213-4415/+7909
| | | | | | the instruction encodings and ensure everything is with EVEX. llvm-svn: 270315
* [AVX512] Fix test cases I missed in r270311.Craig Topper2016-05-211-4/+4
| | | | llvm-svn: 270313
* AMDGPU: Define priorities for register classesMatt Arsenault2016-05-219-40/+43
| | | | | | | | | | Allocating larger register classes first should give better allocation results (and more importantly for myself, make the lit tests more stable with respect to scheduler changes). Patch by Matthias Braun llvm-svn: 270312
* AMDGPU: Cleanup lowering actionsMatt Arsenault2016-05-212-32/+131
| | | | | | | | These are kind of a mess and hard to follow, particularly for loads and stores. Fix various redundant, unnecessary and dead settings. llvm-svn: 270307
* AMDGPU: Fix high bits after division optimizationMatt Arsenault2016-05-216-41/+329
| | | | | | | This is essentially doing a 24-bit signed division with FP. We need to truncate to the N bit result. llvm-svn: 270305
* AMDGPU: Fix verifier error when spilling SGPRsMatt Arsenault2016-05-211-2/+2
| | | | | | | | | | | The current SGPR spilling test does not stress this because it is using s_buffer_load instructions to increase SGPR pressure and spill, but their output operands have the same SReg_32_XM0 constraint. This fixes an error when the SReg_32 output from most instructions is spilled. llvm-svn: 270301
* AMDGPU: Handle cbranch vccz/vccnzMatt Arsenault2016-05-213-8/+12
| | | | llvm-svn: 270297
* AMDGPU: Implement ReverseBranchConditionMatt Arsenault2016-05-211-6/+2
| | | | llvm-svn: 270296
* AMDGPU: Implement AnalyzeBranchMatt Arsenault2016-05-219-41/+84
| | | | | | Original patch by Tom Stellard llvm-svn: 270295
* [WebAssembly] Optimize away return instructions using fallthroughs.Dan Gohman2016-05-2141-49/+90
| | | | | | | | | This saves a small amount of code size, and is a first small step toward passing values on the stack across block boundaries. Differential Review: http://reviews.llvm.org/D20450 llvm-svn: 270294
* LiveIntervalAnalysis: Rework constructMainRangeFromSubranges()Matthias Braun2016-05-201-0/+32
| | | | | | | | | | | | | | | | | | | | | We now use LiveRangeCalc::extendToUses() instead of a specially designed algorithm in constructMainRangeFromSubranges(): - The original motivation for constructMainRangeFromSubranges() were differences between the main liverange and subranges because of hidden dead definitions. This case however cannot happen anymore with the DetectDeadLaneMasks pass in place. - It simplifies the code. - This fixes a longstanding bug where we did not properly create new SSA values on merging control flow (the MachineVerifier missed most of these cases). - Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and LiveRangeCalc to better match the implementation/available helper functions. This re-applies r269016. The fixes from r270290 and r270259 should avoid the machine verifier problems this time. llvm-svn: 270291
* MachineVerifier: subregs so not require defs/valnos on every pathMatthias Braun2016-05-201-0/+26
| | | | | | | | | | | It is fine for subregister ranges to be undefined on some CFG paths as we may have a "vregX:other_subreg<read-undef> =" def on that path. We do not (and should not) have live segments for the subregister ranges. The MachineVerifier should not complain about this. This is a slight variant of http://llvm.org/PR27705 llvm-svn: 270290
* [PowerPC] Add a testcase for TCO on string rvo functionTim Shen2016-05-201-0/+44
| | | | | | Differential Revision: http://reviews.llvm.org/D20311 llvm-svn: 270287
* [lanai] Change reloc to use PIC_ by default and cleanup.Jacques Pienaar2016-05-201-8/+3
| | | | | | | * Change reloc to PIC_; * Cleanup (clang-format & modify test); llvm-svn: 270282
* LiveIntervalAnalysis: Fix missing defs in renameDisconnectedComponents().Matthias Braun2016-05-201-0/+33
| | | | | | | | | | | | | | Fix renameDisconnectedComponents() creating vreg uses that can be reached from function begin withouthaving a definition (or explicit live-in). Fix this by inserting IMPLICIT_DEF instruction before control-flow joins as necessary. Removes an assert from MachineScheduler because we may now get additional IMPLICIT_DEF when preparing the scheduling policy. This fixes the underlying problem of http://llvm.org/PR27705 llvm-svn: 270259
* [AArch64] Disable narrow load merge by defaultJun Bum Lim2016-05-201-3/+3
| | | | | | | | | | | | | | Summary: As this optimization converts two loads into one load with two shift instructions, it could potentially hurt performance if a loop is arithmetic operation intensive. Reviewers: t.p.northover, mcrosier, jmolloy Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20172 llvm-svn: 270251
* [X86][AVX] Generalized matching for target shuffle combinesSimon Pilgrim2016-05-202-34/+15
| | | | | | | | | | | | This patch is a first step towards a more extendible method of matching combined target shuffle masks. Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple. I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits. Differential Revision: http://reviews.llvm.org/D19198 llvm-svn: 270230
* [X86][AVX] Sync with clang/test/CodeGen/avx-builtins.cSimon Pilgrim2016-05-201-211/+3303
| | | | llvm-svn: 270229
* Refactor X86 symbol access classification.Rafael Espindola2016-05-201-1/+2
| | | | | | | | | | | | This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209
* Simplify handling of hidden stubs on PowerPC.Rafael Espindola2016-05-201-5/+3
| | | | | | | We now handle them just like non hidden ones. This was already the case on x86 (r207518) and arm (r207517). llvm-svn: 270205
* [Sparc] Enable more inline assembly constraints.Chris Dewhurst2016-05-201-0/+12
| | | | | | | | Note: This is specifically to allow GCC's test pr44707 to pass. Trivial change, not put for differential revision. Test included. llvm-svn: 270192
* [X86] Run the AVX/AVX2 intrinsic tests in AVX512VL mode too just to make ↵Craig Topper2016-05-203-1901/+4168
| | | | | | sure we don't break any older intrinsics. llvm-svn: 270183
* Revert accidental commit of a test command line addition.Craig Topper2016-05-201-1/+0
| | | | llvm-svn: 270175
* [X86] Fix some AVX patterns to only be disabled if VLX and BWI are ↵Craig Topper2016-05-201-0/+1
| | | | | | supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL. llvm-svn: 270174
* [ARM, AArch64] Match additional patterns to ldN instructionsMatthew Simpson2016-05-194-0/+196
| | | | | | | | | | | | | | When matching an interleaved load to an ldN pattern, the interleaved access pass checks that all users of the load are shuffles. If the load is used by an instruction other than a shuffle, the pass gives up and an ldN is not generated. This patch considers users of the load that are extractelement instructions. It attempts to modify the extracts to use one of the available shuffles rather than the load. After the transformation, the load is only used by shuffles and will then be matched with an ldN pattern. Differential Revision: http://reviews.llvm.org/D20250 llvm-svn: 270142
* X86: Don't reset the stack after calls that don't return (PR27117)Hans Wennborg2016-05-191-0/+48
| | | | | | | | | Since the calls don't return, the instruction afterwards will never run, and is just taking up unnecessary space in the binary. Differential Revision: http://reviews.llvm.org/D20406 llvm-svn: 270109
* [x86] add tests for urem loweringSanjay Patel2016-05-191-0/+80
| | | | llvm-svn: 270096
* [X86][SSE] Added fast-isel tests to sync with clang/test/CodeGen/sse-builtins.cSimon Pilgrim2016-05-192-0/+2315
| | | | llvm-svn: 270081
* [X86][SSE2] Fixed shuffle of results in _mm_cmpnge_sd/_mm_cmpngt_sd testsSimon Pilgrim2016-05-191-8/+16
| | | | llvm-svn: 270080
* [AArch64 ] Generate a BFXIL from 'or (and X, Mask0Imm),(and Y, Mask1Imm)'.Chad Rosier2016-05-191-0/+67
| | | | | | | | | | | | | | | | | | Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted mask (e.g., 0x000ffff0). Both 'and's must have a single use. This changes code like: and w8, w0, #0xffff000f and w9, w1, #0x0000fff0 orr w0, w9, w8 into lsr w8, w1, #4 bfi w0, w8, #4, #12 llvm-svn: 270063
* [ARM] Add cdp intrinsic tests.Ranjeet Singh2016-05-193-8/+34
| | | | | | | | | | - Renamed intrinsics.ll to intrinsics-coprocessor.ll as all the tests were testing coprocessor instructions, also made the test checks match the full instruction. Differential Revision: http://reviews.llvm.org/D20393 llvm-svn: 270057
* [X86][SSE2] Added _mm_move_* testsSimon Pilgrim2016-05-191-0/+31
| | | | llvm-svn: 270046
* [X86][SSE2] Added _mm_cast* and _mm_set* testsSimon Pilgrim2016-05-191-0/+720
| | | | llvm-svn: 270041
* [mips][mips16] Fix ZERO is not a CPU16Regs register error from the machine ↵Daniel Sanders2016-05-191-1/+2
| | | | | | | | | | | | | | verifier. Summary: Partially fixes PR27458 Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D20330 llvm-svn: 270037
* [X86] Enable RRL part of the LEA optimization pass for -O2.Andrey Turetskiy2016-05-191-20/+39
| | | | | | | | | | Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2. This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro. There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006). Differential Revision: http://reviews.llvm.org/D19659 llvm-svn: 270036
* [WebAssembly] Make several CHECK lines less fragile using regexes and CHECK-DAG.Dan Gohman2016-05-192-31/+31
| | | | llvm-svn: 270011
* AMDGPU: Fix promote alloca for pointer loadsMatt Arsenault2016-05-182-3/+43
| | | | | | | If the load has a pointer type, we don't want to change its type. llvm-svn: 270000
* Delete Reloc::Default.Rafael Espindola2016-05-182-2/+1
| | | | | | | | | | | | Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988
* When looking for a spill slot in reg scavenger, find one that matches RCKrzysztof Parzyszek2016-05-181-0/+100
| | | | | | | | | | | | When looking for an available spill slot, the register scavenger would stop after finding the first one with no register assigned to it. That slot may have size and alignment that do not meet the requirements of the register that is to be spilled. Instead, find an available slot that is the closest in size and alignment to one that is needed to spill a register from RC. Differential Revision: http://reviews.llvm.org/D20295 llvm-svn: 269969
* [X86][SSE2] Added fast-isel tests to sync with ↵Simon Pilgrim2016-05-182-0/+3116
| | | | | | clang/test/CodeGen/sse2-builtins.c llvm-svn: 269966
* AMDGPU: Other sizes of popcnt are fastMatt Arsenault2016-05-181-0/+52
| | | | | | | We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950
* Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA ↵Hans Wennborg2016-05-188-16/+171
| | | | | | | | | | | | instructions" with an additional fix to make RegAllocFast ignore undef physreg uses. It would previously get confused about the "push %eax" instruction's use of eax. That method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate as well, but since that runs after register-allocation, we didn't run into the RegAllocFast issue before. llvm-svn: 269949
* AMDGPU: Fix assert when erroring on a callMatt Arsenault2016-05-181-3/+15
| | | | | | | For some reason an assert is now hit when a valid chain is not returned, so return the entry chain. llvm-svn: 269948
* AMDGPU: Handle alloca promoting with null operandsMatt Arsenault2016-05-183-0/+91
| | | | | | | If the second pointer in a multi-pointer instruction is a constant, we can replace the type. llvm-svn: 269945
* AMDGPU: Fix a few slightly broken testsMatt Arsenault2016-05-188-43/+49
| | | | | | | Fix minor bugs and uses of undef which break when pointer related optimization passes are run. llvm-svn: 269944
* [Hexagon] Recognize "q" and "v" in inline-asm as register constraintsKrzysztof Parzyszek2016-05-181-0/+19
| | | | llvm-svn: 269933
* [WebAssembly] Don't expand divisions by constants.Dan Gohman2016-05-181-0/+62
| | | | | | | Don't expand divisions by constants if it would require multiple instructions. The current assumption is that engines will perform the desired optimizations. llvm-svn: 269930
OpenPOWER on IntegriCloud