summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Tie source and destination operands for AESMC/AESIMC. Florian Hahn2017-07-291-83/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Most CPUs implementing AES fusion require instruction pairs of the form AESE Vn, _ AESMC Vn, Vn and AESD Vn, _ AESIMC Vn, Vn The constraint is added to AES(I)MC instructions which use the result of an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which constraint source and destination registers to be the same. A nice side effect of this change is that now all possible pairs are scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll test case. I had to update aes_load_store. The version I added initially was very reduced and with the new constraint, AESE/AESMC could not be scheduled back-to-back. I updated the test to be more realistic and still expose the same scheduling problem as the initial test case. Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga Reviewed By: t.p.northover, evandro Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35299 llvm-svn: 309495
* [AArch64] Use 8 bytes as preferred function alignment on Cortex-A53.Florian Hahn2017-07-291-1/+1
| | | | | | | | | | | | | | | | | | Summary: This change gives a 0.25% speedup on execution time, a 0.82% improvement in benchmark scores and a 0.20% increase in binary size on a Cortex-A53. These numbers are the geomean results on a wide range of benchmarks from the test-suite and a range of proprietary suites. Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin Reviewed By: rengolin Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35568 llvm-svn: 309494
* [SelectionDAG][X86] CombineBT - more aggressively determine demanded bitsSimon Pilgrim2017-07-291-5/+1
| | | | | | | | | | | | This patch is in 2 parts: 1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT. 2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match. Differential Revision: https://reviews.llvm.org/D35896 llvm-svn: 309486
* AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flatMatt Arsenault2017-07-291-36/+89
| | | | | | | Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472
* AMDGPU: Teach isLegalAddressingMode about global_* instructionsMatt Arsenault2017-07-291-10/+75
| | | | | | | | Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471
* AMDGPU: Start selecting global instructionsMatt Arsenault2017-07-2927-638/+1004
| | | | llvm-svn: 309470
* Added tests for i8 interleaved-load-pattern of stride=4, VF=(8, 16, 32).Farhana Aleen2017-07-281-0/+495
| | | | llvm-svn: 309447
* Remove the obsolete offset parameter from @llvm.dbg.valueAdrian Prantl2017-07-2814-82/+82
| | | | | | | | | | | | There is no situation where this rarely-used argument cannot be substituted with a DIExpression and removing it allows us to simplify the DWARF backend. Note that this patch does not yet remove any of the newly dead code. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D35951 llvm-svn: 309426
* Fix conditional tail call branch folding when both edges are the sameReid Kleckner2017-07-281-0/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The conditional tail call logic did the wrong thing when both destinations of a conditional branch were the same: BB#1: derived from LLVM BB %entry Live Ins: %EFLAGS Predecessors according to CFG: BB#0 JE_1 <BB#5>, %EFLAGS<imp-use,kill> JMP_1 <BB#5> BB#5: derived from LLVM BB %sw.epilog Predecessors according to CFG: BB#1 TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ... We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5 successor. Then BB#5 would be deleted as it had no predecessors, leaving a dangling "JMP_1 <BB#5>" reference behind to cause assertions later. This patch checks that both conditional branch destinations are different before doing the transform. The standard branch folding logic is able to remove both the JMP_1 and the JE_1, and for my test case we end up forming a better conditional tail call later. Fixes PR33980 llvm-svn: 309422
* AMDGPU: Look through a bitcast user of an out argumentMatt Arsenault2017-07-282-2/+277
| | | | | | | | | | | | | | This allows handling of a lot more of the interesting cases in Blender. Most of the large functions unlikely to be inlined have this pattern. This is a special case for what clang emits for OpenCL 3 element vectors. Annoyingly, these are emitted as <3 x elt>* pointers, but accessed as <4 x elt>* operations. This also needs to handle cases where a struct containing a single vector is used. llvm-svn: 309419
* AMDGPU: Add pass to replace out argumentsMatt Arsenault2017-07-282-0/+585
| | | | | | | | | | | | | | | | | | | | | | | It is better to return arguments directly in registers if we are making a call rather than introducing expensive stack usage. In one of sample compile from one of Blender's many kernel variants, this fires on about ~20 different functions. Future improvements may be to recognize simple cases where the pointer is indexing a small array. This also fails when the store to the out argument is in a separate block from the return, which happens in a few of the Blender functions. This should also probably be using MemorySSA which might help with that. I'm not sure this is correct as a FunctionPass, but MemoryDependenceAnalysis seems to not work with a ModulePass. I'm also not sure where it should run.I think it should run before DeadArgumentElimination, so maybe either EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate. llvm-svn: 309416
* GlobalISel: map 128-bit values to an FPR by default.Tim Northover2017-07-282-18/+21
| | | | | | | Eventually we may want to allow a pair of GPRs but absolutely nothing in the entire world is ready for that yet. llvm-svn: 309404
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-282-11/+58
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* [ARM] Add the option to directly access TLS pointerStrahinja Petrovic2017-07-281-0/+22
| | | | | | | | | This patch enables choice for accessing thread local storage pointer (like '-mtp' in gcc). Differential Revision: https://reviews.llvm.org/D34408 llvm-svn: 309381
* [X86] Add test case for PR33290Simon Pilgrim2017-07-281-0/+51
| | | | llvm-svn: 309375
* [X86][AVX] Cleanup shuffle combine tests - remove old prefixes.Simon Pilgrim2017-07-281-212/+0
| | | | llvm-svn: 309374
* [ARM] Add test to check pcs of ARM ABI runtime floating point helpersPeter Smith2017-07-281-0/+1195
| | | | | | | | | | | | | | | | | The ARM Runtime ABI document (IHI0043) defines the AEABI floating point helper functions in section 4.1.2 The floating-point helper functions. The functions listed in this section must always use the base AAPCS calling convention. This test generates calls to all the helper functions that llvm supports and checks that the base AAPCS calling convention has been used. We test the equivalent of -mfloat-abi=soft, -mfloat-abi=softfp, -mfloat-abi=hardfp with an FPU that supports single and double precision, and one that only supports double precision. Differential Revision: https://reviews.llvm.org/D35904 llvm-svn: 309371
* ARMFrameLowering: Only set ExtraCSSpill for actually unused registers.Matthias Braun2017-07-281-0/+60
| | | | | | | | | | The code assumed that unclobbered/unspilled callee saved registers are unused in the function. This is not true for callee saved registers that are also used to pass parameters such as swiftself. rdar://33401922 llvm-svn: 309350
* [X86] Fix latent bug in sibcall eligibility logicReid Kleckner2017-07-281-0/+42
| | | | | | | | | | | | | | | | | | | The X86 tail call eligibility logic was correct when it was written, but the addition of inalloca and argument copy elision broke its assumptions. It was assuming that fixed stack objects were immutable. Currently, we aim to emit a tail call if no arguments have to be re-arranged in memory. This code would trace the outgoing argument values back to check if they are loads from an incoming stack object. If the stack argument is immutable, then we won't need to store it back to the stack when we tail call. Fortunately, stack objects track their mutability, so we can just make the obvious check to fix the bug. This was http://crbug.com/749826 llvm-svn: 309343
* [AArch64] Fix legality info passed to demanded bits for TBI opt.Ahmed Bougacha2017-07-271-0/+11
| | | | | | | | | | | | | | | The (seldom-used) TBI-aware optimization had a typo lying dormant since it was first introduced, in r252573: when asking for demanded bits, it told TLI that it was running after legalize, where the opposite was true. This is an important piece of information, that the demanded bits analysis uses to make assumptions about the node. r301019 added such an assumption, which was broken by the TBI combine. Instead, pass the correct flags to TLO. llvm-svn: 309323
* Change prefix in vector-shuffle-combining-avx.patch to reduce test size.Dinar Temirbulatov2017-07-271-511/+6
| | | | llvm-svn: 309315
* [SelectionDAG] Improve DAGTypeLegalizer::convertMask assertion (PR33960)Simon Pilgrim2017-07-271-0/+39
| | | | | | Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them. llvm-svn: 309302
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-2784-620/+1511
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* Added cost of ZEROALL and ZEROUPPER instrs in btver2 cpu.Andrew V. Tischenko2017-07-271-2/+2
| | | | | | Differential Revision https://reviews.llvm.org/D35834 llvm-svn: 309269
* [X86][AVX] Regenerate shuffle tests with broadcast comments.Simon Pilgrim2017-07-271-2/+2
| | | | llvm-svn: 309266
* [X86] Adding test cases for LEA factorization (PR32755 / D35014)Simon Pilgrim2017-07-273-0/+156
| | | | | | Differential Revision: https://reviews.llvm.org/D35886 llvm-svn: 309262
* [PowerPC] enable optimizeCompareInstr for branch with static branch hintHiroshi Inoue2017-07-271-0/+24
| | | | | | | | | | In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible. If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen. This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations. Differential Revision: https://reviews.llvm.org/D35801 llvm-svn: 309255
* [AMDGPU] Optimize SI_IF lowering for simple if regionsStanislav Mekhanoshin2017-07-2613-32/+6
| | | | | | | | | | | | | Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64. The xor is used to extract only the changed bits. In case of a simple if region where the only use of that value is in the SI_END_CF to restore the old exec mask, we can omit the xor and perform an or of the exec mask with the original exec value saved by the s_and_saveexec_b64. Differential Revision: https://reviews.llvm.org/D35861 llvm-svn: 309185
* AMDGPU : Widen extending scalar loads to 32-bits.Wei Ding2017-07-262-1/+193
| | | | | | Differential Revision: http://reviews.llvm.org/D35146 llvm-svn: 309178
* AMDGPU: Fix using SMRD instructions for argument loads in functionsMatt Arsenault2017-07-261-7/+4
| | | | | | These are not actually uniform values except in kernels. llvm-svn: 309172
* AMDGPU/GlobalISel: Mark 32-bit G_OR as legalTom Stellard2017-07-261-0/+21
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35127 llvm-svn: 309165
* This patch returns proper value to indicate the case when instruction ↵Andrew V. Tischenko2017-07-267-114/+114
| | | | | | | | throughput can't be calculated. Differential revision https://reviews.llvm.org/D35831 llvm-svn: 309156
* [X86][AVX512] Regenerated and cleaned up extension tests.Simon Pilgrim2017-07-261-184/+184
| | | | llvm-svn: 309139
* [X86] Regenerate setcc testsSimon Pilgrim2017-07-262-6/+9
| | | | llvm-svn: 309138
* [X86][AVX512] Regenerate shuffle tests with broadcast comments.Simon Pilgrim2017-07-262-5/+5
| | | | llvm-svn: 309137
* [X86] Regenerate memset testsSimon Pilgrim2017-07-261-31/+78
| | | | llvm-svn: 309136
* [X86] Add combineBT test failure because bits have multiple uses.Simon Pilgrim2017-07-261-0/+64
| | | | llvm-svn: 309124
* DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offsetZvi Rackover2017-07-263-1136/+259
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adding support for combining power2-strided build_vector's where the first build_vectori's operand is extracted from a non-zero index. Example: v4i32 build_vector((extract_elt V, 1), (extract_elt V, 3), (extract_elt V, 5), (extract_elt V, 7)) --> v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64) Reviewers: delena, RKSimon, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35700 llvm-svn: 309108
* [X86] Regenerated BT testsSimon Pilgrim2017-07-262-304/+646
| | | | | | Test on 32/64 bit targets where appropriate llvm-svn: 309107
* [X86] Add urem vector test for non-uniform pow2 constantsSimon Pilgrim2017-07-261-4/+17
| | | | llvm-svn: 309104
* [X86] Regenerated urem pow2 tests on 32/64 bit targetsSimon Pilgrim2017-07-261-36/+79
| | | | llvm-svn: 309103
* [X86] Regenerated umul overflow tests on 32/64 bit targetsSimon Pilgrim2017-07-261-14/+48
| | | | llvm-svn: 309102
* [ARM] GlobalISel: Map G_GLOBAL_VALUE to GPRDiana Picus2017-07-261-0/+19
| | | | | | A G_GLOBAL_VALUE is basically a pointer, so it should live in the GPR. llvm-svn: 309101
* [X86][AVX] Regenerated and cleaned up AVX1 intrinsic tests.Simon Pilgrim2017-07-267-718/+989
| | | | | | Cleaned up triple settings, added 32-bit/64-bit targets where useful, added broadcast comments llvm-svn: 309100
* [X86][AVX2] Regenerated and cleaned up broadcast tests.Simon Pilgrim2017-07-262-40/+42
| | | | llvm-svn: 309099
* [X86][AVX512] Regenerated and added 32-bit targets to select testsSimon Pilgrim2017-07-261-106/+256
| | | | llvm-svn: 309098
* [X86][AVX] Regenerated and cleaned up masked gather/scatter tests.Simon Pilgrim2017-07-261-32/+8
| | | | | | Remove unused KNL checks and triple settings, added broadcast comments llvm-svn: 309097
* [X86][AVX] Regenerate lzcnt test.Simon Pilgrim2017-07-261-36/+36
| | | | | | Tidied up triples and checks. llvm-svn: 309095
* [X86][FMA] Regenerate test with broadcast comments.Simon Pilgrim2017-07-262-13/+13
| | | | llvm-svn: 309093
* [ARM] GlobalISel: Mark G_GLOBAL_VALUE as legalDiana Picus2017-07-261-0/+26
| | | | llvm-svn: 309090
OpenPOWER on IntegriCloud