summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt.Florian Hahn2019-12-111-1/+1
| | | | This fixes a crash when debug instructions are in between 2 stores.
* Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"Kerry McLaughlin2019-12-113-98/+2
| | | | | | | | This reverts commit 3f5bf35f868d1e33cd02a5825d33ed4675be8cb1 as it was causing build failures in llvm-clang-x86_64-expensive-checks: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
* [AArch64] Teach Load/Store optimizier to rename store operands for pairing.Florian Hahn2019-12-111-8/+322
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, we can rename a store operand, in order to enable pairing of stores. For store pairs, that cannot be merged because the first tored register is defined in between the second store, we try to find suitable rename register. First, we check if we can rename the given register: 1. The first store register must be killed at the store, which means we do not have to rename instructions after the first store. 2. We scan backwards from the first store, to find the definition of the stored register and check all uses in between are renamable. Along they way, we collect the minimal register classes of the uses for overlapping (sub/super)registers. Second, we try to find an available register from the minimal physical register class of the original register. A suitable register must not be 1. defined before FirstMI 2. between the previous definition of the register to rename 3. a callee saved register. We use KILL flags to clear defined registers while scanning from the beginning to the end of the block. This triggers quite often, here are the top changes for MultiSource, SPEC2000, SPEC2006 compiled with -O3 for iOS: Metric: aarch64-ldst-opt.NumPairCreated Program base patch diff test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0% test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9% test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1% test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5% test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0% test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3% test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0% test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0% test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0% test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0% test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0% test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5% test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9% test-suite...marks/Olden/health/health.test 16.00 18.00 12.5% test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2% test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9% test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1% test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7% test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1% test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0% test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9% test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7% test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4% test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2% test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9% test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9% test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6% test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6% test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2% test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1% test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1% test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0% test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0% test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8% test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8% test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5% test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3% test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2% test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8% test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7% test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7% Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D70450
* [AArch64][SVE] Add DAG combine rules for gather loads and sext/zextAndrzej Warzynski2019-12-113-48/+204
| | | | | | | | | | | | | | | | Summary: These changes allow us to support sign-extending gather loads with the exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*). Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential revision: https://reviews.llvm.org/D70812
* Revert "Reland [AArch64][MachineOutliner] Return address signing for ↵Oliver Stannard2019-12-111-288/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | outlined functions" This reverts commit cec2d5c17457722113580251c8a045fa9aca9b1b. Reverting because this is still creating outlined functions with return address signing instructions with mismatches SP values. For example: int *volatile v; void foo(int x) { int a[x]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } void bar(int x) { int a[x]; v = 0; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } This generates these two outlined functions, both of which modify SP between the paciasp and retaa instructions: $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf ... OUTLINED_FUNCTION_0: // @OUTLINED_FUNCTION_0 .cfi_sections .debug_frame .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state mov w8, w0 lsl x8, x8, #2 add x8, x8, #15 // =15 mov x9, sp and x8, x8, #0x7fffffff0 sub x8, x9, x8 mov x29, sp mov sp, x8 adrp x9, v retaa ... OUTLINED_FUNCTION_1: // @OUTLINED_FUNCTION_1 .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] mov sp, x29 retaa
* [AArch64][SVE] Implement intrinsics for non-temporal loads & storesKerry McLaughlin2019-12-113-2/+98
| | | | | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above. Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71000
* [AArch64] Fix issues with large arrays on stackKiran Chandramohan2019-12-105-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a few issues when large arrays are allocated on the stack. Currently, clang has inconsistent behaviour, for debug builds there is an assertion failure when the array size on stack is around 2GB but there is no assertion when the stack is around 8GB. For release builds there is no assertion, the compilation succeeds but generates incorrect code. The incorrect code generated is due to using int/unsigned int instead of their 64-bit counterparts. This patch, 1) Removes the assertion in frame legality check. 2) Converts int/unsigned int in some places to the 64-bit variants. This helps in generating correct code and removes the inconsistent behaviour. 3) Adds a test which runs without optimisations. Reviewers: sdesmalen, efriedma, fhahn, aemerson Reviewed By: efriedma Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70496
* [AArch64][SVE] Add wide compare immediate patternsCullen Rhodes2019-12-101-0/+101
| | | | | | | | | | | | | | | | | | | | Summary: Recognize wide compares where the wide operand is a splat of a scalar value in the appropriate range and convert to the immediate variant of the instruction. Patch by Graham Hunter Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71009
* [AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors.Eli Friedman2019-12-091-13/+18
| | | | | | | | | | The generated sequence with whilelo is unintuitive, but it's the best I could come up with given the limited number of SVE instructions that interact with scalar registers. The other sequence I was considering was something like dup+cmpne, but an extra scalar instruction seems better than an extra vector instruction. Differential Revision: https://reviews.llvm.org/D71160
* [ARM] Teach the Arm cost model that a Shift can be folded into other ↵David Green2019-12-092-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
* [DebugInfo] Make describeLoadedValue() reg awareDavid Stenberg2019-12-092-11/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes situations in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: ormris, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D70431
* Revert "[DebugInfo] Make describeLoadedValue() reg aware"David Stenberg2019-12-092-69/+11
| | | | | This reverts commit 3cd93a4efcdeabeb20cb7bec9fbddcb540d337a1. I'll recommit with a well-formatted arcanist commit message.
* [DebugInfo] Make describeLoadedValue() reg awareDavid Stenberg2019-12-092-11/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes a case in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers.
* [AArch64][GlobalISel] Add missing default statement to a switch in the selector.Amara Emerson2019-12-061-0/+3
|
* Move variable only used in an assert into the assert itself.Sterling Augustine2019-12-061-2/+1
| | | | This prevents unused variable warnings from breaking the build.
* [AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates.Amara Emerson2019-12-061-5/+71
| | | | | | Only implemented for the type combinations already supported for G_SHL. Differential Revision: https://reviews.llvm.org/D71153
* [AArch64] Fix a bug with jump table generationCullen Rhodes2019-12-062-4/+27
| | | | | | | | | | | | | | | | | | | | Summary: When trying to calculate the offsets for the jump table entries we fail to take into account the block alignment, which could be greater than 4 bytes. This led to cases where the jump table offset was too big to fit in a byte. Reviewers: t.p.northover, sdesmalen, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Committed on behalf of David Sherwood (david-arm) Tags: #llvm Differential Revision: https://reviews.llvm.org/D70533
* [AArch64][SVE2] Implement while comparison intrinsicsCullen Rhodes2019-12-061-10/+9
| | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: * whilege, whilegt, whilehi, whilehs Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70909
* [AArch64][SVE] Implement integer compare intrinsicsCullen Rhodes2019-12-063-34/+168
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Adds intrinsics for the following: * cmphs, cmphi * cmpge, cmpgt * cmpeq, cmpne * cmplt, cmple * cmplo, cmpls Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the scalable flag being dropped. Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70889
* [AArch64] Fix MUL/SUB fusingSanne Wouda2019-12-051-20/+90
| | | | | | | | | | | | | | | | | Summary: When MUL is the first operand to SUB, we can't use MLS because the accumulator should be negated. Emit a NEG of the accumulator and an MLA instead, similar to what we do for FMUL / FSUB fusing. Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf, mstorsjo, asbirlea Reviewed By: asbirlea Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71067
* [AArch64][SVE] Integer reduction instructions pattern/intrinsics.Danilo Carvalho Grael2019-12-055-14/+106
| | | | | | | | Added pattern matching/intrinsics for the following SVE instructions: -- saddv, uaddv -- smaxv, sminv, umaxv, uminv -- orv, eorv, andv
* [AArch64][SVE] Implement element count intrinsicsCullen Rhodes2019-12-052-7/+16
| | | | | | | | | | | | | | | | | | | | | Summary: Adds intrinsics for the following: * cntb * cnth * cntw * cntd * cntp Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70967
* [AArch64][SVE] Add intrinsics and patterns for logical predicate instructionsDanilo Carvalho Grael2019-12-042-17/+27
| | | | | | | Add instrinics and patters for the following logical predicate instructions: -- and, ands, bic, bics, eor, eors -- sel -- orr, orrs, orn, orns, nor, nors, nand, nads
* Reland [AArch64][MachineOutliner] Return address signing for outlined functionsDavid Tellenbach2019-12-051-8/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Reland after fixing an ASan failure by stopping outlining early if the constraints for return address signing removed too many outlining candidates. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635
* Revert "Reland [AArch64][MachineOutliner] Return address signing for ↵Sterling Augustine2019-12-041-284/+8
| | | | | | | | | outlined functions" This reverts commit 02760b750b2ffcc0e2f5d78ecb137c80930c42c3. The original commit is not asan clean. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/37147/steps/check-llvm%20asan/logs/stdio
* Reland [AArch64][MachineOutliner] Return address signing for outlined functionsDavid Tellenbach2019-12-041-8/+284
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Reland after fixing a bug that allowed outlining of SP modifying instructions that invalidated return address signing. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635
* [SVE][AArch64] Adding patterns for while intrinsics.Mikhail Gudim2019-12-042-28/+40
|
* [AArch64][SVE] Implement reversal intrinsicsCullen Rhodes2019-12-042-8/+37
| | | | | | | | | | | | | | | | | | | | | | | Summary: Adds intrinsics for the following: * rbit * revb * revh * revw Patterns are also defined to map the 'llvm.bswap.*' intrinsic to the SVE revb instruction. Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70960
* [AArch64TTI] Compute imm materialization cost for AArch64 intrinsicsFlorian Hahn2019-12-041-0/+6
| | | | | | | | | | | | | | | | Currently, getIntImmCost returns TCC_Free for almost all intrinsics. For most AArch64 specific intrinsics however, it looks like integer constants cannot be folded into most of them (at least the ones I checked). Unless we know that we can fold integer operands with the intrinsic, we handle more cases correctly by returning the cost to materialize the immediate than return TCC_Free. Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D70669
* [AArch64] Fix over-eager fusing of NEON SIMD MUL/ADDSanne Wouda2019-12-032-8/+362
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The ISel pattern for SIMD MLA is a bit too eager: it replaces the ADD with an MLA even when the MUL cannot be eliminated, e.g. when it has another use. An MLA is usually has a higher latency than an ADD (and there are fewer pipes available that can execute it), so trading an MLA for an ADD is not great. ISel is not taking the number of uses of the MUL result into account, nor any other factors such as the length of the critical path or other resource pressure. The MachineCombiner is able to make these judgments so this patch ports the ISel pattern for MUL/ADD fusing to the MachineCombiner. Similarly for MUL/SUB -> MLS, as well as the indexed variants. The change has no impact on SPEC CPU© intrate nor fprate. Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70673
* [Aarch64][SVE] Add intrinsics for gather loads (vector + imm)Sander de Smalen2019-12-034-29/+47
| | | | | | | | | | | | | | | | | | This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index: * @llvm.aarch64.sve.ld1.gather.imm This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words): * ld1h { z0.d }, p0/z, [z0.d, #16] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70806
* [Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsetsSander de Smalen2019-12-034-46/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782
* [AArch64][SVE2] Implement remaining SVE2 floating-point intrinsicsKerry McLaughlin2019-12-032-17/+42
| | | | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: - faddp - fmaxp, fminp, fmaxnmp & fminnmp - fmlalb, fmlalt, fmlslb & fmlslt - flogb Reviewers: huntergr, sdesmalen, dancgr, efriedma Reviewed By: sdesmalen Subscribers: efriedma, tschuett, kristof.beyls, hiraditya, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70253
* [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsetsSander de Smalen2019-12-036-29/+158
| | | | | | | | | | | | | | | | | | | | This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542
* [AArch64][SVE] Implement shift intrinsicsKerry McLaughlin2019-12-035-21/+69
| | | | | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: - asr & asrd - insr - lsl & lsr This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic. Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70437
* [MIBundles] Move analyzePhysReg out of MIBundleOperands iterator (NFC).Florian Hahn2019-12-021-2/+1
| | | | | | | | | | | analyzePhysReg does not really fit into the iterator and moving it makes it easier to change the base iterator. Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D70559
* [ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-AVictor Campos2019-12-021-0/+23
| | | | | | | | | | | | | | | | | | | Summary: Add support for vcadd_* family of intrinsics. This set of intrinsics is available in Armv8.3-A. The fp16 versions require the FP16 extension, which has been available (opt-in) since Armv8.2-A. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70862
* Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack ↵Hans Wennborg2019-11-301-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | adjustment when optimizing for size" This caused asserts (and perhaps also miscompiles) while building for Windows on AArch64. See the discussion on D68530 for details and reproducer. Reverting until this can be investigated and fixed. > For arm64, D18619 introduced the ability to combine bumping the stack pointer > upfront in case it needs to be bumped for both the callee-save area as well as > the local stack area. > > That diff already remarks that "This change can cause an increase in > instructions", but argues that even when that happens, it should be still be a > performance benefit because the number of micro-ops is reduced. > > We have observed that this code-size increase can be significant in practice. > This diff disables combining stack bumping for methods that are marked as > optimize-for-size. > > Example of a prologue with the behavior before this diff (combining stack bumping when possible): > sub sp, sp, #0x40 > stp d9, d8, [sp, #0x10] > stp x20, x19, [sp, #0x20] > stp x29, x30, [sp, #0x30] > add x29, sp, #0x30 > [... compute x8 somehow ...] > stp x0, x8, [sp] > > And after this diff, if the method is marked as optimize-for-size: > stp d9, d8, [sp, #-0x30]! > stp x20, x19, [sp, #0x10] > stp x29, x30, [sp, #0x20] > add x29, sp, #0x20 > [... compute x8 somehow ...] > stp x0, x8, [sp, #-0x10]! > > Note that without combining the stack bump there are two auto-decrements, > nicely folded into the stp instructions, whereas otherwise there is a single > sub sp, ... instruction, but not folded. > > Patch by Nikolai Tillmann! > > Differential Revision: https://reviews.llvm.org/D68530
* [AArch64][v8.3a] Don't emit LDRA '[xN]!' alias in disassembly.Simon Tatham2019-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: In rG643ac6c0420b, the syntax `ldraa x1, [x0]!` was added as an alias for `ldraa x1, [x0, #0]!`. That syntax is less obvious in meaning, and also will not be accepted by assemblers that haven't been updated yet. So it would be better not to emit it as the preferred disassembly for that instruction. This change lowers the EmitPriority of the new alias so that the more explicit syntax `[x0, #0]!` is preferred by the disassembler. The new syntax is still accepted by the assembler. Reviewers: ab, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70813
* AArch64: support the Apple NEON syntax for v8.2 crypto instructions.Tim Northover2019-11-271-11/+15
| | | | Very simple change, just adding the extra syntax variant.
* [Codegen][ARM] Add addressing modes from masked loads and storesDavid Green2019-11-261-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176
* [AArch64][SVE] Implement floating-point conversion intrinsicsKerry McLaughlin2019-11-262-41/+64
| | | | | | | | | | | | | | | | | | | | Summary: Adds intrinsics for the following: - fcvt - fcvtzs & fcvtzu - scvtf & ucvtf - fcvtlt, fcvtnt - fcvtx & fcvtxnt Reviewers: huntergr, sdesmalen, dancgr, mgudim, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70180
* [NFC] [AArch64] Fix wrong documentation for IsStoreRegOffsetOpDavid Tellenbach2019-11-231-1/+1
|
* [AArch64] Add the pipeline model for Exynos M5Evandro Menezes2019-11-222-1/+1014
| | | | Add the scheduling and cost models for Exynos M5.
* [AArch64] [FrameLowering] Allow conditional insertion of CFI instructionDavid Tellenbach2019-11-221-7/+7
| | | | | | | | | | | | | | | | | | Summary: The insertion of most CFI instructions during AArch64 frame lowering can be disabled (e.g. using the function attribute `nounwind`). This patch enables conditional insertion for one more CFI instruction. Reviewers: t.p.northover, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70129
* [PGO][PGSO] DAG.shouldOptForSize part.Hiroshi Yamauchi2019-11-212-9/+5
| | | | | | | | | | | | | | | Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095
* [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"Tom Stellard2019-11-215-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Most libraries are defined in the lib/ directory but there are also a few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining "Component Libraries" as libraries defined in lib/ that may be included in libLLVM.so. Explicitly marking the libraries in lib/ as component libraries allows us to remove some fragile checks that attempt to differentiate between lib/ libraries and tools/ libraires: 1. In tools/llvm-shlib, because llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of all libraries defined in the whole project, there was custom code needed to filter out libraries defined in tools/, none of which should be included in libLLVM.so. This code assumed that any library defined as static was from lib/ and everything else should be excluded. With this change, llvm_map_components_to_libnames(LIB_NAMES, "all") only returns libraries that have been added to the LLVM_COMPONENT_LIBS global cmake property, so this custom filtering logic can be removed. Doing this also fixes the build with BUILD_SHARED_LIBS=ON and LLVM_BUILD_LLVM_DYLIB=ON. 2. There was some code in llvm_add_library that assumed that libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or ARG_LINK_COMPONENTS set. This is only true because libraries defined lib lib/ use LLVMBuild.txt and don't set these values. This code has been fixed now to check if the library has been explicitly marked as a component library, which should now make it easier to remove LLVMBuild at some point in the future. I have tested this patch on Windows, MacOS and Linux with release builds and the following combinations of CMake options: - "" (No options) - -DLLVM_BUILD_LLVM_DYLIB=ON - -DLLVM_LINK_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON Reviewers: beanz, smeenai, compnerd, phosek Reviewed By: beanz Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70179
* Revert "[AArch64] Add the pipeline model for Exynos M5"Eric Christopher2019-11-202-1014/+1
| | | | | | as it's causing test failures in llvm-mca. This reverts commit 9bdfee2a3bd13d405ce1592930182f23849d2897.
* [AArch64] Add the pipeline model for Exynos M5Evandro Menezes2019-11-202-1/+1014
| | | | Add the scheduling and cost models for Exynos M5.
* ExecutionEngine: add preliminary support for COFF ARM64Adam Kallai2019-11-201-1/+4
| | | | Differential Revision: https://reviews.llvm.org/D69434
OpenPOWER on IntegriCloud