summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Split block for si_end_cfMatt Arsenault2019-04-035-17/+128
| | | | | | | | | | | | | | | Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634
* [X86] Extend boolean arguments to inline-asm according to getBooleanTypeKrzysztof Parzyszek2019-04-031-2/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D60208 llvm-svn: 357615
* [X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 ↵Simon Pilgrim2019-04-031-1/+8
| | | | | | | | reduction on AVX1 Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result. llvm-svn: 357611
* [X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction ↵Simon Pilgrim2019-04-031-6/+3
| | | | | | | | on AVX1 Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets. llvm-svn: 357608
* [AArch64][GlobalISel] Legalize G_FEXP2Jessica Paquette2019-04-031-1/+1
| | | | | | | | | Same as G_EXP. Add a test, and update legalizer-info-validation.mir and f16-instructions.ll. Differential Revision: https://reviews.llvm.org/D60165 llvm-svn: 357605
* Test commit: Remove double variable assignmentLewis Revill2019-04-031-1/+1
| | | | llvm-svn: 357601
* [SystemZ] Improve codegen for certain SADDO-immediate casesUlrich Weigand2019-04-032-0/+28
| | | | | | | | | | | | When performing an add-with-overflow with an immediate in the range -2G ... -4G, code currently loads the immediate into a register, which generally takes two instructions. In this particular case, it is preferable to load the negated immediate into a register instead, which always only requires one instruction, and then perform a subtract. llvm-svn: 357597
* [MIPS GlobalISel] Select floating point arithmetic operationsPetar Avramovic2019-04-032-5/+20
| | | | | | | | Select 32 and 64 bit floating point add, sub, mul and div for MIPS32. Differential Revision: https://reviews.llvm.org/D60191 llvm-svn: 357584
* [AArch64] Update v8.5a MTE LDG/STG instructionsJaved Absar2019-04-031-12/+12
| | | | | | | | | | | | | The latest MTE specification adds register Xt to the STG instruction family: STG [Xn, #offset] -> STG Xt, [Xn, #offset] The tag written to memory is taken from Xt rather than Xn. Also, the LDG instruction also was changed to read return address from Xt: LDG Xt, [Xn, #offset]. This patch includes those changes and tests. Specification is at: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60188 llvm-svn: 357583
* [mips] Remove unused FGRH32 register class. NFCSimon Atanasyan2019-04-032-32/+0
| | | | | | | | If we need this class in the future we will easily restore it. Differential Revision: http://reviews.llvm.org/D60132 llvm-svn: 357570
* [X86] Make the post machine scheduler macrofusion-aware.Clement Courbet2019-04-031-0/+7
| | | | | | | | | | | | | | | | Summary: Given that X86 does not use this currently, this is an NFC. I'll experiment with enabling and will report numbers. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60185 llvm-svn: 357568
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-034-6/+32
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* [WebAssembly] Remove unneeded target operand flagsSam Clegg2019-04-037-50/+32
| | | | | | | | | | | This change is in preparation for the addition of new target operand flags for new relocation types. Have a symbol type as part of the flag set makes it harder to use and AFAICT these are serving no purpose. Differential Revision: https://reviews.llvm.org/D60014 llvm-svn: 357548
* AMDGPU: Remove unnecessary subtarget getMatt Arsenault2019-04-031-1/+0
| | | | llvm-svn: 357542
* AMDGPU: Fix names for generation featuresMatt Arsenault2019-04-034-10/+17
| | | | | | | | We should overall stop using these, but the uppercase name didn't work. Any feature string is converted to lowercase, so these could never be found in the table. llvm-svn: 357541
* [X86] Mark the default case of the X86InstrInfo::convertToThreeAddress ↵Craig Topper2019-04-021-1/+1
| | | | | | | | | switch as unreachable. This function should only be called with instructions that are really convertible. And all convertible instructions need to be handled by the switch. So nothing should use the default. llvm-svn: 357529
* [X86] Check MI.isConvertibleTo3Addr() before calling convertToThreeAddress ↵Craig Topper2019-04-021-0/+6
| | | | | | | | | | in X86FixupLEAs. X86FixupLEAs just assumes convertToThreeAddress will return nullptr for any instruction that isn't convertible. But the code in convertToThreeAddress for X86 assumes that any instruction coming in has at least 2 operands and that the second one is a register. But those properties aren't guaranteed of all instructions. We should check the instruction property first. llvm-svn: 357528
* [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)Jessica Paquette2019-04-021-8/+68
| | | | | | | | | | | | | This adds partial instruction selection support for llvm.aarch64.stlxr. It also factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The new function removes the restriction that the intrinsic ID on the G_INTRINSIC_W_SIDE_EFFECTS be on operand 0. Also add a test, and add a GISel line to arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D60100 llvm-svn: 357518
* [X86] Allow FixupLEAs to form INC/DEC under OptSize not just MinSizeCraig Topper2019-04-021-1/+1
| | | | | | This matches our usual INC/DEC heuristic used during isel. llvm-svn: 357497
* [PowerPC] Fix reversed bit issue in DCMX mask for "xvtstdcdp" and ↵Stefan Pintilie2019-04-021-2/+2
| | | | | | | | | | | | | | | | "xvtstdcsp" P9 implementation Did experiments on power 9 machine, checked the outputs for NaN & Infinity+ cases with corresponding DCMX bit set. Confirmed the DCMX mask bit for NaN and infinity+ are reversed. This patch fixes the issue. Patch by Victor Huang. Differential Revision: https://reviews.llvm.org/D59384 llvm-svn: 357494
* [BPF] Replace fstream and sstream with line_iteratorFangrui Song2019-04-021-11/+10
| | | | | | | | | | | | | | | | Summary: This makes libLLVMBPFCodeGen.so 1128 bytes smaller for my build. Reviewers: yonghong-song Reviewed By: yonghong-song Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60117 llvm-svn: 357489
* [SystemZ] Improve instruction selection of 64 bit shifts and rotates.Jonas Paulsson2019-04-022-0/+20
| | | | | | | | | | | | | | | | For shift and rotate instructions that only use the last 6 bits of the shift amount, a shift amount of (x*64-s) can be substituted with (-s). This saves one instruction and a register: lhi %r1, 64 sr %r1, %r3 sllg %r2, %r2, 0(%r1) => lcr %r1, %r3 sllg %r2, %r2, 0(%r1) Review: Ulrich Weigand llvm-svn: 357481
* [mips] Remove the override of the `isMachineVerifierClean()`Simon Atanasyan2019-04-021-4/+0
| | | | | | All issues found by machine verifier in MIPS target have been fixed. llvm-svn: 357473
* [mips] Use AltOrders to prevent using odd FP-registersSimon Atanasyan2019-04-023-18/+24
| | | | | | | | | | | | | | | | | | | | | | | | | To disable using of odd floating-point registers (O32 ABI and -mno-odd-spreg command line option) such registers and their super-registers added to the set of reserved registers. In general, it works. But there is at least one problem - in case of enabled machine verifier pass some floating-point tests failed because live ranges of register units that are reserved is not empty and verification pass failed with "Live segment doesn't end at a valid instruction" error message. There is D35985 patch which tries to solve the problem by explicit removing of register units. This solution did not get approval. I would like to use another approach for prevent using odd floating point registers - define `AltOrders` and `AltOrderSelect` for MIPS floating point register classes. Such `AltOrders` contains reduced set of registers. At first glance, such solution does not break any test cases and allows enabling machine instruction verification for all MIPS test cases. Differential Revision: http://reviews.llvm.org/D59799 llvm-svn: 357472
* [RISCV] Support assembling @plt symbol operandsAlex Bradbury2019-04-028-6/+34
| | | | | | | | | | This patch allows symbols appended with @plt to parse and assemble with the R_RISCV_CALL_PLT relocation. Differential Revision: https://reviews.llvm.org/D55335 Patch by Lewis Revill. llvm-svn: 357470
* Enforce StackID definition in PEISander de Smalen2019-04-022-2/+7
| | | | | | | | | | | | | | | There are various places in LLVM where the definition of StackID is not properly honoured, for example in PEI where objects with a StackID > 0 are allocated on the default stack (StackID0). This patch enforces that PEI only considers allocating objects to StackID 0. Reviewers: arsenm, thegameg, MatzeB Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60062 llvm-svn: 357460
* [X86] Use unsigned type for opcodes throughout X86FixupLEAs.Craig Topper2019-04-021-12/+13
| | | | | | All of the interfaces related to opcode in MachineInstr and MCInstrInfo refer to opcodes as unsigned. llvm-svn: 357444
* [ARM] Optimize expressions like "return x != 0;" for Thumb1.Eli Friedman2019-04-021-11/+13
| | | | | | | | | | | | | There's an existing optimization for x != C, but somehow it was missing a special case for 0. While I'm here, also cleaned up the code/comments a bit: the second value produced by the MERGE_VALUES was actually dead, since a CMOV only produces one result. Differential Revision: https://reviews.llvm.org/D59616 llvm-svn: 357437
* [ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz.Eli Friedman2019-04-011-0/+2
| | | | | | | | | | | | It's a little tricky to make this issue show up because prologue/epilogue emission normally likes to push at least two registers... but it doesn't when lr is force-spilled due to function length. Not sure if that really makes sense, but I decided not to touch it for now. Differential Revision: https://reviews.llvm.org/D59385 llvm-svn: 357436
* [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizingJessica Paquette2019-04-011-1/+2
| | | | | | | | | This improves selection for vector stores into v2s64s. Before we just scalarized them, but we can just use a STRQui instead. Differential Revision: https://reviews.llvm.org/D60083 llvm-svn: 357432
* [X86] Classify the AVX512 rounding control operand as ↵Craig Topper2019-04-013-2/+9
| | | | | | | | | | X86::OPERAND_ROUNDING_CONTROL instead of MCOI::OPERAND_IMMEDIATE. Add an assert on legal values of rounding control in the encoder and remove an explicit mask. This should allow llvm-exegesis to intelligently constrain the rounding mode. The mask in the encoder shouldn't be necessary any more. We used to allow codegen to use 8-11 for rounding mode and the assembler would use 0-3 to mean the same thing so we masked here and in the printer. Codegen now matches the assembler and the printer was updated, but I forgot to update the encoder. llvm-svn: 357419
* [NVPTX] Fix the codegen for llvm.round.Bixia Zheng2019-04-013-10/+107
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is not correct as llvm.round should round away from zero. This change replaces llvm.round with a round away from zero implementation through target specific custom lowering. Modify a few affected tests to not check for cvt.rni. Instead, we check for the use of a few constants used in implementing round. We are also adding CUDA runnable tests to check for the values produced by llvm.round to test-suites/External/CUDA. Reviewers: tra Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59947 llvm-svn: 357407
* [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.Neil Henning2019-04-0110-434/+278
| | | | | | | | | | | | | | | | | | | | | | | This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400
* [AArch64] Add v8.5-a Memory Tagging STZGM instructionDavid Spickett2019-04-011-0/+4
| | | | | | | | | | | | | | | This instruction writes a block of allocation tags and stores zero to the associated data locations. It differs from STGM by 1 bit and has the same arguments. The specification can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60065 llvm-svn: 357397
* [RISCV] Attach VK_RISCV_CALL to symbols upon creationAlex Bradbury2019-04-016-9/+57
| | | | | | | | | | | | | | | | | This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by creating the RISCVMCExpr when tail/call are parsed, or in the codegen case when the callee symbols are created. This required adding a new CallSymbol operand to allow only adding VK_RISCV_CALL to tail/call instructions. This patch will allow further expansion of parsing and codegen to easily include PLT symbols which must generate the R_RISCV_CALL_PLT relocation. Differential Revision: https://reviews.llvm.org/D55560 Patch by Lewis Revill. llvm-svn: 357396
* [AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructionsDavid Spickett2019-04-014-43/+6
| | | | | | | | | | | | | | The STGV/LDGV instructions were replaced with STGM/LDGM. The encodings remain the same but there is no longer writeback so there are no unpredictable encodings to check for. The specfication can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60064 llvm-svn: 357395
* [RISCV] Generate address sequences suitable for mcmodel=mediumAlex Bradbury2019-04-016-35/+110
| | | | | | | | | | | | | | | | | This patch adds an implementation of a PC-relative addressing sequence to be used when -mcmodel=medium is specified. With absolute addressing, a 'medium' codemodel may cause addresses to be out of range. This is because while 'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as opposed to 'small', which implies the first 2 GiB only. Note that LLVM/Clang currently specifies code models differently to GCC, where small and medium imply the same functionality as GCC's medlow and medany respectively. Differential Revision: https://reviews.llvm.org/D54143 Patch by Lewis Revill. llvm-svn: 357393
* [AArch64] Add v8.5-a Memory Tagging GMID_EL1 registerDavid Spickett2019-04-011-0/+1
| | | | | | | | | | | The latest version of the MTE spec added a system register 'GMID_EL1'. It contains the block size used by the LDGM and STGM instructions and is read only. The specification can be found here: https://developer.arm.com/docs/ddi0596/c llvm-svn: 357392
* X86: Fix override warningMatt Arsenault2019-04-011-2/+2
| | | | llvm-svn: 357388
* [X86] Make post-ra scheduling macrofusion-aware.Clement Courbet2019-04-012-0/+9
| | | | | | | | | | Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59688 llvm-svn: 357384
* [RISCV] Add seto pattern expansionLuis Marques2019-04-013-3/+11
| | | | | | | | | Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and `fcmp ord` would be inefficient due to an unoptimized double negation. Differential Revision: https://reviews.llvm.org/D59699 llvm-svn: 357378
* [X86] Use ISD::INTRINSIC_VOID in getTgtMemIntrinsic for truncating stores ↵Craig Topper2019-04-011-1/+3
| | | | | | | | | and scatter intrinsics. This is the appropriate opcode for only having a chain output. Though I'm not sure it matters much. llvm-svn: 357375
* [RISCV] Don't evaluatePCRelLo if a relocation will be forced (e.g. due to ↵Alex Bradbury2019-04-012-2/+21
| | | | | | | | | | | | | | | | | | linker relaxation) A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to the real target. RISCVMCExpr::evaluatePCRelLo will work around this indirection in order to allow the fixup to be evaluate properly. However, if relocations are forced (e.g. due to linker relaxation is enabled) then its evaluation is undesired and will result in a relocation with the wrong target. This patch modifies evaluatePCRelLo so it will not try to evaluate if the fixup will be forced as a relocation. A new helper method is added to RISCVAsmBackend to query this. Differential Revision: https://reviews.llvm.org/D59686 llvm-svn: 357374
* [x86] allow movmsk with 2-element reductionsSanjay Patel2019-03-311-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One motivation for making this change is that the lack of using movmsk is likely a main source of perf difference between clang and gcc on the C-Ray benchmark as shown here: https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5 ...but this change alone isn't enough to solve that problem. The 'all-of' examples show what is likely the worst case trade-off: we end up with an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of' examples look clearly better using movmsk because we've traded 2 vector instructions for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'. If we examine the llvm-mca output for these cases, it appears that even though the 'all-of' movmsk variant looks worse on paper, it would perform better on both Haswell and Jaguar. $ llvm-mca -mcpu=haswell no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 504 Total uOps: 400 Dispatch Width: 4 uOps Per Cycle: 0.79 IPC: 0.79 Block RThroughput: 1.0 $ llvm-mca -mcpu=haswell movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 358 Total uOps: 600 Dispatch Width: 4 uOps Per Cycle: 1.68 IPC: 1.68 Block RThroughput: 1.5 $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 407 Total uOps: 400 Dispatch Width: 2 uOps Per Cycle: 0.98 IPC: 0.98 Block RThroughput: 2.0 $ llvm-mca -mcpu=btver2 movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 311 Total uOps: 600 Dispatch Width: 2 uOps Per Cycle: 1.93 IPC: 1.93 Block RThroughput: 3.0 Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if that's true, then we're also almost certainly making the wrong transform already for reductions with >2 elements, so that should be fixed independently. Differential Revision: https://reviews.llvm.org/D59997 llvm-svn: 357367
* fix typo: "\t" => " "Liang Zou2019-03-312-2/+2
| | | | | | | | | | | | | | Reviewers: llvm.org, Jim Reviewed By: Jim Subscribers: arsenm, jvesely, nhaehnle, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59983 llvm-svn: 357365
* [X86] Teach isel for RMW binops to handle negateCraig Topper2019-03-301-2/+15
| | | | | | | | Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A) Differential Revision: https://reviews.llvm.org/D60007 llvm-svn: 357353
* [RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard ↵Alex Bradbury2019-03-303-17/+128
| | | | | | | | | | | | | | | | | | | | float") ABIs This patch adds support for the RISC-V hard float ABIs, building on top of rL355771, which added basic target-abi parsing and MC layer support. It also builds on some re-organisations and expansion of the upstream ABI and calling convention tests which were recently committed directly upstream. A number of aspects of the RISC-V float hard float ABIs require frontend support (e.g. flattening of structs and passing int+fp for fp+fp structs in a pair of registers), and will be addressed in a Clang patch. As can be seen from the tests, it would be worthwhile extending RISCVMergeBaseOffsets to handle constant pool as well as global accesses. Differential Revision: https://reviews.llvm.org/D59357 llvm-svn: 357352
* [X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316)Simon Pilgrim2019-03-301-10/+21
| | | | | | Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351
* [X86][SSE] detectAVGPattern - begin generalizing ADD matchesSimon Pilgrim2019-03-301-4/+15
| | | | | | Move the ADD matching into a helper - first NFC stage towards supporting 'ADD like' cases such as in PR41316 llvm-svn: 357349
* [WebAssembly] Fix unwind destination mismatches in CFG stackifyHeejin Ahn2019-03-302-19/+516
| | | | | | | | | | | | | | | | | | | Summary: Linearing the control flow by placing `try`/`end_try` markers can create mismatches in unwind destinations. This patch resolves these mismatches by wrapping those instructions with an incorrect unwind destination with a nested `try`/`catch`/`end_try` and branching to the right destination within the new catch block. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D48345 llvm-svn: 357343
OpenPOWER on IntegriCloud