summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU][MC] Corrected parsing of NAME:VALUE modifiersDmitry Preobrazhensky2019-05-171-33/+17
| | | | | | | | | | See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61009 llvm-svn: 361045
* [AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_forkDmitry Preobrazhensky2019-05-171-2/+2
| | | | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=41888 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62016 llvm-svn: 361040
* [X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling.Simon Pilgrim2019-05-171-7/+0
| | | | | | We can now rely on generic expansion to handle this. llvm-svn: 361038
* [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) ↵Simon Pilgrim2019-05-171-0/+9
| | | | | | | | fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035
* [AMDGPU][MC] Enabled expressions for most operands which accept integer valuesDmitry Preobrazhensky2019-05-172-65/+110
| | | | | | | | | | See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60768 llvm-svn: 361031
* AMDGPU: Fix unused variable warnings in release buildsMatt Arsenault2019-05-171-12/+9
| | | | llvm-svn: 361030
* AMDGPU/GlobalISel: Legalize G_FCEILMatt Arsenault2019-05-172-2/+37
| | | | llvm-svn: 361028
* AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNCMatt Arsenault2019-05-172-4/+70
| | | | llvm-svn: 361027
* AMDGPU/GlobalISel: Legalize G_FRINTMatt Arsenault2019-05-172-0/+44
| | | | llvm-svn: 361026
* AMDGPU/GlobalISel: Legalize G_FCOPYSIGNMatt Arsenault2019-05-171-0/+4
| | | | llvm-svn: 361025
* AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.loadMatt Arsenault2019-05-171-0/+44
| | | | llvm-svn: 361023
* AMDGPU/GlobalISel: Use subreg index instead of extra unmergeMatt Arsenault2019-05-171-8/+2
| | | | | | | This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022
* AMDGPU/GlobalISel: Use waterfall loop for buffer_loadMatt Arsenault2019-05-172-36/+302
| | | | | | | This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021
* [X86] Pull out IsNOT helper. NFCI.Simon Pilgrim2019-05-171-8/+16
| | | | | | Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012
* [AMDGPU] detect WaW hazards when moving/merging load/store instructionsRhys Perry2019-05-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008
* [AArch64][SVE2] Asm: add saturating multiply-add long instructionsCullen Rhodes2019-05-171-0/+12
| | | | | | | | | | | | | | | | | Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61997 llvm-svn: 361005
* [AArch64][SVE2] Asm: add integer multiply-add long instructionsCullen Rhodes2019-05-172-0/+49
| | | | | | | | | | | | | | | | | Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61951 llvm-svn: 361003
* [AArch64][SVE2] Asm: add integer multiply long instructionsCullen Rhodes2019-05-172-0/+64
| | | | | | | | | | | | | | | | | Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61936 llvm-svn: 361002
* [X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ↵Craig Topper2019-05-171-0/+2
| | | | | | | | | ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992
* [PowerPC] Support .reloc *, R_PPC{,64}_NONE, *Fangrui Song2019-05-172-28/+49
| | | | | | | | This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. llvm-svn: 360990
* [MC][PowerPC] Clean up PPCAsmBackendFangrui Song2019-05-171-25/+17
| | | | | | | | Replace the member variable Target with Triple Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit. Delete redundant parameters llvm-svn: 360986
* [X86] Support .reloc *, R_{386,X86_64}_NONE, *Fangrui Song2019-05-172-9/+51
| | | | | | | | | | | | | | This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983
* [AArch64] Support .reloc *, R_AARCH64_NONE, *Fangrui Song2019-05-172-2/+18
| | | | | | | | | | | | | Summary: This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61973 llvm-svn: 360981
* [ARM] Support .reloc *, R_ARM_NONE, *Fangrui Song2019-05-176-9/+24
| | | | | | | | | | | | | | | | R_ARM_NONE can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is ubiquitous on ELF and COFF, and probably available on many other binary formats. See D62014. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61992 llvm-svn: 360980
* [SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM()Jonas Paulsson2019-05-171-1/+1
| | | | | | | | | | Make sure to not unroll a vector division/remainder (with a constant splat divisor) after type legalization, since the scalar type may then be illegal. Review: Ulrich Weigand https://reviews.llvm.org/D62036 llvm-svn: 360965
* [X86][AsmParser] Add mnemonics missed in r360954.David L. Jones2019-05-171-1/+2
| | | | | | | | These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960
* [X86][AsmParser] Ignore "short" even harder in Intel syntax ASM.David L. Jones2019-05-161-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954
* [X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate".David L. Jones2019-05-161-9/+9
| | | | | | | | This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953
* [X86] Deduplicate symbol lowering logic, NFCReid Kleckner2019-05-162-88/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952
* [X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. ↵Craig Topper2019-05-161-4/+4
| | | | | | | | | Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915
* AMDGPU: Introduce TokenFactor for ABI register copies in call sequenceMatt Arsenault2019-05-161-0/+7
| | | | | | | The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906
* AMDGPU: Assume xnack is enabled by defaultMatt Arsenault2019-05-163-2/+28
| | | | | | | | | | | This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903
* [AArch64] Handle ISD::LROUND and ISD::LLROUNDAdhemerval Zanella2019-05-162-0/+11
| | | | | | | This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894
* [CodeGen] Add lround/llround builtinsAdhemerval Zanella2019-05-161-0/+2
| | | | | | | | | | | | | This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889
* AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xorMatt Arsenault2019-05-161-1/+1
| | | | | | Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877
* [AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructionsCullen Rhodes2019-05-162-0/+39
| | | | | | | | | | | | | | | Summary: This patch adds support for the indexed and unpredicated vectors forms of the CMLA and SQRDCMLAH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61906 llvm-svn: 360871
* [AArch64][SVE2] Asm: implement CDOT instructionCullen Rhodes2019-05-162-0/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The complex DOT instructions perform a dot-product on quadtuplets from two source vectors and the resuling wide real or wide imaginary is accumulated into the destination register. The instructions come in two forms: Vector form, e.g. cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. The complex numbers in the second source vector are rotated by 90 degrees. cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. The complex numbers in the second source vector are rotated by 180 degrees. Indexed form, e.g. cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 32-bit elements. cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 64-bit elements. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61903 llvm-svn: 360870
* [AArch64][SVE2] Asm: add unpredicated integer multiply instructionsCullen Rhodes2019-05-162-0/+86
| | | | | | | | | | | | | | | | | | | | | Summary: Add support for the following instructions: * MUL (indexed and unpredicated vectors forms) * SQDMULH (indexed and unpredicated vectors forms) * SQRDMULH (indexed and unpredicated vectors forms) * SMULH (unpredicated, predicated form added in SVE) * UMULH (unpredicated, predicated form added in SVE) * PMUL (unpredicated) The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61902 llvm-svn: 360867
* Add Triple::isPPC64()Fangrui Song2019-05-162-3/+2
| | | | llvm-svn: 360864
* [X86] Delay creating index register negations during address matching until ↵Craig Topper2019-05-151-7/+15
| | | | | | | | | | after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823
* [mips] Use range-based `for` loops. NFCSimon Atanasyan2019-05-151-20/+17
| | | | llvm-svn: 360817
* [AArch64] only indicate CFI on Windows if we emitted CFIMandeep Singh Grang2019-05-153-37/+80
| | | | | | | | | | | | | | | | | | | | | | | Summary: Otherwise, we emit directives for CFI without any actual CFI opcodes to go with them, which causes tools to malfunction. The technique is similar to what the x86 backend already does. Fixes https://bugs.llvm.org/show_bug.cgi?id=40876 Patch by: froydnj (Nathan Froyd) Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor Tags: #llvm Differential Revision: https://reviews.llvm.org/D61960 llvm-svn: 360816
* [X86] Strengthen type constraints on some specialized X86 ISD opcodes that ↵Craig Topper2019-05-151-5/+17
| | | | | | | | | | | | don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815
* Uncomment LLVM_FALLTHROUGH.Pete Couperus2019-05-151-1/+1
| | | | llvm-svn: 360798
* [AMDGPU] Increases available SGPR for Calling ConventionRyan Taylor2019-05-152-4/+22
| | | | | | | | | | | | | | | | Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778
* [ARM] Don't use the Machine Scheduler for cortex-m at minsizeDavid Green2019-05-152-1/+8
| | | | | | | | | | | | | | | | | | The new cortex-m schedule in rL360768 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved). I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize. The test shows one case where we increase the number of registers used. Differential Revision: https://reviews.llvm.org/D61882 llvm-svn: 360769
* [ARM] Cortex-M4 scheduleDavid Green2019-05-156-65/+176
| | | | | | | | | | | | | | | | | | | | This patch adds a simple Cortex-M4 schedule, renaming the existing M3 schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM: https://developer.arm.com/docs/ddi0439/latest Most of these are 1, with the important exception being loads taking 2 cycles. A few others are also higher, but I don't believe they make a large difference. I've repurposed the M3 schedule as the latencies are mostly the same between the two cores, with the M4 having more FP and DSP instructions. We also turn on MISched and UseAA for the cores that now use this. It also adds some schedule Write's to various instruction to make things simpler. Differential Revision: https://reviews.llvm.org/D54142 llvm-svn: 360768
* [mips] LLVM and GAS now use same instructions for CFA Definition. NFCISimon Atanasyan2019-05-151-1/+1
| | | | | | | | | | | | | LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set the register and offset for current CFA rule. We change it to `DW_CFA_def_cfa_register` which is the same one used by GAS that only changes the register but keeping the old offset. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D61899 llvm-svn: 360765
* [X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.Craig Topper2019-05-152-5/+3
| | | | | | | | | | | They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747
* [NFC] Reuse a helper function to eliminate duplicate codePhilip Reames2019-05-151-79/+67
| | | | llvm-svn: 360740
OpenPOWER on IntegriCloud