summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable ↵Craig Topper2017-10-141-2/+6
| | | | | | | | load folding without the peephole pass. This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns. llvm-svn: 315794
* [AArch64][RegisterBankInfo] Use the statically computed mappings for COPYQuentin Colombet2017-10-141-4/+32
| | | | | | | | | | | | | | | | | | | | We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315781
* Revert r315763: "[Hexagon] Rangify some loops, NFC"Krzysztof Parzyszek2017-10-132-26/+44
| | | | | | Broke some builds (using libstdc++). llvm-svn: 315769
* [X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is ↵Craig Topper2017-10-133-17/+27
| | | | | | | | | | | | | | | | available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768
* [Hexagon] Rangify some loops, NFCKrzysztof Parzyszek2017-10-132-44/+26
| | | | llvm-svn: 315763
* [globalisel][tablegen] Add support for fpimm and import of APInt/APFloat ↵Daniel Sanders2017-10-132-8/+13
| | | | | | | | | | | | | | | | | | | | | | based ImmLeaf. Summary: There's only a tablegen testcase for IntImmLeaf and not a CodeGen one because the relevant rules are rejected for other reasons at the moment. On AArch64, it's because there's an SDNodeXForm attached to the operand. On X86, it's because the rule either emits multiple instructions or has another predicate using PatFrag which cannot easily be supported at the same time. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36569 llvm-svn: 315761
* AMDGPU: Implement hasBitPreservingFPLogicMatt Arsenault2017-10-132-0/+6
| | | | llvm-svn: 315754
* [Hexagon] Avoid unused variable warnings in release builds.Benjamin Kramer2017-10-131-0/+4
| | | | | | No functionality change intended. llvm-svn: 315749
* AMDGPU: Look for src mods before fp_extendMatt Arsenault2017-10-131-1/+17
| | | | | | | When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. llvm-svn: 315748
* [aarch64] Support APInt and APFloat in ImmLeaf subclasses and make AArch64 ↵Daniel Sanders2017-10-131-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use them. Summary: The purpose of this patch is to expose more information about ImmLeaf-like PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf could only be used to test int64_t's produced by sign-extending an APInt. Other tests on immediates had to use the generic PatLeaf and extract the constant using C++. With this patch, tablegen will know how to generate predicates for APInt, and APFloat. This will allow it to 'do the right thing' for both SelectionDAG and GlobalISel which require different methods of extracting the immediate from the IR. This is NFC for SelectionDAG since the new code is equivalent to the previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1 for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new subclasses will require a significant re-factor of FastISel. For GlobalISel, it's currently NFC because the relevant code to import the affected rules is not yet present. This will be added in a later patch. Depends on D36086 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36534 llvm-svn: 315747
* AMDGPU: Implement isFPExtFoldableMatt Arsenault2017-10-132-0/+12
| | | | | | This helps match v_mad_mix* in some cases. llvm-svn: 315744
* DAG: Add opcode and source type to isFPExtFreeMatt Arsenault2017-10-132-3/+4
| | | | | | | | This is only currently used for mad/fma transforms. This is the only case where it should be used for AMDGPU, so add an opcode to be sure. llvm-svn: 315740
* [Hexagon] Minimize number of repeated constant extendersKrzysztof Parzyszek2017-10-133-0/+1863
| | | | | | | | | | | | | | Each constant extender requires an extra instruction, which adds to the code size and also reduces the number of available slots in an instruction packet. In most cases, the value of a repeated constant extender could be loaded into a register, and the instructions using the extender could be replaced with their counterparts that use that register instead. This patch adds a pass that tries to reduce the number of constant extenders, including extenders which differ only in an immediate offset known at compile time, e.g. @global and @global+12. llvm-svn: 315735
* [X86] Add initial skeleton support for knm cpuCraig Topper2017-10-131-5/+16
| | | | | | | | This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it. Differential Revision: https://reviews.llvm.org/D38811 llvm-svn: 315722
* [X86] Fix some inconsistent formatting in the processor feature lists.Craig Topper2017-10-131-4/+4
| | | | llvm-svn: 315696
* [X86] Add ProcIntelBDW to BroadwellProc class not BDWFeatures class.Craig Topper2017-10-131-4/+5
| | | | | | This isn't a property we want inherited. llvm-svn: 315695
* [Hexagon] Add patterns for cmpb/cmph with immediate argumentsKrzysztof Parzyszek2017-10-131-0/+46
| | | | | | Patch by Sumanth Gundapaneni. llvm-svn: 315692
* [X86] Stop creating CMOV nodes with a second MVT::Glue resultCraig Topper2017-10-131-24/+9
| | | | | | | | | | | | | | Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this. Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38664 llvm-svn: 315686
* [X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. ↵Craig Topper2017-10-131-8/+20
| | | | | | | | Prefer vbroadcastsd/vpbroadcastq instead. There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table. llvm-svn: 315674
* Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"Matthias Braun2017-10-1242-72/+546
| | | | | | | | | | Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637
* TargetMachine: Merge TargetMachine and LLVMTargetMachineMatthias Braun2017-10-1242-546/+72
| | | | | | | | | | | | | | | Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633
* [X86] Add CLWB intrinsic. llvm partCraig Topper2017-10-121-2/+2
| | | | llvm-svn: 315613
* Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.Wei Ding2017-10-125-38/+78
| | | | | | Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610
* AMDGPU/NFC: Move AMDGPU specific note types to ELF.hKonstantin Zhuravlyov2017-10-123-10/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D38747 llvm-svn: 315608
* [NVPTX] Implemented wmma intrinsics and instructions.Artem Belevich2017-10-124-0/+845
| | | | | | | | | | WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601
* [codeview] Don't emit FPO data in funclet prologuesReid Kleckner2017-10-122-6/+3
| | | | | | Attempt 3 to work around bugs in FPO data with funclets. llvm-svn: 315600
* AMDGPU: Fix warnings introduced in r315526Konstantin Zhuravlyov2017-10-122-5/+5
| | | | llvm-svn: 315596
* [PowerPC] Add profitablilty check for conversion to mtctr loopsLei Huang2017-10-121-1/+32
| | | | | | | | | | | | | | | Add profitability checks for modifying counted loops to use the mtctr instruction. The latency of mtctr is only justified if there are more than 4 comparisons that will be removed as a result. Usually counted loops are formed relatively early and before unrolling, so most low trip count loops often don't survive. However we want to ensure that if they do, we do not mistakenly update them to mtctr loops. Use CodeMetrics to ensure we are only doing this for small loops with small trip counts. Differential Revision: https://reviews.llvm.org/D38212 llvm-svn: 315592
* [AMDGPU] For amdpal, widen interpolation mode workaroundTim Renouf2017-10-121-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 llvm-svn: 315591
* [dump] Remove NDEBUG from test to enable dump methods [NFC]Don Hinton2017-10-1211-13/+13
| | | | | | | | | | | | | | | Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590
* [x86] replace isEqualTo with == for efficiencySanjay Patel2017-10-121-4/+4
| | | | | | | This is a follow-up suggested in D37534. Patch by Yulia Koval. llvm-svn: 315589
* [X86][SSE] Pull out repeated INSERT_VECTOR_ELT code from LowerBUILD_VECTOR ↵Simon Pilgrim2017-10-121-57/+51
| | | | | | v16i8/v8i16 insertion. NFCI. llvm-svn: 315587
* Speculative build fix 2Reid Kleckner2017-10-121-1/+1
| | | | llvm-svn: 315542
* Revert r307036 because of PR34919.Wei Mi2017-10-121-13/+0
| | | | llvm-svn: 315540
* Speculative build fix, apparently I built llc without my patch applied to ↵Reid Kleckner2017-10-121-1/+1
| | | | | | test it llvm-svn: 315539
* [codeview] Disable FPO in functions using EH funcletsReid Kleckner2017-10-122-0/+5
| | | | | | | Funclets are emitted by WinException which doesn't have access to X86TargetStreamer so it's hard to make a quick fix for this. llvm-svn: 315538
* Fix AMDGPU build issueReid Kleckner2017-10-111-1/+1
| | | | llvm-svn: 315535
* [X86] Sink X86AsmPrinter ctor into .cpp file, NFCReid Kleckner2017-10-112-3/+5
| | | | | | I keep adding and removing code here, so let's sink it. llvm-svn: 315534
* [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.Lang Hames2017-10-1124-110/+154
| | | | | | | | MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that, and allows us to remove the last instance of MCObjectStreamer's weird "holding ownership via someone else's reference" trick. llvm-svn: 315531
* AMDGPU/NFC: Minor clean ups in HSA metadataKonstantin Zhuravlyov2017-10-117-125/+110
| | | | | | | | | - Use HSA metadata streamer directly from AMDGPUAsmPrinter - Make naming consistent with PAL metadata Differential Revision: https://reviews.llvm.org/D38746 llvm-svn: 315526
* AMDGPU/NFC: Minor clean ups in PAL metadataKonstantin Zhuravlyov2017-10-116-90/+79
| | | | | | | | | - Move PAL metadata definitions to AMDGPUMetadata - Make naming consistent with HSA metadata Differential Revision: https://reviews.llvm.org/D38745 llvm-svn: 315523
* AMDGPU/NFC: Rename code object metadata as HSA metadataKonstantin Zhuravlyov2017-10-117-77/+76
| | | | | | | | | - Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change) - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer - Introduce HSAMD namespace - Other minor name changes in function and test names llvm-svn: 315522
* [codeview] Implement FPO data assembler directivesReid Kleckner2017-10-1112-45/+704
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds a set of new directives that describe 32-bit x86 prologues. The directives are limited and do not expose the full complexity of codeview FPO data. They are merely a convenience for the compiler to generate more readable assembly so we don't need to generate tons of labels in CodeGen. If our prologue emission changes in the future, we can change the set of available directives to suit our needs. These are modelled after the .seh_ directives, which use a different format that interacts with exception handling. The directives are: .cv_fpo_proc _foo .cv_fpo_pushreg ebp/ebx/etc .cv_fpo_setframe ebp/esi/etc .cv_fpo_stackalloc 200 .cv_fpo_endprologue .cv_fpo_endproc .cv_fpo_data _foo I tried to follow the implementation of ARM EHABI CFI directives by sinking most directives out of MCStreamer and into X86TargetStreamer. This helps avoid polluting non-X86 code with WinCOFF specific logic. I used cdb to confirm that this can show locals in parent CSRs in a few cases, most importantly the one where we use ESI as a frame pointer, i.e. the one in http://crbug.com/756153#c28 Once we have cdb integration in debuginfo-tests, we can add integration tests there. Reviewers: majnemer, hans Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D38776 llvm-svn: 315513
* [Hexagon] Make sure that new-value jump is packetized with producerKrzysztof Parzyszek2017-10-111-9/+15
| | | | llvm-svn: 315510
* [PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex ↵Lei Huang2017-10-112-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | elimination to only use `lis/addi` if necessary. Currently we produce a bunch of unnecessary code when emitting the prologue/epilogue for spills/restores. Namely, if the load from stack slot/store to stack slot instruction is an X-Form instruction, we will always produce an LIS/ORI sequence for the stack offset. Furthermore, we have not exploited the P9 vector D-Form loads/stores for this purpose. This patch address both issues. Specifying the D-Form load as the instruction to use for stack spills/reloads should be safe because: 1. The stack should be aligned according to the ABI 2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will check for the offset being a multiple of 16 and will convert it to an X-Form instruction if it isn't. Differential Revision : https://reviews.llvm.org/D38758 llvm-svn: 315500
* [x86] avoid infinite loop from SoftenFloatOperand (PR34866)Sanjay Patel2017-10-111-0/+5
| | | | | | | | | Legalization of fp128 assumes things that we should have asserts for, so that's another potential improvement. Differential Revision: https://reviews.llvm.org/D38771 llvm-svn: 315485
* [Hexagon] Handle non-immediate operands to A2_addi in getIncrementValueKrzysztof Parzyszek2017-10-111-4/+6
| | | | llvm-svn: 315472
* Spelling mistake in comment. NFCI.Simon Pilgrim2017-10-111-1/+1
| | | | llvm-svn: 315471
* [X86] Remove MVT::i1 handling code from LowerTRUNCATECraig Topper2017-10-111-8/+0
| | | | | | | | | | | | | | Summary: I don't think this is necessary with i1 being illegal now. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38784 llvm-svn: 315469
* [Pipeliner] Fix offset value for instrs dependent on post-inc load/storesKrzysztof Parzyszek2017-10-111-7/+8
| | | | | | | | | | | | The software pipeliner and the packetizer try to break dependence between the post-increment instruction and the dependent memory instructions by changing the base register and the offset value. However, in some cases, the existing logic didn't work properly and created incorrect offset value. Patch by Jyotsna Verma. llvm-svn: 315468
OpenPOWER on IntegriCloud