summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Add a few more lit testsKrzysztof Parzyszek2018-03-1918-1/+1652
| | | | llvm-svn: 327884
* [X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model.Craig Topper2018-03-191-8/+8
| | | | | | PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong. llvm-svn: 327882
* [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.Craig Topper2018-03-192-16/+16
| | | | | | | | JRCXZ was already present, but not the others. We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output. llvm-svn: 327881
* [X86] Correct regular expression in Zen scheduler model that was excluding ↵Craig Topper2018-03-192-2/+2
| | | | | | | | JECXZ instruction. The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago. llvm-svn: 327880
* [PowerPC][Power9]Legalize and emit code for quad-precision add/div/mul/subLei Huang2018-03-191-0/+73
| | | | | | | | | | | | | Legalize and emit code for quad-precision floating point operations: * xsaddqp * xssubqp * xsdivqp * xsmulqp Differential Revision: https://reviews.llvm.org/D44506 llvm-svn: 327878
* [PowerPC] Make AddrSpaceCast noopNemanja Ivanovic2018-03-191-0/+22
| | | | | | | | | | | PowerPC targets do not use address spaces. As a result, we can get selection failures with address space casts. This patch makes those casts noops. Patch by Valentin Churavy. Differential revision: https://reviews.llvm.org/D43781 llvm-svn: 327877
* [X86] Remove sse41 specific code from lowering v16i8 multiplyCraig Topper2018-03-197-261/+230
| | | | | | With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that. llvm-svn: 327869
* [X86] Make the multiply and divide itineraries more consistent.Craig Topper2018-03-191-2/+2
| | | | | | | | | | Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage. We also used the MUL8 itinerary for MULX32/64 which was also weird. The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result. llvm-svn: 327866
* Revert [MachineLICM] This reverts commit rL327856Zaara Syeda2018-03-193-129/+4
| | | | | | Failing build bots. Revert the commit now. llvm-svn: 327864
* [CodeGen] Avoid handling DBG_VALUE in the LivePhysRegs ↵Matt Davis2018-03-191-0/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | (addUses,removeDefs,stepForward) Summary: This patch prevents DBG_VALUE instructions from influencing LivePhysRegs::stepBackwards and stepForwards. In at least one case, specifically branch folding, the stepBackwards logic was having an influence on code generation. The result was that certain code compiled with '-g -O2' would differ from that compiled with '-O2' alone. It seems that the original logic, accounting for DBG_VALUE, was influencing the placement of an IMPLICIT_DEF which had a later impact on how blocks were processed in branch folding. Reviewers: kparzysz, MatzeB Reviewed By: kparzysz Subscribers: bjope, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D43850 llvm-svn: 327862
* [MachineLICM] Add functions to MachineLICM to hoist invariant storesZaara Syeda2018-03-193-4/+129
| | | | | | | | | | | | | | | | This patch adds functions to allow MachineLICM to hoist invariant stores. Currently, MachineLICM does not hoist any store instructions, however when storing the same value to a constant spot on the stack, the store instruction should be considered invariant and be hoisted. The function isInvariantStore iterates each operand of the store instruction and checks that each register operand satisfies isCallerPreservedPhysReg. The store may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore. This patch also adds the PowerPC changes needed to consider the stack register as caller preserved. Differential Revision: https://reviews.llvm.org/D40196 llvm-svn: 327856
* [X86] Generalize schedule classes to support multiple stagesSimon Pilgrim2018-03-191-1/+1
| | | | | | | | | | | | Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults. This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases. I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific. Differential Revision: https://reviews.llvm.org/D44612 llvm-svn: 327855
* [x86] put nops into the WriteNop class and customize for JaguarSanjay Patel2018-03-193-12/+12
| | | | | | | | | | | | 1. Given that we already have a classification bucket with 'nop' in the name, that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'. 2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably llvm-mca) how to model the resource requirements better even though a nop has no dependencies. Differential Revision: https://reviews.llvm.org/D44608 llvm-svn: 327853
* AMDGPU/GlobalISel: RegBankSelect for basic int opsMatt Arsenault2018-03-193-0/+201
| | | | llvm-svn: 327843
* AMDGPU: Don't leave dead illegal VGPR->SGPR copiesMatt Arsenault2018-03-192-4/+39
| | | | | | | | | Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842
* [MergeICmps] Re-land 324317 "Enable the MergeICmps Pass by default."Clement Courbet2018-03-193-41/+16
| | | | | | Now that PR36557 is fixed. llvm-svn: 327840
* [ARM] Support for v4f16 and v8f16 vectorsSjoerd Meijer2018-03-192-0/+60
| | | | | | | | | | | | This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839
* [SystemZ] Bugfix of CC liveness in emitMemMemWrapper (CLC).Jonas Paulsson2018-03-191-0/+20
| | | | | | | | | If DoneMBB becomes empty it must have CC added to its live-in list, since it will fall-through into EndMBB. This happens when the CLC loop does the complete range. Review: Ulrich Weigand llvm-svn: 327834
* [RISCV] Peephole optimisation for load/store of global values or constant ↵Alex Bradbury2018-03-196-67/+36
| | | | | | | | | | | addresses (load (add base, off), 0) -> (load base, off) (store val, (add base, off)) -> (store val, base, off) This is similar to an equivalent peephole optimisation in PPCISelDAGToDAG. llvm-svn: 327831
* [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to ↵Craig Topper2018-03-191-40/+40
| | | | | | | | match ADD8i8. Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm. llvm-svn: 327820
* [AVR] Lower i128 divisions to runtime library callsDylan McKay2018-03-191-0/+52
| | | | | | | | | | | This patch adds i128 division support by instruction LLVM to lower 128-bit divisions to the __udivmodti4 and __divmodti4 rtlib functions. This also adds test for 64-bit division and 128-bit division. Patch by Peter Nimmervoll. llvm-svn: 327814
* [X86][Btver2] Fix crc32 schedule costsSimon Pilgrim2018-03-181-10/+10
| | | | | | The default is currently FAdd for some reason llvm-svn: 327807
* [X86] Fix a bunch of overlapping regular expressions in the scheduler models.Craig Topper2018-03-185-213/+213
| | | | llvm-svn: 327787
* [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.Craig Topper2018-03-181-2/+2
| | | | | | | | | | | | | | The information was so wildly inaccurate and incomplete its better to just remove it. MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information. MMX_MASKMOVQ and MASKMOVDQU were completely missing. MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction. Filed PR36780 to track fixing this right. llvm-svn: 327783
* Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""Nirav Dave2018-03-1715-323/+383
| | | | | | as it times out building test-suite on PPC. llvm-svn: 327778
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-1715-383/+323
| | | | | | | Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777
* AMDGPU/GlobalISel: Cleanup constant legalityMatt Arsenault2018-03-171-56/+22
| | | | llvm-svn: 327774
* AMDGPU/GlobalISel: Basic G_GEP legalityMatt Arsenault2018-03-171-0/+92
| | | | llvm-svn: 327773
* AMDGPU/GlobalISel: Basic legality for load/storeMatt Arsenault2018-03-172-0/+253
| | | | llvm-svn: 327772
* [X86] Added support for nocf_check attribute for indirect Branch TrackingOren Ben Simhon2018-03-172-21/+88
| | | | | | | | | | | | | | | X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767
* [SystemZ] Add 'REQUIRES: asserts' to test case using debug output.Jonas Paulsson2018-03-171-0/+1
| | | | llvm-svn: 327766
* [SystemZ] computeKnownBitsForTargetNode() / ComputeNumSignBitsForTargetNode()Jonas Paulsson2018-03-177-2/+1265
| | | | | | | | | | | Improve/implement these methods to improve DAG combining. This mainly concerns intrinsics. Some constant operands to SystemZISD nodes have been marked Opaque to avoid transforming back and forth between generic and target nodes infinitely. Review: Ulrich Weigand llvm-svn: 327765
* [SelectionDAG] Handle big endian target BITCAST in computeKnownBits()Jonas Paulsson2018-03-171-0/+29
| | | | | | | | | | | | | | | The BITCAST handling in computeKnownBits() previously only worked for little endian. This patch reverses the iteration over elements for a big endian target which allows this to work in this case also. SystemZ test case. Review: Eli Friedman https://reviews.llvm.org/D44249 llvm-svn: 327764
* [MachineOutliner] Make KILLs invisibleJessica Paquette2018-03-161-0/+3
| | | | | | | | | At the point the outliner runs, KILLs don't impact anything, but they're still considered unique instructions. This commit makes them invisible like DebugValues so that they can still be outlined without impacting outlining decisions. llvm-svn: 327760
* [Hexagon] Avoid bank conflicts in post-RA schedulerKrzysztof Parzyszek2018-03-161-0/+156
| | | | | | | | | | Avoid scheduling two loads in such a way that they would end up in the same packet. If there is a load in a packet, try to schedule a non-load next. Patch by Brendon Cahoon. llvm-svn: 327742
* [Hexagon] Add lit testcases for atomic intrinsicsKrzysztof Parzyszek2018-03-163-0/+177
| | | | | | Patch by Ben Craig. llvm-svn: 327737
* [IR] Avoid the need to prefix MS C++ symbols with '\01'Reid Kleckner2018-03-161-0/+60
| | | | | | | | | | | | | | | | | | | | Now the Windows mangling modes ('w' and 'x') do not do any mangling for symbols starting with '?'. This means that clang can stop adding the hideous '\01' leading escape. This means LLVM debug logs are less likely to contain ASCII escape characters and it will be easier to copy and paste MS symbol names from IR. Finally. For non-Windows platforms, names starting with '?' still get IR mangling, so once clang stops escaping MS C++ names, we will get extra '_' prefixing on MachO. That's fine, since it is currently impossible to construct a triple that uses the MS C++ ABI in clang and emits macho object files. Differential Revision: https://reviews.llvm.org/D7775 llvm-svn: 327734
* [AMDGPU] Supported ds_write_b128 generation.Farhana Aleen2018-03-166-17/+43
| | | | | | | | | | | | | | Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726
* [X86] Post process the DAG after isel to remove vector moves that were added ↵Craig Topper2018-03-164-5/+0
| | | | | | | | | | | | to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724
* [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP ↵Dmitry Preobrazhensky2018-03-161-43/+43
| | | | | | | | | | | opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723
* [Hexagon] Fix zero-extending non-HVX bool vectorsKrzysztof Parzyszek2018-03-161-0/+69
| | | | llvm-svn: 327712
* [X86][Btver2] Add correct mul/imul schedule costsSimon Pilgrim2018-03-161-4/+4
| | | | | | Integer multiply is performed on the JMul function unit and i64 requires double pumping llvm-svn: 327707
* [X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costsSimon Pilgrim2018-03-163-18/+18
| | | | | | Don't use WriteIMul defaults llvm-svn: 327706
* [ARM] FP16 codegen support for VSELSjoerd Meijer2018-03-161-1/+39
| | | | | | | | | This implements lowering of SELECT_CC for f16s, which enables codegen of VSEL with f16 types. Differential Revision: https://reviews.llvm.org/D44518 llvm-svn: 327695
* [SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job ↵Craig Topper2018-03-153-317/+419
| | | | | | | | | | | | | | | | picking the result type for the setcc. Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type. But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86. This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck. This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to. For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing. llvm-svn: 327683
* [X86] Make sure we use FSUB instruction as the reference for operand order ↵Craig Topper2018-03-151-5/+19
| | | | | | | | in isAddSubOrSubAdd when recognizing subadd The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node. llvm-svn: 327660
* [X86] Add test case showing bad fmsubadd creation due to bad commuting.Craig Topper2018-03-151-0/+19
| | | | | | The code that creates fmsubadd from shuffle vector has some code to allow commuting the operands of the fadd node. This code was originally created when we only recognized fmaddsub. When fmsubadd support was added this code was not updated and is now commuting the fsub operands instead. llvm-svn: 327659
* [PPC] Avoid non-simple MVT in STBRX optimizationGuozhi Wei2018-03-151-0/+18
| | | | | | | | | | PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash. This patch detects the non-simple MVT and returns early. Differential Revision: https://reviews.llvm.org/D44500 llvm-svn: 327651
* [PowerPC] Optimize TLS initial-exec sequence to use X-Form loads/storesZaara Syeda2018-03-151-0/+169
| | | | | | | | | This patch adds new load/store instructions for integer scalar types which can be used for X-Form when fed by add with an @tls relocation. Differential Revision: https://reviews.llvm.org/D43315 llvm-svn: 327635
* [X86][Btver2] Remove JAny resource, and map system/microcoded instructions ↵Simon Pilgrim2018-03-155-123/+123
| | | | | | | | to JALU pipes Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6) llvm-svn: 327632
OpenPOWER on IntegriCloud