summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [TargetLowering] Add SimplifyDemandedBitsForTargetNode callbackSimon Pilgrim2018-10-241-0/+24
| | | | | | | | Add a SimplifyDemandedBitsForTargetNode callback to handle target nodes. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345179
* [CodeGen] skip lifetime end marker in isInTailCallPositionRobert Lougher2018-10-241-0/+4
| | | | | | | | | A lifetime end intrinsic between a tail call and the return should not prevent the call from being tail call optimized. Differential Revision: https://reviews.llvm.org/D53519 llvm-svn: 345163
* [LegalizeDAG] ExpandLegalINT_TO_FP - cleanup UINT_TO_FP i64 -> f32 expansion.Simon Pilgrim2018-10-241-11/+12
| | | | | | | | Use SrcVT/DestVT types and correct shift type. Part of prep work for D52965 llvm-svn: 345158
* [DEBUGINFO, NVPTX] Try to pack bytes data into a single string.Alexey Bataev2018-10-241-2/+1
| | | | | | | | | | | | | | | | | | | | Summary: If the target does not support `.asciz` and `.ascii` directives, the strings are represented as bytes and each byte is placed on the new line as a separate byte directive `.b8 <data>`. NVPTX target allows to represent the vector of the data of the same type as a vector, where values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This allows to reduce the size of the final PTX file. Ptxas tool includes ptx files into the resulting binary object, so reducing the size of the PTX file is important. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45822 llvm-svn: 345142
* SelectionDAG: Reuse bigger sized constants in memset expansion.Matthias Braun2018-10-232-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102
* Fix typo in verifier error messageMatt Arsenault2018-10-231-1/+1
| | | | llvm-svn: 345083
* CGP: Clear data structures at the end of a loop iteration instead of the ↵Peter Collingbourne2018-10-231-5/+5
| | | | | | | | | | | | beginning. Clearing LargeOffsetGEPMap at the end fixes a bug where if a large offset GEP is in a dead basic block, we fail an assertion when trying to delete the block due to the asserting VH in LargeOffsetGEPMap. Differential Revision: https://reviews.llvm.org/D53464 llvm-svn: 345082
* [LegalizeDAG] Share Vector/Scalar CTPOP ExpansionSimon Pilgrim2018-10-232-58/+51
| | | | | | | | As suggested on D53258, this patch move the CTPOP expansion code from SelectionDAGLegalize to TargetLowering to allow it to be reused by the VectorLegalizer. Proper vector support will be added by D53258. llvm-svn: 345066
* [LegalizeDAG] Share Vector/Scalar CTLZ ExpansionSimon Pilgrim2018-10-233-49/+62
| | | | | | | | As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering. Extension to D53474 llvm-svn: 345060
* [SelectionDAG] use 'match' to simplify code; NFCSanjay Patel2018-10-231-8/+8
| | | | | | | | | Vector types are not possible here because this code only starts matching from the scalar bool value of a conditional branch, but this is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345041
* [LegalizeDAG] Remove unused variableBenjamin Kramer2018-10-231-2/+0
| | | | llvm-svn: 345040
* [LegalizeDAG] Share Vector/Scalar CTTZ ExpansionSimon Pilgrim2018-10-233-47/+64
| | | | | | | | | | As suggested on D53258, this patch demonstrates sharing common CTTZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering. I intend to move CTLZ and (scalar) CTPOP over as well and then update D53258 accordingly. Differential Revision: https://reviews.llvm.org/D53474 llvm-svn: 345039
* Revert "[MachinePipeliner] Split MachinePipeliner code into header and cpp ↵Aleksandr Urakov2018-10-231-7/+598
| | | | | | | | | files" This reverts commit 40760b733d9eef841c897338af5e9d81b12551bf. It seems that the commit is a cuse of the build failure. llvm-svn: 345032
* [MachinePipeliner] Split MachinePipeliner code into header and cpp filesLama Saba2018-10-231-598/+7
| | | | | | | | Split MachinePipeliner code into header and cpp files to allow inheritance from SwingSchedulerDAG Differential Revision: https://reviews.llvm.org/D53477 llvm-svn: 345008
* [Intrinsic] Unigned Saturation Addition IntrinsicLeonard Chan2018-10-229-37/+62
| | | | | | | | | | | | Add an intrinsic that takes 2 integers and perform unsigned saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53340 llvm-svn: 344971
* Recommit r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
* [DWARF] Use a function-local offset for AT_call_return_pcVedant Kumar2018-10-225-9/+38
| | | | | | | | | | | | | | | | | | | Logs provided by @stella.stamenova indicate that on Linux, lldb adds a spurious slide offset to the return PC it loads from AT_call_return_pc attributes (see the list thread: "[PATCH] D50478: Add support for artificial tail call frames"). This patch side-steps the issue by getting rid of the load address calculation in lldb's CallEdge::GetReturnPCAddress. The idea is to have the DWARF writer emit function-local offsets to the instruction after a call. I.e. return-pc = label-after-call-insn - function-entry. LLDB can simply add this offset to the base address of a function to get the return PC. Differential Revision: https://reviews.llvm.org/D53469 llvm-svn: 344960
* Reapply "[MachineCopyPropagation] Reimplement CopyTracker in terms of ↵Justin Bogner2018-10-221-58/+69
| | | | | | | | | | | | | | | | | | | register units" Recommits r342942, which was reverted in r343189, with a fix for an issue where we would propagate unsafely if we defined only the upper part of a register. Original message: Change the copy tracker to keep a single map of register units instead of 3 maps of registers. This gives a very significant compile time performance improvement to the pass. I measured a 30-40% decrease in time spent in MCP on x86 and AArch64 and much more significant improvements on out of tree targets with more registers. llvm-svn: 344942
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-228-3/+89
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)Sanjay Patel2018-10-211-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872
* DebugInfo: Use base address specifiers more aggressivelyDavid Blaikie2018-10-201-1/+1
| | | | | | | | | | Using a base address specifier even for a single-element range is a size win for object files (7 words versus 8 words - more significant savings if the debug info is compressed (since it's 3 words of uncompressable reloc + 4 compressable words compared to 6 uncompressable reloc + 2 compressable words) - does trade off executable size increase though. llvm-svn: 344841
* DebugInfo: Use DW_OP_addrx in DWARFv5David Blaikie2018-10-201-4/+11
| | | | | | Reuse addresses in the address pool, even in non-split cases. llvm-svn: 344838
* DebugInfo: Implement debug_rnglists.dwoDavid Blaikie2018-10-202-3/+29
| | | | | | | Save space/relocations in .o files by keeping dwo ranges in the dwo file rather than the .o file. llvm-svn: 344837
* DebugInfo: Use address pool forms in debug_rnglistsDavid Blaikie2018-10-208-107/+123
| | | | | | Save no relocations by reusing addresses from the address pool. llvm-svn: 344836
* DebugInfo: Use debug_addr for non-dwo addresses in DWARF 5David Blaikie2018-10-205-14/+22
| | | | | | | | Putting addresses in the address pool, even with non-fission, can reduce relocations - reusing the addresses from debug_info and debug_rnglists (the latter coming soon) llvm-svn: 344834
* [MachineCSE][GlobalISel] Making sure MachineCSE works mid-GlobalISel (again)Roman Tereshin2018-10-202-34/+38
| | | | | | | | | | | | | | | | | | | | | | | | Change of approach, it looks like it's a much better idea to deal with the vregs that have LLTs and reg classes both properly, than trying to avoid creating those across all GlobalISel passes and all targets. The change mostly touches MachineRegisterInfo::constrainRegClass, which is apparently only used by MachineCSE. The changes are NFC for any pipeline but one that contains MachineCSE mid-GlobalISel. NOTE on isCallerPreservedOrConstPhysReg change in MachineCSE: There is no test covering it as the only way to insert a new pass (MachineCSE) from a command line I know of is llc's -run-pass option, which only works with MIR, but MIRParser freezes reserved registers upon MachineFunctions creation, making it impossible to reproduce the state that exposes the issue. Reviwed By: aditya_nandakumar Differential Revision: https://reviews.llvm.org/D53144 llvm-svn: 344822
* [GISel]: Allow PHIs to be DCEdAditya Nandakumar2018-10-191-1/+1
| | | | | | | | | | | https://reviews.llvm.org/D53304 Currently dead phis are not cleaned up during DCE. This patch allows dead PHI and G_PHI insts to be deleted. Reviewed by: dsanders llvm-svn: 344811
* Fix a use-after-RAUW bug in large GEP splittingKrzysztof Pszeniczny2018-10-191-3/+14
| | | | | | | | | | | | | | | | | | | Summary: Large GEP splitting, introduced in rL332015, uses a `DenseMap<AssertingVH<Value>, ...>`. This causes an assertion to fail (in debug builds) or undefined behaviour to occur (in release builds) when a value is RAUWed. This manifested itself in the 7zip benchmark from the llvm test suite built on ARM with `-fstrict-vtable-pointers` enabled while RAUWing invariant group launders and splits in CodeGenPrepare. This patch merges the large offsets of the argument and the result of an invariant.group strip/launder intrinsic before RAUWing. Reviewers: Prazek, javed.absar, haicheng, efriedma Reviewed By: Prazek, efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51936 llvm-svn: 344802
* Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFCFangrui Song2018-10-191-4/+3
| | | | llvm-svn: 344774
* [Pipeliner] copyToPhi DAG Mutation to improve scheduling.Sumanth Gundapaneni2018-10-181-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | In a loop, create artificial dependences between the source of a COPY/REG_SEQUENCE to the use in next iteration. Eg: SRC ----Data Dep--> COPY COPY ---Anti Dep--> PHI (implies, to be used in next iteration) PHI ----Data Dep--> USE This patches creates USE ----Artificial Dep---> SRC This will effectively schedule the COPY late to eliminate additional copies. Before this patch, the schedule can be SRC, COPY, USE : The COPY is used in next iteration and it needs to be preserved. After this patch, the schedule can be USE, SRC, COPY : The COPY is used in next iteration and the live interval is reduced. Differential Revision: https://reviews.llvm.org/D53303 llvm-svn: 344748
* Revert "[WebAssembly] LSDA info generation"Krasimir Georgiev2018-10-1611-234/+58
| | | | | | | | This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639
* [Intrinsic] Signed Saturation Addition IntrinsicLeonard Chan2018-10-169-0/+105
| | | | | | | | | | | Add an intrinsic that takes 2 integers and perform saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53053 llvm-svn: 344629
* [LegalizeDAG] ExpandLegalINT_TO_FP - cleanup UINT_TO_FP i64 -> f64 expansion.Simon Pilgrim2018-10-161-20/+17
| | | | | | | | Use SrcVT/DestVT types, correct shift type and AND instead of ZERO_EXTEND_IN_REG. Part of prep work for D52965 llvm-svn: 344602
* [WebAssembly] LSDA info generationHeejin Ahn2018-10-1611-58/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575
* [SelectionDAG] allow FP binops in SimplifyDemandedVectorEltsSanjay Patel2018-10-151-1/+6
| | | | | | | | | | | | This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in: rL343727 ...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553. Differential Revision: https://reviews.llvm.org/D52912 llvm-svn: 344541
* [DAGCombiner] allow undef elts in vector fmul matchingSanjay Patel2018-10-151-1/+1
| | | | llvm-svn: 344534
* [DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCISanjay Patel2018-10-151-16/+18
| | | | | | The transform doesn't work if the vector constant has undef elements. llvm-svn: 344532
* [DAGCombiner] allow undef elts in vector fma matchingSanjay Patel2018-10-151-21/+22
| | | | llvm-svn: 344528
* [DAGCombiner] allow undef elts in vector fma matchingSanjay Patel2018-10-151-9/+10
| | | | llvm-svn: 344525
* [TI removal] Make variables declared as `TerminatorInst` and initializedChandler Carruth2018-10-155-6/+6
| | | | | | | | | | | | | by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
* [TwoAddressInstructionPass] Replace subregister uses when processing tied ↵Bjorn Pettersson2018-10-151-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operands Summary: TwoAddressInstruction pass typically rewrites %1:short = foo %0.sub_lo:long as %1:short = COPY %0.sub_lo:long %1:short = foo %1:short when having tied operands. If there are extra un-tied operands that uses the same reg and subreg, such as the second and third inputs to fie here: %1:short = fie %0.sub_lo:long, %0.sub_hi:long, %0.sub_lo:long then there was a bug which replaced the register %0 also for the un-tied operand, but without changing the subregister indices. So we used to get: %1:short = COPY %0.sub_lo:long %1:short = fie %1, %1.sub_hi:short, %1.sub_lo:short With this fix we instead get: %1:short = COPY %0.sub_lo:long %1:short = fie %1, %0.sub_hi:long, %1 Reviewers: arsenm, JesperAntonsson, kparzysz, MatzeB Reviewed By: MatzeB Subscribers: bjope, kparzysz, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D36224 llvm-svn: 344492
* [LegalizeDAG] Don't bother with final MUL+SRL stage for byte CTPOP. Simon Pilgrim2018-10-141-3/+4
| | | | | | | | The final stage of CTPOP expansion (v = (v * 0x01010101...) >> (Len - 8)) is completely pointless for the byte (Len = 8) case as it reduces to (v = (v * 0x01...) >> 0), but annoyingly this doesn't always get optimized away. Found while investigating generic vector CTPOP expansion (PR32655). llvm-svn: 344477
* Pull out repeated variables from SelectionDAGLegalize::ExpandBitCount.Simon Pilgrim2018-10-131-8/+2
| | | | | | The CTPOP case has been changed from VT.getSizeInBits to VT.getScalarSizeInBits - but this fits in with future work for vector support (PR32655) and doesn't affect any current (scalar) uses. llvm-svn: 344461
* [LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and ↵Craig Topper2018-10-131-8/+20
| | | | | | | | | | | | | | | | | | | | | PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy Summary: getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant. This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use. Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets. Reviewers: eli.friedman, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53232 llvm-svn: 344460
* [X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG ↵Simon Pilgrim2018-10-131-2/+2
| | | | | | | | | | instead. There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM. I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen. llvm-svn: 344457
* [X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG ↵Simon Pilgrim2018-10-132-4/+16
| | | | | | | | instead. Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. llvm-svn: 344453
* [Intrinsic] Add llvm.minimum and llvm.maximum instrinsic functionsThomas Lively2018-10-131-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | Summary: These new intrinsics have the semantics of the `minimum` and `maximum` operations specified by the latest draft of IEEE 754-2018. Unlike llvm.minnum and llvm.maxnum, these new intrinsics propagate NaNs and always treat -0.0 as less than 0.0. `minimum` and `maximum` lower directly to the existing `fminnan` and `fmaxnan` ISel DAG nodes. It is safe to reuse these DAG nodes because before this patch were only emitted in situations where there were known to be no NaN arguments or where NaN propagation was correct and there were known to be no zero arguments. I know of only four backends that lower fminnan and fmaxnan: WebAssembly, ARM, AArch64, and SystemZ, and each of these lowers fminnan and fmaxnan to instructions that are compatible with the IEEE 754-2018 semantics. Reviewers: aheejin, dschuff, sunfish, javed.absar Subscribers: kristof.beyls, dexonsmith, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D52764 llvm-svn: 344437
* [LegalizeVectorTypes] Use TLI.getVectorIdxTy instead of DAG.getIntPtrConstant.Craig Topper2018-10-121-2/+3
| | | | | | There's no guarantee that vector indices should use pointer types. So use the correct query method. llvm-svn: 344428
* [LegalizeVectorTypes] When widening the result of a bitcast from a scalar ↵Craig Topper2018-10-121-14/+12
| | | | | | | | | | type, use a scalar_to_vector to turn the scalar into a vector intead of a build vector full of mostly undefs. This is more consistent with what we usually do and matches some code X86 custom emits in some cases that I think I can cleanup. The MIPS test change just looks to be an instruction ordering change. llvm-svn: 344422
* Revert BTF commit series.Eli Friedman2018-10-127-662/+3
| | | | | | | | | | The initial patch was not reviewed, and does not have any tests; it should not have been merged. This reverts 344395, 344390, 344387, 344385, 344381, 344376, and 344366. llvm-svn: 344405
OpenPOWER on IntegriCloud