summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] fix miscompile in translating (X & undef) to shuffleSanjay Patel2020-01-031-1/+3
| | | | | See PR42982 for more context: https://bugs.llvm.org/show_bug.cgi?id=42982
* [LegalizeVectorOps] Pass the post-UpdateNodeOperands version of Op to ↵Craig Topper2020-01-031-11/+14
| | | | | | | | | | ExpandLoad/ExpandStore UpdateNodeOperands might CSE to another existing node. So we should make sure we're legalizing that node otherwise we might fail to hook up the operands properly. I've moved the result registration up to the caller to avoid having to pass both Result and Op into the functions where it might be confusing which is which. This address 2 other issues pointed out in D71861. Differential Revision: https://reviews.llvm.org/D72021
* [X86] Improve for v2i32->v2f64 uint_to_fpCraig Topper2020-01-031-36/+14
| | | | | | | | | | | | | | This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945
* Move tail call disabling code to target independent codeReid Kleckner2020-01-0310-43/+34
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* [NFC][InstCombine] '(Op1 & С) - Op1' -> '-(Op1 & ~C)' fold (PR44427)Roman Lebedev2020-01-031-0/+9
| | | | | | | | | | | | | | | | | | | This decreases use count of Op1, potentially allows us to further hoist said 'neg' later on, and results in marginally better X86 codegen. Name: (Op1 & С) - Op1 -> -(Op1 & ~C) %o = and i64 %Op1, C1 %r = sub i64 %o, %Op1 => %n = and i64 %Op1, ~C1 %r = sub i64 0, %n https://rise4fun.com/Alive/rwgA https://godbolt.org/z/R_RMfM https://bugs.llvm.org/show_bug.cgi?id=44427
* [DWARF] Don't assume optional always has a value.Jonas Devlieghere2020-01-031-1/+4
| | | | | | | | When getting the file name form the line table prologue we assume that a valid string form value can always be extracted as a string. If you look at the implementation of DWARFormValue this is not necessarily true. I hit this assertion from LLDB when I create a "dummy" DWARFContext that was missing the string section.
* [NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448)Roman Lebedev2020-01-031-0/+10
| | | | | | | | | | | | | | | | | | | | | Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448) %negy = sub i8 0, %y %unbiasedx = and i8 %negy, %x %r = sub i8 %unbiasedx, %x => %ymask = add i8 %y, -1 %xmasked = and i8 %ymask, %x %r = sub i8 0, %xmasked https://rise4fun.com/Alive/OIpla This decreases use count of %x, may allow us to later hoist said negation even further, and results in marginally nicer X86 codegen. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [Attributor][FIX] Allow dead users of rewritten functionJohannes Doerfert2020-01-031-13/+15
| | | | | | | If we replace a function with a new one because we rewrite the signature, dead users may still refer to the old version. With this patch we reuse the code that deals with dead functions, which the old versions are, to avoid problems.
* [Attributor][NFC] Unify the way we delete dead functionsJohannes Doerfert2020-01-031-13/+14
|
* [Attributor][FIX] Don't crash on ptr2int/int2ptr instructionsJohannes Doerfert2020-01-031-1/+2
| | | | | An integer isn't allowed in getAlignmentForValue so we need to stop at a ptr2int instruction during exploration.
* [Attributor][FIX] Do not derive nonnull and dereferenceable w/o accessJohannes Doerfert2020-01-031-14/+0
| | | | | | | An inbounds GEP results in poison if the value is not "inbounds", not in UB. We accidentally derived nonnull and dereferenceable from these inbounds GEPs even in the absence of accesses that would make the poison to UB.
* [Attributor][FIX] Return CHANGED once a pessimistic fixpoint is reached.Johannes Doerfert2020-01-031-1/+2
|
* AMDGPU/GlobalISel: Fix off by one in operand indexMatt Arsenault2020-01-031-4/+4
| | | | This should be looking at the RHS of the add for a constant.
* [DAGCombiner][X86][AArch64] Generalize `A-(A&B)`->`A&(~B)` fold (PR44448)Roman Lebedev2020-01-031-20/+9
| | | | | | | | | | | | | | | | | | | | | | | The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)' Even if we don't manage to fold `~` into B, we have likely formed `ANDN` node. Also, this way there's less similar-but-duplicate folds. Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [DAGCombiner] `~(add X, -1)` -> `neg X` foldRoman Lebedev2020-01-031-0/+7
| | | | | | | | | | | | | | | The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)', but we currently fail to sink that '~' into `(B - 1)`. Name: ~(X - 1) -> (0 - X) %o = add i32 %X, -1 %r = xor i32 %o, -1 => %r = sub i32 0, %X https://rise4fun.com/Alive/rjU
* [DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold ↵Roman Lebedev2020-01-031-0/+10
| | | | | | | | | | | | | | | | | | | | | (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [NFC][DAGCombine] Clarify comment for 'A - (A & (B - 1))' foldRoman Lebedev2020-01-031-1/+1
|
* Fix for a dangling point bug in DeadStoreElimination passAnkit2020-01-031-17/+39
| | | | | | | | | | | | | | The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction. While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration. In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container. Reviewers: fhahn, bcahoon, efriedma Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65326
* [InstCombine] replace undef elements in vector constant when doing icmp ↵Sanjay Patel2020-01-031-0/+17
| | | | | | | | | | | | | | folds (PR44383) As shown in P44383: https://bugs.llvm.org/show_bug.cgi?id=44383 ...we can't safely propagate a vector constant through this icmp fold if that vector constant contains undefined elements. We know that each defined element of the constant is safe though, so find the first of those and replicate it into the formerly undef lanes. Differential Revision: https://reviews.llvm.org/D72101
* Fix typo "psuedo" in commentsJay Foad2020-01-035-6/+6
|
* [DebugInfo] Remove redundant checks for past-the-end of prologueJames Henderson2020-01-031-24/+0
| | | | | | | | | | | | | | The V5 directory and filename tables had checks in to make sure we hadn't read past the end of the line table prologue. Since previous changes to the data extractor class ensure we never read past the end, these checks are now redundant, so this patch removes them. There is still a check to show that the whole prologue remains within the prologue length. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D71768
* [DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448)Roman Lebedev2020-01-031-0/+15
| | | | | | | | | | | | | | | | | | | | | | | While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [ARM][NFC] Move tail predication checksSam Parker2020-01-031-69/+76
| | | | | Extract the tail predication validation checks out into their own LowOverHeadLoop method.
* [X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵Craig Topper2020-01-034-10/+10
| | | | | | | | | | | property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
* [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB ↵Craig Topper2020-01-021-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968
* [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG ↵QingShan Zhang2020-01-035-0/+24
| | | | | | | | | | | for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000
* [X86] Enable strict FP by default and remove option ↵Wang, Pengfei2020-01-031-0/+3
| | | | -disable-strictnode-mutation. NFCI.
* Revert "[Attributor] AAValueConstantRange: Value range analysis using ↵Hideto Ueno2020-01-031-499/+7
| | | | | | constant range" This reverts commit e9963034314edf49a12ea5e29f694d8f9f52734a.
* [PowerPC]: Fix predicate handling with SPEJustin Hibbits2020-01-021-9/+21
| | | | | SPE floating-point compare instructions only update the GT bit in the CR field. All predicates must therefore be reduced to GT/LE.
* [X86] Optimization of inserting vxi1 sub vector into vXi1 vectorWang, Pengfei2020-01-031-2/+20
| | | | | | | | | | | | | | | | | Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
* [PowerPC][AIX] Enable sret arguments.Sean Fertile2020-01-021-3/+0
| | | | | | Removes the fatal error for sret arguments and adds lit testing. Differential Revision: https://reviews.llvm.org/D71504
* [PDB] Print the most redundant type record indices with /summaryReid Kleckner2020-01-021-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I used this information to motivate splitting up the Intrinsic::ID enum (5d986953c8b917bacfaa1f800fc1e242559f76be) and adding a key method to clang::Sema (586f65d31f32ca6bc8cfdb8a4f61bee5057bf6c8) which saved a fair amount of object file size. Example output for clang.pdb: Top 10 types responsible for the most TPI input bytes: index total bytes count size 0x3890: 8,671,220 = 1,805 * 4,804 0xE13BE: 5,634,720 = 252 * 22,360 0x6874C: 5,181,600 = 408 * 12,700 0x2A1F: 4,520,528 = 1,574 * 2,872 0x64BFF: 4,024,020 = 469 * 8,580 0x1123: 4,012,020 = 2,157 * 1,860 0x6952: 3,753,792 = 912 * 4,116 0xC16F: 3,630,888 = 633 * 5,736 0x69DD: 3,601,160 = 985 * 3,656 0x678D: 3,577,904 = 319 * 11,216 In this case, we can see that record 0x3890 is responsible for ~8MB of total object file size for objects in clang. The user can then use llvm-pdbutil to find out what the record is: $ llvm-pdbutil dump -types -type-index 0x3890 Types (TPI Stream) ============================================================ Showing 1 records. 0x3890 | LF_FIELDLIST [size = 4804] - LF_STMEMBER [name = `WORDTYPE_MAX`, type = 0x1001, attrs = public] - LF_MEMBER [name = `U`, Type = 0x37F0, offset = 0, attrs = private] - LF_MEMBER [name = `BitWidth`, Type = 0x0075 (unsigned), offset = 8, attrs = private] - LF_METHOD [name = `APInt`, # overloads = 8, overload list = 0x3805] ... In this case, we can see that these are members of the APInt class, which is emitted in 1805 object files. The next largest type is ASTContext: $ llvm-pdbutil dump -types -type-index 0xE13BE bin/clang.pdb 0xE13BE | LF_FIELDLIST [size = 22360] - LF_BCLASS type = 0x653EA, offset = 0, attrs = public - LF_MEMBER [name = `Types`, Type = 0x653EB, offset = 8, attrs = private] - LF_MEMBER [name = `ExtQualNodes`, Type = 0x653EC, offset = 24, attrs = private] - LF_MEMBER [name = `ComplexTypes`, Type = 0x653ED, offset = 48, attrs = private] - LF_MEMBER [name = `PointerTypes`, Type = 0x653EE, offset = 72, attrs = private] ... ASTContext only appears 252 times, but the list of members is long, and must be repeated everywhere it is used. This was the output before I split Intrinsic::ID: Top 10 types responsible for the most TPI input: 0x686C: 69,823,920 = 1,070 * 65,256 0x686D: 69,819,640 = 1,070 * 65,252 0x686E: 69,819,640 = 1,070 * 65,252 0x686B: 16,371,000 = 1,070 * 15,300 ... These records were all lists of intrinsic enums. Reviewers: MaskRay, ruiu Subscribers: mgrang, zturner, thakis, hans, akhuang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71437
* AMDGPU/GlobalISel: Remove manual G_FENCE selectionMatt Arsenault2020-01-021-5/+0
| | | | | The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.
* DAG: Use TargetConstant for FENCE operandsMatt Arsenault2020-01-028-20/+20
|
* [SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes.Jonas Paulsson2020-01-021-1/+1
| | | | | | | For consistency with GCC, the target label is moved to the brcl itself instead of the next instruction. Review: Ulrich Weigand
* [X86] Move STRICT_ ISD nodes into the new section of X86ISelLowering.h where ↵Craig Topper2020-01-021-4/+17
| | | | STRICT nodes are collected after D71841
* [PowerPC] Only legalize FNEARBYINT with unsafe fp mathNemanja Ivanovic2020-01-021-2/+7
| | | | | | | Commit 0f0330a78709 legalized these nodes on PPC without consideration of unsafe math which means that we get inexact exceptions raised for nearbyint. Since this doesn't conform to the standard, switch this legalization to depend on unsafe fp math.
* X86: remove unused variableSaleem Abdulrasool2020-01-021-1/+0
| | | | | Remove the now unused-variable from aa17d31edb00c66461093b5a7cd2f4a35dc143e9. This breaks `-Werror` builds.
* build: reduce CMake handling for zlibSaleem Abdulrasool2020-01-023-6/+6
| | | | | | | | | | | | | Rather than handling zlib handling manually, use `find_package` from CMake to find zlib properly. Use this to normalize the `LLVM_ENABLE_ZLIB`, `HAVE_ZLIB`, `HAVE_ZLIB_H`. Furthermore, require zlib if `LLVM_ENABLE_ZLIB` is set to `YES`, which requires the distributor to explicitly select whether zlib is enabled or not. This simplifies the CMake handling and usage in the rest of the tooling. This restores 68a235d07f9e7049c7eb0c8091f37e385327ac28, e6c7ed6d2164a0659fd9f6ee44f1375d301e3cad. The problem with the windows bot is a need for clearing the cache.
* [X86] Remove FP0-6 operands from call instructions in FPStackifier pass. ↵Craig Topper2020-01-021-9/+11
| | | | | | | | | | | | | | | Only count defs as returns. All FP0-6 operands should be removed by the FP stackifier. By removing these we fix the machine verifier error in PR39437. I've also made it so that only defs are counted for STReturns which removes what I think were extra stack cleanup instructions. And I've removed the regcall assert because it was checking the attributes of the caller, but here we're concerned with the attributes of the callee. But I don't know how to get that information from this level.
* [DebugInfo][NFC] Use function_ref consistently in debug line parsingJames Henderson2020-01-022-3/+3
| | | | | | | | | | This patch fixes an inconsistency where we were using std::function in some places and function_ref in others to pass around the error handling callback. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D71762
* [SelectionDAG] Simplify SelectionDAGBuilder::visitInlineAsmFangrui Song2020-01-021-3/+1
|
* Revert "build: reduce CMake handling for zlib"James Henderson2020-01-023-4/+7
| | | | | | | This reverts commit 68a235d07f9e7049c7eb0c8091f37e385327ac28. This commit broke the clang-x64-windows-msvc build bot and a follow-up commit did not fix it. Reverting to fix the bot.
* Revert "build: make `LLVM_ENABLE_ZLIB` a tri-bool for users"James Henderson2020-01-021-4/+1
| | | | | | | This reverts commit e6c7ed6d2164a0659fd9f6ee44f1375d301e3cad. This commit was an attempt to fix the build bots, but it still left the clang-x64-windows-msvc bot in a broken state.
* [FPEnv] Default NoFPExcept SDNodeFlag to falseUlrich Weigand2020-01-027-26/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841
* [InstCombine] remove uses before deleting instructions (PR43723)Sanjay Patel2020-01-021-3/+4
| | | | | | | | | | | | | | | | | | | | | This is a less ambitious alternative to previous attempts to fix this bug with: rG56b2aee1875a rGef02831f0a4e rG56b2aee1875a ...because those all failed bot testing with use-after-free or other problems. The original crashing/assert problem is still showing up on various fuzzers, so I've added a new minimal test based on another one of those failures. Instead of trying to manage and coordinate the logic in isAllocSiteRemovable() with the deletion loops, just loosen the existing code that handles casts and GEP by replacing with undef to allow other opcodes. That means that no instructions with uses should assert on deletion, and there are hopefully no non-obvious sanitizer bugs induced.
* Remove unneeded extra variable realArgIdx. NFC.Jay Foad2020-01-022-10/+8
|
* [NFC] Add explicit instantiation to releaseNodeQiu Chaofan2020-01-021-0/+5
| | | | | | Resolve a build failure about undefined symbols introduced by f9f78cf. Differential Revision: https://reviews.llvm.org/D72069
* [AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32Andrzej Warzynski2020-01-021-14/+21
| | | | | | | | | | | | | | | | | | | Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724
* [NFC] Make the type of X86AlignBranchBoundary compatibleShengchen Kan2020-01-021-1/+1
| | | | | | Change the type of X86AlignBranchBoundary from cl::opt<uint64_t> to cl::opt<unsigned> since the template class cl::opt is only instantiated with type unsigned, int, std::string, char and bool.
OpenPOWER on IntegriCloud