summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner][X86][AArch64] Generalize `A-(A&B)`->`A&(~B)` fold (PR44448)Roman Lebedev2020-01-031-20/+9
| | | | | | | | | | | | | | | | | | | | | | | The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)' Even if we don't manage to fold `~` into B, we have likely formed `ANDN` node. Also, this way there's less similar-but-duplicate folds. Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [DAGCombiner] `~(add X, -1)` -> `neg X` foldRoman Lebedev2020-01-031-0/+7
| | | | | | | | | | | | | | | The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)', but we currently fail to sink that '~' into `(B - 1)`. Name: ~(X - 1) -> (0 - X) %o = add i32 %X, -1 %r = xor i32 %o, -1 => %r = sub i32 0, %X https://rise4fun.com/Alive/rjU
* [DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold ↵Roman Lebedev2020-01-031-0/+10
| | | | | | | | | | | | | | | | | | | | | (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [NFC][DAGCombine] Clarify comment for 'A - (A & (B - 1))' foldRoman Lebedev2020-01-031-1/+1
|
* Fix for a dangling point bug in DeadStoreElimination passAnkit2020-01-031-17/+39
| | | | | | | | | | | | | | The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction. While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration. In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container. Reviewers: fhahn, bcahoon, efriedma Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65326
* [InstCombine] replace undef elements in vector constant when doing icmp ↵Sanjay Patel2020-01-031-0/+17
| | | | | | | | | | | | | | folds (PR44383) As shown in P44383: https://bugs.llvm.org/show_bug.cgi?id=44383 ...we can't safely propagate a vector constant through this icmp fold if that vector constant contains undefined elements. We know that each defined element of the constant is safe though, so find the first of those and replicate it into the formerly undef lanes. Differential Revision: https://reviews.llvm.org/D72101
* Fix typo "psuedo" in commentsJay Foad2020-01-035-6/+6
|
* [DebugInfo] Remove redundant checks for past-the-end of prologueJames Henderson2020-01-031-24/+0
| | | | | | | | | | | | | | The V5 directory and filename tables had checks in to make sure we hadn't read past the end of the line table prologue. Since previous changes to the data extractor class ensure we never read past the end, these checks are now redundant, so this patch removes them. There is still a check to show that the whole prologue remains within the prologue length. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D71768
* [DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448)Roman Lebedev2020-01-031-0/+15
| | | | | | | | | | | | | | | | | | | | | | | While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
* [ARM][NFC] Move tail predication checksSam Parker2020-01-031-69/+76
| | | | | Extract the tail predication validation checks out into their own LowOverHeadLoop method.
* [X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵Craig Topper2020-01-034-10/+10
| | | | | | | | | | | property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
* [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB ↵Craig Topper2020-01-021-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968
* [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG ↵QingShan Zhang2020-01-035-0/+24
| | | | | | | | | | | for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000
* [X86] Enable strict FP by default and remove option ↵Wang, Pengfei2020-01-031-0/+3
| | | | -disable-strictnode-mutation. NFCI.
* Revert "[Attributor] AAValueConstantRange: Value range analysis using ↵Hideto Ueno2020-01-031-499/+7
| | | | | | constant range" This reverts commit e9963034314edf49a12ea5e29f694d8f9f52734a.
* [PowerPC]: Fix predicate handling with SPEJustin Hibbits2020-01-021-9/+21
| | | | | SPE floating-point compare instructions only update the GT bit in the CR field. All predicates must therefore be reduced to GT/LE.
* [X86] Optimization of inserting vxi1 sub vector into vXi1 vectorWang, Pengfei2020-01-031-2/+20
| | | | | | | | | | | | | | | | | Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
* [PowerPC][AIX] Enable sret arguments.Sean Fertile2020-01-021-3/+0
| | | | | | Removes the fatal error for sret arguments and adds lit testing. Differential Revision: https://reviews.llvm.org/D71504
* [PDB] Print the most redundant type record indices with /summaryReid Kleckner2020-01-021-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I used this information to motivate splitting up the Intrinsic::ID enum (5d986953c8b917bacfaa1f800fc1e242559f76be) and adding a key method to clang::Sema (586f65d31f32ca6bc8cfdb8a4f61bee5057bf6c8) which saved a fair amount of object file size. Example output for clang.pdb: Top 10 types responsible for the most TPI input bytes: index total bytes count size 0x3890: 8,671,220 = 1,805 * 4,804 0xE13BE: 5,634,720 = 252 * 22,360 0x6874C: 5,181,600 = 408 * 12,700 0x2A1F: 4,520,528 = 1,574 * 2,872 0x64BFF: 4,024,020 = 469 * 8,580 0x1123: 4,012,020 = 2,157 * 1,860 0x6952: 3,753,792 = 912 * 4,116 0xC16F: 3,630,888 = 633 * 5,736 0x69DD: 3,601,160 = 985 * 3,656 0x678D: 3,577,904 = 319 * 11,216 In this case, we can see that record 0x3890 is responsible for ~8MB of total object file size for objects in clang. The user can then use llvm-pdbutil to find out what the record is: $ llvm-pdbutil dump -types -type-index 0x3890 Types (TPI Stream) ============================================================ Showing 1 records. 0x3890 | LF_FIELDLIST [size = 4804] - LF_STMEMBER [name = `WORDTYPE_MAX`, type = 0x1001, attrs = public] - LF_MEMBER [name = `U`, Type = 0x37F0, offset = 0, attrs = private] - LF_MEMBER [name = `BitWidth`, Type = 0x0075 (unsigned), offset = 8, attrs = private] - LF_METHOD [name = `APInt`, # overloads = 8, overload list = 0x3805] ... In this case, we can see that these are members of the APInt class, which is emitted in 1805 object files. The next largest type is ASTContext: $ llvm-pdbutil dump -types -type-index 0xE13BE bin/clang.pdb 0xE13BE | LF_FIELDLIST [size = 22360] - LF_BCLASS type = 0x653EA, offset = 0, attrs = public - LF_MEMBER [name = `Types`, Type = 0x653EB, offset = 8, attrs = private] - LF_MEMBER [name = `ExtQualNodes`, Type = 0x653EC, offset = 24, attrs = private] - LF_MEMBER [name = `ComplexTypes`, Type = 0x653ED, offset = 48, attrs = private] - LF_MEMBER [name = `PointerTypes`, Type = 0x653EE, offset = 72, attrs = private] ... ASTContext only appears 252 times, but the list of members is long, and must be repeated everywhere it is used. This was the output before I split Intrinsic::ID: Top 10 types responsible for the most TPI input: 0x686C: 69,823,920 = 1,070 * 65,256 0x686D: 69,819,640 = 1,070 * 65,252 0x686E: 69,819,640 = 1,070 * 65,252 0x686B: 16,371,000 = 1,070 * 15,300 ... These records were all lists of intrinsic enums. Reviewers: MaskRay, ruiu Subscribers: mgrang, zturner, thakis, hans, akhuang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71437
* AMDGPU/GlobalISel: Remove manual G_FENCE selectionMatt Arsenault2020-01-021-5/+0
| | | | | The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.
* DAG: Use TargetConstant for FENCE operandsMatt Arsenault2020-01-028-20/+20
|
* [SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes.Jonas Paulsson2020-01-021-1/+1
| | | | | | | For consistency with GCC, the target label is moved to the brcl itself instead of the next instruction. Review: Ulrich Weigand
* [X86] Move STRICT_ ISD nodes into the new section of X86ISelLowering.h where ↵Craig Topper2020-01-021-4/+17
| | | | STRICT nodes are collected after D71841
* [PowerPC] Only legalize FNEARBYINT with unsafe fp mathNemanja Ivanovic2020-01-021-2/+7
| | | | | | | Commit 0f0330a78709 legalized these nodes on PPC without consideration of unsafe math which means that we get inexact exceptions raised for nearbyint. Since this doesn't conform to the standard, switch this legalization to depend on unsafe fp math.
* X86: remove unused variableSaleem Abdulrasool2020-01-021-1/+0
| | | | | Remove the now unused-variable from aa17d31edb00c66461093b5a7cd2f4a35dc143e9. This breaks `-Werror` builds.
* build: reduce CMake handling for zlibSaleem Abdulrasool2020-01-023-6/+6
| | | | | | | | | | | | | Rather than handling zlib handling manually, use `find_package` from CMake to find zlib properly. Use this to normalize the `LLVM_ENABLE_ZLIB`, `HAVE_ZLIB`, `HAVE_ZLIB_H`. Furthermore, require zlib if `LLVM_ENABLE_ZLIB` is set to `YES`, which requires the distributor to explicitly select whether zlib is enabled or not. This simplifies the CMake handling and usage in the rest of the tooling. This restores 68a235d07f9e7049c7eb0c8091f37e385327ac28, e6c7ed6d2164a0659fd9f6ee44f1375d301e3cad. The problem with the windows bot is a need for clearing the cache.
* [X86] Remove FP0-6 operands from call instructions in FPStackifier pass. ↵Craig Topper2020-01-021-9/+11
| | | | | | | | | | | | | | | Only count defs as returns. All FP0-6 operands should be removed by the FP stackifier. By removing these we fix the machine verifier error in PR39437. I've also made it so that only defs are counted for STReturns which removes what I think were extra stack cleanup instructions. And I've removed the regcall assert because it was checking the attributes of the caller, but here we're concerned with the attributes of the callee. But I don't know how to get that information from this level.
* [DebugInfo][NFC] Use function_ref consistently in debug line parsingJames Henderson2020-01-022-3/+3
| | | | | | | | | | This patch fixes an inconsistency where we were using std::function in some places and function_ref in others to pass around the error handling callback. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D71762
* [SelectionDAG] Simplify SelectionDAGBuilder::visitInlineAsmFangrui Song2020-01-021-3/+1
|
* Revert "build: reduce CMake handling for zlib"James Henderson2020-01-023-4/+7
| | | | | | | This reverts commit 68a235d07f9e7049c7eb0c8091f37e385327ac28. This commit broke the clang-x64-windows-msvc build bot and a follow-up commit did not fix it. Reverting to fix the bot.
* Revert "build: make `LLVM_ENABLE_ZLIB` a tri-bool for users"James Henderson2020-01-021-4/+1
| | | | | | | This reverts commit e6c7ed6d2164a0659fd9f6ee44f1375d301e3cad. This commit was an attempt to fix the build bots, but it still left the clang-x64-windows-msvc bot in a broken state.
* [FPEnv] Default NoFPExcept SDNodeFlag to falseUlrich Weigand2020-01-027-26/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841
* [InstCombine] remove uses before deleting instructions (PR43723)Sanjay Patel2020-01-021-3/+4
| | | | | | | | | | | | | | | | | | | | | This is a less ambitious alternative to previous attempts to fix this bug with: rG56b2aee1875a rGef02831f0a4e rG56b2aee1875a ...because those all failed bot testing with use-after-free or other problems. The original crashing/assert problem is still showing up on various fuzzers, so I've added a new minimal test based on another one of those failures. Instead of trying to manage and coordinate the logic in isAllocSiteRemovable() with the deletion loops, just loosen the existing code that handles casts and GEP by replacing with undef to allow other opcodes. That means that no instructions with uses should assert on deletion, and there are hopefully no non-obvious sanitizer bugs induced.
* Remove unneeded extra variable realArgIdx. NFC.Jay Foad2020-01-022-10/+8
|
* [NFC] Add explicit instantiation to releaseNodeQiu Chaofan2020-01-021-0/+5
| | | | | | Resolve a build failure about undefined symbols introduced by f9f78cf. Differential Revision: https://reviews.llvm.org/D72069
* [AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32Andrzej Warzynski2020-01-021-14/+21
| | | | | | | | | | | | | | | | | | | Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724
* [NFC] Make the type of X86AlignBranchBoundary compatibleShengchen Kan2020-01-021-1/+1
| | | | | | Change the type of X86AlignBranchBoundary from cl::opt<uint64_t> to cl::opt<unsigned> since the template class cl::opt is only instantiated with type unsigned, int, std::string, char and bool.
* [Coroutines] const-ify internal helpers (NFC)Brian Gesiak2020-01-013-11/+13
| | | | | Several helpers internal to llvm/Transforms/Coroutines do not use 'const' for parameters that are not modified. Add const where possible.
* [RegisterClassInfo] Use SmallVector::assign instead of resize to make sure ↵Craig Topper2020-01-011-1/+1
| | | | | | | | | | | | we erase previous contents from all entries of the vector. resize only writes to elements that get added. Any elements that already existed maintain their previous value. In this case we're trying to erase cached information so we should use assign which will write to every element. Found while trying to add new tests to an existing X86 test and noticed register allocation changing in other functions.
* [Coroutines] Rename "legacy" passes (NFC)Brian Gesiak2020-01-016-51/+52
| | | | | | | | | | | | | | | | | | A series of patches beginning with https://reviews.llvm.org/D71898 propose to add an implementation of the coroutine passes to the new pass manager. As part of these changes, the coroutine passes that implement the legacy pass manager interface are renamed, to `<PassName>Legacy`. This mirrors similar changes that have been made to many other passes in LLVM as they've been transitioned to support both old and new pass managers. This commit splits out the renaming portion of that patch and commits it in advance as an NFC (no functional change intended) commit. It renames: * `CoroEarly` => `CoroEarlyLegacy` * `CoroSplit` => `CoroSplitLegacy` * `CoroElide` => `CoroElideLegacy` * `CoroCleanup` => `CoroCleanupLegacy`
* build: make `LLVM_ENABLE_ZLIB` a tri-bool for usersSaleem Abdulrasool2020-01-011-1/+4
| | | | | | Treat the flag `LLVM_ENABLE_ZLIB` as a tri-bool, `FORCE_ON` being `ON`, and `ON` being an auto-detect. This is needed as many of the builders enable the flag without having zlib available.
* build: reduce CMake handling for zlibSaleem Abdulrasool2020-01-013-7/+4
| | | | | | | | | Rather than handling zlib handling manually, use `find_package` from CMake to find zlib properly. Use this to normalize the `LLVM_ENABLE_ZLIB`, `HAVE_ZLIB`, `HAVE_ZLIB_H`. Furthermore, require zlib if `LLVM_ENABLE_ZLIB` is set to `YES`, which requires the distributor to explicitly select whether zlib is enabled or not. This simplifies the CMake handling and usage in the rest of the tooling.
* [polly][Support] Un-break polly testsAlexandre Ganea2020-01-011-1/+1
| | | | | | Previously, the polly unit tests were stuck in a infinite loop. There was an edge case in StringRef::count() introduced by 9f6b13e5cce96066d7262d224c971d93c2724795, where an empty 'Str' would cause the function to never exit. Also fixed usage in polly.
* [InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423)Nikita Popov2020-01-011-2/+11
| | | | | | | | This addresses https://bugs.llvm.org/show_bug.cgi?id=44423. If one of the GEPs is inbounds and the other is zero-index, we can also preserve inbounds. Differential Revision: https://reviews.llvm.org/D72060
* [InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425)Nikita Popov2020-01-011-0/+1
| | | | | | | | This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to drop inbounds if one of the GEPs is not inbounds. This was already done when creating a new GEP, but not when modifying in place. Differential Revision: https://reviews.llvm.org/D72059
* [MachineScheduler] improve reuse of 'releaseNode'methodLorenzo Casalino2020-01-011-17/+21
| | | | | | | | | | | | | | | | The 'SchedBoundary::releaseNode' is merely invoked for releasing the Top/Bottom root nodes. However, 'SchedBoundary::releasePending' uses its same logic to check if the Pending queue has any releasable SUnit. It is possible to slightly modify the body of the two, allowing re-use of the former ('releaseNode') in the latter. Patch by Lorenzo Casalino <lorenzo.casalino93@gmail.com> Reviewers: MatzeB, fhahn, atrick Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65506
* [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if ↵Craig Topper2020-01-011-24/+42
| | | | | | | | the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984
* [NFC] Fixes -Wrange-loop-analysis warningsMark de Wever2020-01-0120-34/+35
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71857
* add strict float for round operationLiu, Chen32020-01-016-41/+86
| | | | Differential Revision: https://reviews.llvm.org/D72026
* [MC][TargetMachine] Delete MCTargetOptions::MCPIECopyRelocationsFangrui Song2020-01-012-10/+9
| | | | | | | | | | | | clang/lib/CodeGen/CodeGenModule performs the -mpie-copy-relocations check and sets dso_local on applicable global variables. We don't need to duplicate the work in TargetMachine shouldAssumeDSOLocal. Verified that -mpie-copy-relocations can still emit PC relative relocations for external variable accesses. clang -target x86_64 -fpie -mpie-copy-relocations -c => R_X86_64_PC32 clang -target aarch64 -fpie -mpie-copy-relocations -c => R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_LDST64_ABS_LO12_NC
OpenPOWER on IntegriCloud