summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [LoopUnroll+LoopUnswitch] do not transform loops containing callbrNick Desaulniers2019-07-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There is currently a correctness issue when unrolling loops containing callbr's where their indirect targets are being updated correctly to the newly created labels, but their operands are not. This manifests in unrolled loops where the second and subsequent copies of callbr instructions have blockaddresses of the label from the first instance of the unrolled loop, which would result in nonsensical runtime control flow. For now, conservatively do not unroll the loop. In the future, I think we can pursue unrolling such loops provided we transform the cloned callbr's operands correctly. Such a transform and its legalities are being discussed in: https://reviews.llvm.org/D64101 Link: https://bugs.llvm.org/show_bug.cgi?id=42489 Link: https://groups.google.com/forum/#!topic/clang-built-linux/z-hRWP9KqPI Reviewers: fhahn, hfinkel, efriedma Reviewed By: fhahn, hfinkel, efriedma Subscribers: efriedma, hiraditya, zzheng, dmgreen, llvm-commits, pirama, kees, nathanchance, E5ten, craig.topper, chandlerc, glider, void, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D64368 llvm-svn: 366130
* AMDGPU/GlobalISel: Allow scalar s1 and/or/xorMatt Arsenault2019-07-151-6/+91
| | | | | | | | If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125
* ARM MTE stack sanitizer.Evgeniy Stepanov2019-07-159-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | Add "memtag" sanitizer that detects and mitigates stack memory issues using armv8.5 Memory Tagging Extension. It is similar in principle to HWASan, which is a software implementation of the same idea, but there are enough differencies to warrant a new sanitizer type IMHO. It is also expected to have very different performance properties. The new sanitizer does not have a runtime library (it may grow one later, along with a "debugging" mode). Similar to SafeStack and StackProtector, the instrumentation pass (in a follow up change) will be inserted in all cases, but will only affect functions marked with the new sanitize_memtag attribute. Reviewers: pcc, hctim, vitalybuka, ostannard Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64169 llvm-svn: 366123
* AMDGPU/GlobalISel: Select G_AND/G_OR/G_XORMatt Arsenault2019-07-152-0/+66
| | | | llvm-svn: 366121
* AMDGPU/GlobalISel: Don't constrain source register of VCC copiesMatt Arsenault2019-07-151-0/+20
| | | | | | | | | | | | | This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120
* AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copiesMatt Arsenault2019-07-151-10/+12
| | | | | | | | | The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119
* AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCCMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366118
* AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCCMatt Arsenault2019-07-151-17/+23
| | | | | | This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117
* AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELTMatt Arsenault2019-07-152-1/+33
| | | | llvm-svn: 366116
* AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELTMatt Arsenault2019-07-152-1/+36
| | | | | | Turn the constant cases into G_EXTRACTs. llvm-svn: 366115
* AMDGPU/GlobalISel: Fix G_ICMP for wave32Matt Arsenault2019-07-151-2/+2
| | | | llvm-svn: 366114
* GlobalISel: Implement narrowScalar for vector extract/insert indexesMatt Arsenault2019-07-151-0/+11
| | | | llvm-svn: 366113
* [FileCheck] Store line numbers as optional valuesThomas Preud'homme2019-07-151-10/+12
| | | | | | | | | | | | | | | | | | | | Summary: Processing of command-line definition of variable and logic around implicit not directives both reuse parsing code that expects a line number to be defined. So far, a special line number of 0 was used for those users of the parsing code where a line number does not make sense. This commit instead represents line numbers as Optional values so that they can be None for those cases. Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D64639 llvm-svn: 366109
* Expand comment about how StringsToBuckets was computed, and add more entriesNico Weber2019-07-151-1/+14
| | | | | | | | | | | | | | | | | The construction was explained in https://reviews.llvm.org/D44810?id=139526#inline-391999 but reading the code shouldn't require hunting down old reviews to understand it. The precomputed list was missing an entry for the empty list case, and one entry at the very end. (The current last entry is the last one where 3 * BucketCount fits in a signed int, but the reference implementation uses unsigneds as far as I can tell, so there's room for one more entry.) No behavior change for inputs seen in practice. Differential Revision: https://reviews.llvm.org/D64738 llvm-svn: 366107
* [ARM] MVE vector for 64bit typesDavid Green2019-07-152-1/+19
| | | | | | | | | | | | We need to make sure that we are sensibly dealing with vectors of types v2i64 and v2f64, even if most of the time we cannot generate native operations for them. This mostly adds a lot of testing, plus fixes up a couple of the issues found. And, or and xor can be legal for v2i64, and shifts combining needs a slight fixup. Differential Revision: https://reviews.llvm.org/D64316 llvm-svn: 366106
* [WebAssembly] Assembler: recognize .init_array as data section.Wouter van Oortmerssen2019-07-151-0/+3
| | | | | | | | | | | | Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64602 llvm-svn: 366104
* AMDGPU/GlobalISel: Widen vector extractsMatt Arsenault2019-07-151-5/+8
| | | | llvm-svn: 366103
* AMDGPU/GlobalISel: Handle llvm.amdgcn.if.breakMatt Arsenault2019-07-152-0/+32
| | | | llvm-svn: 366102
* AMDGPU/GlobalISel: Select llvm.amdgcn.end.cfMatt Arsenault2019-07-152-0/+19
| | | | llvm-svn: 366099
* [x86] try to keep FP casted+truncated+extracted vector element out of GPRsSanjay Patel2019-07-151-0/+39
| | | | | | | | | | | | | | | | | | inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0) We have pseudo-vectorization of scalar int to FP casts, so this tries to make that more likely by replacing a truncate with a bitcast. I didn't see any test diffs starting from 'uitofp', so I left that as a TODO. We can't only match the shorter trunc+extract pattern because there's an opposing transform somewhere, so we infinite loop. Waiting to try this during lowering is another possibility. A motivating case is shown in PR39975 and included in the test diffs here: https://bugs.llvm.org/show_bug.cgi?id=39975 Differential Revision: https://reviews.llvm.org/D64710 llvm-svn: 366098
* [llvm-lib] Add a dependency to intrinsics_gen to the LLVMLibDriver buildStella Stamenova2019-07-151-0/+3
| | | | | | | | | | | | | | | | | | Summary: Occasionally the build of LLVMLibDriver will fail because Attributes.inc has not been generated yet. Add an explicit dependency, so that we can guarantee that the file has been generated before LLVMLibDriver is build. ##[error]llvm\include\llvm\IR\Attributes.h(73,0): Error C1083: Cannot open include file: 'llvm/IR/Attributes.inc': No such file or directory llvm\include\llvm/IR/Attributes.h(73): fatal error C1083: Cannot open include file: 'llvm/IR/Attributes.inc': No such file or directory [LLVMLibDriver.vcxproj] Reviewers: asmith Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64357 llvm-svn: 366097
* [X86] Return UNDEF from LowerScalarImmediateShift when the shift amount is ↵Craig Topper2019-07-151-2/+5
| | | | | | | | | | | | | | | | | | | | out of range. I think we only turn out of range shiftss to undef when all elements are out of range or the shift amount is a splat out of range. I'm not sure which, I didn't check. During lowering we can split a shift where some elements are out of range into multiple shifts. This can create a new shift with a splat shift amount that is out of range. This patch returns undef for this case. Fixes PR42615. Differential Revision: https://reviews.llvm.org/D64699 llvm-svn: 366096
* AMDGPU: Add 24-bit mul intrinsicsMatt Arsenault2019-07-152-0/+132
| | | | | | | | | | | Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094
* [AMDGPU] Copy missing predicate from pseudo to realStanislav Mekhanoshin2019-07-151-0/+1
| | | | | | | | NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092
* [FunctionAttrs] Remove readonly and writeonly assertionJohannes Doerfert2019-07-151-2/+5
| | | | | | | | | | | | | There are scenarios where mutually recursive functions may cause the SCC to contain both read only and write only functions. This removes an assertion when adding read attributes which caused a crash with a the provided test case, and instead just doesn't add the attributes. Patch by Luke Lau <luke.lau@intel.com> Differential Revision: https://reviews.llvm.org/D60761 llvm-svn: 366090
* [ARM] Minor formatting in ARMInstrMVE.td. NFCDavid Green2019-07-151-34/+34
| | | | llvm-svn: 366089
* AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTORMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366087
* AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORSMatt Arsenault2019-07-151-1/+2
| | | | llvm-svn: 366086
* [AMDGPU] fixed scheduler crash in gfx908Stanislav Mekhanoshin2019-07-151-2/+2
| | | | | | | | | For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074
* [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL messageDmitry Preobrazhensky2019-07-153-4/+10
| | | | | | | | Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071
* [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructionsDmitry Preobrazhensky2019-07-151-3/+5
| | | | | | | | | | See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067
* [X86] isTargetShuffleEquivalent - assert the expected mask is correctly ↵Simon Pilgrim2019-07-151-0/+2
| | | | | | | | formed. NFCI. While we don't make any assumptions about the actual mask, assert that the expected mask only contains valid mask element values. llvm-svn: 366066
* [mips] Remove "else-after-return". NFCSimon Atanasyan2019-07-151-1/+1
| | | | llvm-svn: 366064
* [ARM] MVE Vector ShiftsDavid Green2019-07-154-60/+117
| | | | | | | | | | | | | | | | | | | | | | | | This adds basic lowering for MVE shifts. There are many shifts in MVE, but the instructions handled here are: VSHL (imm) VSHRu (imm) VSHRs (imm) VSHL (vector) VSHL (register) MVE, like NEON before it, doesn't have shift right by a vector (or register). We instead have to negate the amount and shift in the opposite direction. This means we have to convert any SHR's into a form of SHL (that is still signed or unsigned) with a negated condition and selecting from there. MVE still does have shifting by an immediate for SHL, ASR and LSR. This adds lowering for these and for register forms, which work well for shift lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially work optimally for right shifts. Differential Revision: https://reviews.llvm.org/D64212 llvm-svn: 366056
* [ARM] Move Shifts after Bits. NFCDavid Green2019-07-151-564/+565
| | | | | | | This just moves the shift instruction definitions further down the ARMInstrMVE.td file, to make positioning patterns slightly more natural. llvm-svn: 366054
* [ARM] Adjust how NEON shifts are loweredDavid Green2019-07-153-218/+287
| | | | | | | | | | | | This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051
* [Loop Peeling] Fix the bug with IDom setting for exit loopsSerguei Katkov2019-07-151-3/+18
| | | | | | | | | | | | | It is possible that loop exit has two predecessors in a loop body. In this case after the peeling the iDom of the exit should be a clone of iDom of original exit but no a clone of a block coming to this exit. Reviewers: reames, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D64618 llvm-svn: 366050
* [LoopVectorize] Pass unfiltered list of arguments to getIntrinsicInstCost.Florian Hahn2019-07-151-5/+2
| | | | | | | We do not compute the scalarization overhead in getVectorIntrinsicCost and TTI::getIntrinsicInstrCost requires the full arguments list. llvm-svn: 366049
* [Loop Peeling] Enable peeling for loops with multiple exitsSerguei Katkov2019-07-152-1/+22
| | | | | | | | | | | | | | | This CL enables peeling of the loop with multiple exits where one exit should be from latch and others are basic blocks with call to deopt. The peeling is enabled under the flag which is false by default. Reviewers: reames, mkuper, iajbar, fhahn Reviewed By: reames Subscribers: xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D63923 llvm-svn: 366048
* [Attributor] Deduce "nonnull" attributeHideto Ueno2019-07-151-0/+284
| | | | | | | | | | | | | | | | | Summary: Porting nonnull attribute to attributor. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63604 llvm-svn: 366043
* [LoopUtils] Extend the scope of getLoopEstimatedTripCountSerguei Katkov2019-07-151-6/+14
| | | | | | | | | | | | | | | With this patch the getLoopEstimatedTripCount function will accept also the loops where there are more than one exit but all exits except latch block should ends up with a call to deopt. This side exits should not impact the estimated trip count. Reviewers: reames, mkuper, danielcdh Reviewed By: reames Subscribers: fhahn, lebedev.ri, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D64553 llvm-svn: 366042
* Remove set but unused variable.Bill Wendling2019-07-151-5/+1
| | | | llvm-svn: 366041
* [LoopInfo] Introduce getUniqueNonLatchExitBlocks utility functionSerguei Katkov2019-07-151-13/+7
| | | | | | | | | | | | Extract the code from LoopUnrollRuntime into utility function to re-use it in D63923. Reviewers: reames, mkuper Reviewed By: reames Subscribers: fhahn, hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D64548 llvm-svn: 366040
* [PowerPC] Support fp128 libcallsFangrui Song2019-07-151-0/+28
| | | | | | | | | | | | | On PowerPC, IEEE 754 quadruple-precision libcall names use "kf" instead of "tf". In libgcc, libgcc/config/rs6000/float128-sed converts TF names to KF names. This patch implements its 24 substitution rules. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D64282 llvm-svn: 366039
* [ValueTracking] Look through constant Int2Ptr/Ptr2Int expressionsJohannes Doerfert2019-07-151-0/+9
| | | | | | | | | | | | | | | | Summary: This is analogous to the int2ptr/ptr2int instruction handling introduced in D54956. Reviewers: fhahn, efriedma, spatel, nlopes, sanjoy, lebedev.ri Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64708 llvm-svn: 366036
* [X86] Separate the memory size of vzext_load/vextract_store from the element ↵Craig Topper2019-07-154-119/+164
| | | | | | | | | | | | | | | | | | | | | | | | | size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only. Summary: SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory. This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense. I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load. If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64528 llvm-svn: 366034
* [TargetParser][ARM] Account dependencies when processing target featuresAlexandros Lamprineas2019-07-141-6/+20
| | | | | | | | | | Teaches ARM::appendArchExtFeatures to account dependencies when processing target features: i.e. when you say -march=armv8.1-m.main+mve.fp+nofp it means mve.fp should get discarded too. (Split from D63936) Differential Revision: https://reviews.llvm.org/D64048 llvm-svn: 366031
* [LV] Exclude loop-invariant inputs from scalar cost computation.Florian Hahn2019-07-141-22/+42
| | | | | | | | | | | | | | | | Loop invariant operands do not need to be scalarized, as we are using the values outside the loop. We should ignore them when computing the scalarization overhead. Fixes PR41294 Reviewers: hsaito, rengolin, dcaballe, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D59995 llvm-svn: 366030
* [clang][Driver][ARM] Favor -mfpu over default CPU featuresAlexandros Lamprineas2019-07-141-24/+6
| | | | | | | | | | | | | | | | When processing the command line options march, mcpu and mfpu, we store the implied target features on a vector. The change D62998 introduced a temporary vector, where the processed features get accumulated. When calling DecodeARMFeaturesFromCPU, which sets the default features for the specified CPU, we certainly don't want to override the features that have been explicitly specified on the command line. Therefore, the default features should appear first in the final vector. This problem became evident once I added the missing (unhandled) target features in ARM::getExtensionFeatures. Differential Revision: https://reviews.llvm.org/D63936 llvm-svn: 366027
* Recommit "[BitcodeReader] Validate OpNum, before accessing Record array."Florian Hahn2019-07-141-0/+4
| | | | | | | | | | | | | | | | | | | | | This recommits r365750 (git commit 8b222ecf2769ee133691f208f6166ce118c4a164) Original message: Currently invalid bitcode files can cause a crash, when OpNum exceeds the number of elements in Record, like in the attached bitcode file. The test case was generated by clusterfuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15698 Reviewers: t.p.northover, thegameg, jfb Reviewed By: jfb Differential Revision: https://reviews.llvm.org/D64507 llvm-svn: 365750jkkkk llvm-svn: 366018
OpenPOWER on IntegriCloud