summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [x86] explicitly set cost of integer add/subSanjay Patel2019-01-061-0/+8
| | | | | | | | | | | | | | | There are no test changes here in the existing cost model regression tests because integer add/sub have a default legal cost of 1 already. This would break, however, if we custom lower those ops because the default cost model assumes that custom-lowered ops are more expensive. This is similar to the change in rL350403. See discussion in D56011 for more details. When we enhance that patch to handle integer ops, we need this cost model change to avoid unintended diffs here from the custom lowering. llvm-svn: 350496
* Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"Lama Saba2019-01-061-595/+5
| | | | | | | | | Resubmitted in rL345290 and reverted in rL350345 due to failures in http://green.lab.llvm.org/green/job/lldb-cmake/ Resubmitting after a workaround to lldb-cmake failure was committed in rL350346, more info in https://reviews.llvm.org/D56084 llvm-svn: 350493
* [LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that ↵Craig Topper2019-01-061-0/+2
| | | | | | | | should be handled. The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too. llvm-svn: 350490
* [X86][AsmParser] Don't allow X86::DX in CheckBaseRegAndIndexRegAndScale.Craig Topper2019-01-051-2/+1
| | | | | | This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case. llvm-svn: 350483
* [X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 ↵Craig Topper2019-01-051-2/+21
| | | | | | (truncate (v64i8)))) on KNL. llvm-svn: 350481
* [X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the ↵Craig Topper2019-01-051-1/+9
| | | | | | | | input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480
* Added single use check to ShrinkDemandedConstantStanislav Mekhanoshin2019-01-051-0/+3
| | | | | | | | | Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475
* [X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate ↵Craig Topper2019-01-051-1/+3
| | | | | | when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473
* [InstCombine] Relax cttz/ctlz with select on zeroNikita Popov2019-01-051-8/+15
| | | | | | | | | | | | | | The cttz/ctlz intrinsics have a parameter specifying whether the result is undefined for zero. cttz(x, false) can be relaxed to cttz(x, true) if x is known non-zero, and in fact such an optimization is already performed. However, this currently doesn't work if x is non-zero as a result of a select rather than an explicit branch. This patch adds handling for this case, thus allowing x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y. Differential Revision: https://reviews.llvm.org/D55786 llvm-svn: 350463
* [Inliner] Optimize shouldBeDeferredEaswaran Raman2019-01-051-5/+16
| | | | | | | | | | This has some minor optimizations to shouldBeDeferred. This is not strictly NFC because the early exit inside the loop assumes TotalSecondaryCost is monotonically non-decreasing, which is not true if the threshold used by CostAnalyzer is negative. AFAICT the thresholds do not go below 0 for the default values of the various options we use. llvm-svn: 350456
* [X86] Require second operand of X86vshiftuniform to be an integer. NFCCraig Topper2019-01-051-1/+1
| | | | | | We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer. llvm-svn: 350455
* Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to ↵Evgeniy Stepanov2019-01-051-1/+3
| | | | | | | | TLS_SLOT_SANITIZER(6)"" This reapplies commit r348983. llvm-svn: 350448
* [PGO] Use SourceFileName rather module name in PGOFuncNameRong Xu2019-01-041-5/+6
| | | | | | | | | | In LTO or Thin-lto mode (though linker plugin), the module names are of temp file names which are different for different compilations. Using SourceFileName avoids the issue. This should not change any functionality for current PGO as all the current callers of getPGOFuncName() is before LTO. llvm-svn: 350442
* [X86] Fix warning; NFCNikita Popov2019-01-041-1/+1
| | | | llvm-svn: 350437
* Update the pr_datasz of .note.gnu.property section.Vyacheslav Zakharin2019-01-041-3/+3
| | | | | | | | Patch by Xiang Zhang. Differential Revision: https://reviews.llvm.org/D56080 llvm-svn: 350436
* [BDCE] Remove dead uses of argumentsNikita Popov2019-01-042-43/+48
| | | | | | | | | | | | | | | | | In addition to finding dead uses of instructions, also find dead uses of function arguments, and replace them with zero as well. I'm changing the way the known bits are computed here to remove the coupling between the transfer function and the algorithm. It previously relied on the first op being visited first and computing known bits -- unless the first op is not an instruction, in which case they're computed on the second op. I could have adjusted this to check for "instruction or argument", but I think it's better to avoid the repeated calculation with an explicit flag. Differential Revision: https://reviews.llvm.org/D56247 llvm-svn: 350435
* [AArch64] Adjust the cost model for Exynos M3Evandro Menezes2019-01-041-38/+39
| | | | | | Improve the modeling of ASIMD loads and stores. llvm-svn: 350434
* [X86] Add INSERT_SUBVECTOR to ComputeNumSignBitsCraig Topper2019-01-041-1/+35
| | | | | | | | | | This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432
* hwasan: Implement lazy thread initialization for the interceptor ABI.Peter Collingbourne2019-01-041-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is similar to D55986 but for threads: a process with the interceptor hwasan library loaded might have some threads started by instrumented libraries and some by uninstrumented libraries, and we need to be able to run instrumented code on the latter. The solution is to perform per-thread initialization lazily. If a function needs to access shadow memory or add itself to the per-thread ring buffer its prologue checks to see whether the value in the sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter and reloads from the TLS slot. The runtime does the same thing if it needs to access this data structure. This change means that the code generator needs to know whether we are targeting the interceptor runtime, since we don't want to pay the cost of lazy initialization when targeting a platform with native hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform} has been introduced for selecting the runtime ABI to target. The default ABI is set to interceptor since it's assumed that it will be more common that users will be compiling application code than platform code. Because we can no longer assume that the TLS slot is initialized, the pthread_create interceptor is no longer necessary, so it has been removed. Ideally, lazy initialization should only cost one instruction in the hot path, but at present the call may cause us to spill arguments to the stack, which means more instructions in the hot path (or theoretically in the cold path if the spills are moved with shrink wrapping). With an appropriately chosen calling convention for the per-thread initialization function (TODO) the hot path should always need just one instruction and the cold path should need two instructions with no spilling required. Differential Revision: https://reviews.llvm.org/D56038 llvm-svn: 350429
* [ThinLTO] Handle chains of aliasesTeresa Johnson2019-01-046-7/+121
| | | | | | | | | | | | | | | | | | | At -O0, globalopt is not run during the compile step, and we can have a chain of an alias having an immediate aliasee of another alias. The summaries are constructed assuming aliases in a canonical form (flattened chains), and as a result only the base object but no intermediate aliases were preserved. Fix by adding a pass that canonicalize aliases, which ensures each alias is a direct alias of the base object. Reviewers: pcc, davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54507 llvm-svn: 350423
* [x86] lower extracted fadd/fsub to horizontal vector math; 2nd trySanjay Patel2019-01-041-0/+63
| | | | | | | | | | | | | | | | | | | | The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421
* [CodeExtractor] Do not extract unsafe lifetime markersVedant Kumar2019-01-041-10/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lifetime markers which reference inputs to the extraction region are not safe to extract. Example ('rhs' will be extracted): ``` entry: +------------+ | x = alloca | | y = alloca | +------------+ / \ lhs: rhs: +-------------------+ +-------------------+ | lifetime_start(x) | | lifetime_start(x) | | use(x) | | lifetime_start(y) | | lifetime_end(x) | | use(x, y) | | lifetime_start(y) | | lifetime_end(y) | | use(y) | | lifetime_end(x) | | lifetime_end(y) | +-------------------+ +-------------------+ ``` Prior to extraction, the stack coloring pass sees that the slots for 'x' and 'y' are in-use at the same time. After extraction, the coloring pass infers that 'x' and 'y' are *not* in-use concurrently, because markers from 'rhs' are no longer available to help decide otherwise. This leads to a miscompile, because the stack slots actually are in-use concurrently in the extracted function. Fix this by moving lifetime start/end markers for memory regions defined in the calling function around the call to the extracted function. Fixes llvm.org/PR39671 (rdar://45939472). Differential Revision: https://reviews.llvm.org/D55967 llvm-svn: 350420
* [InstCombine] reduce raw IR narrowing rotate patterns to funnel shiftSanjay Patel2019-01-041-16/+8
| | | | | | | | | Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419
* [LICM] Adjust how moving the re-hoist point worksJohn Brawn2019-01-041-3/+4
| | | | | | | | | | | | | | | | In some cases the order that we hoist instructions in means that when rehoisting (which uses the same order as hoisting) we can rehoist to a block A, then a block B, then block A again. This currently causes an assertion failure as it expects that when changing the hoist point it only ever moves to a block that dominates the hoist point being moved from. Fix this by moving the re-hoist point when it doesn't dominate the dominator of hoisted instruction, or in other words when it wouldn't dominate the uses of the instruction being rehoisted. Differential Revision: https://reviews.llvm.org/D55266 llvm-svn: 350408
* Undo r350355 "[X86] Remove terrible DX Register parsing hack in parse ↵Nirav Dave2019-01-042-5/+18
| | | | | | | | operand. NFCI." Add missing test case and update comments. llvm-svn: 350406
* [CostModel][X86] Fix SSE1 FADD/FSUB costsSimon Pilgrim2019-01-041-0/+12
| | | | | | | | Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403
* Revert patches 348835 and 348571 because they'reRanjeet Singh2019-01-041-5/+4
| | | | | | causing code size performance regressions. llvm-svn: 350402
* [X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combineSimon Pilgrim2019-01-041-3/+26
| | | | | | Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399
* [MCA] Improved handling of in-order issue/dispatch resources.Andrea Di Biagio2019-01-043-21/+15
| | | | | | | | | | | Added field 'MustIssueImmediately' to the instruction descriptor of instructions that only consume in-order issue/dispatch processor resources. This speeds up queries from the hardware Scheduler, and gives an average ~5% speedup on a release build. No functional change intended. llvm-svn: 350397
* [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffsetFlorian Hahn2019-01-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395
* [MCA] Store extra information about processor resources in the ResourceManager.Andrea Di Biagio2019-01-041-14/+32
| | | | | | | | | | | | | | | | | | | | | Method ResourceManager::use() is responsible for updating the internal state of used processor resources, as well as notifying resource groups that contain used resources. Before this patch, method 'use()' didn't know how to quickly obtain the set of groups that contain a particular resource unit. It had to discover groups by perform a potentially slow search (done by iterating over the set of processor resource descriptors). With this patch, the relationship between resource units and groups is stored in the ResourceManager. That means, method 'use()' no longer has to search for groups. This gives an average speedup of ~4-5% on a release build. This patch also adds extra code comments in ResourceManager.h to better describe the resource mask layout, and how resouce indices are computed from resource masks. llvm-svn: 350387
* [WebAssembly] Split the checking from the sorting logic.Richard Trieu2019-01-041-2/+13
| | | | | | | | Move the check for -1 and identical values outside the vector sorting code. Compare functions need to be able to compare identical elements to be conforming. llvm-svn: 350379
* [memcpyopt] Remove a few unnecessary isVolatile() checks. NFCXin Tong2019-01-041-6/+4
| | | | | | We already checked for isSimple() on the store. llvm-svn: 350378
* [X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the ↵Craig Topper2019-01-041-0/+35
| | | | | | | | | | zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374
* revert r350369: [x86] lower extracted fadd/fsub to horizontal vector mathSanjay Patel2019-01-041-63/+0
| | | | | | There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373
* [x86] lower extracted fadd/fsub to horizontal vector mathSanjay Patel2019-01-031-0/+63
| | | | | | | | | | | | | | This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369
* [WebAssembly] Optimize Irreducible Control FlowHeejin Ahn2019-01-031-157/+276
| | | | | | | | | | | | | | | | | | | | | | | Summary: Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output. Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other). Reviewers: aheejin, sunfish Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D55467 Patch by Alon Zakai (kripken) llvm-svn: 350367
* [WebAssembly] Fixed disassembler not knowing about new brlist operandWouter van Oortmerssen2019-01-033-1/+21
| | | | | | | | | | | | | | Summary: The previously introduced new operand type for br_table didn't have a disassembler implementation, causing an assert. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56227 llvm-svn: 350366
* [WebAssembly] Made InstPrinter more robustWouter van Oortmerssen2019-01-033-59/+68
| | | | | | | | | | | | | | | | | | | Summary: Instead of asserting on certain kinds of malformed instructions, it now still print, but instead adds an annotation indicating the problem, and/or indicates invalid_type etc. We're using the InstPrinter from many contexts that can't always guarantee values are within range (e.g. the disassembler), where having output is more valueable than asserting. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56223 llvm-svn: 350365
* [X86] Remove terrible DX Register parsing hack in parse operand. NFCI.Nirav Dave2019-01-032-18/+5
| | | | | | | Fold hack special casing of (%dx) operand parsing into the related hack for out*/in* instruction parsing. llvm-svn: 350355
* [DAGCombiner][x86] scalarize binop followed by extractelementSanjay Patel2019-01-033-5/+61
| | | | | | | | | | | | | | | | | | | | As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354
* [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.Alexander Timofeev2019-01-031-3/+7
| | | | | | | | | | | | | | | | | | | | | | Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350
* [UnrollRuntime] Move the DomTree verification under expensive checksAnna Thomas2019-01-031-1/+1
| | | | | | Suggested by Hal as done in r349871. llvm-svn: 350349
* Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp ↵Stefan Granitz2019-01-031-5/+595
| | | | | | | | files"" This reverts commit r350290. llvm-svn: 350345
* [MCStreamer] Use report_fatal_error in EmitRawTextImplKristina Brooks2019-01-031-7/+8
| | | | | | | | | | Use report_fatal_error in MCStreamer::EmitRawTextImpl instead of using errs() and explain the rationale behind it not being llvm_unreachable() to save confusion for any future maintainers. Differential Revision: https://reviews.llvm.org/D56245 llvm-svn: 350342
* [UnrollRuntime] Add DomTree verification under debug modeAnna Thomas2019-01-031-0/+6
| | | | | | | | | | | | NFC: This adds the dom tree verification under debug mode at a point just before we start unrolling the loop. This allows us to verify dom tree at a state where it is much smaller and before the unrolling actually happens. This also implies we do not need to run -verify-dom-info everytime to see if the DT is in a valid state when we transform the loop for runtime unrolling. llvm-svn: 350334
* [AArch64] Add new scheduling predicatesEvandro Menezes2019-01-031-31/+86
| | | | | | Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode. llvm-svn: 350332
* [MCA] Improve code comment and reuse an helper function in ResourceManager. NFCIAndrea Di Biagio2019-01-031-9/+10
| | | | llvm-svn: 350322
* [RISCV][MC] Accept %lo and %pcrel_lo on operands to liAlex Bradbury2019-01-032-6/+20
| | | | | | This matches GNU assembler behaviour. llvm-svn: 350321
* [NewPM] Port MsanPhilip Pfaffe2019-01-035-96/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Keeping msan a function pass requires replacing the module level initialization: That means, don't define a ctor function which calls __msan_init, instead just declare the init function at the first access, and add that to the global ctors list. Changes: - Pull the actual sanitizer and the wrapper pass apart. - Add a newpm msan pass. The function pass inserts calls to runtime library functions, for which it inserts declarations as necessary. - Update tests. Caveats: - There is one test that I dropped, because it specifically tested the definition of the ctor. Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji Differential Revision: https://reviews.llvm.org/D55647 llvm-svn: 350305
OpenPOWER on IntegriCloud