summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* GlobalISel: Fix fewerElementsVector for ctlz with different result typeMatt Arsenault2019-02-202-3/+7
| | | | | | Also complete the set of related operations. llvm-svn: 354480
* GlobalISel: Implement moreElementsVector for g_insert resultsMatt Arsenault2019-02-202-14/+32
| | | | llvm-svn: 354477
* Re-land the refactoring part of r354244 "[DAGCombiner] Eliminate dead stores ↵Clement Courbet2019-02-202-35/+80
| | | | | | | | to stack." This is an NFC. llvm-svn: 354476
* [Hexagon] Split vector pairs for ISD::SIGN_EXTEND and ISD::ZERO_EXTENDKrzysztof Parzyszek2019-02-202-0/+7
| | | | llvm-svn: 354473
* [MCA][ResourceManager] Add a table that maps processor resource indices to ↵Andrea Di Biagio2019-02-201-21/+24
| | | | | | | | | | | | processor resource identifiers. This patch adds a lookup table to speed up resource queries in the ResourceManager. This patch also moves helper function 'getResourceStateIndex()' from ResourceManager.cpp to Support.h, so that we can reuse that logic in the SummaryView (and potentially other views in llvm-mca). No functional change intended. llvm-svn: 354470
* [InstSimplify] use any-zero matcher for fcmp foldsSanjay Patel2019-02-201-22/+25
| | | | | | | | | | | | | The m_APFloat matcher does not work with anything but strict splat vector constants, so we could miss these folds and then trigger an assertion in instcombine: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201 The previous attempt at this in rL354406 had a logic bug that actually triggered a regression test failure, but I failed to notice it the first time. llvm-svn: 354467
* [MIPS MSA] Avoid some DAG combines for vector shiftsPetar Avramovic2019-02-202-0/+9
| | | | | | | | | | DAG combiner combines two shifts into shift + and with bitmask. Avoid such combines for vectors since leaving two vector shifts as they are produces better end results. Differential Revision: https://reviews.llvm.org/D58225 llvm-svn: 354461
* [Codegen] Remove dead flags on Physical Defs in machine cseDavid Green2019-02-201-19/+24
| | | | | | | | | | We may leave behind incorrect dead flags on instructions that are CSE'd. Make sure we remove the dead flags on physical registers to prevent other incorrect code motion. Differential Revision: https://reviews.llvm.org/D58115 llvm-svn: 354443
* [RegAllocGreedy] Take last chance recoloring into account in split and assignMikael Holmen2019-02-201-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a follow-up to r353988 where tryEvict was extended to take last chance recoloring into account. Now we do the same thing for trySplit and tryAssign. Now we always pass a "FixedRegisters" argument to canEvictInterference and tryEvict so it doesn't need to have a default value anymore. The need for this was found long ago in an out-of-tree target. Unfortunately I don't have a reproducer for an in-tree target. Reviewers: qcolombet, rudkx Reviewed By: qcolombet, rudkx Subscribers: rudkx, MatzeB, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58376 llvm-svn: 354439
* [NFC] add/modify wrapper function for findRegisterDefOperand().Chen Zheng2019-02-203-3/+4
| | | | llvm-svn: 354438
* [X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUsCraig Topper2019-02-201-1/+0
| | | | | | | | | | | | | | | | | | | Summary: Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag. Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead. Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU. Reviewers: spatel, RKSimon, chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D58412 llvm-svn: 354436
* Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit ↵Eric Christopher2019-02-202-8/+0
| | | | | | | | | | | | | | | | horizontal X86 instructions (add, sub)" As this has broken the lto bootstrap build for 3 days and is showing a significant regression on the Dither_benchmark results (from the LLVM benchmark suite) -- specifically, on the BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have regressed by about 28% on Skylake, 34% on Haswell, and over 40% on Sandybridge. This reverts commit r353923. llvm-svn: 354434
* [RISCV] Implement pseudo instructions for load/store from a symbol address.Kito Cheng2019-02-205-14/+137
| | | | | | | | | | | | | Summary: Those pseudo-instructions are making load/store instructions able to load/store from/to a symbol, and its always using PC-relative addressing to generating a symbol address. Reviewers: asb, apazos, rogfer01, jrtc27 Differential Revision: https://reviews.llvm.org/D50496 llvm-svn: 354430
* [PowerPC] exploit P9 instruction maddld.Chen Zheng2019-02-202-4/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D58364 llvm-svn: 354427
* [WebAssembly] Generalize section ordering constraintsThomas Lively2019-02-201-8/+61
| | | | | | | | | | | | | | | | | | | | | Summary: Changes from using a total ordering of known sections to using a dependency graph approach. This allows our tools to accept and process binaries that are compliant with the spec and tool conventions that would have been previously rejected. It also means our own tools can do less work to enforce an artificially imposed ordering. Using a general mechanism means fewer special cases and exceptions in the ordering logic. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58312 llvm-svn: 354426
* [WebAssembly] Refactor atomic operation definitions (NFC)Heejin Ahn2019-02-201-205/+226
| | | | | | | | | | | | | | | | | | | Summary: - Make `ATOMIC_I`, `ATOMIC_NRI`, `AtomicLoad`, `AtomicStore` classes and make other operations inherit from them - Factor the common opcode prefix '0xfe' out from the opcodes into the common class - Reorder instructions in the order of increasing opcodes Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58338 llvm-svn: 354421
* [WebAssembly] Fix load/store name detection for atomic instructionsHeejin Ahn2019-02-202-7/+6
| | | | | | | | | | | | | | | | | | | | Summary: Fixed a bug in the routine in AsmParser that determines whether the current instruction is a load or a store. Atomic instructions' prefixes are not `atomic_` but `atomic.`, and all atomic instructions are also memory instructions. Also fixed the printing format of atomic instructions to match other memory instructions and added encoding tests for atomic instructions. Reviewers: aardappel, tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58337 llvm-svn: 354419
* [WebAssembly] Fixed disassembler not knowing about OPERAND_EVENTWouter van Oortmerssen2019-02-201-0/+1
| | | | | | | | | | | | Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58414 llvm-svn: 354416
* [GVN] Small tweaks to comments, style, and missed vector handlingPhilip Reames2019-02-201-5/+5
| | | | | | Noticed these while doing a final sweep of the code to make sure I hadn't missed anything in my last couple of patches. The (minor) missed optimization was noticed because of the stylistic fix to avoid an overly specific cast. llvm-svn: 354412
* Revert "[InstSimplify] use any-zero matcher for fcmp folds"Sanjay Patel2019-02-201-25/+22
| | | | | | | This reverts commit 058bb8351351d56d2a4e8a772570231f9e5305e5. Forgot to update another test affected by this change. llvm-svn: 354408
* [GVN] Fix last crasher w/non-integral pointersPhilip Reames2019-02-201-2/+18
| | | | | | | | | | Same case as for memset and memcpy, but this time for clobbering stores and loads. We still can't allow coercion to or from non-integrals, regardless of the transform. Now that I'm done the whole little sequence, it seems apparent that we'd entirely missed reasoning about clobbers in the original GVN support for non-integral pointers. My appologies, I thought we'd upstreamed all of this, but it turns out we were still carrying a downstream hack which hid all of these issues. My chanks to Cherry Zhang for helping debug. llvm-svn: 354407
* [InstSimplify] use any-zero matcher for fcmp foldsSanjay Patel2019-02-201-22/+25
| | | | | | | | | The m_APFloat matcher does not work with anything but strict splat vector constants, so we could miss these folds and then trigger an assertion in instcombine: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201 llvm-svn: 354406
* [GVN] Fix a crash bug w/non-integral pointers and memtransfersPhilip Reames2019-02-191-0/+5
| | | | | | Problem is very similiar to the one fixed for memsets in r354399, we try to coerce a value to non-integral type, and then crash while try to do so. Since we shouldn't be doing such coercions to start with, easy fix. From inspection, I see two other cases which look to be similiar and will follow up with most test cases and fixes if confirmed. llvm-svn: 354403
* [GVN] Fix a non-integral pointer bug w/vector typesPhilip Reames2019-02-191-2/+2
| | | | | | GVN generally doesn't forward structs or array types, but it *will* forward vector types to non-vectors and vice versa. As demonstrated in tests, we need to inhibit the same set of transforms for vector of non-integral pointers as for non-integral pointers themselves. llvm-svn: 354401
* [GVN] Fix a crash bug around non-integral pointersPhilip Reames2019-02-191-3/+16
| | | | | | If we encountered a location where we tried to forward the value of a memset to a load of a non-integral pointer, we crashed. Such a forward is not legal in general, but we can forward null pointers. Test for both cases are included. llvm-svn: 354399
* [WebAssembly] Update MC for bulk memoryThomas Lively2019-02-197-21/+97
| | | | | | | | | | | | | | | | | | Summary: Rename MemoryIndex to InitFlags and implement logic for determining data segment layout in ObjectYAML and MC. Also adds a "passive" flag for the .section assembler directive although this cannot be assembled yet because the assembler does not support data sections. Reviewers: sbc100, aardappel, aheejin, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57938 llvm-svn: 354397
* [X86] Mark FP32_TO_INT16_IN_MEM/FP32_TO_INT32_IN_MEM/FP32_TO_INT64_IN_MEM as ↵Craig Topper2019-02-191-1/+3
| | | | | | | | | | | | | | clobbering EFLAGS to prevent mis-scheduling during conversion from SelectionDAG to MIR. After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags. Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first. We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this. Fixes PR40779 llvm-svn: 354395
* [InstCombine] reduce even more unsigned saturated add with 'not' opSanjay Patel2019-02-191-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to use the sum in the icmp to allow matching with m_UAddWithOverflow and eliminate the 'not'. This is discussed in D51929 and is another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 Name: uaddsat, -1 fval %notx = xor i32 %x, -1 %a = add i32 %x, %y %c = icmp ugt i32 %notx, %y %r = select i1 %c, i32 %a, i32 -1 => %a = add i32 %x, %y %c2 = icmp ugt i32 %y, %a %r = select i1 %c2, i32 -1, i32 %a Name: uaddsat, -1 fval + ult %notx = xor i32 %x, -1 %a = add i32 %x, %y %c = icmp ult i32 %y, %notx %r = select i1 %c, i32 %a, i32 -1 => %a = add i32 %x, %y %c2 = icmp ugt i32 %y, %a %r = select i1 %c2, i32 -1, i32 %a https://rise4fun.com/Alive/nTp llvm-svn: 354393
* [InstCombine] rearrange saturated add folds; NFCSanjay Patel2019-02-191-12/+14
| | | | | | | | | | This is no-functional-change-intended, but that was also true when it was part of rL354276, and I managed to lose 2 predicates for the fold with constant...causing much bot distress. So this time I'm adding a couple of negative tests to avoid that. llvm-svn: 354384
* PowerPC: Fix typos in commentsJinsong Ji2019-02-191-2/+2
| | | | llvm-svn: 354382
* [ConstantFold] Fix misfolding fcmp of a ConstantExpr NaN with itself.Andrew Scheidecker2019-02-191-3/+10
| | | | | | | | | | | The code incorrectly inferred that the relationship of a constant expression to itself is FCMP_OEQ (ordered and equal), when it's actually FCMP_UEQ (unordered *or* equal). This change corrects that, and adds some more limited folds that can be done in this case. Differential revision: https://reviews.llvm.org/D51216 llvm-svn: 354381
* [ConstantFold] Fix misfolding of icmp with a bitcast FP second operand.Andrew Scheidecker2019-02-191-2/+4
| | | | | | | | | In the process of trying to eliminate the bitcast, this was producing a malformed icmp with FP operands. Differential revision: https://reviews.llvm.org/D51215 llvm-svn: 354380
* [llvm-cov] Add support for gcov --hash-filenames optionVedant Kumar2019-02-191-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for --hash-filenames to llvm-cov. This option adds md5 hash of the source path to the name of the generated .gcov file. The option is crucial for cases where you have multiple files with the same name but can't use --preserve-paths as resulting filenames exceed the limit. from gcov(1): ``` -x --hash-filenames By default, gcov uses the full pathname of the source files to to create an output filename. This can lead to long filenames that can overflow filesystem limits. This option creates names of the form source-file##md5.gcov, where the source-file component is the final filename part and the md5 component is calculated from the full mangled name that would have been used otherwise. ``` Patch by Igor Ignatev! Differential Revision: https://reviews.llvm.org/D58370 llvm-svn: 354379
* [X86] Don't consider functions ABI compatible for ArgumentPromotion pass if ↵Craig Topper2019-02-192-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | they view 512-bit vectors differently. The use of the -mprefer-vector-width=256 command line option mixed with functions using vector intrinsics can create situations where one function thinks 512 vectors are legal, but another fucntion does not. If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion might try to pass by value instead. This will result in type legalization for the two functions handling the 512 bit vector differently leading to runtime failures. Had the 512 bit vector been passed by value from clang codegen, both functions would have been tagged with a min-legal-vector-width=512 function attribute. That would make them be legalized the same way. I observed this issue in 32-bit mode where a union containing a 512 bit vector was being passed by a function that used intrinsics to one that did not. The caller ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1. The fix implemented here is just to consider it a mismatch if two functions would handle 512 bit differently without looking at the types that are being considered. This is the easist and safest fix, but it can be improved in the future. Differential Revision: https://reviews.llvm.org/D58390 llvm-svn: 354376
* Annotate timeline in Instruments with passes and other timed regions.Daniel Sanders2019-02-193-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instruments is a useful tool for finding performance issues in LLVM but it can be difficult to identify regions of interest on the timeline that we can use to filter the profiler or allocations instrument. Xcode 10 and the latest macOS/iOS/etc. added support for the os_signpost() API which allows us to annotate the timeline with information that's meaningful to LLVM. This patch causes timer start and end events to emit signposts. When used with -time-passes, this causes the passes to be annotated on the Instruments timeline. In addition to visually showing the duration of passes on the timeline, it also allows us to filter the profile and allocations instrument down to an individual pass allowing us to find the issues within that pass without being drowned out by the noise from other parts of the compiler. Using this in conjunction with the Time Profiler (in high frequency mode) and the Allocations instrument is how I found the SparseBitVector that should have been a BitVector and the DenseMap that could be replaced by a sorted vector a couple months ago. I added NamedRegionTimers to TableGen and used the resulting annotations to identify the slow portions of the Register Info Emitter. Some of these were placed according to educated guesses while others were placed according to hot functions from a previous profile. From there I filtered the profile to a slow portion and the aforementioned issues stood out in the profile. To use this feature enable LLVM_SUPPORT_XCODE_SIGNPOSTS in CMake and run the compiler under Instruments with -time-passes like so: instruments -t 'Time Profiler' bin/llc -time-passes -o - input.ll' Then open the resulting trace in Instruments. There was a talk at WWDC 2018 that explained the feature which can be found at https://developer.apple.com/videos/play/wwdc2018/405/ if you'd like to know more about it. Reviewers: bogner Reviewed By: bogner Subscribers: jdoerfert, mgorny, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D52954 llvm-svn: 354365
* [libObject][NFC] Use sys::path::convert_to_slash.Jordan Rupprecht2019-02-191-5/+1
| | | | | | | | | | | | | | | | Summary: As suggested in rL353995 Reviewers: compnerd Reviewed By: compnerd Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58298 llvm-svn: 354364
* [X86][SSE] Generalize X86ISD::BLENDI support to more value typesSimon Pilgrim2019-02-192-60/+63
| | | | | | | | | | | | | | | | | | D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361 Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 354363
* [SDAG] Use shift amount type in MULO promotion; NFCNikita Popov2019-02-191-2/+4
| | | | | | | | | | Directly use the correct shift amount type if it is possible, and future-proof the code against vectors. The added test makes sure that bitwidths that do not fit into the shift amount type do not assert. Split out from D57997. llvm-svn: 354359
* [X86][AVX2] Hide VPBLENDD instructions behind AVX2 predicateSimon Pilgrim2019-02-191-0/+2
| | | | | | This was the cause of the regression in D57888 - the commuted load pattern wasn't hidden by the predicate so once we enabled v4i32 blends on SSE41+ targets then isel was incorrectly matched against AVX2+ instructions. llvm-svn: 354358
* [X86] Bugfix for nullptr check by klocworkCraig Topper2019-02-192-4/+5
| | | | | | | | | | klocwork critical issues in CG files: Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D58363 llvm-svn: 354357
* X86AsmParser AVX-512: Return error instead of hitting assertCraig Topper2019-02-191-0/+2
| | | | | | | | | | When parsing a sequence of tokens beginning with {, it will hit an assert and crash if the token afterwards is not an identifier. Instead of this, return a more verbose error as seen elsewhere in the function. Patch by Brandon Jones (BrandonTJones) Differential Revision: https://reviews.llvm.org/D57375 llvm-svn: 354356
* [X86] Filter out tuning feature flags and a few ISA feature flags when ↵Craig Topper2019-02-192-4/+57
| | | | | | | | | | | | | | | | checking for function inline compatibility. Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining. There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present. Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible. I've based the implementation here of the similar code in AMDGPU. Differential Revision: https://reviews.llvm.org/D58371 llvm-svn: 354355
* GlobalISel: Implement moreElementsVector for selectMatt Arsenault2019-02-192-18/+21
| | | | llvm-svn: 354354
* GlobalISel: Implement moreElementsVector for G_EXTRACT sourceMatt Arsenault2019-02-192-0/+8
| | | | llvm-svn: 354348
* [X86][AVX] Update VBROADCAST folds to always use v2i64 X86vzloadSimon Pilgrim2019-02-192-3/+3
| | | | | | | | The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands. Follow up to D58053 llvm-svn: 354346
* GlobalISel: Implement moreElementsVector for bit opsMatt Arsenault2019-02-192-0/+60
| | | | llvm-svn: 354345
* [yaml2obj][obj2yaml] Remove section type range markers from allowed mappings ↵James Henderson2019-02-191-3/+1
| | | | | | | | | | | | | | | | | | | | | and support hex values yaml2obj/obj2yaml previously supported SHT_LOOS, SHT_HIOS, and SHT_LOPROC for section types. These are simply values that delineate a range and don't really make sense as valid values. For example if a section has type value 0x70000000, obj2yaml shouldn't print this value as SHT_LOPROC. Additionally, this was missing the three other range markers (SHT_HIPROC, SHT_LOUSER and SHT_HIUSER). This change removes these three range markers. It also adds support for specifying the type as an integer, to allow section types that LLVM doesn't know about. Reviewed by: grimar Differential Revision: https://reviews.llvm.org/D58383 llvm-svn: 354344
* Cast from SDValue directly instead of superfluous getNode(). NFCI.Simon Pilgrim2019-02-191-2/+2
| | | | llvm-svn: 354343
* GlobalISel: Verify g_insertMatt Arsenault2019-02-191-0/+24
| | | | llvm-svn: 354342
* [X86][AVX] EltsFromConsecutiveLoads - Add BROADCAST lowering supportSimon Pilgrim2019-02-191-3/+70
| | | | | | | | | | This patch adds scalar/subvector BROADCAST handling to EltsFromConsecutiveLoads. It mainly shows codegen changes to 32-bit code which failed to handle i64 loads, although 64-bit code is also using this new path to more efficiently combine to a broadcast load. Differential Revision: https://reviews.llvm.org/D58053 llvm-svn: 354340
OpenPOWER on IntegriCloud