summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns ↵Craig Topper2018-07-151-12/+39
| | | | | | to match AVX. llvm-svn: 337135
* [X86] Use 128-bit ops for 256-bit vzmovl patterns.Craig Topper2018-07-151-10/+17
| | | | | | | | 128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2. The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue. llvm-svn: 337134
* [DAGCombiner] fix typo in comment; NFCSanjay Patel2018-07-151-1/+1
| | | | llvm-svn: 337132
* [InstCombine] Corrections in comments for division transformation (NFC)Sanjay Patel2018-07-151-3/+3
| | | | | | | | | | The actual code seems to be correct, but the comments were misleading. Patch by Aaron Puchert! Differential Revision: https://reviews.llvm.org/D49276 llvm-svn: 337131
* [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)Sanjay Patel2018-07-151-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130
* [InstSimplify] fold minnum/maxnum with NaN argSanjay Patel2018-07-151-0/+8
| | | | | | | | | | | | | | | This fold is repeated/misplaced in instcombine, but I'm not sure if it's safe to remove that yet because some other folds appear to be asserting that the transform has occurred within instcombine itself. This isn't the best fix for PR37776, but it probably hides the bug with the given code example: https://bugs.llvm.org/show_bug.cgi?id=37776 We have another test to demonstrate the more general bug. llvm-svn: 337127
* [llvm-mca][BtVer2] teach how to identify false dependencies on partially writtenAndrea Di Biagio2018-07-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | registers. The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions perform partial register writes. On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware. When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf). Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf: " The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it." This patch is a first important step towards improving the analysis of partial register updates. It changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile). This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register. On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage). This is a very interesting article on the subject with a very informative answer from Peter Cordes: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to In future, the definition of RegisterFile can be extended with extra information that may be used to identify delays caused by merge opcodes triggered by a dirty read of a partial write. Differential Revision: https://reviews.llvm.org/D49196 llvm-svn: 337123
* [AVR] Document some public functionsDylan McKay2018-07-151-0/+2
| | | | llvm-svn: 337122
* [X86] Add some optsize patterns for 256-bit X86vzmovl.Craig Topper2018-07-151-0/+19
| | | | | | These patterns use VMOVSS/SD. Without optsize we use BLENDI instead. llvm-svn: 337119
* [NFC][InstCombine] foldICmpWithLowBitMaskedVal(): update comments.Roman Lebedev2018-07-141-2/+3
| | | | | | | | All predicates are handled. There does not seem to be any other possible folds here. There are some more folds possible with inverted mask though. llvm-svn: 337112
* [InstCombine] Fold x & (-1 >> y) s< x to x s> (-1 >> y)Roman Lebedev2018-07-141-0/+6
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337111
* [InstCombine] Fold x & (-1 >> y) s>= x to x s<= (-1 >> y)Roman Lebedev2018-07-141-0/+6
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337109
* [InstCombine] Fold x s<= x & (-1 >> y) to x s<= (-1 >> y)Roman Lebedev2018-07-141-0/+6
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337107
* [InstCombine] Fold x s> x & (-1 >> y) to x s> (-1 >> y)Roman Lebedev2018-07-141-0/+6
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337105
* [InstCombine] Fold x u<= x & C to x u<= CRoman Lebedev2018-07-141-0/+5
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/Fqp This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337102
* [InstCombine] Fold x u> x & C to x u> CRoman Lebedev2018-07-141-0/+5
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/JvS This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337100
* [InstCombine] Fold x & (-1 >> y) u< x to x u> (-1 >> y)Roman Lebedev2018-07-141-0/+5
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/ocb This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337098
* [InstCombine] Fold x & (-1 >> y) u>= x to x u<= (-1 >> y)Roman Lebedev2018-07-141-0/+5
| | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/azI This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337096
* [MachineOutliner] Check the last instruction from the sequence when updating ↵Francis Visoiu Mistrih2018-07-141-1/+1
| | | | | | | | | | | | | | | | | liveness The MachineOutliner was doing an std::for_each from the call (inserted before the outlined sequence) to the iterator at the end of the sequence. std::for_each needs the iterator past the end, so the last instruction was not taken into account when propagating the liveness information. This fixes the machine verifier issue in machine-outliner-disubprogram.ll. Differential Revision: https://reviews.llvm.org/D49295 llvm-svn: 337090
* [x86/SLH] Fix an issue where we wouldn't harden any loads if we foundChandler Carruth2018-07-141-3/+3
| | | | | | | | | no conditions. This is only valid to do if we're hardening calls and rets with LFENCE which results in an LFENCE guarding the entire entry block for us. llvm-svn: 337089
* [X86] Fix a subtle bug in the custom execution domain fixing for blends.Craig Topper2018-07-141-2/+2
| | | | | | | | The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted. Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file. llvm-svn: 337088
* Give llvm-lib rudimentary help output.Nico Weber2018-07-142-1/+11
| | | | | | https://reviews.llvm.org/D49318 llvm-svn: 337084
* [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing ↵Craig Topper2018-07-142-11/+16
| | | | | | | | | | for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083
* Revert "[ThinLTO] Ensure we always select the same function copy to import"Teresa Johnson2018-07-142-90/+71
| | | | | | | This reverts commits r337050 and r337059. Caused failure in reverse-iteration bot that needs more investigation. llvm-svn: 337081
* Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov2018-07-1413-214/+193
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
* [x86/SLH] Add an assert to catch if we ever end up trying to hardenChandler Carruth2018-07-141-0/+8
| | | | | | post-load a register that isn't valid for use with OR or SHRX. llvm-svn: 337078
* Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."Tim Shen2018-07-131-7/+20
| | | | llvm-svn: 337075
* [Hexagon] Avoid introducing calls into coalesced range of HVX vector pairsKrzysztof Parzyszek2018-07-132-0/+54
| | | | | | | | | | | | | If an HVX vector register is to be coalesced into a vector pair, make sure that the vector pair will not have a function call in its live range, unless it already had one. All HVX vector registers are volatile, so any vector register live across a function call will have to be spilled. If a vector needs to be spilled, and it's coalesced into a vector pair then the whole pair will need to be spilled (even if only a part of it is live), taking extra stack space. llvm-svn: 337073
* [LSR] If no Use is interesting, early return.Tim Shen2018-07-131-1/+3
| | | | | | | | | | | | | | | Summary: By looking at the callers of getUse(), we can see that even though IVUsers may offer uses, but they may not be interesting to LSR. It's possible that none of them is interesting. Reviewers: sanjoy Subscribers: jlebar, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D49049 llvm-svn: 337072
* [X86][SLH] Remove PDEP and PEXT from isDataInvariantLoadCraig Topper2018-07-131-4/+0
| | | | | | | | | | | | Ryzen has something like an 18 cycle latency on these based on Agner's data. AMD's own xls is blank. So it seems like there might be something tricky here. Agner's data for Intel CPUs indicates these are a single uop there. Probably safest to remove them. We never generate them without an intrinsic so this should be ok. Differential Revision: https://reviews.llvm.org/D49315 llvm-svn: 337067
* [X86][SLH] Add VEX and EVEX conversion instructions to isDataInvariantLoadCraig Topper2018-07-131-13/+19
| | | | | | | | | | -Drop the intrinsic versions of conversion instructions. These should be handled when we do vectors. They shouldn't show up in scalar code. -Add the float<->double conversions which were missing. -Add the AVX512 and AVX version of the conversion instructions including the unsigned integer conversions unique to AVX512 Differential Revision: https://reviews.llvm.org/D49313 llvm-svn: 337066
* [X86][SLH] Regroup the instructions in isDataInvariantLoad a little. NFCCraig Topper2018-07-131-36/+43
| | | | | | | | | | -Move BSF/BSR to the same group as TZCNT/LZCNT/POPCNT. -Split some of the bit manipulation instructions away from TZCNT/LZCNT/POPCNT. These are things like 'x & (x - 1)' which are composed of a few simple arithmetic operations. These aren't nearly as complicated/surprising as counting bits. -Move BEXTR/BZHI into their own group. They aren't like a simple arithmethic op or the bit manipulation instructions. They're more like a shift+and. Differential Revision: https://reviews.llvm.org/D49312 llvm-svn: 337065
* Fix comments which mixed up 'before' and 'after', NFCVedant Kumar2018-07-131-2/+2
| | | | llvm-svn: 337061
* [X86] Use the correct types in some recently added isel patterns.Craig Topper2018-07-131-2/+2
| | | | | | | | These were supposed to be integer types since we are selecting integer instructions. Found while preparing to remove these patterns for another patch. llvm-svn: 337057
* AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnumTom Stellard2018-07-132-0/+19
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056
* [X86][FastISel] Support uitofp with avx512.Craig Topper2018-07-131-8/+26
| | | | llvm-svn: 337055
* [LTO] Fix linking with an alias defined using another alias.Eli Friedman2018-07-131-1/+1
| | | | | | | | | | | | When we're linking an alias which will be defined later, we neeed to build a GlobalAlias, or else we'll crash later in IRLinker::linkGlobalValueBody. clang sometimes constructs aliases like this for C++ destructors. Differential Revision: https://reviews.llvm.org/D49316 llvm-svn: 337053
* [X86] Correct comment of TEST elimination in BSF/TZCNTFangrui Song2018-07-131-2/+2
| | | | llvm-svn: 337052
* [ThinLTO] Ensure we always select the same function copy to importTeresa Johnson2018-07-132-71/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to always import the same copy of a linkonce function, even when encountering it with different thresholds (a higher one then a lower one), keep track of the summary we decided to import. This ensures that the backend only gets a single definition to import for each GUID, so that it doesn't need to choose one. Move the largest threshold the GUID was considered for import into the current module out of the ImportMap (which is part of a larger map maintained across the whole index), and into a new map just maintained for the current module we are computing imports for. This saves some memory since we no longer have the thresholds maintained across the whole index (and throughout the in-process backends when doing a normal non-distributed ThinLTO build), at the cost of some additional information being maintained for each invocation of ComputeImportForModule (the selected summary pointer for each import). There is an additional map lookup for each callee being considered for importing, however, this was able to subsume a map lookup in the Worklist iteration that invokes computeImportForFunction. We also are able to avoid calling selectCallee if we already failed to import at the same or higher threshold. I compared the run time and peak memory for the SPEC2006 471.omnetpp benchmark (running in-process ThinLTO backends), as well as for a large internal benchmark with a distributed ThinLTO build (so just looking at the thin link time/memory). Across a number of runs with and without this change there was no significant change in the time and memory. (I tried a few other variations of the change but they also didn't improve time or peak memory). Reviewers: davidxl Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D48670 llvm-svn: 337050
* AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.expTom Stellard2018-07-132-0/+71
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046
* [X86][FastISel] Add EVEX support to sitofp handling.Craig Topper2018-07-131-7/+16
| | | | llvm-svn: 337045
* [X86] Try fixing r336768Fangrui Song2018-07-131-1/+1
| | | | llvm-svn: 337043
* [LowerTypeTests] Limit when icall jumptable entries are emittedVlad Tsyrklevich2018-07-131-6/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently LowerTypeTests emits jumptable entries for all live external and address-taken functions; however, we could limit the number of functions that we emit entries for significantly. For Cross-DSO CFI, we continue to emit jumptable entries for all exported definitions. In the non-Cross-DSO CFI case, we only need to emit jumptable entries for live functions that are address-taken in live functions. This ignores exported functions and functions that are only address taken in dead functions. This change uses ThinLTO summary data (now emitted for all modules during ThinLTO builds) to determine address-taken and liveness info. The logic for emitting jumptable entries is more conservative in the regular LTO case because we don't have summary data in the case of monolithic LTO builds; however, once summaries are emitted for all LTO builds we can unify the Thin/monolithic LTO logic to only use summaries to determine the liveness of address taking functions. This change is a partial fix for PR37474. It reduces the build size for nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in lots of unused code and unused functions that are address taken in dead functions no longer being being considered live due to emitted jumptable references. The reduction for chromium is ~0.1-0.2%. Reviewers: pcc, eugenis, javed.absar Reviewed By: pcc Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D47652 llvm-svn: 337038
* [dwarfdump] Add pretty printer for accelerator table based on Atom.Jonas Devlieghere2018-07-132-3/+20
| | | | | | | | For instance, When dumping .apple_types, the second atom represents the DW_TAG. In addition to printing the raw value, we now also pretty print the value if the ATOM tells us how. llvm-svn: 337026
* AMDGPU: Properly handle shader inputs with split argumentsMatt Arsenault2018-07-131-12/+27
| | | | | | | | | | This needs to refer to arguments by their original argument index, not the argument split index which depends on what the type splitting decides to do. Also avoid increment PSInputNum for each split piece. llvm-svn: 337022
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-1313-193/+214
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* Revert "CallGraphSCCPass: iterate over all functions."Evgeniy Stepanov2018-07-131-71/+39
| | | | | | | | | | | | This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements due to the use of a stale iterator in CGPassManager::runOnModule. The iterator may be invalidated if a pass removes a function, ex.: llvm::LegacyInlinerBase::inlineCalls inlineCallsImpl llvm::CallGraph::removeFunctionFromModule llvm-svn: 337018
* [dwarfdump] Pretty print DW_AT_APPLE_runtime_classJonas Devlieghere2018-07-131-0/+2
| | | | | | | | | | | | Instead of printing DW_AT_APPLE_runtime_class (0x10) we now print DW_AT_APPLE_runtime_class (DW_LANG_ObjC) llvm-svn: 337011
* [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions (cont'd)Sjoerd Meijer2018-07-133-22/+35
| | | | | | | | | Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha for pointing this out. Differential Revision: https://reviews.llvm.org/D49284 llvm-svn: 337009
* [PowerPC] Materialize more constants with CR-field set in late peepholeNemanja Ivanovic2018-07-131-5/+28
| | | | | | | | | | | | | | | | | Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008
OpenPOWER on IntegriCloud