summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombine] Don't combine sext with extload if sextload is not supported ↵Guozhi Wei2017-10-271-0/+26
| | | | | | | | | | | | | | | and extload has multi users In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then if sext is the only user of extload, there is no big difference, no harm no benefit. if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case. This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload. Differential Revision: https://reviews.llvm.org/D39108 llvm-svn: 316802
* Handle undefined weak hidden symbols on all architectures.Rafael Espindola2017-10-272-2/+15
| | | | | | | | | | | | | | | | We were handling the non-hidden case in lib/Target/TargetMachine.cpp, but the hidden case was handled in architecture dependent code and only X86_64 and AArch64 were covered. While it is true that some code sequences in some ABIs might be able to produce the correct value at runtime, that doesn't seem to be the common case. I left the AArch64 code in place since it also forces a got access for non-pic code. It is not clear if that is needed, but it is probably better to change that in another commit. llvm-svn: 316799
* [X86] Add fast-isel tests for integer shifts. We definitely had no coverage ↵Craig Topper2017-10-271-0/+383
| | | | | | of i16 and i32/i64 are only tested by larger tests. llvm-svn: 316796
* Improve clamp recognition in ValueTracking.Artur Gainullin2017-10-271-0/+9
| | | | | | | | | | | | | | | | Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. The first patch was reverted because it caused miscompile in NVPTX target. Added corresponding test cases. Reviewers: spatel, majnemer, efriedma, reames Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D39240 llvm-svn: 316795
* [X86] Add avx512vl command line to fast-isel-nontemporal.llCraig Topper2017-10-271-0/+27
| | | | llvm-svn: 316789
* [Hexagon] Fix an incorrect assertion in HexagonConstExtenders.cppKrzysztof Parzyszek2017-10-271-0/+45
| | | | | | | Making sure that an instruction has fewer operands than required, then attempting to access one out of range is going to fail. llvm-svn: 316785
* [X86][SSE] Add tests for inserting all-bits (-1) into a vectorSimon Pilgrim2017-10-271-0/+504
| | | | | | We should be able to do this by re-materializing an all-bits vector and then blending with it llvm-svn: 316779
* [CodeGen][ExpandMemCmp][NFC] Simplify load sequence generation.Clement Courbet2017-10-271-53/+82
| | | | llvm-svn: 316763
* DAG: Fold fma (fneg x), K, y -> fma x, -K, yMatt Arsenault2017-10-273-4/+50
| | | | llvm-svn: 316753
* [CodeGen][ExpandMemcmp][NFC] Make tests more complete.Clement Courbet2017-10-271-0/+14
| | | | llvm-svn: 316749
* Add subclass data to the FoldingSetNode for MemIntrinsicSDNodes.Sean Fertile2017-10-271-0/+37
| | | | | | | | | | | Not having the subclass data on an MemIntrinsicSDNodes means it was possible to try to fold 2 nodes with the same operands but differing MMO flags. This would trip an assertion when trying to refine the alignment between the 2 MachineMemOperands. Differential Revision: https://reviews.llvm.org/D38898 llvm-svn: 316737
* [ARM] Honor -mfloat-abi for libcall calling conventionEli Friedman2017-10-261-22/+35
| | | | | | | | | | | | | | | As far as I can tell, this matches gcc: -mfloat-abi determines the calling convention for all functions except those explicitly defined as soft-float in the ARM RTABI. This change only affects cases where the user specifies -mfloat-abi to override the default calling convention derived from the target triple. Fixes https://bugs.llvm.org//show_bug.cgi?id=34530. Differential Revision: https://reviews.llvm.org/D38299 llvm-svn: 316708
* [X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support ↵Craig Topper2017-10-261-3/+1
| | | | | | | | | | | | | | 64-bit extensions. If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation. This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case. This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG. Differential Revision: https://reviews.llvm.org/D38275 llvm-svn: 316702
* [x86] use an insert op to put one variable element into a constant of vectorsSanjay Patel2017-10-262-552/+144
| | | | | | | | Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it. Differential Revision: https://reviews.llvm.org/D38756 llvm-svn: 316685
* AMDGPU: Commit missing fence-barrier testKonstantin Zhuravlyov2017-10-261-0/+197
| | | | | | This should have been committed with memory model implementation llvm-svn: 316680
* Represent runtime preemption in the IR.Sean Fertile2017-10-264-0/+1013
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we do not represent runtime preemption in the IR, which has several drawbacks: 1) The semantics of GlobalValues differ depending on the object file format you are targeting (as well as the relocation-model and -fPIE value). 2) We have no way of disabling inlining of run time interposable functions, since in the IR we only know if a function is link-time interposable. Because of this llvm cannot support elf-interposition semantics. 3) In LTO builds of executables we will have extra knowledge that a symbol resolved to a local definition and can't be preemptable, but have no way to propagate that knowledge through the compiler. This patch adds preemptability specifiers to the IR with the following meaning: dso_local --> means the compiler may assume the symbol will resolve to a definition within the current linkage unit and the symbol may be accessed directly even if the definition is not within this compilation unit. dso_preemptable --> means that the compiler must assume the GlobalValue may be replaced with a definition from outside the current linkage unit at runtime. To ease transitioning dso_preemptable is treated as a 'default' in that low-level codegen will still do the same checks it did previously to see if a symbol should be accessed indirectly. Eventually when IR producers emit the specifiers on all Globalvalues we can change dso_preemptable to mean 'always access indirectly', and remove the current logic. Differential Revision: https://reviews.llvm.org/D20217 llvm-svn: 316668
* AMDGPU: Handle s_buffer_load_dword hazard on SIMarek Olsak2017-10-261-0/+17
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39171 llvm-svn: 316666
* [mips] Fix PR35071Simon Dardis2017-10-261-0/+73
| | | | | | | | | | | | | | | | | | | | PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past debug instructions when removing branches for the control flow optimizer, which lead to duplicated conditional branches. If the target of the branch was a removable block, only the conditional branch in the terminating position would have it's MBB operands updated, leaving the first branch with a dangling MBB operand. The MIPS long branch pass would then trigger an assertion when attempting to examine the instruction with dangling MBB operand. This resolves PR35071. Thanks to Alex Richardson for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39288 llvm-svn: 316654
* [PowerPC] Use record-form instruction for Less-or-Equal -1 and ↵Hiroshi Inoue2017-10-261-0/+36
| | | | | | | | | | | Greater-or-Equal 1 Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0. This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities. Differential Revision: https://reviews.llvm.org/D38941 llvm-svn: 316647
* Fix CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0Alexander Richardson2017-10-251-0/+4
| | | | | | | | | | | | | | | | | Summary: On FreeBSD11.0 the FileCheck NOT string "1.0" will be matched by `.amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802"` at the end of the file. Add a CHECK for that directive to avoid failing the test. Reviewers: rampitec, kzhuravl Reviewed By: rampitec, kzhuravl Subscribers: emaste, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D39306 llvm-svn: 316616
* [Hexagon] Account for negative offset when limiting max deviationKrzysztof Parzyszek2017-10-251-0/+43
| | | | | | | | | | | | | In getOffsetRange, Max can be set to 0 to force the extender replacement to be at or below the original value. This would cause the new offset to be non-negative, which is preferred for memory instructions (to reduce the likelihood of it getting constant-extended due to predication). The problem happens when the range is shifted by an offset (present in the instruction being examined) and the offset is negative. The entire range for the allowable deviation will then be strictly negative. This creates a problem, since 0 is assumed to be a valid deviation. llvm-svn: 316601
* AMDGPU: Cleanup memory legalizer load/store testsKonstantin Zhuravlyov2017-10-254-378/+375
| | | | llvm-svn: 316590
* AMDGPU/NFC: Rename memory legalizer tests:Konstantin Zhuravlyov2017-10-252-0/+0
| | | | | | | - memory-legalizer-atomic-load.ll -> memory-legalizer-load.ll - memory-legalizer-atomic-store.ll -> memory-legalizer-store.ll llvm-svn: 316586
* [inlineasm] Fix crash when number of matched input constraint operands ↵Daniil Fukalov2017-10-251-0/+12
| | | | | | | | | | | | | | overflows signed char In a case when number of output constraint operands that has matched input operands doesn't fit to signed char, TargetLowering::ParseConstraints() can try to access ConstraintOperands (that is std::vector) with negative index. Reviewers: rampitec, arsenm Differential Review: https://reviews.llvm.org/D39125 llvm-svn: 316574
* [ARM GlobalISel] Remove redundant testcases. NFCDiana Picus2017-10-251-53/+0
| | | | | | | | Remove the G_FADD testcases from arm-legalizer.mir, they are covered by arm-legalizer-fp.mir (I probably forgot to delete them when I created that test). llvm-svn: 316573
* [ARM GlobalISel] Update test after r316479. NFCDiana Picus2017-10-251-58/+11
| | | | | | | No need to check register classes in the register block anymore, since we can now much more conveniently check them at their def. llvm-svn: 316572
* [ARM GlobalISel] Fix call opcodesDiana Picus2017-10-257-159/+163
| | | | | | | | We were generating BLX for all the calls, which was incorrect in most cases. Update ARMCallLowering to generate BL for direct calls, and BLX, BX_CALL or BMOVPCRX_CALL for indirect calls. llvm-svn: 316570
* [ARM GlobalISel] Split test into 3. NFCDiana Picus2017-10-253-499/+502
| | | | | | | | | | | Separate the test cases that deal with calls from the rest of the IR Translator tests. We split into 2 different files, one for testing parameter and result lowering, and one for testing the various different kinds of calls that can occur (BL, BLX, BX_CALL etc). llvm-svn: 316569
* Re-land "[CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads ↵Clement Courbet2017-10-251-0/+7
| | | | | | | | | | | | | (1)" Compute the actual decomposition only after deciding whether to expand of not. Else, it's easy to make the compiler OOM with: `memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if someone mistakenly passes a negative value. Add a test. This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44. llvm-svn: 316567
* [ARM] Swap cmp operands for automatic shiftsSam Parker2017-10-252-52/+154
| | | | | | | | | | Swap the compare operands if the lhs is a shift and the rhs isn't, as in arm and T2 the shift can be performed by the compare for its second operand. Differential Revision: https://reviews.llvm.org/D39004 llvm-svn: 316562
* [AArch64] Add support for dllimport of values and functionsMartin Storsjo2017-10-251-0/+54
| | | | | | | | | | | | | | Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555
* DAG: Fix creating select with wrong condition typeMatt Arsenault2017-10-252-29/+62
| | | | | | | | | | | | | | | | | | | This code added in r297930 assumed that it could create a select with a condition type that is just an integer bitcast of the selected type. For AMDGPU any vselect is going to be scalarized (although the vector types are legal), and all select conditions must be i1 (the same as getSetCCResultType). This logic doesn't really make sense to me, but there's never really been a consistent policy in what the select condition mask type is supposed to be. Try to extend the logic for skipping the transform for condition types that aren't setccs. It doesn't seem quite right to me though, but checking conditions that seem more sensible (like whether the vselect is going to be expanded) doesn't work since this seems to depend on that also. llvm-svn: 316554
* [NVPTX] allow address space inference for volatile loads/stores.Artem Belevich2017-10-241-0/+97
| | | | | | | | | | If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495
* [X86][Broadwell] Added the instruction scheduling information for the ↵Gadi Haber2017-10-2419-1372/+1372
| | | | | | | | | | | | | | | | | | | | Broadwell CPU. Adding the scheduling information for the Browadwell (BDW) CPU target. This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target. We used the scheduling information retrieved from the Broadwell architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction. The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175. Performance fluctuations may be expected due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D39054 Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01 llvm-svn: 316492
* MIR: Print the register class or bank in vreg defsJustin Bogner2017-10-24245-6937/+5380
| | | | | | | | | | | | | | | | | | | | | | | | | | This updates the MIRPrinter to include the regclass when printing virtual register defs, which is already valid syntax for the parser. That is, given 64 bit %0 and %1 in a "gpr" regbank, %1(s64) = COPY %0(s64) would now be written as %1:gpr(s64) = COPY %0(s64) While this change alone introduces a bit of redundancy with the registers block, it allows us to update the tests to be more concise and understandable and brings us closer to being able to remove the registers block completely. Note: We generally only print the class in defs, but there is one exception. If there are uses without any defs whatsoever, we'll print the class on all uses. I'm not completely convinced this comes up in meaningful machine IR, but for now the MIRParser and MachineVerifier both accept that kind of stuff, so we don't want to have a situation where we can print something we can't parse. llvm-svn: 316479
* [PowerPC] Try to simplify a Swap if it feeds a SplatStefan Pintilie2017-10-242-2/+136
| | | | | | | | | | | | If we have the situation where a Swap feeds a Splat we can sometimes change the index on the Splat and then remove the Swap instruction. Fixed the test case that was failing and recommit after pulling the original commit. Original revision is here: https://reviews.llvm.org/D39009 llvm-svn: 316478
* [X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNCSimon Pilgrim2017-10-242-15/+5
| | | | llvm-svn: 316462
* [SelectionDAG] Add VSELECT support to ComputeNumSignBitsSimon Pilgrim2017-10-241-2/+2
| | | | llvm-svn: 316457
* [X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of ↵Simon Pilgrim2017-10-2412-287/+283
| | | | | | | | just PACKSSWB. By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits. llvm-svn: 316448
* [x86] add more vector ISA variants for memcmp expansion; NFCSanjay Patel2017-10-241-4/+62
| | | | | | ...because every swiss cheese has different holes. llvm-svn: 316446
* Update f16c instruction scheduling on btver2.Andrew V. Tischenko2017-10-241-33/+33
| | | | | | Differential Revision: https://reviews.llvm.org/D39051 llvm-svn: 316435
* X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idiomsZvi Rackover2017-10-243-177/+259
| | | | | | | | | | | | | | | | | | | | | | | | Summary: r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However, X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable. This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms. An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire. Fixes pr34863 Reviewers: DavidKreitzer, guyblank, aymanmus Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38738 llvm-svn: 316431
* AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)Marek Olsak2017-10-242-1/+242
| | | | | | | | | | | | | | | | Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
* AMDGPU: Add llvm.amdgcn.wqm.vote intrinsicMarek Olsak2017-10-241-0/+52
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D38543 llvm-svn: 316426
* X86: Fix X86CallFrameOptimization to search for the COPY StackPointerZvi Rackover2017-10-243-41/+71
| | | | | | | | | | | | | | | | | | | SelectionDAG inserts a copy of ESP into a virtual register. X86CallFrameOptimization assumed that the COPY, if present, is always right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a wrong assumption as the COPY can be located anywhere between the call-frame setup instruction and its first use. If the COPY happened to be located in a different location than what X86CallFrameOptimization assumed, visiting it while processing the call chain would lead to a conservative bail-out. The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note of it so it can be ignored while processing the call chain. Fixes pr34903 Differential Revision: https://reviews.llvm.org/D38730 llvm-svn: 316416
* [MC] Adding code padding for performance stability - infrastructure. NFC.Omer Paparo Bivas2017-10-241-1/+5
| | | | | | | | | | | | | | | | | Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved. The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known. This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding. Reviewers: aaboud zvi craig.topper gadi.haber Differential revision: https://reviews.llvm.org/D34393 Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e llvm-svn: 316413
* X86: Register the X86CallFrameOptimization passZvi Rackover2017-10-241-0/+125
| | | | | | | | | | | | | | | | | Summary: The motivation of this change is to enable .mir testing for this pass. Added one test case to cover the functionality, this same case will be improved by a future patch. Reviewers: igorb, guyblank, DavidKreitzer Reviewed By: guyblank, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38729 llvm-svn: 316412
* [MachineOutliner] Add optimisation remarks for successful outliningJessica Paquette2017-10-231-9/+59
| | | | | | | | | | | | | | | | | | | This commit adds optimisation remarks for outlining which fire when a function is successfully outlined. To do this, OutlinedFunctions must now contain references to their Candidates. Since the Candidates must still be sorted and worked on separately, this is done by working on everything in terms of shared_ptrs to Candidates. This is good; it means that we can easily move everything to outlining in terms of the OutlinedFunctions rather than the individual Candidates. This is far more intuitive than what's currently there! (Remarks are output when a function is created for some group of Candidates. In a later commit, all of the outlining logic should be rewritten so that we loop over OutlinedFunctions rather than over Candidates.) llvm-svn: 316396
* [GISel][ARM]: Fix illegal Generic copies in testsAditya Nandakumar2017-10-234-227/+382
| | | | | | | This is in preparation for a verifier check that makes sure copies are of the same size (when generic virtual registers are involved). llvm-svn: 316388
* [GISel][AArch64]: Fix illegal Generic copies in testsAditya Nandakumar2017-10-2316-77/+168
| | | | | | | This is in preparation for a verifier check that makes sure copies are of the same size (when generic virtual registers are involved). llvm-svn: 316387
OpenPOWER on IntegriCloud