|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D61045
llvm-svn: 359117 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D61041
llvm-svn: 359113 | 
| | 
| 
| 
| 
| 
| 
| 
| | Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D60767
llvm-svn: 359096 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | If the types don't match, we can't just remove the shuffle.
There may be some other opportunity for optimization here,
but this should prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=41414
llvm-svn: 359095 | 
| | 
| 
| 
| 
| 
| | Prep work toward fixing PR40758
llvm-svn: 359088 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | copy the subtarget from the MachineFunction instead of recalculating it.
Summary:
The MachineFunction should have been created with the correct subtarget. As
long as there is no way to change it, MipsTargetMachine can just capture it
directly from the MachineFunction without calling getSubtargetImpl again.
While there, const correct the Subtarget pointer to avoid a const_cast.
I believe the Mips16Subtarget and NoMips16Subtarget members are never used, but
I'll leave there removal for a separate patch.
Reviewers: echristo, atanasyan
Reviewed By: atanasyan
Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60936
llvm-svn: 359071 | 
| | 
| 
| 
| 
| 
| 
| | Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.
llvm-svn: 359046 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its
input and output are assigned the correct register bank.
Add a regbankselect test to verify that we get what we expect here.
llvm-svn: 359044 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Always convert switches to br_tables unless there is only one case,
which is equivalent to a simple branch. This reduces code size for wasm,
and we defer possible jump table optimizations to the VM.
Addresses PR41502.
Reviewers: kripken, sunfish
Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60966
llvm-svn: 359038 | 
| | 
| 
| 
| 
| 
| 
| | Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.
llvm-svn: 359033 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.
Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.
I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.
llvm-svn: 359030 | 
| | 
| 
| 
| 
| 
| 
| 
| | Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test.
Update arm64-vfloatintrinsics.ll now that we can select it.
llvm-svn: 359022 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Same patch as G_FCEIL etc.
Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.
llvm-svn: 359021 | 
| | 
| 
| 
| 
| 
| 
| 
| | The second argument is flags, not subreg.
Differential Revision: https://reviews.llvm.org/D61031
llvm-svn: 359017 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Same as G_FCEIL, G_FABS, etc. Just move it into that rule.
Add a legalizer test for G_FMA, which we didn't have before and update
arm64-vfloatintrinsics.ll.
llvm-svn: 359015 | 
| | 
| 
| 
| 
| 
| 
| 
| | Noticed an unnecessary fallback in arm64-vmul caused by this.
Also add a regbankselect test for G_FMA.
llvm-svn: 359013 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1
...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'.
This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca.
We already do this transform for the SETULT predicate, so this makes the code more
symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect
anything but pre-SSE4.1 subtargets.
  $ cat before.s
	movdqa	-16(%rip), %xmm2    ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
	pxor	%xmm0, %xmm2
	pcmpgtw	-32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
	pand	%xmm2, %xmm0
	pandn	%xmm1, %xmm2
	por	%xmm2, %xmm0
  $ cat after.s
	movdqa	-16(%rip), %xmm2    ## xmm2 = [256,256,256,256,256,256,256,256]
	psubusw	%xmm0, %xmm2
	pxor	%xmm3, %xmm3
	pcmpeqw	%xmm2, %xmm3
	pand	%xmm3, %xmm0
	pandn	%xmm1, %xmm3
	por	%xmm3, %xmm0
  $ llvm-mca before.s -mcpu=haswell
  Iterations:        100
  Instructions:      600
  Total Cycles:      909
  Total uOps:        700
  Dispatch Width:    4
  uOps Per Cycle:    0.77
  IPC:               0.66
  Block RThroughput: 1.8
  $ llvm-mca after.s -mcpu=haswell
  Iterations:        100
  Instructions:      700
  Total Cycles:      409
  Total uOps:        700
  Dispatch Width:    4
  uOps Per Cycle:    1.71
  IPC:               1.71
  Block RThroughput: 1.8
Differential Revision: https://reviews.llvm.org/D60838
llvm-svn: 358999 | 
| | 
| 
| 
| 
| 
| 
| 
| | 64bit mode must use 64bit registers, otherwise assumptions about the top
half of the registers are made. Problem found by Takeshi Nakayama in
NetBSD.
llvm-svn: 358998 | 
| | 
| 
| 
| 
| 
| | While touching the code, simplify if feasible.
llvm-svn: 358996 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This patch adds support for parsing and assembling the %tls_ie_pcrel_hi
and %tls_gd_pcrel_hi modifiers.
Differential Revision: https://reviews.llvm.org/D55342
llvm-svn: 358994 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Essentially complete a proper rebase of the V3 metadata change over
https://reviews.llvm.org/D49096.
Minimize the diff between the V2 and V3 variants of the relevant lit
tests, and clean up some trailing whitespace.
llvm-svn: 358992 | 
| | 
| 
| 
| 
| 
| | Create collectConcatOps helper that returns all the subvector ops for CONCAT_VECTORS or a INSERT_SUBVECTOR series.
llvm-svn: 358989 | 
| | 
| 
| 
| 
| 
| 
| 
| | The manual says that Thumb2 add/sub instructions are only allowed to modify sp
if the first source is also sp. This is slightly different from the usual rGPR
restriction since it's context-sensitive, so implement it in C++.
llvm-svn: 358987 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.
This then used to trigger an assertion when processing a dependent
phi instruction.
Change-Id: Id4949719f8298062fe476a25718acccc109113b6
Reviewers: llvm-commits
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60999
llvm-svn: 358983 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636
llvm-svn: 358982 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The check for creating CBZ in constant island pass recently obtained the
ability to search backwards to find a Cmp instruction. The code in IfCvt should
mirror this to allow more conversions to the smaller form. The common code has
been pulled out into a separate function to be shared between the two places.
Differential Revision: https://reviews.llvm.org/D60090
llvm-svn: 358977 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Ifcvt can replicate instructions as it converts them to be predicated. This
stops that from happening on thumb2 targets at minsize where an extra IT
instruction is likely needed.
Differential Revision: https://reviews.llvm.org/D60089
llvm-svn: 358974 | 
| | 
| 
| 
| | llvm-svn: 358969 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486
llvm-svn: 358963 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Add missing D and Q lane VLDSTLane lowering
for fp16 elements.
Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60874
llvm-svn: 358962 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This change partially reverts https://reviews.llvm.org/D54647 in favor
of bailing out during computeAddress instead.
This catches the condition earlier and handles more cases.
Differential Revision: https://reviews.llvm.org/D60986
llvm-svn: 358948 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.
There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().
llvm-svn: 358930 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
  packed literals. Even an instruction is `packed`, it may have operand
  requiring non-packed literal, such as `v_dot2_f32_f16`.
Reviewers: rampitec, arsenm, kzhuravl
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60978
llvm-svn: 358922 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address
llvm-svn: 358909 | 
| | 
| 
| 
| | llvm-svn: 358894 | 
| | 
| 
| 
| 
| 
| 
| | Effectively reverts r356956. The check for isFullCopy was excessive,
but there still needs to be a check that this is a copy.
llvm-svn: 358890 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D60624
llvm-svn: 358888 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
Differential Revision: https://reviews.llvm.org/D60462
llvm-svn: 358887 | 
| | 
| 
| 
| 
| 
| | not enabled. Same for 256 bit and AVX.
llvm-svn: 358872 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.
Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.
I have not attempted to do much with vectors yet. That will need changing once
MVE is added.
Differential Revision: https://reviews.llvm.org/D60677
llvm-svn: 358845 | 
| | 
| 
| 
| 
| 
| | instructions.
llvm-svn: 358844 | 
| | 
| 
| 
| 
| 
| | fpclass only has a single use.
llvm-svn: 358841 | 
| | 
| 
| 
| 
| 
| 
| | The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.
llvm-svn: 358829 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory.
The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too.
This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix.
Reviewers: rnk
Reviewed By: rnk
Subscribers: dbabokin, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60801
llvm-svn: 358817 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI
with stack probing a dynamic alloca will result in a WIN_ALLOCA_32
with a 32-bit size. The current implementation tries to copy it into
RAX, resulting in a physreg copy error. Fix this by copying to EAX
instead. Also fix incorrect opcodes or registers used in subs.
llvm-svn: 358807 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | the original AND can represented by MOVZX.
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.
llvm-svn: 358805 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | (C1 >> C2), C2) if the AND could match a movzx.
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.
llvm-svn: 358804 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.
llvm-svn: 358800 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Exactly the same as G_FCEIL, G_FABS, etc.
Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.
Differential Revision: https://reviews.llvm.org/D60895
llvm-svn: 358799 |