| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
|
|
|
|
|
|
|
| |
vaddr comes before srsrc in every other MUBUF instruction,
and is the order it is printed.
llvm-svn: 252139
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The CLR's personality routine passes the pointer to the establisher frame
in RCX, not RDX.
Reviewers: pgavlin, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14343
llvm-svn: 252135
|
|
|
|
|
|
|
|
| |
This brings back the behavior from before r252090 for out of range symbols.
Should bring some arm bots back.
llvm-svn: 252119
|
|
|
|
| |
llvm-svn: 252116
|
|
|
|
|
|
|
|
| |
The generic infrastructure already did a lot of work to decide if the
fixup value is know or not. It doesn't make sense to reimplement a very
basic case: same fragment.
llvm-svn: 252090
|
|
|
|
|
|
|
|
|
|
| |
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.
Fixes PR24193.
llvm-svn: 252088
|
|
|
|
| |
llvm-svn: 252078
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
llvm-svn: 252074
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is intended to make a later change simpler.
Note: adding this bounds checking required fixing `X86FastISel`. As
far I can tell I've preserved original behavior but a careful review
will be appreciated.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14304
llvm-svn: 252073
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scalar FMA intrinsics.
Patch by Slava Klochkov
The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.
Reviewers: Quentin Colombet, Elena Demikhovsky
Differential Revision: http://reviews.llvm.org/D13710
llvm-svn: 252060
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252057
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D14109
llvm-svn: 252043
|
|
|
|
|
|
|
| |
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.
llvm-svn: 252042
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A profile of an LTO link of Chrome revealed that we were spending some
~30-50% of execution time in the function Constant::getRelocationInfo(),
which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn
from TargetMachine::getNameWithPrefix().
It turns out that we only need the result of getKindForGlobal() when
targeting Mach-O, so this change moves the relevant part of the logic to
TargetLoweringObjectFileMachO.
NFCI.
Differential Revision: http://reviews.llvm.org/D14168
llvm-svn: 252014
|
|
|
|
|
|
|
| |
The printed name and the parsed assembler names weren't the same.
I'm not sure which name SC prints these as, but I think it's this one.
llvm-svn: 252010
|
|
|
|
|
|
|
|
|
| |
If the requested SGPR was not actually aligned, it was
accepted and rounded down instead of rejected.
Also fix an assert if the range is an invalid size.
llvm-svn: 252009
|
|
|
|
|
|
| |
If trying to use one past the end, this would assert.
llvm-svn: 252008
|
|
|
|
| |
llvm-svn: 252003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add support for wasm's select operator, and lower LLVM's select DAG node
to it.
Reviewers: sunfish
Subscribers: dschuff, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D14295
llvm-svn: 252002
|
|
|
|
| |
llvm-svn: 252000
|
|
|
|
|
|
|
|
|
| |
There are actually 104 so 2 were missing.
More assembler tests with high register number tuples
will be included in later patches.
llvm-svn: 251999
|
|
|
|
|
|
| |
Add more comments etc.
llvm-svn: 251996
|
|
|
|
| |
llvm-svn: 251995
|
|
|
|
| |
llvm-svn: 251994
|
|
|
|
|
|
|
|
|
|
| |
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )
This patch adds tablegen pattern matching for this instruction.
Differential Revision: http://reviews.llvm.org/D8841
llvm-svn: 251975
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.
Darwin does not support this well, so don't use pushes whenever it
would be required.
Differential Revision: http://reviews.llvm.org/D13767
llvm-svn: 251904
|
|
|
|
| |
llvm-svn: 251903
|
|
|
|
| |
llvm-svn: 251888
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.
The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.
Differential Revision: http://reviews.llvm.org/D14245
llvm-svn: 251883
|
|
|
|
| |
llvm-svn: 251867
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was causing a variety of test failures when v2i64
is added as a legal type.
SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.
llvm-svn: 251860
|
|
|
|
| |
llvm-svn: 251859
|
|
|
|
|
|
|
|
| |
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.
llvm-svn: 251858
|
|
|
|
|
|
|
| |
Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.
llvm-svn: 251855
|
|
|
|
|
|
|
|
| |
Replace some hacky code with the proper way to get at this data.
No functional change.
llvm-svn: 251848
|
|
|
|
| |
llvm-svn: 251814
|
|
|
|
|
|
|
|
|
|
| |
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.
llvm-svn: 251798
|
|
|
|
|
|
|
|
| |
VBROADCASTF32x2 instructions.
Differential Revision: http://reviews.llvm.org/D14216
llvm-svn: 251781
|
|
|
|
|
|
| |
intrinsics. Nothing upstream prevented illegal values from getting here.
llvm-svn: 251780
|
|
|
|
| |
llvm-svn: 251778
|
|
|
|
| |
llvm-svn: 251777
|
|
|
|
|
|
| |
unreachable in their default case.
llvm-svn: 251776
|
|
|
|
| |
llvm-svn: 251775
|
|
|
|
| |
llvm-svn: 251774
|
|
|
|
| |
llvm-svn: 251772
|
|
|
|
| |
llvm-svn: 251769
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.
Differential Revision: http://reviews.llvm.org/D14108
llvm-svn: 251764
|
|
|
|
|
|
| |
already known to be a vector. This should result in slightly less code. NFC
llvm-svn: 251751
|
|
|
|
|
|
| |
the type is simple. NFC
llvm-svn: 251745
|