| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
are the same
This is limited to a set of patterns based on the example in PR34111:
https://bugs.llvm.org/show_bug.cgi?id=34111
...but as I was investigating this, I see that horizontal patterns can go wrong in many,
many other ways that would not be handled by this patch. Each data type may even go
different in the DAG after starting with the same basic IR pattern, so even proper IR
canonicalization won't fix it all.
Differential Revision: https://reviews.llvm.org/D37357
llvm-svn: 312379
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37392
llvm-svn: 312364
|
| |
|
|
| |
llvm-svn: 312349
|
| |
|
|
|
|
|
| |
Doesn't include the tied operand necessary for the loads,
but is enough for the assembler to work.
llvm-svn: 312347
|
| |
|
|
|
|
|
|
| |
Summary: See https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md
Differential Revision: https://reviews.llvm.org/D37385
llvm-svn: 312342
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D36193
llvm-svn: 312337
|
| |
|
|
|
|
|
| |
In the ROPI relocation model, read-only variables are accessed relative
to the PC. We use the (MOV|LDRLIT)_ga_pcrel pseudoinstructions for this.
llvm-svn: 312323
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds 2-operand assembly aliases for these instructions:
add r0, r1 => add r0, r0, r1
sub r0, r1 => sub r0, r0, r1
Previously this syntax was only accepted for Thumb2 targets, where the
wide versions of the instructions were used.
This patch allows the 2-operand syntax to be used for Thumb1 targets,
and selects the narrow encoding when it is used for Thumb2 targets.
Differential revision: https://reviews.llvm.org/D37377
llvm-svn: 312321
|
| |
|
|
|
|
|
| |
This exposes the isReadOnly(GlobalValue *) in the ARMTargetLowering so
we can make use of it in GlobalISel as well.
llvm-svn: 312320
|
| |
|
|
|
|
|
|
| |
Previously we generated a register only pattern for each of the 3 instruction forms, but they are all identical as far as isel is concerned. So drop the others and just keep the 213 version.
This removes 2968 bytes from the isel table.
llvm-svn: 312313
|
| |
|
|
| |
llvm-svn: 312312
|
| |
|
|
| |
llvm-svn: 312311
|
| |
|
|
|
|
| |
opportunities.
llvm-svn: 312310
|
| |
|
|
| |
llvm-svn: 312309
|
| |
|
|
|
|
|
|
| |
for FMA instrinsics.
The instructions are already defined as writing a VR128 register.
llvm-svn: 312308
|
| |
|
|
| |
llvm-svn: 312297
|
| |
|
|
|
|
| |
warnings; other minor fixes. Also affected in files (NFC).
llvm-svn: 312289
|
| |
|
|
|
|
|
|
|
| |
Not all of these will be able to be used by atomics because tablegen, but it
still seems like a good change by itself.
Differential Revision: https://reviews.llvm.org/D37345
llvm-svn: 312287
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
we're really using the carry flag from the add.
Prior to this patch we had a DAG combine that tried to bypass an X86ISD::ADD with -1 being added to the carry flag of some previous operation. We would then pass the carry flag directly to user.
But this is only safe if the user is looking for the carry flag and not the zero flag.
So we need to only do this combine in a context where we know what flag the consumer is using.
Fixes PR34381.
Differential Revision: https://reviews.llvm.org/D37317
llvm-svn: 312285
|
| |
|
|
|
|
|
|
|
|
| |
build_vector is a more useful canonical form when
pattern matching packed operations, so turn shift
into high element into a build_vector.
Should show no change for now.
llvm-svn: 312282
|
| |
|
|
|
|
|
|
| |
synthetic references in .text"
Breaks builds internally. Will forward repo instructions to author.
llvm-svn: 312243
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch enables the following:
1) Regex based Instruction itineraries for integer instructions.
2) The instructions are grouped as per the nature of the instructions
(move, arithmetic, logic, Misc, Control Transfer).
3) FP instructions and their itineraries are added which includes values
for SSE4A, BMI, BMI2 and SHA instructions.
Patch by Ganesh Gopalasubramanian
Reviewers: RKSimon, craig.topper
Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D36617
llvm-svn: 312237
|
| |
|
|
| |
llvm-svn: 312234
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers,
where the complex numbers are packed in a vector register as a
pair of elements. The Imaginary part of the number is placed in the
more significant element, and the Real part of the number is placed
in the less significant element.
Differential Revision: https://reviews.llvm.org/D36792
llvm-svn: 312228
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Replace the UsePostRAScheduler SubtargetFeature with
DisablePostRAScheduler, which is then used by Swift and Cyclone.
This patch maintains enabling PostRA scheduling for other Thumb2
capable cores and/or for functions which are being compiled in Arm
mode.
Differential Revision: https://reviews.llvm.org/D37055
llvm-svn: 312226
|
| |
|
|
|
|
|
|
|
|
| |
The IDSAR6 system register has been introduced to identify the
v8.3-a Javascript data type conversion and v8.2-a dot product
support.
Differential Revision: https://reviews.llvm.org/D37068
llvm-svn: 312225
|
| |
|
|
|
|
|
|
|
|
| |
This is similar to what was done for ARM in SVN r269574; the code
and the test are straight copypaste to the corresponding AArch64
code and test directory.
Differential revision: https://reviews.llvm.org/D37204
llvm-svn: 312223
|
| |
|
|
|
|
|
|
| |
From comments and code review it wasn't intended to be enabled by default yet.
This reverts commit r311588.
llvm-svn: 312214
|
| |
|
|
|
|
| |
Also refine for f16 and rcp cases.
llvm-svn: 312213
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The majority of the time spent in the pass checking
for the register reads. Rather than searching all of
the defined registers for uses in each instruction,
use a set of defined registers and check the operands
of the instruction.
This process still is algorithmically not great,
but with the additional trick of skipping the analysis
for addresses with one use, this brings one slow
testcase into a reasonable range.
llvm-svn: 312206
|
| |
|
|
|
|
| |
an illegal type.
llvm-svn: 312190
|
| |
|
|
|
|
| |
It's smaller. No functionality change.
llvm-svn: 312180
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These aren't really packed instructions, so the default
op_sel_hi should be 0 since this indicates a conversion.
The operand types are scalar values that behave similar
to an f16 scalar that may be converted to f32.
Doesn't change the default printing for op_sel_hi, just
the parsing.
llvm-svn: 312179
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Remove a check for `ARMSubtarget::isTargetDarwin` when determining
whether to use Swift error registers, so that Swift errors work
properly on non-Darwin ARM32 targets (specifically Android).
Before this patch, generated code would save and restores ARM register r8 at
the entry and returns of a function that throws. As r8 is used as a virtual
return value for the object being thrown, this gets overwritten by the restore,
and calling code is unable to catch the error. In turn this caused Swift code
that used `do`/`try`/`catch` to work improperly on Android ARM32 targets.
Addresses Swift bug report https://bugs.swift.org/browse/SR-5438.
Patch by John Holdsworth.
Reviewers: manmanren, rjmccall, aschwaighofer
Reviewed By: aschwaighofer
Subscribers: srhines, aschwaighofer, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35835
llvm-svn: 312164
|
| |
|
|
|
|
| |
This is no longer necessary now that i1 is illegal.
llvm-svn: 312146
|
| |
|
|
|
|
|
|
|
|
| |
Summary:
This tracks the WebAssembly threads feature proposal at
https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md
Differential Revision: https://reviews.llvm.org/D37300
llvm-svn: 312145
|
| |
|
|
|
|
|
|
|
|
|
|
| |
unless we're matching a masked op or broadcast
Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does.
Since there's no functional difference, just remove the extra patterns and save some isel table size.
Differential Revision: https://reviews.llvm.org/D36854
llvm-svn: 312138
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Support variadic function call. Port the implementation from X86FastISel.
Reviewers: zvi, guyblank, oren_ben_simhon
Reviewed By: guyblank
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D37261
llvm-svn: 312130
|
| |
|
|
| |
llvm-svn: 312120
|
| |
|
|
| |
llvm-svn: 312119
|
| |
|
|
|
|
|
|
|
|
| |
This patch supports one more pattern for bbit0 and bbit1
instructions, CBranchBitNum class is expanded so it can
take 32 bit immidate.
Differential Revision: https://reviews.llvm.org/D36222
llvm-svn: 312111
|
| |
|
|
|
|
|
|
|
|
| |
Support for scalars was committed in r311154, this adds support for allowing
v4f16 vector types (thus avoiding conversions from/to single precision for
these types).
Differential Revision: https://reviews.llvm.org/D37145
llvm-svn: 312104
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vbroadcastf32x2/vbroadcasti32x2
Summary:
This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern.
Fixes PR34357
We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch.
Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one.
Reviewers: aymanmus, zvi, igorb
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37286
llvm-svn: 312101
|
| |
|
|
|
|
|
|
|
|
| |
a 512-bit register
This enables the use of a smaller encoding by using a VEX instruction when possible.
Differential Revision: https://reviews.llvm.org/D37092
llvm-svn: 312100
|
| |
|
|
|
|
|
|
| |
Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.
Differential Revision: https://reviews.llvm.org/D37250
llvm-svn: 312099
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
basing it on the AVX flag
Summary:
Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge".
This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion.
This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX)
This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature.
Reviewers: spatel, chandlerc, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37280
llvm-svn: 312097
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The merge is only possible if the base address register is the
same for the two instructions. If there is only the one use,
there's no point in doing an expensive forward scan checking
for memory interference looking for a merge candidate.
This gives a signficant improvement in one extreme testcase.
The code to do the scan is still algorithmically terrible,
so this is still the slowest pass in that example.
llvm-svn: 312096
|
| |
|
|
|
|
|
|
|
|
| |
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.
Differential Revision: https://reviews.llvm.org/D36856
llvm-svn: 312095
|
| |
|
|
| |
llvm-svn: 312087
|
| |
|
|
|
|
| |
We don't have an intrinsic implemented for this instruction yet, but it looked odd that we were missing the accessor method from the subtarget.
llvm-svn: 312064
|