| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This section is used for debug information and has no need to be
in memory at runtime. With this patch, LLVM now emits the same flags as
the GNU assembler. This patch also fixes an error when compiling
the Linux kernel, The error is that there are relocations within the
.pdr section in a VDSO.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D17199
llvm-svn: 260879
|
| |
|
|
|
|
|
|
|
|
| |
are changed to 16 bits.
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.
Differential Revision: http://reviews.llvm.org/D17138
llvm-svn: 260878
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
r180893 added an indirect include of llvm/Config/Targets.def to
llvm/Support/CodeGen.h, which in turn is included by things like
llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to
rebuild 1274 files after reconfiguring.
This commit strips CodeGen.h back down to just a pile of enums and moves
the expensive includes over to CodeGenCWrappers.h (which is only
included in two places). This gets ninja down to 88 files if you
reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86.
llvm-svn: 260835
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.
On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.
This patch has several benefits:
* Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
* Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
* Matching the repeating shuffle makes use of a lot of existing shuffle lowering.
There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.
Differential Revision: http://reviews.llvm.org/D16537
llvm-svn: 260834
|
| |
|
|
|
|
| |
features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats.
llvm-svn: 260832
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.
I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)
Differential Revision: http://reviews.llvm.org/D17219
llvm-svn: 260828
|
| |
|
|
|
|
|
| |
Gcc 4.7.2-4 does not seem to have "emplace" in its implementation of map.
This should fix the build failure on polly-amd64-linux.
llvm-svn: 260816
|
| |
|
|
|
|
| |
SlotInfo() instead of member initializers.
llvm-svn: 260812
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Tests for the new scalarize all private access options will be
included with a future commit.
The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.
llvm-svn: 260804
|
| |
|
|
|
|
|
| |
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.
llvm-svn: 260792
|
| |
|
|
| |
llvm-svn: 260784
|
| |
|
|
|
|
|
| |
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.
llvm-svn: 260782
|
| |
|
|
|
|
| |
Also fixes missing f32 test.
llvm-svn: 260780
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: nhaustov, cfang, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17159
llvm-svn: 260774
|
| |
|
|
| |
llvm-svn: 260773
|
| |
|
|
| |
llvm-svn: 260766
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16603
llvm-svn: 260765
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16837
llvm-svn: 260764
|
| |
|
|
|
|
|
|
|
| |
ops.
Computed gotos and RETURNADDR may never be supported; we can do
FRAMEADDR in the future.
llvm-svn: 260759
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace spills to memory with spills to registers, if possible. This
applies mostly to predicate registers (both scalar and vector), since
they are very limited in number. A spill of a predicate register may
happen even if there is a general-purpose register available. In cases
like this the stack spill/reload may be eliminated completely.
This optimization will consider all stack objects, regardless of where
they came from and try to match the live range of the stack slot with
a dead range of a register from an appropriate register class.
llvm-svn: 260758
|
| |
|
|
| |
llvm-svn: 260753
|
| |
|
|
| |
llvm-svn: 260750
|
| |
|
|
| |
llvm-svn: 260748
|
| |
|
|
| |
llvm-svn: 260740
|
| |
|
|
|
|
| |
This should have landed in r260686.
llvm-svn: 260739
|
| |
|
|
| |
llvm-svn: 260737
|
| |
|
|
| |
llvm-svn: 260725
|
| |
|
|
| |
llvm-svn: 260698
|
| |
|
|
|
|
|
|
|
| |
Rewrite the code to handle all pseudo-instructions in a single pass.
This temporarily reverts spill slot optimization that used general-
purpose registers to hold values of spilled predicate registers.
llvm-svn: 260696
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
assembler
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match. Also update MC and CodeGen tests.
Differential Revision: http://reviews.llvm.org/D16927
Patch by: Nikolay Haustov
llvm-svn: 260694
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D15093
llvm-svn: 260692
|
| |
|
|
|
|
| |
This code is dead. The expansion is now done in HexagonFrameLowering.
llvm-svn: 260691
|
| |
|
|
|
|
|
|
|
| |
We can generate the actual instructions from the intrinsics without the
need for pseudo-instructions. Also, since the intrinsics have a side-
effect in a form of a store, attempt to optimize away loads from the
store location.
llvm-svn: 260690
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs. This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.
This optimization cannot currently be done for MachO targets since they
rely on a fast-path .debug_frame equivalent that can only encode
callee-save registers as pairs.
Reviewers: t.p.northover, rengolin, mcrosier, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17000
llvm-svn: 260689
|
| |
|
|
|
|
|
| |
Create a virtual register that will hold the actual address and use it
with the offset of 0 in the place of the original FI.
llvm-svn: 260688
|
| |
|
|
|
|
| |
Machine model description by Dave Estes <cestes@codeaurora.org>.
llvm-svn: 260686
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change merges adjacent 32 bit zero stores into a 64 bit zero store.
e.g.,
str wzr, [x0]
str wzr, [x0, #4]
becomes
str xzr, [x0]
Therefore, four adjacent 32 bit zero stores will be a single stp.
e.g.,
str wzr, [x0]
str wzr, [x0, #4]
str wzr, [x0, #8]
str wzr, [x0, #12]
becomes
stp xzr, xzr, [x0]
Reviewers: mcrosier, jmolloy, gberry, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16933
llvm-svn: 260682
|
| |
|
|
|
|
|
|
|
|
|
| |
The DataLayout can calculate alignment of vectors based on the alignment
of the element type and the number of elements. In fact, it is the product
of these two values. The problem is that for vectors of N x i1, this will
return the alignment of N bytes, since the alignment of i1 is 8 bits. The
vector types of vNi1 should be aligned to N bits instead. Provide explicit
alignment for HVX vectors to avoid such complications.
llvm-svn: 260678
|
| |
|
|
|
|
| |
Found by msan.
llvm-svn: 260676
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.
llvm-svn: 260658
|
| |
|
|
|
| |
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260657
|
| |
|
|
|
|
|
|
|
| |
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
|
| |
|
|
|
|
|
| |
I don't think this was causing any real problems, so I'm not sure
how to test for this.
llvm-svn: 260646
|
| |
|
|
| |
llvm-svn: 260645
|
| |
|
|
| |
llvm-svn: 260644
|
| |
|
|
|
|
|
|
|
|
|
| |
Let DAG.getConstant() handle the splatting; there's no need
to repeat that logic here.
See also:
http://reviews.llvm.org/rL258833
http://reviews.llvm.org/rL260582
llvm-svn: 260609
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is just a trivial implementation:
- Support only arguments passed in registers.
- Support only "plain" arguments, i.e., no sext/zext attribute.
At this point, it is possible to play with the IRTranslator on AArch64:
llc -mtriple arm64-<vendor>-<os> -print-machineinstrs <input.ll> -o - -global-isel
For now, we only support the translation of program with adds and returns.
Follow-up patches are on their way to add a test case (the MIRParser is
not ready as it is).
llvm-svn: 260600
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations. When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17102
llvm-svn: 260599
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When we split SMRD instructions into two MUBUFs we were adding the users
of the newly created MUBUFs to the VALU worklist. However, the only
users these instructions had was the REG_SEQUENCE that was inserted
by splitSMRD when the original SMRD instruction was split.
We need to make sure to add the users of the original SMRD to the VALU
worklist before it is split.
I have a test case, but it requires one other bug fix, so it will be
added in a later commt.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17101
llvm-svn: 260588
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: sunfish, jfb
Subscribers: jfb, dschuff
Differential Revision: http://reviews.llvm.org/D17156
llvm-svn: 260585
|