| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.
llvm-svn: 249364
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CATCHRET operand did not match the MachineFunction's CFG. This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.
Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.
llvm-svn: 249344
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.
These aliases allow for the automatic matching of instructions
with forced 32-bit encoding. Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13396
llvm-svn: 249330
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.
Fixes PR9199 and PR23768.
[This is based on Peter Collingbourne's r238473 which was reverted.]
Differential Revision: http://reviews.llvm.org/D13239
Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D11219
llvm-svn: 249317
|
|
|
|
|
|
|
|
|
| |
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.
Differential Revision: http://reviews.llvm.org/D13011
llvm-svn: 249313
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allow simple expressions.
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.
Patch by Scott Egerton.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12760
llvm-svn: 249311
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
llvm-svn: 249303
|
|
|
|
| |
llvm-svn: 249262
|
|
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D12690
llvm-svn: 249261
|
|
|
|
| |
llvm-svn: 249253
|
|
|
|
|
|
| |
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.
llvm-svn: 249243
|
|
|
|
|
|
|
| |
This is a temporary assembly syntax that will likely evolve along with
broader upcoming syntax changes.
llvm-svn: 249225
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13395
llvm-svn: 249221
|
|
|
|
| |
llvm-svn: 249218
|
|
|
|
| |
llvm-svn: 249187
|
|
|
|
| |
llvm-svn: 249184
|
|
|
|
| |
llvm-svn: 249178
|
|
|
|
| |
llvm-svn: 249171
|
|
|
|
|
|
|
| |
Make sure we aren't accidentally not setting
these in the instruction definitions.
llvm-svn: 249170
|
|
|
|
| |
llvm-svn: 249165
|
|
|
|
|
|
|
|
|
| |
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.
llvm-svn: 249164
|
|
|
|
|
|
|
| |
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).
llvm-svn: 249163
|
|
|
|
| |
llvm-svn: 249153
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Originally reviewed here: http://reviews.llvm.org/D13347
llvm-svn: 249147
|
|
|
|
|
|
|
|
|
| |
128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.
llvm-svn: 249124
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D13235
llvm-svn: 249123
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Differential Revision: http://reviews.llvm.org/D13347
llvm-svn: 249121
|
|
|
|
| |
llvm-svn: 249091
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace LiveInterval usage with LiveVariables. LiveIntervals
computes far more information than is needed for this pass
which just needs to find if an SGPR is live out of the
defining block.
LiveIntervals are not usually available that early, requiring
computing them twice which is very expensive. The extra run of
LiveIntervals/LiveVariables/SlotIndexes was costing in total
about 5% of compile time.
Continuing to use LiveIntervals is problematic. It seems
there is an option (early-live-intervals) to run the analysis
about where it should go to avoid recomputing LiveVariables,
but it seems to be completely broken with subreg liveness
enabled. There are also problems from trying to recompute
LiveIntervals since this seems to undo LiveVariables
and clearing kill flags, causing TwoAddressInstructions
to make bad decisions.
Insert the pass right after live variables and preserve it.
The tricky case to worry about might be phis since
LiveVariables doesn't count a register as live out if
in the successor block it is only used in a phi,
but I don't think this is a concern right now
because SIFixSGPRCopies replaces SGPR phis.
llvm-svn: 249087
|
|
|
|
|
|
|
|
| |
for "set" pseudo op in PIC mode.
Differential Revision: http://reviews.llvm.org/D13173
llvm-svn: 249086
|
|
|
|
| |
llvm-svn: 249082
|
|
|
|
|
|
|
| |
There's no point in checking VReg_1 because all uses
of it should already have been removed by SILowerI1Copies.
llvm-svn: 249081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.
Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.
There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.
llvm-svn: 249079
|
|
|
|
|
|
|
|
|
|
|
|
| |
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.
With this, some basic __try / __except examples work.
llvm-svn: 249078
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function. This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13356
llvm-svn: 249073
|
|
|
|
|
|
|
|
|
| |
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of. Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.
llvm-svn: 249052
|
|
|
|
|
|
|
| |
Support for pairing unscaled loads and stores has been enabled since the
original ARM64 port. This feature is no longer experimental, AFAICT.
llvm-svn: 249049
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.
Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.
README.txt updated (bullet removed).
Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.
Reviewed by Ulrich Weigand.
llvm-svn: 249046
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Add test (Matt).
Fix capitalization of isEOP (Matt).
Move pattern to class parameter (Matt).
Make the instruction available to Cayman (Matt).
Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.
Patch by: Zoltan Gilian
llvm-svn: 249042
|
|
|
|
|
|
|
|
| |
v2: Fix brace placement and capitalization (Matt).
Patch by: Zoltan Gilian
llvm-svn: 249041
|
|
|
|
|
|
| |
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll
llvm-svn: 249032
|
|
|
|
|
|
|
|
|
|
|
| |
CPU features
Provide assembler support for STCK, STCKF, STCKE, and STFLE.
Author: joncmu
Differential Revision: http://reviews.llvm.org/D13299
llvm-svn: 249015
|
|
|
|
| |
llvm-svn: 249011
|
|
|
|
| |
llvm-svn: 249008
|
|
|
|
| |
llvm-svn: 249007
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D10337
llvm-svn: 249004
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D13240
llvm-svn: 249002
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12451
llvm-svn: 248978
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
|