| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r339895 | niravd | 2018-08-16 18:31:14 +0200 (Thu, 16 Aug 2018) | 13 lines
[MC][X86] Enhance X86 Register expression handling to more closely match GCC.
Allow the comparison of x86 registers in the evaluation of assembler
directives. This generalizes and simplifies the extension from r334022
to catch another case found in the Linux kernel.
Reviewers: rnk, void
Reviewed By: rnk
Subscribers: hiraditya, nickdesaulniers, llvm-commits
Differential Revision: https://reviews.llvm.org/D50795
------------------------------------------------------------------------
------------------------------------------------------------------------
r339896 | d0k | 2018-08-16 18:50:23 +0200 (Thu, 16 Aug 2018) | 1 line
[MC] Remove unused variable
------------------------------------------------------------------------
llvm-svn: 340329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches llvm-mca how to identify dependency breaking instructions on
btver2.
An example of dependency breaking instructions is the zero-idiom XOR (example:
`XOR %eax, %eax`), which always generates zero regardless of the actual value of
the input register operands.
Dependency breaking instructions don't have to wait on their input register
operands before executing. This is because the computation is not dependent on
the inputs.
Not all dependency breaking idioms are also zero-latency instructions. For
example, `CMPEQ %xmm1, %xmm1` is independent on
the value of XMM1, and it generates a vector of all-ones.
That instruction is not eliminated at register renaming stage, and its opcode is
issued to a pipeline for execution. So, the latency is not zero.
This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis
interface. That method takes as input an instruction (i.e. MCInst) and a
MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns
false for all instructions. Targets may override the default behavior for
specific CPUs, and return a value which better matches the subtarget behavior.
In future, we should teach to Tablegen how to automatically generate the body of
isDependencyBreaking from scheduling predicate definitions. This would allow us
to expose the knowledge about dependency breaking instructions to the machine
schedulers (and, potentially, other codegen passes).
Differential Revision: https://reviews.llvm.org/D49310
llvm-svn: 338372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the latency/throughput of LEA instructions in the BtVer2
scheduling model.
On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of
1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA
with a "Scale" operand is also slow, and it has the same latency profile as the
3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e.
RThrouhgput of 2.0).
This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td.
The tablegen backend (for instruction-info) expands that definition into this
(file X86GenInstrInfo.inc):
```
static bool isThreeOperandsLEA(const MachineInstr &MI) {
return (
(
MI.getOpcode() == X86::LEA32r
|| MI.getOpcode() == X86::LEA64r
|| MI.getOpcode() == X86::LEA64_32r
|| MI.getOpcode() == X86::LEA16r
)
&& MI.getOperand(1).isReg()
&& MI.getOperand(1).getReg() != 0
&& MI.getOperand(3).isReg()
&& MI.getOperand(3).getReg() != 0
&& (
(
MI.getOperand(4).isImm()
&& MI.getOperand(4).getImm() != 0
)
|| (MI.getOperand(4).isGlobal())
)
);
}
```
A similar method is generated in the X86_MC namespace, and included into
X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h).
Back to the BtVer2 scheduling model:
A new scheduling predicate named JSlowLEAPredicate now checks if either the
instruction is a three-operands LEA, or it is an LEA with a Scale value
different than 1.
A variant scheduling class uses that new predicate to correctly select the
appropriate latency profile.
Differential Revision: https://reviews.llvm.org/D49436
llvm-svn: 337469
|
|
|
|
|
|
|
|
|
|
| |
search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.
Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.
llvm-svn: 336075
|
|
|
|
|
|
| |
some test CHECK lines.
llvm-svn: 335414
|
|
|
|
|
|
|
|
| |
zero displacement.
(%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement.
llvm-svn: 335379
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the upper portion of a super-register.
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.
On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits". Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.
This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register. The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.
I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2. Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.
Differential Revision: https://reviews.llvm.org/D48225
llvm-svn: 335113
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R_X86_64_GOTPC32 instead of R_X86_64_PC32
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:
leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc
GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.
Reviewers: javed.absar, echristo
Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar
Differential Revision: https://reviews.llvm.org/D47507
llvm-svn: 334515
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.
Differential Revision: https://reviews.llvm.org/D44928
llvm-svn: 334078
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Allow extended parsing of variable assembler assignment syntax and modify X86 to permit
VAR = register assignment. As we emit these as .set directives when possible, we inline
such expressions in output assembly.
Fixes PR37425.
Reviewers: rnk, void, echristo
Reviewed By: rnk
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47545
llvm-svn: 334022
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47563
llvm-svn: 333676
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds CVReg to CodeView register names to prevent a duplicate symbol with
CR3 defined in termios.h, as suggested by Zachary on the mailing list.
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123372.html
Differential revision: https://reviews.llvm.org/D47478
rdar://39863705
llvm-svn: 333421
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47045
llvm-svn: 332868
|
|
|
|
| |
llvm-svn: 332866
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47035
llvm-svn: 332857
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47050
llvm-svn: 332749
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
|
|
|
|
|
|
| |
OpSizeFixed.
llvm-svn: 330532
|
|
|
|
|
|
|
|
| |
XADD instructions.
I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP.
llvm-svn: 330298
|
|
|
|
|
|
| |
TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4.
llvm-svn: 329049
|
|
|
|
| |
llvm-svn: 328907
|
|
|
|
|
|
| |
The 3DNow instructions are encoded a little weird, but we can still represent it as an opcode map.
llvm-svn: 328410
|
|
|
|
|
|
| |
Should be NFC since nothing used the enum value. The instruction descriptions are generated from tablegen which had the correct value.
llvm-svn: 328398
|
|
|
|
|
|
| |
and update comments to be more clear about what it does. NFC
llvm-svn: 328136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.
This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.
Differential Revision: https://reviews.llvm.org/D41879
llvm-svn: 327767
|
|
|
|
|
|
|
|
|
|
|
| |
For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.
Differential revision: https://reviews.llvm.org/D43383
llvm-svn: 325569
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the rare case where the input contains rip-relative addressing with
immediate displacements, *and* the instruction ends with an immediate,
we encode the instruction in the wrong way:
movl $12345678, 0x400(%rdi) // all good, no rip-relative addr
movl %eax, 0x400(%rip) // all good, no immediate at the end of the instruction
movl $12345678, 0x400(%rip) // fails, encodes address as 0x3fc(%rip)
Offset is a label:
movl $12345678, foo(%rip)
we want to account for the size of the immediate (in this case,
$12345678, 4 bytes).
Offset is an immediate:
movl $12345678, 0x400(%rip)
we should not account for the size of the immediate, assuming the
immediate offset is what the user wanted.
Differential Revision: https://reviews.llvm.org/D43050
llvm-svn: 324772
|
|
|
|
|
|
|
|
|
|
|
|
| |
10-byte NOPs (PR22965)
We currently emit up to 15-byte NOPs on all targets (apart from Silvermont), which stalls performance on some targets with decoders that struggle with 2 or 3 more '66' prefixes.
This patch flags recent AMD targets (btver1/znver1) to still emit 15-byte NOPs and bdver* targets to emit 11-byte NOPs. All other targets now emit 10-byte NOPs apart from SilverMont CPUs which still emit 7-byte NOPS.
Differential Revision: https://reviews.llvm.org/D42616
llvm-svn: 323693
|
|
|
|
|
|
|
|
|
|
| |
the MCAsmBackend constructor
After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.
Differential Revision: https://reviews.llvm.org/D41721
llvm-svn: 322227
|
|
|
|
|
|
|
| |
Besides the unsightly print-out, it was causing some buildbots to fail,
e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/9311
llvm-svn: 321711
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend.
D20830 threaded an MCSubtargetInfo reference through
MCAsmBackend::relaxInstruction, but this isn't the only function that would
benefit from access. This patch removes the Triple and CPUString arguments
from createMCAsmBackend and replaces them with MCSubtargetInfo.
This patch just changes the interface without making any intentional
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)
This change initially exposed PR35686, which has since been resolved in r321026.
Differential Revision: https://reviews.llvm.org/D41349
llvm-svn: 321692
|
|
|
|
|
|
| |
This recommits the change from r321026. I have a fix for the lld test now.
llvm-svn: 321038
|
|
|
|
|
|
|
|
| |
empty CPU string." while I investigate how to fix an lld test failure.
Looks like lld also needs to pass a -mcpu in some of its tests
llvm-svn: 321033
|
|
|
|
|
|
|
|
|
|
| |
Update tests to force a CPU with NOPL
Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL.
Fixes PR35686
llvm-svn: 321026
|
|
|
|
|
|
| |
Hopefully r320864 has fixed the offending case that failed the assert.
llvm-svn: 320898
|
|
|
|
|
|
| |
It seems to be failing real code which is concerning, but we were silently getting away with it. I'll investigate further.
llvm-svn: 320850
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40960
llvm-svn: 320837
|
|
|
|
|
|
|
|
|
|
| |
assembler in 32-bit mode.
There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction.
I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode.
llvm-svn: 320830
|
|
|
|
| |
llvm-svn: 320636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Segment moves to memory are always 16-bit. Remove invalid 32 and 64
bit variants.
Recommiting with missing clang inline assembly test change.
Fixes PR34478.
Reviewers: rnk, craig.topper
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39847
llvm-svn: 318797
|
|
|
|
|
|
| |
r318678 caused the Clang test CodeGen/ms-inline-asm.c to start failing.
llvm-svn: 318710
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Segment moves to memory are always 16-bit. Remove invalid 32 and 64
bit variants.
Fixes PR34478.
Reviewers: rnk, craig.topper
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39847
llvm-svn: 318678
|
|
|
|
|
|
| |
We support 2 spelling for silvermont and we should accept both here.
llvm-svn: 318023
|
|
|
|
|
|
| |
PR7709, PR17697, PR19251, PR32809 and PR21640. There could be other bugs closed by this patch.
llvm-svn: 315899
|
|
|
|
|
|
|
|
| |
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315531
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.
The directives are:
.cv_fpo_proc _foo
.cv_fpo_pushreg ebp/ebx/etc
.cv_fpo_setframe ebp/esi/etc
.cv_fpo_stackalloc 200
.cv_fpo_endprologue
.cv_fpo_endproc
.cv_fpo_data _foo
I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.
I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28
Once we have cdb integration in debuginfo-tests, we can add integration
tests there.
Reviewers: majnemer, hans
Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D38776
llvm-svn: 315513
|
|
|
|
|
|
|
|
| |
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315410
|
|
|
|
|
|
|
|
|
|
| |
functions.
This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.
llvm-svn: 315327
|
|
|
|
|
|
|
|
|
|
| |
This makes the .seh_ directives slightly more usable from standalone
assembly files.
This removes a large number of report_fatal_errors and recovers from the
error by ignoring the directive.
llvm-svn: 315262
|