| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
llvm-svn: 222544
|
|
|
|
|
|
|
|
| |
positive numbers
Differential Revision: http://reviews.llvm.org/D5938
llvm-svn: 222521
|
|
|
|
|
|
|
|
| |
instructions.
This was missed in r221706.
llvm-svn: 221731
|
|
|
|
|
|
|
|
|
|
| |
This removes calls to isMaterializable in the following cases:
* It was redundant with a call to isDeclaration now that isDeclaration returns
the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.
llvm-svn: 221050
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
llvm-svn: 220570
|
|
|
|
| |
llvm-svn: 218804
|
|
|
|
| |
llvm-svn: 217071
|
|
|
|
|
|
|
|
| |
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reinstates commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 216982
|
|
|
|
|
|
|
|
| |
Added FeatureSMAP.
Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP
llvm-svn: 216161
|
|
|
|
| |
llvm-svn: 215667
|
|
|
|
| |
llvm-svn: 215279
|
|
|
|
|
|
| |
created.
llvm-svn: 215271
|
|
|
|
|
|
|
|
|
|
|
| |
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
|
|
|
|
|
|
|
|
|
| |
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows assembling the two new instructions, encls and enclu for the
SKX processor model.
Note the diffs are a bigger than what might think, but to fit the new
MRM_CF and MRM_D7 in things in the right places things had to be
renumbered and shuffled down causing a bit more diffs.
rdar://16228228
llvm-svn: 214460
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling HasAVX512{DQ,BW,VL} predicates.
Adding VK2, VK4, VK32, VK64 masked register classes.
Adding new types (v64i8, v32i16) to VR512.
Extending calling conventions for new types (v64i8, v32i16)
Patch by Zinovy Nis <zinovy.y.nis@intel.com>
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>
llvm-svn: 213545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
Differential Revision: http://reviews.llvm.org/D4217
llvm-svn: 213101
|
|
|
|
|
|
| |
so that we can use initializer lists for the X86Subtarget.
llvm-svn: 210614
|
|
|
|
| |
llvm-svn: 210606
|
|
|
|
| |
llvm-svn: 210596
|
|
|
|
| |
llvm-svn: 210516
|
|
|
|
|
|
| |
from the x86 target machine. Should be no functional change.
llvm-svn: 210479
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to Intel Software Optimization Manual
on Silvermont INC or DEC instructions require
an additional uop to merge the flags.
As a result, a branch instruction depending
on an INC or a DEC instruction incurs a 1 cycle penalty.
Differential Revision: http://reviews.llvm.org/D3990
llvm-svn: 210466
|
|
|
|
| |
llvm-svn: 209342
|
|
|
|
|
|
|
| |
a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.
llvm-svn: 209340
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to Intel Software Optimization Manual on Silvermont in some cases LEA
is better to be replaced with ADD instructions:
"The rule of thumb for ADDs and LEAs is that it is justified to use LEA
with a valid index and/or displacement for non-destructive destination purposes
(especially useful for stack offset cases), or to use a SCALE.
Otherwise, ADD(s) are preferable."
Differential Revision: http://reviews.llvm.org/D3826
llvm-svn: 209198
|
|
|
|
| |
llvm-svn: 208248
|
|
|
|
| |
llvm-svn: 207197
|
|
|
|
|
|
|
| |
definition below all of the header #include lines, lib/Target/...
edition.
llvm-svn: 206842
|
|
|
|
|
|
|
|
|
|
|
|
| |
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...
llvm-svn: 206838
|
|
|
|
|
|
|
|
| |
This logic is properly in the realm of whatever is creating the
TargetMachine. This makes plain 'llc foo.ll' consistent across
heterogenous machines.
llvm-svn: 206094
|
|
|
|
|
|
|
|
|
|
| |
WinCOFF cannot form PC relative relocations to support absolute
MCValues. We should reenable this once WinCOFF supports emission of
IMAGE_REL_I386_REL32 relocations.
This fixes PR19272.
llvm-svn: 205058
|
|
|
|
| |
llvm-svn: 199648
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199218
|
|
|
|
|
|
|
|
| |
Revert this for now until I fix an issue in Clang with it.
This reverts commit r199204.
llvm-svn: 199207
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199204
|
|
|
|
| |
llvm-svn: 198720
|
|
|
|
|
|
|
|
|
|
|
| |
This is not really expected to work right yet. Mostly because we will
still emit the OpSize (0x66) prefix in all the wrong places, along with
a number of other corner cases. Those will all be fixed in the subsequent
commits.
Patch from David Woodhouse.
llvm-svn: 198584
|
|
|
|
|
|
| |
Patch by Adam Strzelecki
llvm-svn: 195632
|
|
|
|
|
|
|
|
|
|
| |
on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.
It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.
llvm-svn: 195383
|
|
|
|
|
|
|
|
|
|
| |
Adding TBM feature to bdver2 processor; piledriver supports this instruction set
according to the following document:
http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf
Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1692
llvm-svn: 191324
|
|
|
|
|
|
| |
they've already been enabled. The extra call ends up clearing the bit in FeatureBits since its a 'toggle'. Can't prove that anything was broken because of this since I don't think the FeatureBits for these are used.
llvm-svn: 190920
|
|
|
|
|
|
| |
InitMCProcessorInfo right after detecting them. Instead add a new function that only updates the scheduling model and call that.
llvm-svn: 190919
|
|
|
|
|
|
|
|
|
|
|
| |
Implements Instruction scheduler latencies for Silvermont,
using latencies from the Intel Silvermont Optimization Guide.
Auto detects SLM.
Turns on post RA scheduler when generating code for SLM.
llvm-svn: 190717
|
|
|
|
| |
llvm-svn: 190659
|
|
|
|
|
|
|
|
|
| |
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.
Support for the remaining instructions will follow in a separate patch.
llvm-svn: 190611
|
|
|
|
|
|
| |
av512pf, avx-512-cdi -> avx512cd, avx-512-eri->avx512er. This matches better with official docs and what gcc patches appearto be using. I didn't touch the has* functions or the feature flag names to avoid change the td and lowering file while commits are still happening.
llvm-svn: 188859
|
|
|
|
| |
llvm-svn: 188746
|
|
|
|
| |
llvm-svn: 188745
|
|
|
|
|
|
|
| |
Added 512-bit operands printing.
Added instruction formats for KNL instructions.
llvm-svn: 187324
|