summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Allow merging of immediates within a basic block for code size savingsMichael Kuperstein2015-08-113-7/+117
| | | | | | | | | | | First step in preventing immediates that occur more than once within a single basic block from being pulled into their users, in order to prevent unnecessary large instruction encoding .Currently enabled only when optimizing for size. Patch by: zia.ansari@intel.com Differential Revision: http://reviews.llvm.org/D11363 llvm-svn: 244601
* [X86] Add SAL mnemonics for Intel syntaxMarina Yatsina2015-08-111-0/+1
| | | | | | | | SAL and SHL instructions perform the same operation Differential Revision: http://reviews.llvm.org/D11882 llvm-svn: 244588
* [X86] Fix REPE, REPZ, REPNZ for intel syntaxMarina Yatsina2015-08-111-3/+3
| | | | | | | | | REPE, REPZ, REPNZ, REPNE should have mnemonics for Intel syntax as well. Currently using these instructions causes compilation errors for Intel syntax. Differential Revision: http://reviews.llvm.org/D11794 llvm-svn: 244584
* [X86] Fix imul alias for intel syntaxMarina Yatsina2015-08-111-6/+6
| | | | | | | | | The "imul reg, imm" alias is not defined for intel syntax. In intel syntax there is no w/l/q suffix for the imul instruction. Differential Revision: http://reviews.llvm.org/D11887 llvm-svn: 244582
* [X86] When optimizing for minsize, use POP for small post-call stack clean-upMichael Kuperstein2015-08-112-1/+73
| | | | | | | | | | | | | | | | When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp" following a call by one or two pops, respectively. We don't try to do it in general, but only when the stack adjustment immediately follows a call - which is the most common case. That allows taking a short-cut when trying to find a free register to pop into, instead of a full-blown liveness check. If the adjustment immediately follows a call, then every register the call clobbers but doesn't define should be dead at that point, and can be used. Differential Revision: http://reviews.llvm.org/D11749 llvm-svn: 244578
* Explicitly clear the MI operand list when getInstruction() is called. Call ↵Cameron Esfahani2015-08-111-0/+1
| | | | | | | | | | | | | | MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler. Summary: Explicitly clear the MI operand list when getInstruction() is called. Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11665 llvm-svn: 244557
* x86: Emit LAHF/SAHF instead of PUSHF/POPFJF Bastien2015-08-101-26/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF. As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire. I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are: | Time per call (ms) | Runtime (ms) | Benchmark | | 0.000012514 | 6257 | sete.i386 | | 0.000012810 | 6405 | sete.i386-fast | | 0.000010456 | 5228 | sete.x86-64 | | 0.000010496 | 5248 | sete.x86-64-fast | | 0.000012906 | 6453 | lahf-sahf.i386 | | 0.000013236 | 6618 | lahf-sahf.i386-fast | | 0.000010580 | 5290 | lahf-sahf.x86-64 | | 0.000010304 | 5152 | lahf-sahf.x86-64-fast | | 0.000028056 | 14028 | pushf-popf.i386 | | 0.000027160 | 13580 | pushf-popf.i386-fast | | 0.000023810 | 11905 | pushf-popf.x86-64 | | 0.000026468 | 13234 | pushf-popf.x86-64-fast | Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose. Reviewers: rnk, jvoung, t.p.northover Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D6629 llvm-svn: 244503
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-5/+2
| | | | llvm-svn: 244499
* [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombinerSimon Pilgrim2015-08-101-44/+0
| | | | | | | | As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations. Differential Revision: http://reviews.llvm.org/D11886 llvm-svn: 244495
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-3/+1
| | | | llvm-svn: 244464
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-4/+1
| | | | llvm-svn: 244463
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-2/+1
| | | | llvm-svn: 244460
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-3/+1
| | | | llvm-svn: 244458
* Test commit to verify commit accessMarina Yatsina2015-08-101-0/+1
| | | | llvm-svn: 244438
* X86: remove a dead store (NFC)Saleem Abdulrasool2015-08-091-2/+2
| | | | | | | | The SP was always unconditionally assigned to later, but initialised early. This delays the initialisation, and avoids the dead store. Identified by clang static analysis. No functional change intended. llvm-svn: 244423
* [x86] enable machine combiner reassociations for 128-bit vector ↵Sanjay Patel2015-08-081-0/+4
| | | | | | single/double adds llvm-svn: 244403
* Fix some comment typos.Benjamin Kramer2015-08-081-6/+6
| | | | llvm-svn: 244402
* [X86] Add ADX and RDSEED to Skylake processor.Craig Topper2015-08-081-1/+2
| | | | llvm-svn: 244396
* Add SlowBTMem to Sandy Bridge and newer Intel CPUs. Reading through Agner ↵Craig Topper2015-08-081-5/+9
| | | | | | Fog's table suggests there have been no improvements to these processors relative to Westmere for bit test instructions. llvm-svn: 244395
* Removing tailing whitespacesMichael Liao2015-08-061-62/+62
| | | | llvm-svn: 244203
* [X86] Improve EmitLoweredSelect for contiguous CMOV pseudo instructions.Michael Kuperstein2015-08-061-33/+159
| | | | | | | | | | | | This change improves EmitLoweredSelect() so that multiple contiguous CMOV pseudo instructions with the same (or exactly opposite) conditions get lowered using a single new basic-block. This eliminates unnecessary extra basic-blocks (and CFG merge points) when contiguous CMOVs are being lowered. Patch by: kevin.b.smith@intel.com Differential Revision: http://reviews.llvm.org/D11428 llvm-svn: 244202
* MIR Serialization: Initial serialization of the machine operand target flags.Alex Lorenz2015-08-062-0/+41
| | | | | | | | | | | | This commit implements the initial serialization of the machine operand target flags. It extends the 'TargetInstrInfo' class to add two new methods that help to provide text based serialization for the target flags. This commit can serialize only the X86 target flags, and the target flags for the other targets will be serialized in the follow-up commits. Reviewers: Duncan P. N. Exon Smith llvm-svn: 244185
* x86: NFC remove needless InstrCompiler castJF Bastien2015-08-051-15/+15
| | | | | | | | | | Summary: The casts from String to PatFrag weren't needed if we instead provided an SDNode. This fix was suggested by @pete in D11382. Subscribers: pete, llvm-commits Differential Revision: http://reviews.llvm.org/D11788 llvm-svn: 244167
* x86 atomic: optimize a.store(reg op a.load(acquire), release)JF Bastien2015-08-054-19/+114
| | | | | | | | | | | | Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations. Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11382 llvm-svn: 244128
* Revert "Fix MO's analyzePhysReg, it was confusing sub- and super-registers. ↵JF Bastien2015-08-051-48/+26
| | | | | | | | Problem pointed out by Michael Hordijk." I mistakenly committed the patch for D6629, and was trying to commit another. Reverting until it gets proper signoff. llvm-svn: 244121
* Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem ↵JF Bastien2015-08-051-26/+48
| | | | | | pointed out by Michael Hordijk. llvm-svn: 244120
* [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'Chandler Carruth2015-08-052-83/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we *never* want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080
* wrap OptSize and MinSize attributes for easier and consistent access (NFCI)Sanjay Patel2015-08-045-13/+10
| | | | | | | | | | | | | | | | | Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994
* [x86] machine combiner reassociation: mark EFLAGS operand as 'dead'Sanjay Patel2015-08-041-4/+43
| | | | | | | | | | | | | | In the commentary for D11660, I wasn't sure if it was alright to create new integer machine instructions without also creating the implicit EFLAGS operand. From what I can see, the implicit operand is always created by the MachineInstrBuilder based on the instruction type, so we don't have to do that explicitly. However, in reviewing the debug output, I noticed that the operand was not marked as 'dead'. The machine combiner should do that to preserve future optimization opportunities that may be checking for that dead EFLAGS operand themselves. Differential Revision: http://reviews.llvm.org/D11696 llvm-svn: 243990
* De-constify pointers to Type since they can't be modified. NFCCraig Topper2015-08-012-4/+4
| | | | | | This was already done in most places a while ago. This just fixes the ones that crept in over time. llvm-svn: 243842
* x86: check hasOpaqueSPAdjustment in canRealignStackJF Bastien2015-07-311-4/+6
| | | | | | | | | | | | | | | Summary: @rnk pointed out in [1] that x86's canRealignStack logic should match that in CantUseSP from hasBasePointer. [1]: http://reviews.llvm.org/D11160?id=29713#inline-89350 Reviewers: rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D11377 llvm-svn: 243772
* [x86] reassociate integer multiplies using machine combiner passSanjay Patel2015-07-311-10/+30
| | | | | | | | | | | | | Add i16, i32, i64 imul machine instructions to the list of reassociation candidates. A new bit of logic is needed to handle integer instructions: they have an implicit EFLAGS operand, so we have to make sure it's dead in order to do any reassociation with integer ops. Differential Revision: http://reviews.llvm.org/D11660 llvm-svn: 243756
* fix memcpy/memset/memmove lowering when optimizing for sizeSanjay Patel2015-07-301-5/+3
| | | | | | | | | | | | | | | | | | | | Fixing MinSize attribute handling was discussed in D11363. This is a prerequisite patch to doing that. The handling of OptSize when lowering mem* functions was broken on Darwin because it wants to ignore -Os for these cases, but the existing logic also made it ignore -Oz (MinSize). The Linux change demonstrates a widespread problem. The backend doesn't usually recognize the MinSize attribute by itself; it assumes that if the MinSize attribute exists, then the OptSize attribute must also exist. Fixing this more generally will be a follow-on patch or two. Differential Revision: http://reviews.llvm.org/D11568 llvm-svn: 243693
* [X86] Recognize "flags" as an identifier, not a register in Intel-syntax ↵Michael Kuperstein2015-07-301-0/+5
| | | | | | | | | inline asm Patch by: marina.yatsina@intel.com Differential Revision: http://reviews.llvm.org/D11512 llvm-svn: 243630
* push fast-math check for machine-combiner reassociations into ↵Sanjay Patel2015-07-301-7/+4
| | | | | | | | instruction-type check; NFC This makes it simpler to add instruction types that don't depend on fast-math. llvm-svn: 243596
* Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all ↵Nick Lewycky2015-07-292-2/+2
| | | | | | | | the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.) Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h. llvm-svn: 243585
* Rename hasCompatibleFunctionAttributes->areInlineCompatible basedEric Christopher2015-07-292-4/+4
| | | | | | | on suggestions. Currently the function is only used for inline purposes and this is more descriptive for the use. llvm-svn: 243578
* [X86][SSE] Keep 32-bit target i64 vector shifts on SSE unit.Simon Pilgrim2015-07-291-15/+31
| | | | | | | | This patch improves the 32-bit target i64 constant matching to detect the shuffle vector splats that are introduced by i64 vector shift vectorization (D8416). Differential Revision: http://reviews.llvm.org/D11327 llvm-svn: 243577
* [X86][SSE] Vectorize i64 ASHR operationsSimon Pilgrim2015-07-292-4/+18
| | | | | | | | This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 llvm-svn: 243569
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-07-291-2/+1
| | | | | | | | | | sources" Reported to Broke some internal tests: PR24303 This reverts commit r243486. llvm-svn: 243540
* fix TLI's combineRepeatedFPDivisors interface to return the minimum user ↵Sanjay Patel2015-07-282-3/+3
| | | | | | | | | | | | | | | threshold This fix was suggested as part of D11345 and is part of fixing PR24141. With this change, we can avoid walking the uses of a divisor node if the target doesn't want the combineRepeatedFPDivisors transform in the first place. There is no NFC-intended other than that. Differential Revision: http://reviews.llvm.org/D11531 llvm-svn: 243498
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply 243271 with more fixes; although we are not handling multiple sources with coalescable copies, we were not properly skipping this case. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 243486
* Implement target independent TLS compatible with glibc's emutls.c.Chih-Hung Hsieh2015-07-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'common' section TLS is not implemented. Current C/C++ TLS variables are not placed in common section. DWARF debug info to get the address of TLS variables is not generated yet. clang and driver changes in http://reviews.llvm.org/D10524 Added -femulated-tls flag to select the emulated TLS model, which will be used for old targets like Android that do not support ELF TLS models. Added TargetLowering::LowerToTLSEmulatedModel as a target-independent function to convert a SDNode of TLS variable address to a function call to __emutls_get_address. Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel for TLSModel::Emulated. Although all targets supporting ELF TLS models are enhanced, emulated TLS model has been tested only for Android ELF targets. Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for emulated TLS variables. Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls. TODO: Add proper DIE for emulated TLS variables. Added new unit tests with emulated TLS. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243438
* [X86] Remove mergeSPUpdatesUp()Michael Kuperstein2015-07-281-25/+1
| | | | | | | | | | | X86FrameLowering has both a mergeSPUpdates() that accepts a direction, and an mergeSPUpdatesUp(), which seem to do the same thing, except for a slightly different interface. Removed the less general function. NFC. Differential Revision: http://reviews.llvm.org/D11510 llvm-svn: 243396
* [X86][SSE] Use bitmasks instead of shuffles where possible.Simon Pilgrim2015-07-281-0/+8
| | | | | | | | | | VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions. Split off from D11518. Differential Revision: http://reviews.llvm.org/D11541 llvm-svn: 243395
* AVX512: Implemented encoding and intrinsics for VGETEXPSS/D instructionsIgor Breger2015-07-283-1/+8
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11528 llvm-svn: 243390
* fix invalid load folding with SSE/AVX FP logical instructions (PR22371)Sanjay Patel2015-07-283-46/+59
| | | | | | | | | | | | | | | | | | This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ). I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy when targeting 32-bit x86 and causes a miscompile/crash in Wine: https://bugs.winehq.org/show_bug.cgi?id=38826 https://llvm.org/bugs/show_bug.cgi?id=22371#c25 The quick fix is to simply remove the scalar FP logical instructions from the load folding table in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs, fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP logical instructions (because that's all x86 gives us anyway). That lets us do the load folding legally. Differential Revision: http://reviews.llvm.org/D11477 llvm-svn: 243361
* [llvm-mc] Pushing plumbing through for --fatal-warnings flag.Colin LeMahieu2015-07-271-1/+1
| | | | llvm-svn: 243334
* remove unnecessary forward declaration; NFCSanjay Patel2015-07-271-16/+12
| | | | llvm-svn: 243328
* don't repeat function names in comments; NFCSanjay Patel2015-07-271-61/+49
| | | | llvm-svn: 243327
OpenPOWER on IntegriCloud