summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Remove DwarfDebug::FirstCU as it has no useDavid Blaikie2014-10-242-17/+5
| | | | | | | It was only being used as a flag to identify the lack of debug info from within endModule - use the section labels for that instead. llvm-svn: 220575
* Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel2014-10-241-40/+77
| | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
* Added reset of LexicalScope in LiveDebugVariables reset function.Marcello Maggioni2014-10-241-0/+1
| | | | llvm-svn: 220545
* Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries ↵Timur Iskhodzhanov2014-10-241-9/+47
| | | | | | | | with VS2012+ Reviewed at http://reviews.llvm.org/D5772 llvm-svn: 220544
* DebugInfo: Remove DwarfDebug::addScopeVariable now that it's just a trivial ↵David Blaikie2014-10-244-13/+6
| | | | | | wrapper llvm-svn: 220542
* [SelectionDAG] Teach the vector scalarizer about FP conversions.Ahmed Bougacha2014-10-231-0/+4
| | | | | | | | | | | | | | | | | This adds support for legalization of instructions of the form: [fp_conv] <1 x i1> %op to <1 x double> where fp_conv is one of fpto[us]i, [us]itofp. This used to assert because they were simply missing from the vector operand scalarizer. A similar problem arose in r190830, with trunc instead. Fixes PR20778. Differential Revision: http://reviews.llvm.org/D5810 llvm-svn: 220533
* Update comment and fix typos in assert message. (NFC)Ahmed Bougacha2014-10-231-3/+3
| | | | llvm-svn: 220531
* ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodesTim Northover2014-10-233-19/+36
| | | | | | | | | | | | | | | x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS dependency because it was represented by a pair of CopyFromReg(EFLAGS) -> CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an implicit-def on the instruction, where the result numbers in the DAG and the Uses list in TableGen matched up precisely. The Copy notation seems much more robust, so this patch extends ScheduleDAG rather than refactoring x86. Should fix PR20376. llvm-svn: 220529
* DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle ↵David Blaikie2014-10-235-51/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument ordering of other arguments (abstract arguments) in the same way and already have code for that too. While refactoring this code I was confused by both the name I had introduced (addNonArgumentVariable... but it has all this logic to handle argument numbering and keep things in order?) and by the redundancy. Seems when I fixed the misordered inlined argument handling, I didn't realize it was mostly redundant with the argument ordering code (which I may've also written, I'm not sure). So let's just rely on the more general case. The only oddity in output this produces is that it means when we emit all the variables for the current function, we don't track when we've finished the argument variables and are about to start the local variables and insert DW_AT_unspecified_parameters (for varargs functions) there. Instead it ends up after the local variables, scopes, etc. But this isn't invalid and doesn't cause DWARF consumers problems that I know of... so we'll just go with that because it makes the code nice & simple. (though, let's see what the buildbots have to say about this - *crosses fingers*) There will be some cleanup commits to follow to remove the now trivial wrappers, etc. llvm-svn: 220527
* DebugInfo: Sink DwarfDebug::addNonArgumentScopeVariable into DwarfFile.David Blaikie2014-10-234-35/+35
| | | | llvm-svn: 220520
* DebugInfo: Remove DwarfDebug::addCurrentFnArgument declaration now that it's ↵David Blaikie2014-10-231-4/+0
| | | | | | moved to DwarfFile. llvm-svn: 220515
* DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.David Blaikie2014-10-231-51/+26
| | | | | | | | | | | | | | | | | | | | | | | This fixes a bug (introduced by fixing the IR emitted from Clang where the definition of a static member would be scoped within the class, rather than within its lexical decl context) where the definition of a static variable would be placed inside a class. It also improves source fidelity by scoping static class member definitions inside the lexical decl context in which tehy are written (eg: namespace n { class foo { static int i; } int foo::i; } - the definition of 'i' will be within the namespace 'n' in the DWARF output now). Lastly, and the original goal, this reduces debug info size slightly (and makes debug info easier to read, etc) by placing the definitions of non-member global variables within their namespace, rather than using a separate namespace-scoped declaration along with a definition at global scope. Based on patches and discussion with Frédéric. llvm-svn: 220497
* Remove explicit (void) use of DwarfFile::DD that was accidentally left in ↵David Blaikie2014-10-231-3/+1
| | | | | | | | r220452. Caught in post-commit review by Frédéric. llvm-svn: 220487
* [DebugInfo] Sink DwarfDebug::addCurrentFnArgument down into DwarfFile.David Blaikie2014-10-233-24/+28
| | | | | | | | Variable handling will be sunk into DwarfFile so that abstract variables and the like can be shared across multiple CUs (to handle cross-CU inlining, for example). llvm-svn: 220453
* [DebugInfo] Add DwarfDebug& to DwarfFile.David Blaikie2014-10-233-10/+17
| | | | | | | | Use the DwarfDebug in one function that previously took it as a parameter, and lay the foundation for use this for other operations coming soon. llvm-svn: 220452
* [DebugInfo] Remove LexicalScopes::isCurrentFunctionScope and CSE a use of ↵David Blaikie2014-10-232-13/+19
| | | | | | | | | | LexicalScopes::getCurrentFunctionScope Now that we're sure the only root (non-abstract) scope is the current function scope, there's no need for isCurrentFunctionScope, the property can be tested directly instead. llvm-svn: 220451
* Strength reduce constant-sized vectors into arrays. No functionality change.Benjamin Kramer2014-10-221-2/+2
| | | | llvm-svn: 220412
* Fix typoMatt Arsenault2014-10-221-1/+1
| | | | llvm-svn: 220353
* Add minnum / maxnum codegenMatt Arsenault2014-10-2111-0/+188
| | | | llvm-svn: 220342
* Pacify bots and simplify r220321Arnaud A. de Grandmaison2014-10-211-1/+1
| | | | llvm-svn: 220335
* [PBQP] Teach PassConfig to tell if the default register allocator is used.Arnaud A. de Grandmaison2014-10-211-0/+6
| | | | | | | | This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321
* [PBQP] Fix coalescing benefitsArnaud A. de Grandmaison2014-10-211-2/+2
| | | | | | As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible. llvm-svn: 220302
* Fix a bit of confusion about .set and produce more readable assembly.Rafael Espindola2014-10-211-34/+24
| | | | | | | | | | | | | | | Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256
* Make AsmPrinter::EmitLabelOffsetDifference a static helper and simplify.Rafael Espindola2014-10-212-33/+27
| | | | | | It had exactly one caller in a position where we know hasSetDirective is true. llvm-svn: 220250
* Introduce enum values for previously defined metadata types. (NFC)Philip Reames2014-10-212-5/+5
| | | | | | | | | | | Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
* [PBQP] Replace the interference-constraints algorithm with a faster versionLang Hames2014-10-181-16/+115
| | | | | | | | loosely based on linear scan. On x86-64 this is good for a ~2% drop in compile time on the nightly test suite. llvm-svn: 220143
* Check for dynamic alloca's when selecting lifetime intrinsics.Pete Cooper2014-10-171-1/+7
| | | | | | | | | | | | | | | | | | TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099
* [Stackmaps] Enable invoking the patchpoint intrinsic.Juergen Ributzka2014-10-172-51/+64
| | | | | | | | | | | Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055
* SelectionDAG: Add sext_inreg optimizationsJan Vesely2014-10-171-0/+22
| | | | | | | | | | v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
* Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.Juergen Ributzka2014-10-162-44/+58
| | | | | | | | | | | | This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967
* Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset2014-10-161-67/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
* Avoid caching the MachineFunction, we don't use it outside ofEric Christopher2014-10-151-9/+7
| | | | | | runOnMachineFunction. llvm-svn: 219847
* Simplify handling of --noexecstack by using getNonexecutableStackSection.Rafael Espindola2014-10-152-7/+9
| | | | llvm-svn: 219799
* [MachineSink] Use the real post dominator treeJingyue Wu2014-10-151-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773
* [AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner2014-10-141-0/+12
| | | | | | | | | | | | | | | | | | | | | Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
* Remove unused member variable.Rafael Espindola2014-10-142-5/+3
| | | | | | Fixes pr20904. llvm-svn: 219706
* DebugInfo: Ensure that all debug location scope chains from instructions ↵David Blaikie2014-10-141-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | within a function, lead to the function itself. Let me tell you a tale... Originally committed in r211723 after discovering a nasty case of weird scoping due to inlining, this was reverted in r211724 after it fired in ASan/compiler-rt. (minor diversion where I accidentally committed/reverted again in r211871/r211873) After further testing and fixing bugs in ArgumentPromotion (r211872) and Inlining (r212065) it was recommitted in r212085. Reverted in r212089 after the sanitizer buildbots still showed problems. Fixed another bug in ArgumentPromotion (r212128) found by this assertion. Recommitted in r212205, reverted in r212226 after it crashed some more on sanitizer buildbots. Fix clang some more in r212761. Recommitted in r212776, reverted in r212793. ASan failures. Recommitted in r213391, reverted in r213432, trying to reproduce flakey ASan build failure. Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952 (LiveDebugVariables strips dbg_value intrinsics in functions not described by debug info). Recommitted in r214761, reverted in r214999, flakey failure on Windows buildbot. Fixed DeadArgElimination + DebugInfo bug in r219210. Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic properties in the test-suite on Darwin. Fixed ObjC++ atomic properties issue in Clang in r219690. [This commit is provided 'as is' with no hope that this is the last time I commit this change either expressed or implied] llvm-svn: 219702
* Revert "Fix stuff... again."David Blaikie2014-10-141-7/+2
| | | | | | | | Accidental commit. This reverts commit r219693. llvm-svn: 219695
* Revert some parts of r196288 that were confusing and untested.David Blaikie2014-10-141-8/+2
| | | | | | | If we figure out why they should be here, let's add some testing of some kind so we can better demonstrate why it's needed. llvm-svn: 219694
* Fix stuff... again.David Blaikie2014-10-141-2/+7
| | | | llvm-svn: 219693
* Remove unnecessary TargetMachine.h includes.Eric Christopher2014-10-1425-26/+1
| | | | llvm-svn: 219672
* Grab the subtarget and subtarget dependent variables off ofEric Christopher2014-10-144-21/+10
| | | | | | MachineFunction rather than TargetMachine. llvm-svn: 219671
* Grab the subtarget and subtarget dependent variables off ofEric Christopher2014-10-142-9/+6
| | | | | | MachineFunction rather than TargetMachine. llvm-svn: 219670
* Instead of the TargetMachine cache the MachineFunctionEric Christopher2014-10-141-14/+13
| | | | | | | | and TargetRegisterInfo in the peephole optimizer. This makes it easier to grab subtarget dependent variables off of the MachineFunction rather than the TargetMachine. llvm-svn: 219669
* Access subtarget specific variables off of the MachineFunction'sEric Christopher2014-10-142-6/+4
| | | | | | cached subtarget and not the TargetMachine. llvm-svn: 219668
* Access the subtarget off of the MachineFunction via the DAGEric Christopher2014-10-141-9/+7
| | | | | | | | scheduler or via the SelectionDAG if available. Otherwise grab the subtarget off of the MachineFunction by going up the parent chain. llvm-svn: 219666
* Remove the use and member variable of the TargetMachine fromEric Christopher2014-10-141-6/+4
| | | | | | MachineLICM as we can get the same data off of the MachineFunction. llvm-svn: 219663
* Have MachineInstrBundle use the MachineFunction for subtargetEric Christopher2014-10-141-5/+5
| | | | | | access rather than the TargetMachine. llvm-svn: 219662
* Access the subtarget off of the MachineFunction rather thanEric Christopher2014-10-141-4/+2
| | | | | | through the TargetMachine. llvm-svn: 219661
* Remove the TargetMachine from DFAPacketizer since it was onlyEric Christopher2014-10-141-2/+2
| | | | | | | being used to grab subtarget specific things that we can grab from the MachineFunction anyhow. llvm-svn: 219650
OpenPOWER on IntegriCloud