summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [FastISel] Remove an performance debugging assert.Juergen Ributzka2014-08-151-1/+0
| | | | | | | | As Jim pointed out this assert isn't really needed to test for correctness, because the code right afterwards does the same check and falls-back to SelectionDAG - as intended. llvm-svn: 215735
* Delete dead code. NFC.Rafael Espindola2014-08-152-5/+0
| | | | llvm-svn: 215720
* Revert several FastISel commits to track down a buildbot error.Juergen Ributzka2014-08-141-21/+15
| | | | | | | | | | | | This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
* optimize vector fneg of bitcasted integer valueSanjay Patel2014-08-141-9/+14
| | | | | | | | | | This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
* [SDAG] Fix a bug in the DAG combiner where we would fail to return theChandler Carruth2014-08-141-5/+1
| | | | | | | | | | | | | | | | | | | | | | | input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being *completely confusing* and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return *that exact node*.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine *just right*. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622
* [SDAG] Fix a case where we would iteratively legalize a node duringChandler Carruth2014-08-141-0/+2
| | | | | | | | | | | | | | | | | | combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611
* [FastISel] Let the target decide first if it wants to materialize a constant.Juergen Ributzka2014-08-131-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588
* [MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]Gerolf Hoflehner2014-08-131-1/+1
| | | | | | | | This is a cleaner solution to the problem described in r215431. When instructions are combined a dangling DBG_VALUE is removed. This resolves bug 20598. llvm-svn: 215587
* [Cleanup] Utility function to erase instruction and mark DBG_ValuesGerolf Hoflehner2014-08-132-12/+24
| | | | | | | | | | New function to erase a machine instruction and mark DBG_VALUE for removal. A DBG_VALUE is marked for removal when it references an operand defined in the instruction. Use the new function to cleanup code in dead machine instruction removal pass. llvm-svn: 215580
* [MachineDominatorTree] Provide a method to inform a MachineDominatorTree that aQuentin Colombet2014-08-132-25/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge has been split. The MachineDominatorTree will when lazy update the underlying dominance properties when require. ** Context ** This is a follow-up of r215410. Each time a critical edge is split this invalidates the dominator tree information. Thus, subsequent queries of that interface will be slow until the underlying information is actually recomputed (costly). ** Problem ** Prior to this patch, splitting a critical edge needed to query the dominator tree to update the dominator information. Therefore, splitting a bunch of critical edges will likely produce poor performance as each query to the dominator tree will use the slow query path. This happens a lot in passes like MachineSink and PHIElimination. ** Proposed Solution ** Splitting a critical edge is a local modification of the CFG. Moreover, as soon as a critical edge is split, it is not critical anymore and thus cannot be a candidate for critical edge splitting anymore. In other words, the predecessor and successor of a basic block inserted on a critical edge cannot be inserted by critical edge splitting. Using these observations, we can pile up the splitting of critical edge and apply then at once before updating the DT information. The core of this patch moves the update of the MachineDominatorTree information from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree. ** Performance ** Thanks to this patch, the motivating example compiles in 4- minutes instead of 6+ minutes. No test case added as the motivating example as nothing special but being huge! The binaries are strictly identical for all the llvm test-suite + SPECs with and without this patch for both Os and O3. Regarding compile time, I observed only noise, although on average I saw a small improvement. <rdar://problem/17894619> llvm-svn: 215576
* Canonicalize header guards into a common format.Benjamin Kramer2014-08-1336-74/+74
| | | | | | | | | | Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
* [DAGCombiner] Improved target independent vector shuffle combine rule.Andrea Di Biagio2014-08-131-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
* [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsicHal Finkel2014-08-132-3/+5
| | | | | | | | | | | | | | | | | This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512
* Remove a condition that can never be true, as wittnessed by the assertAdrian Prantl2014-08-121-3/+2
| | | | | | above. llvm-svn: 215477
* Fix a parentheses warning introduced in r215394.Quentin Colombet2014-08-121-2/+2
| | | | llvm-svn: 215459
* Have MachineRegisterInfo take and store the MachineFunction itEric Christopher2014-08-122-3/+3
| | | | | | | | was created for rather than the TargetMachine since we only needed the TM for the subtarget and we can get that from the MF. llvm-svn: 215432
* DebugLocEntry: Restore the comparison predicate from before theAdrian Prantl2014-08-121-1/+4
| | | | | | | | | refactoring in 215384. This way it can unique multiple entries describing the same piece even if they don't have the exact same location. (The same piece may get merged in and be added from OpenRanges). There ought to be a more elegant solution for this, though. llvm-svn: 215418
* Revert "Partially revert r214761 that asserted that all concrete debug info ↵David Blaikie2014-08-121-3/+1
| | | | | | | | | | | variables had DIEs, due to a failure on Darwin." I believe this was addressed by r215157 and r215227, so let's have another go at the bots, etc. This reverts commit r214880. llvm-svn: 215412
* [MachineSink] Improve the compile time by preserving the dominance informationQuentin Colombet2014-08-111-39/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as long as possible. ** Context ** Each time the dominance information is modified, the dominator tree analysis switches in a slow query mode. After a few queries without any modification on the dominator tree, it performs an expensive update of its internal structure to provide fast queries again. ** Problem ** Prior to this patch, the MachineSink pass was splitting the critical edges on demand while relying heavy on the dominator tree information. In some cases, this leads to pathological behavior where: - We end up in the slow query mode right after splitting an edge. - We update the dominance information. - We break the dominance information again, thus ending up in the slow query mode and so on. ** Proposed Solution ** To mitigate this effect, this patch postpones all the splitting of the edges at the end of each iteration of the main loop. The benefits are: - The dominance information is valid for the life time of an iteration. - This simplifies the code as we do not have to special treat instructions that are sunk on critical edges. Indeed, the related block will be available through the next iteration. The downside is that when edges splitting is required, this incurs an additional iteration of the main loop compared to the previous scheme. ** Performance ** Thanks to this patch, the motivating example compiles in 6+ minutes instead of 10+ minutes. No test case added as the motivating example as nothing special but being huge! I have measured only noise for both the compile time and the runtime on the llvm test-suite + SPECs with Os and O3. Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also uses the dominance information and therefore, hits this problem. A subsequent patch will address that. <rdar://problem/17894619> llvm-svn: 215410
* [x86] Fold extract_vector_elt of a load into the Load's address computation.Michael J. Spencer2014-08-111-90/+125
| | | | llvm-svn: 215409
* Add a couple of convenience accessors to DebugLocEntry::Value to furtherAdrian Prantl2014-08-112-13/+12
| | | | | | simplify common usage patterns. llvm-svn: 215407
* Make these DebugLocEntry::Value comparison operators friend functionsAdrian Prantl2014-08-111-24/+34
| | | | | | as suggested by dblaikie in a comment on r215384. llvm-svn: 215403
* Add isRegSequence property.Quentin Colombet2014-08-111-0/+25
| | | | | | | | | | | This patch adds a new property: isRegSequence and the related target hooks: TargetIntrInfo::getRegSequenceInputs and TargetInstrInfo::getRegSequenceLikeInputs to specify that a target specific instruction is a (kind of) REG_SEQUENCE. <rdar://problem/12702965> llvm-svn: 215394
* Debug info: Remove an obsolete constructor from DebugLocEntry.Adrian Prantl2014-08-111-2/+2
| | | | llvm-svn: 215387
* Debug info: Modify DebugLocEntry::addValue to take multiple values so itAdrian Prantl2014-08-112-8/+9
| | | | | | only has to sort/unique values once per batch. llvm-svn: 215386
* Debug info: Further simplify the implementation of buildLocationList byAdrian Prantl2014-08-111-6/+6
| | | | | | getting rid of the redundant DIVariable in the OpenRanges pair. llvm-svn: 215385
* Debug Info: Move the sorting and uniqueing of pieces from emitLocPieces()Adrian Prantl2014-08-112-18/+22
| | | | | | | into buildLocationList(). By keeping the list of Values sorted, DebugLocEntry::Merge can also merge multi-piece entries. llvm-svn: 215384
* Debug info: Refactor DebugLocEntry's Merge function to makeAdrian Prantl2014-08-112-14/+37
| | | | | | | | | | | | buildLocationLists easier to read. The previous implementation conflated the merging of individual pieces and the merging of entire DebugLocEntries. By splitting this functionality into two separate functions the intention of the code should be clearer. llvm-svn: 215383
* PeepholeOptimizer: make parameter ref to SmallPtrSetImplHans Wennborg2014-08-111-2/+2
| | | | | | | This makes the function type independent of the in-line size of LocalMIs. llvm-svn: 215356
* Make this SmallVector size a power of two as suggested by ChandlerHans Wennborg2014-08-111-1/+1
| | | | llvm-svn: 215355
* In Machine CSE pass, the source register of a COPY machine instruction canJiangning Liu2014-08-111-11/+19
| | | | | | | | be propagated to all its users, and this propagation could increase the probability of finding common subexpressions. If the COPY has only one user, the COPY itself can be removed. llvm-svn: 215344
* Re-commit "Increase the size of this SmallVector in PeepholeOptimizer." ↵Hans Wennborg2014-08-111-3/+3
| | | | | | | | | (r215340) This time, also update the function that receives a reference to the SmallPtrSet as a parameter. llvm-svn: 215342
* Revert "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)Hans Wennborg2014-08-111-1/+1
| | | | | | | | | | | | | That broke the build: /data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>' Changed |= optimizeExtInstr(MI, MBB, LocalMIs); ^~~~~~~~ /data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here SmallPtrSet<MachineInstr*, 8> &LocalMIs) { ^ llvm-svn: 215341
* Increase the size of this SmallVector in PeepholeOptimizer.Hans Wennborg2014-08-111-1/+1
| | | | | | During a Clang build, the median size of this was 9 llvm-svn: 215340
* Increase the size of SpillPlacement::BlockFrequencies.Hans Wennborg2014-08-111-2/+2
| | | | | | This SmallVector's median size during a Clang build was 7. llvm-svn: 215338
* Increase the size of this SmallVector in CloneNodeWithValues.Hans Wennborg2014-08-111-1/+1
| | | | | | In a Clang bootstrap, the size of this vector was always 6. llvm-svn: 215335
* Increase the size of DwarfAccelTable::TableHeaderData::Atoms.Hans Wennborg2014-08-111-1/+1
| | | | | | During a Clang bootstrap, it seems this SmallVector always contains 3 elements. llvm-svn: 215334
* Add support for scalarizing cttz_zero_undefPetar Jovanovic2014-08-101-0/+1
| | | | | | | | | Follow up to r214266. Add missing case in ScalarizeVectorResult() for cttz_zero_undef. Differential Revision: http://reviews.llvm.org/D4813 llvm-svn: 215330
* CodeGen: switch to a range based for loopSaleem Abdulrasool2014-08-091-3/+4
| | | | | | Use a range based for loop instead of manual iteration. NFC. llvm-svn: 215287
* DebugInfo: Recommit (reverted in r215217, originally committed in r215157) ↵David Blaikie2014-08-081-0/+1
| | | | | | | | | the assertion that no argument variable is overwritten by subsequent argument variables. This turned up a bug in clang where arguments were emitted with duplicate argument numbers (see r215227). llvm-svn: 215228
* Added a TLI hook to signal that the target does not have or does not care aboutPedro Artigas2014-08-082-5/+11
| | | | | | | | floating point exceptions, added use of flag to fold potentially exception raising floating point math in selection DAG. No functionality change, as targets have to explicitly ask for this behavior and none does today. llvm-svn: 215222
* DebugInfo: Remove assertion (added in r215157) that's firing on a blocksDavid Blaikie2014-08-081-1/+0
| | | | | | test in the test-suite while I investigate further. llvm-svn: 215217
* Add missing Interpreter intrinsic lowering for sin, cos and ceilJosh Klontz2014-08-081-0/+12
| | | | llvm-svn: 215209
* [pr19635] Revert most of r170537, and add new testcase.Patrik Hagglund2014-08-081-1/+1
| | | | | | | | Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190
* [stack protector] Look through bitcasts to get global variableAkira Hatanaka2014-08-071-9/+19
| | | | | | | | | | | | | | | | __stack_chk_guard. Handle the case where the pointer operand of the load instruction that loads the stack guard is not a global variable but instead a bitcast. %StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**) call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot) Original test case provided by Ana Pazos. This fixes PR20558. llvm-svn: 215167
* DebugInfo: Fix overwriting/loss of inlined arguments to recursively inlined ↵David Blaikie2014-08-071-4/+2
| | | | | | | | | | | | | | | | | | | | | functions. Due to an unnecessary special case, inlined arguments that happened to be from the same function as they were inlined into were misclassified as non-inline arguments and would overwrite the non-inlined arguments. Assert that we never overwrite a function's arguments, and stop misclassifying inlined arguments as non-inline arguments to fix this issue. Excuse the rather crappy test case - handcrafted IR might do better, or someone who understands better how to tickle the inliner to create a recursive inlining situation like this (though it may also be necessary to tickle the variable in a particular way to cause it to be recorded in the MMI side table and go down this particular path for location information). llvm-svn: 215157
* Temporarily Revert "Nuke the old JIT." as it's not quite ready toEric Christopher2014-08-076-4/+42
| | | | | | | | | | | be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
* Debugging Utility - optional ability for dumping critical path lengthGerolf Hoflehner2014-08-071-2/+16
| | | | llvm-svn: 215153
* MachineCombiner Pass for selecting faster instruction sequence on AArch64Gerolf Hoflehner2014-08-071-1/+3
| | | | | | | | | | Re-commit of r214832,r21469 with a work-around that avoids the previous problem with gcc build compilers The work-around is to use SmallVector instead of ArrayRef of basic blocks in preservesResourceLen()/MachineCombiner.cpp llvm-svn: 215151
* test commit: remove trailing whitespace.Frederic Riss2014-08-071-2/+2
| | | | llvm-svn: 215138
OpenPOWER on IntegriCloud