summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.Akira Hatanaka2014-08-131-0/+5
| | | | | | | | | | Certain functions such as objc_autoreleaseReturnValue have to be called as tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls, we have to fall back to SelectionDAG to select calls that are marked as tail. <rdar://problem/17991614> llvm-svn: 215600
* [FastISel][AArch64] Add support for more addressing modes.Juergen Ributzka2014-08-131-168/+289
| | | | | | | | | | | | | | | | | FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] llvm-svn: 215597
* [FastISel][X86] Add large code model support for materializing ↵Juergen Ributzka2014-08-131-1/+17
| | | | | | | | | | | | | | floating-point constants. In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 215595
* [FastISel][X86] Use XOR to materialize the "0" value.Juergen Ributzka2014-08-131-0/+23
| | | | llvm-svn: 215594
* [FastISel][X86] Emit more efficient instructions for integer constant ↵Juergen Ributzka2014-08-131-1/+28
| | | | | | | | | | | | | materialization. This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 215593
* [FastISel][AArch64] Make use of the zero register when possible.Juergen Ributzka2014-08-131-1/+13
| | | | | | | | | | This change materializes now the value "0" from the zero register. The zero register can be folded by several instruction, so no materialization is need at all. Fixes <rdar://problem/17924413>. llvm-svn: 215591
* [FastISel] Let the target decide first if it wants to materialize a constant.Juergen Ributzka2014-08-131-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588
* [MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]Gerolf Hoflehner2014-08-132-3/+2
| | | | | | | | This is a cleaner solution to the problem described in r215431. When instructions are combined a dangling DBG_VALUE is removed. This resolves bug 20598. llvm-svn: 215587
* [FastISel][X86] Refactor constant materialization. NFCI.Juergen Ributzka2014-08-131-54/+67
| | | | | | | Split the constant materialization code into three separate helper functions for Integer-, Floating-Point-, and GlobalValue-Constants. llvm-svn: 215586
* [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.Juergen Ributzka2014-08-131-0/+3
| | | | | | | | This change is also in preparation for a future change to make sure that the constant materialization uses MOVT/MOVW when available and not a load from the constant pool. llvm-svn: 215584
* [FastISel][ARM] Fix a bug in the integer materialization code.Juergen Ributzka2014-08-131-1/+3
| | | | | | | | | | | | getRegClassFor returns the incorrect register class when in Thumb2 mode. This fix simply manually selects the register class as in the code just a few lines above. There is no test case for this code, because the code is currently unreachable. This will be changed in a future commit and existing test cases will exercise this code. llvm-svn: 215583
* [FastISel][AArch64] Cleanup constant materialization code. NFCI.Juergen Ributzka2014-08-131-26/+30
| | | | | | Cleanup and prepare constant materialization code for future commits. llvm-svn: 215582
* [Cleanup] Utility function to erase instruction and mark DBG_ValuesGerolf Hoflehner2014-08-132-12/+24
| | | | | | | | | | New function to erase a machine instruction and mark DBG_VALUE for removal. A DBG_VALUE is marked for removal when it references an operand defined in the instruction. Use the new function to cleanup code in dead machine instruction removal pass. llvm-svn: 215580
* [MachineDominatorTree] Provide a method to inform a MachineDominatorTree that aQuentin Colombet2014-08-132-25/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge has been split. The MachineDominatorTree will when lazy update the underlying dominance properties when require. ** Context ** This is a follow-up of r215410. Each time a critical edge is split this invalidates the dominator tree information. Thus, subsequent queries of that interface will be slow until the underlying information is actually recomputed (costly). ** Problem ** Prior to this patch, splitting a critical edge needed to query the dominator tree to update the dominator information. Therefore, splitting a bunch of critical edges will likely produce poor performance as each query to the dominator tree will use the slow query path. This happens a lot in passes like MachineSink and PHIElimination. ** Proposed Solution ** Splitting a critical edge is a local modification of the CFG. Moreover, as soon as a critical edge is split, it is not critical anymore and thus cannot be a candidate for critical edge splitting anymore. In other words, the predecessor and successor of a basic block inserted on a critical edge cannot be inserted by critical edge splitting. Using these observations, we can pile up the splitting of critical edge and apply then at once before updating the DT information. The core of this patch moves the update of the MachineDominatorTree information from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree. ** Performance ** Thanks to this patch, the motivating example compiles in 4- minutes instead of 6+ minutes. No test case added as the motivating example as nothing special but being huge! The binaries are strictly identical for all the llvm test-suite + SPECs with and without this patch for both Os and O3. Regarding compile time, I observed only noise, although on average I saw a small improvement. <rdar://problem/17894619> llvm-svn: 215576
* utils: Fix segfault in flattencfgJan Vesely2014-08-131-4/+5
| | | | | | | | | | | | | | v2: continue iterating through the rest of the bb use for loop v3: initialize FlattenCFG pass in ScalarOps add test v4: split off initializing flattencfg to a separate patch add comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215574
* Initialize FlattenCFG passJan Vesely2014-08-131-0/+1
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215573
* R600: Correctly set the src value offset for scalarized kernel argsMatt Arsenault2014-08-131-11/+29
| | | | | | | | | | This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564
* Canonicalize header guards into a common format.Benjamin Kramer2014-08-13373-819/+841
| | | | | | | | | | Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
* [DAGCombiner] Improved target independent vector shuffle combine rule.Andrea Di Biagio2014-08-131-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
* [mips] Refactor calls to setCanHaveModuleDir.Toma Tabacu2014-08-132-133/+55
| | | | | | | | | | | | | | | | | Summary: Moved some calls to setCanHaveModuleDir to the MipsTargetStreamer base class and removed the resulting empty functions from the MipsTargetELFStreamer class. Also fixed a missing call to setCanHaveModuleDir in MipsTargetELFStreamer::emitDirectiveSetMicroMips. Reviewers: dsanders Reviewed By: dsanders Subscribers: tomatabacu Differential Revision: http://reviews.llvm.org/D4781 llvm-svn: 215542
* [optnone] Make the optnone attribute effective at suppressing functionChandler Carruth2014-08-131-7/+13
| | | | | | | | | | | | | attribute and function argument attribute synthesizing and propagating. As with the other uses of this attribute, the goal remains a best-effort (no guarantees) attempt to not optimize the function or assume things about the function when optimizing. This is particularly useful for compiler testing, bisecting miscompiles, triaging things, etc. I was hitting specific issues using optnone to isolate test code from a test driver for my fuzz testing, and this is one step of fixing that. llvm-svn: 215538
* Silence a -Wparenthesis warning with these asserts. NFC.Aaron Ballman2014-08-131-2/+2
| | | | llvm-svn: 215537
* [SKX] Extended non-temporal load/store instructions for AVX512VL subsets.Robert Khasanov2014-08-133-34/+67
| | | | | | | | | Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction. Added encoding and lowering tests. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 215536
* Re-commit: [mips] Implement .ent, .end, .frame, .mask and .fmask.Daniel Sanders2014-08-133-22/+263
| | | | | | | | | | | Patch by Matheus Almeida and Toma Tabacu The lld test failure on the previous attempt to commit was caused by the addition of the .pdr section causing the offsets it was checking to change. This has been fixed by removing the .ent/.end directives from that test since they weren't really needed. llvm-svn: 215535
* Revert r215415 which causse MSan to crash on a great deal of C++ code.Chandler Carruth2014-08-131-10/+0
| | | | | | I've followed up on the original commit as well. llvm-svn: 215532
* AVX-512: Fixed a bug in shufflevector lowering.Elena Demikhovsky2014-08-131-1/+2
| | | | | | | PALIGNR instruction does not exist in AVX-512F set. Added a test. llvm-svn: 215526
* InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b)Karthik Bhat2014-08-131-0/+12
| | | | | | | | | | | | | Correctness proof of the transform using CVC3- $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B; $ cvc3 t.cvc Valid. llvm-svn: 215524
* [NVPTX] Remove MemIntrinsicSDNode/MemSDNode duplicate checkingHal Finkel2014-08-131-7/+0
| | | | | | | | | | | As of r214452, isa<MemSDNode> will return true for nodes for which isa<MemIntrinsicSDNode> will return true (classof now respects the actual class hierarchy). So we no longer need to check for both MemIntrinsicSDNode and MemSDNode separately. No functionality change intended. llvm-svn: 215523
* [x86] Rewrite a core part of the new vector shuffle lowering to handleChandler Carruth2014-08-131-22/+122
| | | | | | | | | | | | | | | | | | | | one pesky test case correctly. This test case caused the old code to infloop occilating between solving the low-half and the high-half. The 'side balancing' part of single-input v8 shuffle lowering didn't handle the one pattern which can cause it to occilate. Fortunately the fuzz testing found this case. Unfortuately it was *terrible* to handle. I'm really sorry for the amount and density of the code here, I'd love suggestions on how to simplify it. I feel like there *must* be a simpler form here, but after a lot of days I've not found it. This is the only one I've found that even works. I've added the one pesky test case along with some nice comments explaining the core problem that we have to solve here. So far this has survived approximately 32k test cases. More strenuous fuzzing commencing. llvm-svn: 215519
* [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsicHal Finkel2014-08-134-3/+85
| | | | | | | | | | | | | | | | | This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512
* Remove a condition that can never be true, as wittnessed by the assertAdrian Prantl2014-08-121-3/+2
| | | | | | above. llvm-svn: 215477
* [AVX512] Handle valign masking intrinsic via C++ loweringAdam Nemet2014-08-122-16/+16
| | | | | | | | | | | | | | I think that this will scale better in most cases than adding a Pat<> for each mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We can just lower to the SDNode and have the resulting DAG be matches by the DAG patterns. Alternatively (long term), we could keep the Pat<>s but generate them via the new AVX512_masking multiclass. The difficulty is that in order to formulate that we would have to concatenate DAGs. Currently this is only supported if the operators of the input DAGs are identical. llvm-svn: 215473
* Allwo bitcast + struct GEP transform to work with addrspacecastMatt Arsenault2014-08-121-3/+20
| | | | llvm-svn: 215467
* R600: Use optimized 24bit path in udivremJan Vesely2014-08-122-18/+39
| | | | | | | | | v2: drop enum keyword use correct extension mode don't bother computing the sign in unsinged case Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215462
* R600: Remove unused code.Jan Vesely2014-08-122-174/+0
| | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215461
* R600: Use i24 optimized path for SREMJan Vesely2014-08-122-8/+28
| | | | | | | | | v2: add tests rename LowerSDIV24 to LowerSDIVREM24 handle the rem part in this function Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215460
* Fix a parentheses warning introduced in r215394.Quentin Colombet2014-08-121-2/+2
| | | | llvm-svn: 215459
* Don't upgrade global constructors when reading bitcodeDuncan P. N. Exon Smith2014-08-122-55/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An optional third field was added to `llvm.global_ctors` (and `llvm.global_dtors`) in r209015. Most of the code has been changed to deal with both versions of the variables. Users of the C API might create either version, the helper functions in LLVM create the two-field version, and clang now creates the three-field version. However, the BitcodeReader was changed to always upgrade to the three-field version. This created an unnecessary inconsistency in the IR before/after serializing to bitcode. This commit resolves the inconsistency by making the third field truly optional (and not upgrading in the bitcode reader). Since `llvm-link` was relying on this upgrade code, rather than deleting it I've moved it into `ModuleLinker`, where it upgrades these arrays as necessary to resolve inconsistencies between modules. The ideal resolution would be to remove the 2-field version and make the third field required. I filed PR20506 to track that. I changed `test/Bitcode/upgrade-global-ctors.ll` to a negative test and duplicated the `llvm-link` check in `test/Linker/global_ctors.ll` to check both upgrade directions. Since I came across this as part of PR5680 (serializing use-list order), I've also added the missing `verify-uselistorder` RUN line to `test/Bitcode/metadata-2.ll`. llvm-svn: 215457
* fixed typosSanjay Patel2014-08-121-5/+5
| | | | llvm-svn: 215451
* Reverted my "Testing commit access" commit.Toma Tabacu2014-08-121-0/+1
| | | | llvm-svn: 215441
* Testing commit access.Toma Tabacu2014-08-121-1/+0
| | | | llvm-svn: 215440
* Have MachineRegisterInfo take and store the MachineFunction itEric Christopher2014-08-122-3/+3
| | | | | | | | was created for rather than the TargetMachine since we only needed the TM for the subtarget and we can get that from the MF. llvm-svn: 215432
* [MachineCombiner] Fix for ICE bug 20598Gerolf Hoflehner2014-08-121-1/+2
| | | | | | | | | | | | | | | | | | The combiner ignored DBG nodes when checking the uses of a virtual register. It combined a sequence like %vreg1 = madd %vreg2, %vreg3,... DBG_VALUE (%vreg1 ...) %vreg4 = add %vreg1,... to %vreg4 = madd %vreg2, %vreg3 leaving behind a dangling DBG_VALUE with a definition. This triggered an assertion in the MachineTraceMetrics.cpp module. llvm-svn: 215431
* IR: Print a newline when dumping TypesJustin Bogner2014-08-123-4/+4
| | | | | | | | | | | | Type::dump() doesn't print a newline, which makes for a poor experience in a debugger. This looks like it was an ommission considering Value::dump() two lines above, so I've changed Type to add a newline as well. Of the two in-tree callers, one added a newline anyway, and I've updated the other one to use Type::print instead. llvm-svn: 215421
* [LLVM-C] Expose User::getOperandUse as LLVMGetOperandUse.Peter Zotov2014-08-121-0/+5
| | | | | | Patch by Gabriel Radanne <drupyog@zoho.com> llvm-svn: 215419
* DebugLocEntry: Restore the comparison predicate from before theAdrian Prantl2014-08-121-1/+4
| | | | | | | | | refactoring in 215384. This way it can unique multiple entries describing the same piece even if they don't have the exact same location. (The same piece may get merged in and be added from OpenRanges). There ought to be a more elegant solution for this, though. llvm-svn: 215418
* msan: Handle musttail callsReid Kleckner2014-08-121-0/+10
| | | | | | | | | | | | | | | | First, avoid calling setTailCall(false) on musttail calls. The funciton prototypes should be "congruent", so the shadow layout should be exactly the same. Second, avoid inserting instrumentation after a musttail call to propagate the return value shadow. We don't need to propagate the result of a tail call, it should already be in the right place. Reviewed By: eugenis Differential Revision: http://reviews.llvm.org/D4331 llvm-svn: 215415
* Move helper for getting a terminating musttail call to BasicBlockReid Kleckner2014-08-122-30/+36
| | | | | | | | | | | No functional change. To be used in future commits that need to look for such instructions. Reviewed By: rafael Differential Revision: http://reviews.llvm.org/D4504 llvm-svn: 215413
* Revert "Partially revert r214761 that asserted that all concrete debug info ↵David Blaikie2014-08-121-3/+1
| | | | | | | | | | | variables had DIEs, due to a failure on Darwin." I believe this was addressed by r215157 and r215227, so let's have another go at the bots, etc. This reverts commit r214880. llvm-svn: 215412
* [MachineSink] Improve the compile time by preserving the dominance informationQuentin Colombet2014-08-111-39/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as long as possible. ** Context ** Each time the dominance information is modified, the dominator tree analysis switches in a slow query mode. After a few queries without any modification on the dominator tree, it performs an expensive update of its internal structure to provide fast queries again. ** Problem ** Prior to this patch, the MachineSink pass was splitting the critical edges on demand while relying heavy on the dominator tree information. In some cases, this leads to pathological behavior where: - We end up in the slow query mode right after splitting an edge. - We update the dominance information. - We break the dominance information again, thus ending up in the slow query mode and so on. ** Proposed Solution ** To mitigate this effect, this patch postpones all the splitting of the edges at the end of each iteration of the main loop. The benefits are: - The dominance information is valid for the life time of an iteration. - This simplifies the code as we do not have to special treat instructions that are sunk on critical edges. Indeed, the related block will be available through the next iteration. The downside is that when edges splitting is required, this incurs an additional iteration of the main loop compared to the previous scheme. ** Performance ** Thanks to this patch, the motivating example compiles in 6+ minutes instead of 10+ minutes. No test case added as the motivating example as nothing special but being huge! I have measured only noise for both the compile time and the runtime on the llvm test-suite + SPECs with Os and O3. Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also uses the dominance information and therefore, hits this problem. A subsequent patch will address that. <rdar://problem/17894619> llvm-svn: 215410
OpenPOWER on IntegriCloud