summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Improve comments for r214888Adam Nemet2014-08-051-8/+14
| | | | | | A rebase somehow ate my comments. This restores them. llvm-svn: 214903
* R600/SI: Use register class instead of list of registersMatt Arsenault2014-08-051-1/+1
| | | | | | I'm not sure if this has any consequence or not. llvm-svn: 214902
* R600/SI: Add exec_lo and exec_hi subregisters.Matt Arsenault2014-08-051-2/+10
| | | | | | | | | | | This allows accessing an SReg subregister with a normal subregister index, instead of getting a machine verifier error. Also be sure to include all of these subregisters in SReg_32. This fixes inferring SGPR instead of SReg when finding a super register class. llvm-svn: 214901
* BitcodeReader: Fix non-determinism in use-list orderDuncan P. N. Exon Smith2014-08-052-3/+15
| | | | | | | | | | | | `BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied in a non-deterministic order. When predicting use-list order I've worked around this another way, but even when parsing lazily (and we can't recreate use-list order) use-lists should be deterministic. Make them so by using a side-queue of functions with forward-referenced blocks that gets visited in order. llvm-svn: 214899
* Remove dead zero store to calloc initialized memoryPhilip Reames2014-08-051-15/+35
| | | | | | | | | | | | | | | | | Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897
* Revert r214881 because it broke lots of build-botsJonathan Roelofs2014-08-053-71/+20
| | | | llvm-svn: 214893
* Optimize vector fabs of bitcasted constant integer values.Sanjay Patel2014-08-051-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
* [AVX512] Add masking variant and intrinsics for valignd/qAdam Nemet2014-08-051-5/+34
| | | | | | | | | | | | | This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890
* [X86] Increase X86_MAX_OPERANDS from 5 to 6Adam Nemet2014-08-051-1/+1
| | | | | | | | | | | | | | | | | | | This controls the number of operands in the disassembler's x86OperandSets table. The entries describe how the operand is encoded and its type. Not to surprisingly 5 operands is insufficient for AVX512. Consider VALIGNDrrik in the next patch. These are its operand specifiers: { /* 328 */ { ENCODING_DUP, TYPE_DUP1 }, { ENCODING_REG, TYPE_XMM512 }, { ENCODING_WRITEMASK, TYPE_VK8 }, { ENCODING_VVVV, TYPE_XMM512 }, { ENCODING_RM_CD64, TYPE_XMM512 }, { ENCODING_IB, TYPE_IMM8 }, }, llvm-svn: 214889
* [X86] Add lowering to VALIGNAdam Nemet2014-08-052-18/+51
| | | | | | | | | | | | | | | This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these *at the same time*, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. llvm-svn: 214888
* [X86] Separate DAG node for valign and palignrAdam Nemet2014-08-053-0/+5
| | | | | | | | They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887
* [AVX512] alignr: Use suffix rather than name argument to multiclassAdam Nemet2014-08-051-5/+5
| | | | | | | Again no functional change. This prepares for the suffix to be used with the intrinsic matching. llvm-svn: 214886
* [AVX512] Pull everything alignr-related into the multiclassAdam Nemet2014-08-051-13/+12
| | | | | | | | | The packed integer pattern becomes the DAG pattern for rri and the packed float, another Pat<> inside the multiclass. No functional change. llvm-svn: 214885
* Wrap long linesAdam Nemet2014-08-051-4/+6
| | | | llvm-svn: 214884
* Fix return sequence on armv4 thumbJonathan Roelofs2014-08-053-20/+71
| | | | | | | | | | | | | | | | | | | | | | | | | POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881
* Partially revert r214761 that asserted that all concrete debug info ↵David Blaikie2014-08-051-1/+2
| | | | | | | | variables had DIEs, due to a failure on Darwin. I'll work on a reduction and fix after this. llvm-svn: 214880
* Add accessors for the PPC 403 bank registers.Joerg Sonnenberger2014-08-051-0/+9
| | | | llvm-svn: 214875
* Specify that the thumb setend and blx <immed> instructions are not valid on ↵Keith Walker2014-08-051-2/+2
| | | | | | an m-class target llvm-svn: 214871
* Define stc2/stc2l/ldc2/ldc2l as thumb2 instructionsKeith Walker2014-08-051-4/+4
| | | | llvm-svn: 214868
* Accessors for SSR2 and SSR3 on PPC 403.Joerg Sonnenberger2014-08-051-0/+6
| | | | llvm-svn: 214867
* R600/SI: Update MUBUF assembly string to match AMD proprietary compilerTom Stellard2014-08-053-21/+97
| | | | llvm-svn: 214866
* R600/SI: Avoid generating REGISTER_LOAD instructions.Tom Stellard2014-08-051-1/+2
| | | | | | | SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865
* Add dci/ici instructions for PPC 476 and friends.Joerg Sonnenberger2014-08-051-0/+16
| | | | llvm-svn: 214864
* Add mftblo and mftbhi for PPC 4xx.Joerg Sonnenberger2014-08-051-0/+5
| | | | llvm-svn: 214863
* Add lswi / stswi for assembler use with a warning to not add patternsJoerg Sonnenberger2014-08-051-0/+10
| | | | | | for them. llvm-svn: 214862
* AArch64: Add support for instruction prefetch intrinsicYi Kong2014-08-051-2/+2
| | | | | | | | | Instruction prefetch is not implemented for AArch64, it is incorrectly translated into data prefetch instruction. Differential Revision: http://reviews.llvm.org/D4777 llvm-svn: 214860
* Teach the SLP Vectorizer that keeping some values live over a callsite can ↵James Molloy2014-08-053-0/+93
| | | | | | | | have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859
* [x86] Reformat some code I moved around in a prior commit but leftChandler Carruth2014-08-051-3/+3
| | | | | | poorly formatted. Sorry about that. llvm-svn: 214853
* Allow binary and for tblgen math.Joerg Sonnenberger2014-08-054-2/+9
| | | | llvm-svn: 214851
* [x86] Fix a crash and wrong-code bug in the new vector lowering allChandler Carruth2014-08-051-18/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the *used* slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] llvm-svn: 214849
* [FastIsel][AArch64] Fix previous commit r214844 (Don't perform ↵Juergen Ributzka2014-08-051-6/+4
| | | | | | | | | | | | | | sign-/zero-extension for function arguments that have already been sign-/zero-extended.) The original code would fail for unsupported value types like i1, i8, and i16. This fix changes the code to only create a sub-register copy for i64 value types and all other types (i1/i8/i16/i32) just use the source register without any modifications. getRegClassFor() is now guarded by the i64 value type check, that guarantees that we always request a register for a valid value type. llvm-svn: 214848
* [FastISel][AArch64] Implement the FastLowerArguments hook.Juergen Ributzka2014-08-051-0/+103
| | | | | | | | | | | | | This implements basic argument lowering for AArch64 in FastISel. It only handles a small subset of the C calling convention. It supports simple arguments that can be passed in GPR and FPR registers. This should cover most of the trivial cases without falling back to SelectionDAG. This fixes <rdar://problem/17890986>. llvm-svn: 214846
* Revert "r214832 - MachineCombiner Pass for selecting faster instruction"Kevin Qin2014-08-055-536/+13
| | | | | | | It broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion. llvm-svn: 214845
* [FastISel][AArch64] Don't perform sign-/zero-extension for function ↵Juergen Ributzka2014-08-051-2/+24
| | | | | | arguments that have already been sign-/zero-extended. llvm-svn: 214844
* Provide convenient access to the zext/sext attributes of function arguments. ↵Juergen Ributzka2014-08-051-0/+14
| | | | | | NFC. llvm-svn: 214843
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-05134-739/+482
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* MachineCombiner Pass for selecting faster instructionGerolf Hoflehner2014-08-055-13/+536
| | | | | | | | | | | sequence on AArch64 Re-commit of r214669 without changes to test cases LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and LLVM:: CodeGen/AArch64/dp-3source.ll This resolves the reported compfails of the original commit. llvm-svn: 214832
* Add TCR register accessJoerg Sonnenberger2014-08-041-0/+3
| | | | llvm-svn: 214826
* Add PPC 603's tlbld and tlbli instructions.Joerg Sonnenberger2014-08-041-0/+5
| | | | llvm-svn: 214825
* Allow CP10/CP11 operations on ARMv5/v6Renato Golin2014-08-041-3/+7
| | | | | | | | | | | Those registers are VFP/NEON and vector instructions should be used instead, but old cores rely on those co-processors to enable VFP unwinding. This change was prompted by the libc++abi's unwinding routine and is also present in many legacy low-level bare-metal code that we ought to compile/assemble. Fixing bug PR20025 and allowing PR20529 to proceed with a fix in libc++abi. llvm-svn: 214802
* [PPC64LE] Fix wrong IR for vec_sld and vec_vsldoiBill Schmidt2014-08-041-30/+12
| | | | | | | | | | | | | | | My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800
* Enable Darwin vararg parameters support in assembler macros.Kevin Enderby2014-08-041-1/+1
| | | | | | | | | Duplicate the vararg tests for linux and add a tests which mixed vararg arguments with darwin positional parameters. Patch by: Janne Grunau <j@jannau.net> llvm-svn: 214799
* Changed the liveness tracking in the RegisterScavengerPedro Artigas2014-08-046-74/+86
| | | | | | | | to use register units instead of registers. reviewed by Jakob Stoklund Olesen. llvm-svn: 214798
* Add simplified aliases for access to DCCR, ICCR, DEAR and ESRJoerg Sonnenberger2014-08-041-0/+12
| | | | llvm-svn: 214797
* [FastISel][AArch64] Fix shift lowering for i8 and i16 value types.Juergen Ributzka2014-08-041-15/+13
| | | | | | | | | | | | | | | This fix changes the parameters #r and #s that are passed to the UBFM/SBFM instruction to get the zero/sign-extension for free. The original problem was that the shift left would use the 32-bit shift even for i8/i16 value types, which could leave the upper bits set with "garbage" values. The arithmetic shift right on the other side would use the wrong MSB as sign-bit to determine what bits to shift into the value. This fixes <rdar://problem/17907720>. llvm-svn: 214788
* [SDAG] Fix a really, really terrible bug in the DAG combiner.Chandler Carruth2014-08-041-2/+2
| | | | | | | | | | | | | | | This code is completely wrong. It is also dead, as if it were to *ever* run, it would crash. Fortunately, after my work to the combiner, it is at least *possible* to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785
* tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.Joerg Sonnenberger2014-08-042-0/+39
| | | | llvm-svn: 214784
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-04289-1666/+2147
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* [AArch64] Extend the number of scalar instructions supported in the AdvSIMDChad Rosier2014-08-041-0/+6
| | | | | | | | | scalar integer instruction pass. This is a patch I had lying around from a few months ago. The pass is currently disabled by default, so nothing to interesting. llvm-svn: 214779
* Fix failure to invoke exception handler on Win64Reid Kleckner2014-08-043-0/+48
| | | | | | | | | | | | | When the last instruction prior to a function epilogue is a call, we need to emit a nop so that the return address is not in the epilogue IP range. This is consistent with MSVC's behavior, and may be a workaround for a bug in the Win64 unwinder. Differential Revision: http://reviews.llvm.org/D4751 Patch by Vadim Chugunov! llvm-svn: 214775
OpenPOWER on IntegriCloud