summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* use register iterators that include self to reduce code duplication in ↵Sanjay Patel2014-08-061-25/+6
| | | | | | | | | | | | | | | | CriticalAntiDepBreaker This patch addresses 2 FIXME comments that I added to CriticalAntiDepBreaker while fixing PR20020. Initialize an MCSubRegIterator and an MCRegAliasIterator to include the self reg. Assuming that works as advertised, there should be functional difference with this patch, just less code. Also, remove the associated asserts - we're setting those values just before, so the asserts don't do anything meaningful. Differential Revision: http://reviews.llvm.org/D4566 llvm-svn: 214973
* [AVX512] Added load/store instructions to Register2Memory opcode tables.Robert Khasanov2014-08-061-2/+14
| | | | | | | | Added lowering tests for load/store. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214972
* [AArch64] Add a testcase for r214957.James Molloy2014-08-061-1/+8
| | | | llvm-svn: 214965
* Add a new option -run-slp-after-loop-vectorization.James Molloy2014-08-061-15/+44
| | | | | | This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around. llvm-svn: 214963
* ARM: do not generate BLX instructions on Cortex-M CPUs.Tim Northover2014-08-062-3/+3
| | | | | | | | | Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959
* ARM-MachO: materialize callee address correctly on v4t.Tim Northover2014-08-061-1/+4
| | | | llvm-svn: 214958
* [AArch64] Conditional selects are expensive on out-of-order cores.James Molloy2014-08-061-0/+4
| | | | | | | | Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it. This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp. llvm-svn: 214957
* [x86] Fix two independent miscompiles in the process of getting the sameChandler Carruth2014-08-061-39/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is *much* harder than this code makes it out to be -- it requires reasoning about *all* of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle *into* a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954
* [x86] Switch to a formulation of a for loop that is much more obviouslyChandler Carruth2014-08-061-3/+4
| | | | | | | | not corrupting the mask by mutating it more times than intended. No functionality changed (the results were non-overlapping so the old version "worked" but was non-obvious). llvm-svn: 214953
* [X86] Fixes commit r214890 to match the posted patchAdam Nemet2014-08-061-3/+3
| | | | | | This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951
* Correct commentMatt Arsenault2014-08-061-1/+1
| | | | llvm-svn: 214945
* [dfsan] Try not to create too many additional basic blocks in functions whichPeter Collingbourne2014-08-061-20/+46
| | | | | | | already have a large number of blocks. Works around a performance issue with the greedy register allocator. llvm-svn: 214944
* R600: Increase nearby load scheduling threshold.Matt Arsenault2014-08-061-9/+20
| | | | | | | | | This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943
* R600/SI: Implement areLoadsFromSameBasePtrMatt Arsenault2014-08-062-0/+102
| | | | | | | | This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942
* [X86][SchedModel] Fixed some wrong scheduling model found by code inspection.Quentin Colombet2014-08-062-25/+48
| | | | | | | | Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 214940
* DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at ↵David Blaikie2014-08-061-0/+1
| | | | | | | | | | | | | | | | least one range. This was coming in weird debug info that had variables (and hence debug_locs) but was in GMLT mode (because it was missing the 13th field of the compile_unit metadata) so no ranges were constructed. We should always have at least one range for any CU with a debug_loc in it - because the range should cover the debug_loc. The assertion just ensures that the "!= 1" range case inside the subsequent loop doesn't get entered for the case where there are no ranges at all, which should never reach here in the first place. llvm-svn: 214939
* R600/SI: Add definitions for ds_read2st64_ / ds_write2st64_Matt Arsenault2014-08-051-3/+4
| | | | llvm-svn: 214936
* Fix typos in comments and docJF Bastien2014-08-054-5/+5
| | | | | | Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934
* DebugInfo: Move the reference to the CU from the location list entry to the ↵David Blaikie2014-08-054-19/+14
| | | | | | | | | | list itself, since it is constant across an entire list. This simplifies construction and usage while making the data structure smaller. It was a holdover from the days when we didn't have a separate DebugLocList and all we had was a flat list of DebugLocEntries. llvm-svn: 214933
* Remove a virtual function from TargetMachine. NFC.Rafael Espindola2014-08-055-8/+10
| | | | llvm-svn: 214929
* Re-apply r214881: Fix return sequence on armv4 thumbJonathan Roelofs2014-08-053-20/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928
* [PowerPC] Swap arguments and adjust shift count for vsldoi on little endianBill Schmidt2014-08-053-16/+41
| | | | | | | | | | | | | | Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923
* Don't internalize all but main by default.Rafael Espindola2014-08-052-12/+3
| | | | | | | | | | | | | | | This is mostly a cleanup, but it changes a fairly old behavior. Every "real" LTO user was already disabling the silly internalize pass and creating the internalize pass itself. The difference with this patch is for "opt -std-link-opts" and the C api. Now to get a usable behavior out of opt one doesn't need the funny looking command line: opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts llvm-svn: 214919
* [x86] Fix a crasher due to shuffles which cancel each other out and addChandler Carruth2014-08-051-6/+11
| | | | | | | | | | | | | | | | | | | | | | a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914
* Remove dead code in conditionDuncan P. N. Exon Smith2014-08-051-2/+2
| | | | | | | Whether or not it's appropriate, labels have been first-class types since r51511. llvm-svn: 214908
* X86CodeEmitter.cpp: Add SEH_Epilogue to ignored list for legacy JIT, ↵NAKAMURA Takumi2014-08-051-0/+1
| | | | | | corresponding to r214775. llvm-svn: 214905
* [X86] Improve comments for r214888Adam Nemet2014-08-051-8/+14
| | | | | | A rebase somehow ate my comments. This restores them. llvm-svn: 214903
* R600/SI: Use register class instead of list of registersMatt Arsenault2014-08-051-1/+1
| | | | | | I'm not sure if this has any consequence or not. llvm-svn: 214902
* R600/SI: Add exec_lo and exec_hi subregisters.Matt Arsenault2014-08-051-2/+10
| | | | | | | | | | | This allows accessing an SReg subregister with a normal subregister index, instead of getting a machine verifier error. Also be sure to include all of these subregisters in SReg_32. This fixes inferring SGPR instead of SReg when finding a super register class. llvm-svn: 214901
* BitcodeReader: Fix non-determinism in use-list orderDuncan P. N. Exon Smith2014-08-052-3/+15
| | | | | | | | | | | | `BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied in a non-deterministic order. When predicting use-list order I've worked around this another way, but even when parsing lazily (and we can't recreate use-list order) use-lists should be deterministic. Make them so by using a side-queue of functions with forward-referenced blocks that gets visited in order. llvm-svn: 214899
* Remove dead zero store to calloc initialized memoryPhilip Reames2014-08-051-15/+35
| | | | | | | | | | | | | | | | | Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897
* Revert r214881 because it broke lots of build-botsJonathan Roelofs2014-08-053-71/+20
| | | | llvm-svn: 214893
* Optimize vector fabs of bitcasted constant integer values.Sanjay Patel2014-08-051-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
* [AVX512] Add masking variant and intrinsics for valignd/qAdam Nemet2014-08-051-5/+34
| | | | | | | | | | | | | This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890
* [X86] Increase X86_MAX_OPERANDS from 5 to 6Adam Nemet2014-08-051-1/+1
| | | | | | | | | | | | | | | | | | | This controls the number of operands in the disassembler's x86OperandSets table. The entries describe how the operand is encoded and its type. Not to surprisingly 5 operands is insufficient for AVX512. Consider VALIGNDrrik in the next patch. These are its operand specifiers: { /* 328 */ { ENCODING_DUP, TYPE_DUP1 }, { ENCODING_REG, TYPE_XMM512 }, { ENCODING_WRITEMASK, TYPE_VK8 }, { ENCODING_VVVV, TYPE_XMM512 }, { ENCODING_RM_CD64, TYPE_XMM512 }, { ENCODING_IB, TYPE_IMM8 }, }, llvm-svn: 214889
* [X86] Add lowering to VALIGNAdam Nemet2014-08-052-18/+51
| | | | | | | | | | | | | | | This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these *at the same time*, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. llvm-svn: 214888
* [X86] Separate DAG node for valign and palignrAdam Nemet2014-08-053-0/+5
| | | | | | | | They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887
* [AVX512] alignr: Use suffix rather than name argument to multiclassAdam Nemet2014-08-051-5/+5
| | | | | | | Again no functional change. This prepares for the suffix to be used with the intrinsic matching. llvm-svn: 214886
* [AVX512] Pull everything alignr-related into the multiclassAdam Nemet2014-08-051-13/+12
| | | | | | | | | The packed integer pattern becomes the DAG pattern for rri and the packed float, another Pat<> inside the multiclass. No functional change. llvm-svn: 214885
* Wrap long linesAdam Nemet2014-08-051-4/+6
| | | | llvm-svn: 214884
* Fix return sequence on armv4 thumbJonathan Roelofs2014-08-053-20/+71
| | | | | | | | | | | | | | | | | | | | | | | | | POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881
* Partially revert r214761 that asserted that all concrete debug info ↵David Blaikie2014-08-051-1/+2
| | | | | | | | variables had DIEs, due to a failure on Darwin. I'll work on a reduction and fix after this. llvm-svn: 214880
* Add accessors for the PPC 403 bank registers.Joerg Sonnenberger2014-08-051-0/+9
| | | | llvm-svn: 214875
* Specify that the thumb setend and blx <immed> instructions are not valid on ↵Keith Walker2014-08-051-2/+2
| | | | | | an m-class target llvm-svn: 214871
* Define stc2/stc2l/ldc2/ldc2l as thumb2 instructionsKeith Walker2014-08-051-4/+4
| | | | llvm-svn: 214868
* Accessors for SSR2 and SSR3 on PPC 403.Joerg Sonnenberger2014-08-051-0/+6
| | | | llvm-svn: 214867
* R600/SI: Update MUBUF assembly string to match AMD proprietary compilerTom Stellard2014-08-053-21/+97
| | | | llvm-svn: 214866
* R600/SI: Avoid generating REGISTER_LOAD instructions.Tom Stellard2014-08-051-1/+2
| | | | | | | SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865
* Add dci/ici instructions for PPC 476 and friends.Joerg Sonnenberger2014-08-051-0/+16
| | | | llvm-svn: 214864
* Add mftblo and mftbhi for PPC 4xx.Joerg Sonnenberger2014-08-051-0/+5
| | | | llvm-svn: 214863
OpenPOWER on IntegriCloud