summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Converting the ErrorHandlerMutex to a ManagedStatic to avoid the static ↵Chris Bieneman2014-10-031-4/+5
| | | | | | constructor and destructor. llvm-svn: 219028
* [x86] Adjust the patterns for lowering X86vzmovl nodes which don'tChandler Carruth2014-10-032-47/+54
| | | | | | | | | | | | | | perform a load to use blendps rather than movss when it is available. For non-loads, blendps is *much* faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022
* PR21145: Teach LLVM about C++14 sized deallocation functions.Richard Smith2014-10-032-1/+9
| | | | | | | | C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014
* Revert "Revert "DI: Fold constant arguments into a single MDString""Duncan P. N. Exon Smith2014-10-037-616/+542
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010
* [ISel] Keep matching state consistent when folding during X86 address matchAdam Nemet2014-10-032-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
* R600: Align functions to 256 bytesTom Stellard2014-10-032-3/+12
| | | | llvm-svn: 219002
* Eliminate some deep std::vector copies. NFC.Benjamin Kramer2014-10-0313-45/+21
| | | | llvm-svn: 218999
* MCParser: Modernize memory handling.Benjamin Kramer2014-10-031-37/+22
| | | | | | NFC. llvm-svn: 218998
* llvm-readobj: print out the fields of the COFF delay-import tableRui Ueyama2014-10-031-0/+6
| | | | llvm-svn: 218996
* [Power] Use lwsync for non-seq_cst fencesRobin Morisset2014-10-031-1/+8
| | | | | | | | | | | | | | | | Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995
* MipsAsmParser.cpp: fix VS2012 buildHans Wennborg2014-10-031-1/+1
| | | | llvm-svn: 218991
* HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012Hans Wennborg2014-10-031-2/+2
| | | | llvm-svn: 218990
* [mips] Print warning when using register names not available in N32/64Daniel Sanders2014-10-032-0/+34
| | | | | | | | | | | | | | | | | | | Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989
* Fix build break on HexagonSid Manning2014-10-031-1/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987
* Adding skeleton for unit testing Hexagon Code EmissionSid Manning2014-10-036-8/+173
| | | | | | | | | | | Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986
* [x86] Teach the new vector shuffle lowering to aggressively form MOVSSChandler Carruth2014-10-032-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are *dramatically* faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends *aren't* as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985
* Revert 202433 - Provide a target override for the latest regalloc heuristicRenato Golin2014-10-033-8/+1
| | | | | | | | | | | That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981
* [x86] Refactor the element insertion logic in the new vector shuffleChandler Carruth2014-10-031-19/+21
| | | | | | | | | | | lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980
* [x86] Significantly improve the ability of the new vector shuffleChandler Carruth2014-10-031-26/+30
| | | | | | | | | | | | | | | | lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977
* [x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widenChandler Carruth2014-10-031-4/+8
| | | | | | | | | element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974
* Revert r215343.James Molloy2014-10-031-25/+1
| | | | | | This was contentious and needs invesigation. llvm-svn: 218971
* [BasicAA] Revert r218714 - Make better use of zext and sign information.Lang Hames2014-10-031-29/+2
| | | | | | | | | This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944
* constify TargetMachine parameter.Eric Christopher2014-10-034-5/+6
| | | | llvm-svn: 218934
* llvm-readobj: print COFF delay-load import tableRui Ueyama2014-10-031-13/+90
| | | | | | | | | This patch adds another iterator to access the delay-load import table and use it from llvm-readobj. http://reviews.llvm.org/D5594 llvm-svn: 218933
* constify TargetMachine argument.Eric Christopher2014-10-034-4/+4
| | | | llvm-svn: 218930
* We can grab the options struct from the TargetMachine, no need toEric Christopher2014-10-033-5/+4
| | | | | | pass it down in the constructor. llvm-svn: 218929
* [AVX512] Pull pattern for subvector insert into the instruction definitionAdam Nemet2014-10-021-8/+4
| | | | | | | | | | No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928
* [AVX512] Refactor subvector insertsAdam Nemet2014-10-021-102/+55
| | | | | | | | | | No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927
* [AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rmAdam Nemet2014-10-021-1/+1
| | | | | | | Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926
* [PowerPC] Modern Book-E cores support syncHal Finkel2014-10-024-17/+24
| | | | | | | | | | | | | Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
* [Power] Improve the expansion of atomic loads/storesRobin Morisset2014-10-023-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922
* Fix the threshold added in r186434 (a re-apply of r185393) and updaatedChandler Carruth2014-10-022-31/+27
| | | | | | | | | | | | | | | | | to be a ManagedStatic in r218163 to not be a global variable written and read to from within the innards of SpillPlacement. This will fix a really scary race condition for anyone that has two copies of LLVM running spill placement concurrently. Yikes! This will also fix a really significant compile time hit that r218163 caused because the spill placement threshold read is actually in the *very* hot path of this code. The memory fence on each read was showing up as huge compile time regressions when spilling is responsible for most of the compile time. For example, optimizing sanitized code showed over 50% compile time regressions here. =/ llvm-svn: 218921
* [Stackmaps] Make ithe frame-pointer required for stackmaps.Juergen Ributzka2014-10-022-2/+4
| | | | | | | | | Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920
* Revert "DI: Fold constant arguments into a single MDString"Duncan P. N. Exon Smith2014-10-027-528/+608
| | | | | | This reverts commit r218914 while I investigate some bots. llvm-svn: 218918
* llvm-readobj: print COFF imported symbolsRui Ueyama2014-10-021-0/+90
| | | | | | | | This patch defines a new iterator for the imported symbols. Make a change to COFFDumper to use that iterator to print out imported symbols and its ordinals. llvm-svn: 218915
* DI: Fold constant arguments into a single MDStringDuncan P. N. Exon Smith2014-10-027-608/+528
| | | | | | | | | | | | | This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914
* [x86] Teach the new vector shuffle lowering to widen floating pointChandler Carruth2014-10-022-8/+19
| | | | | | | | | | | | | | | | | | | elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911
* LTO: Document the Boolean argument from r218784Duncan P. N. Exon Smith2014-10-021-1/+2
| | | | llvm-svn: 218907
* Optimize square root squared (PR21126).Sanjay Patel2014-10-021-0/+5
| | | | | | | | | | | When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X. This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c. Differential Revision: http://reviews.llvm.org/D5584 llvm-svn: 218906
* InstrProf: Avoid linear search in a hot loopJustin Bogner2014-10-021-5/+6
| | | | | | | | | | Every time we were adding or removing an expression when generating a coverage mapping we were doing a linear search to try and deduplicate the list. The indices in the list are important, so we can't just replace it by a DenseMap entirely, but an auxilliary DenseMap for fast lookup massively improves the performance issues I was seeing here. llvm-svn: 218892
* This patch adds a new flag "-coff-imports" to llvm-readobj.Rui Ueyama2014-10-021-5/+18
| | | | | | | | | | | | | | When the flag is given, the command prints out the COFF import table. Currently only the import table directory will be printed. I'm going to make another patch to print out the imported symbols. The implementation of import directory entry iterator in COFFObjectFile.cpp was buggy. This patch fixes that too. http://reviews.llvm.org/D5569 llvm-svn: 218891
* Reapply "InstrProf: Don't keep a large sparse list around just to zero it"Justin Bogner2014-10-021-24/+43
| | | | | | | | | | When I was preparing r218879 for commit, I removed an early return that I decided was just noise. It wasn't. This is r218879 no-crash edition. This reverts commit r218881, reapplying r218879. llvm-svn: 218887
* Remove an extra whitespace.Adrian Prantl2014-10-021-1/+1
| | | | llvm-svn: 218886
* Pretty-printer: Paper over an ambiguity between line table entriesAdrian Prantl2014-10-021-1/+3
| | | | | | | | and tagged mdnodes. fixes http://llvm.org/bugs/show_bug.cgi?id=21131 llvm-svn: 218885
* Revert "InstrProf: Don't keep a large sparse list around just to zero it"Justin Bogner2014-10-021-38/+24
| | | | | | | | This seems to be crashing on some buildbots. Reverting to investigate. This reverts commit r218879. llvm-svn: 218881
* InstrProf: Don't keep a large sparse list around just to zero itJustin Bogner2014-10-021-24/+38
| | | | | | | | | | | | | | | | | | | | The Terms vector here represented a polynomial of of all possible counters, and is used to simplify expressions when generating coverage mapping. There are a few problems with this: 1. Keeping the vector as a member is wasteful, since we clear it every time we use it. 2. Most expressions refer to a subset of the counters, so we end up iterating over a large number of zeros doing nothing a lot of the time. This updates the user of the vector to store the terms locally, and uses a sort and combine approach so that we only operate on counters that are actually used in a given expression. For small cases this makes very little difference, but in cases with a very large number of counted regions this is a significant performance fix. llvm-svn: 218879
* Use the local variable that other clauses around here are already using.Sanjay Patel2014-10-021-1/+1
| | | | llvm-svn: 218876
* Remove duplicate function names from comments. NFC.Sanjay Patel2014-10-021-43/+35
| | | | llvm-svn: 218875
* [NVPTX] Remove dead code.Tilmann Scheller2014-10-021-9/+3
| | | | | | Found by the Clang static analyzer. llvm-svn: 218874
* Support padding unaligned data in .text.Joerg Sonnenberger2014-10-021-1/+6
| | | | llvm-svn: 218870
OpenPOWER on IntegriCloud