summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
...
* R600/SI: Implement VGPR register spilling for compute at -O0 v3Tom Stellard2014-09-248-48/+332
| | | | | | | | | | | | | | VGPRs are spilled to LDS. This still needs more testing, but we need to at least enable it at -O0, because the fast register allocator spills all registers that are live at the end of blocks and without this some future commits will break the flat-address-space.ll test. v2: Only calculate thread id once v3: Move insertion of spill instructions to SIRegisterInfo::eliminateFrameIndex() llvm-svn: 218348
* [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles withChandler Carruth2014-09-242-272/+646
| | | | | | | | | | | | the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. llvm-svn: 218347
* [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffleChandler Carruth2014-09-242-18/+41
| | | | | | | | | | | | detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a *numeric* shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. llvm-svn: 218346
* Fix swift-atomics testcaseRobin Morisset2014-09-231-0/+6
| | | | | | | | | This testcase was not testing what it meant: because there were only two checks for dmb {{ish}} in the second function, it could have missed a bug where one of the three required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added CHECK-LABELs to make it a bit less brittle. llvm-svn: 218341
* [x86] Teach the new vector shuffle lowering to lower v4i64 vectorChandler Carruth2014-09-233-139/+311
| | | | | | | | | | | shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. llvm-svn: 218338
* GlobalOpt: Preserve comdats of unoptimized initializersReid Kleckner2014-09-232-45/+63
| | | | | | | | | | | | | Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337
* AArch64: allow constant expressions for shifted reg literalsJim Grosbach2014-09-232-6/+19
| | | | | | | | | | | | e.g., add w1, w2, w3, lsl #(2 - 1) This sort of thing comes up in pre-processed assembly playing macro games. Still validate that it's an assembly time constant. The early exit error check was just a bit overzealous and disallowed a left paren. rdar://18430542 llvm-svn: 218336
* [x86] Teach the rest of the 'target shuffle' machinery about blends andChandler Carruth2014-09-234-13/+42
| | | | | | | | | | | add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. llvm-svn: 218335
* R600/SI: Clean up checks for legality of immediate operandsTom Stellard2014-09-238-67/+149
| | | | | | | | | | | | | | There are new register classes VCSrc_* which represent operands that can take an SGPR, VGPR or inline constant. The VSrc_* class is now used to represent operands that can take an SGPR, VGPR, or a 32-bit immediate. This allows us to have more accurate checks for legality of immediates, since before we had no way to distinguish between operands that supported any 32-bit immediate and operands which could only support inline constants. llvm-svn: 218334
* [X86] Make wide loads be managed by AtomicExpandRobin Morisset2014-09-233-32/+41
| | | | | | | | | | | | | | | | | | | | | | | Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 llvm-svn: 218332
* [Power] Use AtomicExpandPass for fence insertion, and use lwsync where ↵Robin Morisset2014-09-236-3/+178
| | | | | | | | | | | | | | | | | | | | | | | appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 llvm-svn: 218331
* Add AtomicExpandPass::bracketInstWithFences, and use it whenever ↵Robin Morisset2014-09-234-56/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getInsertFencesForAtomic would trigger in SelectionDAGBuilder Summary: The goal is to eventually remove all the code related to getInsertFencesForAtomic in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works mostly by accident because the backends are overly conservative), and repeats the same logic that goes in emitLeading/TrailingFence. In this patch, I make AtomicExpandPass insert the fences as it knows better where to put them. Because this requires getting the fences and not just passing an IRBuilder around, I had to change the return type of emitLeading/TrailingFence. This code only triggers on ARM for now. Because it is earlier in the pipeline than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so SelectionDAGBuilder does not add barriers anymore on ARM. If this patch is accepted I plan to implement emitLeading/TrailingFence for all backends that setInsertFencesForAtomic(true), which will allow both making them less conservative and simplifying SelectionDAGBuilder once they are all using this interface. This should not cause any functionnal change so the existing tests are used and not modified. Test Plan: make check-all, benefits from existing tests of atomics on ARM Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5179 llvm-svn: 218329
* [MCJIT] Fix some more RuntimeDyld debugging output format specifiers.Lang Hames2014-09-231-3/+3
| | | | llvm-svn: 218328
* [MCJIT] Remove PPCRelocations.h - it's no longer used.Lang Hames2014-09-231-56/+0
| | | | | | | This was overlooked in r218320, which removed the relocation headers for other targets. Thanks to Ulrich Weigand for catching it. llvm-svn: 218327
* Just add a fixme about a possibly faster implementation of some atomic loads ↵Robin Morisset2014-09-231-0/+3
| | | | | | on some ARM processors llvm-svn: 218326
* Fix typoMatt Arsenault2014-09-231-2/+3
| | | | llvm-svn: 218324
* [x86] Teach the new shuffle lowering's blend functionality to use AVX2'sChandler Carruth2014-09-233-28/+47
| | | | | | | | | | | | | VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. llvm-svn: 218322
* [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT isLang Hames2014-09-238-902/+0
| | | | | | gone they're no longer needed. llvm-svn: 218320
* [docs] Fixed a typo in Atomics.rstJingyue Wu2014-09-231-4/+4
| | | | llvm-svn: 218319
* [MCJIT] Remove a few more references to JITMemoryManager that survived r218316.Lang Hames2014-09-234-10/+1
| | | | llvm-svn: 218318
* [MCJIT] Remove #include of JITMemoryManager that accidentally survived r218316.Lang Hames2014-09-231-1/+0
| | | | llvm-svn: 218317
* [MCJIT] Delete the JTIMemoryManager and associated APIs.Lang Hames2014-09-2310-1153/+15
| | | | | | | | | | This patch removes the old JIT memory manager (which does not provide any useful functionality now that the old JIT is gone), and migrates the few remaining clients over to SectionMemoryManager. http://llvm.org/PR20848 llvm-svn: 218316
* Use SDValue bool operator to reduce code. No functional change.Sanjay Patel2014-09-231-9/+6
| | | | llvm-svn: 218314
* Fix segfault in AArch64 backend with -g and -mbig-endianOliver Stannard2014-09-232-2/+24
| | | | | | | Fix a null pointer dereference when trying to swap the endianness of fixups in the .eh_frame section in the AArch64 backend. llvm-svn: 218311
* Rework r218304, "ExecutionEngineTests: Call llvm_shutdown() on exit for ↵NAKAMURA Takumi2014-09-231-2/+3
| | | | | | | | ManagedStatic introduced in r218151." r218304 caused crash on msvc builder. llvm-svn: 218308
* valgrind/x86_64-pc-linux-gnu.supp: We don't care if sed leaks.NAKAMURA Takumi2014-09-231-0/+6
| | | | llvm-svn: 218307
* Fix a small typo in the test commentTimur Iskhodzhanov2014-09-231-4/+4
| | | | llvm-svn: 218306
* Loop instead of individual def's for each GPR.Sid Manning2014-09-231-32/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D5450 llvm-svn: 218305
* ExecutionEngineTests: Call llvm_shutdown() on exit for ManagedStatic ↵NAKAMURA Takumi2014-09-231-0/+3
| | | | | | introduced in r218151. llvm-svn: 218304
* Rebuild the inputs for the codeview-linetables.test with VS2013Timur Iskhodzhanov2014-09-239-4/+38
| | | | | | Also provide reproducible instructions llvm-svn: 218303
* Do not destroy external linkage when deleting function bodyPetar Jovanovic2014-09-232-1/+25
| | | | | | | | | | | The function deleteBody() converts the linkage to external and thus destroys original linkage type value. Lack of correct linkage type causes wrong relocations to be emitted later. Calling dropAllReferences() instead of deleteBody() will fix the issue. Differential Revision: http://reviews.llvm.org/D5415 llvm-svn: 218302
* [x86] Teach the vector comment parsing and printing to correctly handleChandler Carruth2014-09-237-70/+127
| | | | | | | | | | | | | | | | | undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is *much* prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. llvm-svn: 218301
* [x86] Teach the AVX1 path of the new vector shuffle lowering one moreChandler Carruth2014-09-238-121/+116
| | | | | | | | | | | | | | | | | | | | | | trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300
* Ensure bitcode encoding stays stable.Michael Kuperstein2014-09-2312-56/+388
| | | | | | | | This includes constants, attributes, and some additional instructions not covered by previous tests. Work was done by lama.saba@intel.com. llvm-svn: 218297
* [ADT/IntrusiveRefCntPtr] Give friend access to IntrusiveRefCntPtr<X> so the ↵Argyrios Kyrtzidis2014-09-231-0/+3
| | | | | | relevant move constructor can access 'Obj'. llvm-svn: 218295
* Windows/DynamicLibrary.inc: Remove 'extern "C"' in ELM_Callback.NAKAMURA Takumi2014-09-231-1/+1
| | | | | | 'extern "C" static' is not accepted by g++-4.7. Rather to tweak, I just removed 'extern "C"', since it doesn't affect the ABI. llvm-svn: 218290
* tighten up checksSanjay Patel2014-09-221-12/+12
| | | | | | | | | We manage to generate all of the matching instructions (and a lot more) via the reciprocal optimization function - even if we completely remove the square root optimization. With CHECK_NEXT, we assure that we're executing the expected square root optimization paths and not generating extra insts. llvm-svn: 218284
* Converting terminalHasColors mutex to a global ManagedStatic to avoid the ↵Chris Bieneman2014-09-221-2/+4
| | | | | | static destructor. llvm-svn: 218283
* [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for theChandler Carruth2014-09-225-30/+30
| | | | | | | | td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282
* Fix a "typo" from my previous commit.Kaelyn Takata2014-09-221-1/+1
| | | | llvm-svn: 218281
* Silence unused variable warnings in the new stub functions that occurKaelyn Takata2014-09-221-1/+3
| | | | | | when assertions are disabled. llvm-svn: 218280
* remove unnecessary labels; NFCSanjay Patel2014-09-221-11/+0
| | | | llvm-svn: 218278
* [x86] Stub out the integer lowering of 256-bit vectors with AVX2Chandler Carruth2014-09-221-4/+87
| | | | | | | support. No interesting functionality yet, but this will let me implement one vector type at a time. llvm-svn: 218277
* In this callback ModuleName includes the file path.Yaron Keren2014-09-221-26/+5
| | | | | | | | | | | | | | | Comparing ModuleName to the file names listed will always fail. I wonder how this code ever worked and what its purpose was. Why exclude the msvc runtime DLLs but not exclude all Windows system DLLs? Anyhow, it does not function as intended. clang-formatted as well. llvm-svn: 218276
* [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left ↵Juergen Ributzka2014-09-222-2/+51
| | | | | | | | | | | for booleans (i1). Shift-left immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. This should fix a bug found by Chad. llvm-svn: 218275
* ms-inline-asm: Fix parsing label names inside bracket expressionsEhsan Akhgari2014-09-223-20/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes a couple of issues. One is ensuring that AOK_Label rewrite rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to be able to skip the brackets properly. The other part of the fix ensures that we don't overwrite Identifier when looking up the identifier, and that we use the locally available information to generate the AOK_Label rewrite in ParseIntelIdentifier. Doing that in CreateMemForInlineAsm would be problematic since the Start location there may point to the beginning of a bracket expression, and not necessarily the beginning of an identifier. This also means that we don't need to carry around the InternlName field, which helps simplify the code. Test Plan: This will be tested on the clang side. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5445 llvm-svn: 218270
* MC: ReadOnlyWithRel section kinds should map to rdata in COFFDavid Majnemer2014-09-222-4/+7
| | | | | | | Don't consider ReadOnlyWithRel as a writable section in COFF, they really belong in .rdata. llvm-svn: 218268
* [x86] Introduce tests covering the gamut of 256-bit vector shuffling.Chandler Carruth2014-09-223-0/+3641
| | | | | | | | | | | | | | | | | These are just test cases, no actual code yet. This establishes the baseline fallback strategy we're starting from on AVX2 and the expected lowering we use on AVX1. Also, these test cases are very much generated. I've manually crafted the specific pattern set that I'm hoping will be useful at exercising the lowering code, but I've not (and could not) manually verify *all* of these. I've spot checked and they seem legit to me. As with the rest of vector shuffling, at a certain point the only really useful way to check the correctness of this stuff is through fuzz testing. llvm-svn: 218267
* Make MCAsmParserSemaCallback::LookupInlineAsmLabel a pure virtual functionEhsan Akhgari2014-09-221-2/+1
| | | | | | | | | | | | | | Summary: r218229 made this function return a dummy nullptr in order to avoid API breakage between clang/llvm. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5432 llvm-svn: 218266
* Use broadcasts to optimize overall size when loading constant splat vectors ↵Sanjay Patel2014-09-223-7/+174
| | | | | | | | | | | | | | | (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263
OpenPOWER on IntegriCloud