summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] GlobalISel: Allow i8 and i16 addsDiana Picus2016-12-193-5/+122
| | | | | | | | | Teach the instruction selector and legalizer that it's ok to have adds with 8 or 16-bit integers. This is the second part of https://reviews.llvm.org/D27704 llvm-svn: 290105
* [ARM] GlobalISel: Select i8 and i16 copiesDiana Picus2016-12-191-3/+60
| | | | | | | | | Teach the instruction selector that it's ok to copy small values from physical registers. First part of https://reviews.llvm.org/D27704 llvm-svn: 290104
* [ARM] GlobalISel: Lower more than 4 argumentsDiana Picus2016-12-192-0/+28
| | | | | | | | | | This adds support for lowering more than 4 arguments (although still i32 only). It uses the handleAssignments / ValueHandler infrastructure extracted from the AArch64 backend in r288658. Differential Revision: https://reviews.llvm.org/D27195 llvm-svn: 290098
* AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for ↵Sam Kolton2016-12-192-1/+57
| | | | | | | | | | | | | | | | | | | | | | | | functime metadata V2.0 Summary: Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata. Between them user can put YAML string that would be directly put to the generated note. E.g.: ''' .hsa_code_object_metadata { amd.MDVersion: [ 2, 0 ] } .end_hsa_code_object_metadata ''' Based on D25046 Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye Differential Revision: https://reviews.llvm.org/D27619 llvm-svn: 290097
* [ARM] GlobalISel: Support loading from the stackDiana Picus2016-12-192-0/+62
| | | | | | | | | | Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit scalars only). This will be useful for functions that need to pass arguments on the stack. First part of https://reviews.llvm.org/D27195. llvm-svn: 290096
* [XRay] Fix assertion failure on empty machine basic blocks (PR 31424)Dean Michael Berris2016-12-192-0/+36
| | | | | | | | | | | | | | | | | | | | The original version of the code in XRayInstrumentation.cpp assumed that functions may not have empty machine basic blocks (or that the first one couldn't be). This change addresses that by special-casing that specific situation. We provide two .mir test-cases to make sure we're handling this appropriately. Fixes llvm.org/PR31424. Reviewers: chandlerc Subscribers: varno, llvm-commits Differential Revision: https://reviews.llvm.org/D27913 llvm-svn: 290091
* Add files I seem to have dropped in my revert (r290086).Daniel Jasper2016-12-191-0/+22
| | | | | | Sorry! llvm-svn: 290087
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-1912-42/+42
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* [FileCheck] Fix --strict-whitespace --match-full-lines -- add test-caseTom de Vries2016-12-181-0/+14
| | | | | | | Add test-case that was missing in "[FileCheck] Fix --strict-whitespace --match-full-lines" commit. llvm-svn: 290070
* [InstCombine] use commutative matchers for patterns with commutative operatorsSanjay Patel2016-12-183-51/+13
| | | | | | | | | | | | | | | | | | | | | | | | Background/motivation - I was circling back around to: https://llvm.org/bugs/show_bug.cgi?id=28296 I made a simple patch for that and noticed some regressions, so added test cases for those with rL281055, and this is hopefully the minimal fix for just those cases. But as you can see from the surrounding untouched folds, we are missing commuted patterns all over the place, and of course there are no regression tests to cover any of those cases. We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but then we still wouldn't have test coverage, and we'd still miss some fraction of commuted patterns because they require adjustments to the match order. I'm aware of the concern about the potential compile-time performance impact of adding matches like this (currently being discussed on llvm-dev), but I don't think there's any evidence yet to suggest that handling commutative pattern matching more thoroughly is not a worthwhile goal of InstCombine. Differential Revision: https://reviews.llvm.org/D24419 llvm-svn: 290067
* Revert r289955 and r289962. This is causing lots of ASAN failures for us.Daniel Jasper2016-12-181-41/+0
| | | | | | | | Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066
* [X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC.Michael Zuckerman2016-12-182-36/+36
| | | | | | | | | | | | Commit on behalf of Gadi Haber Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified. The changed encodings are validated with XED. Rviewers: delena, igorb Differential revision: https://reviews.llvm.org/D27802 llvm-svn: 290065
* [X86][SSE] Add support for combining target shuffles to SHUFPS.Simon Pilgrim2016-12-1811-113/+93
| | | | | | As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064
* [X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations ↵Craig Topper2016-12-184-45/+58
| | | | | | | | | | | | if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060
* [AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when ↵Craig Topper2016-12-181-1/+25
| | | | | | | | DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057
* [AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops ↵Craig Topper2016-12-181-1/+1
| | | | | | for scalars. I missed this in r290049. llvm-svn: 290055
* AMDGPU: Fix broken check prefix in testMatt Arsenault2016-12-171-10/+7
| | | | llvm-svn: 290050
* [AVX-512] Use EVEX encoded logic operations for scalar types when they are ↵Craig Topper2016-12-171-4/+4
| | | | | | available. This gives the register allocator more registers to work with. llvm-svn: 290049
* [AVX-512] Update scalar logic test to show missed opportunity to use EVEX ↵Craig Topper2016-12-171-19/+40
| | | | | | encoded logic instructions to get more registers to use. llvm-svn: 290048
* Revert "AArch64CollectLOH: Rewrite as block-local analysis."Matthias Braun2016-12-174-194/+9
| | | | | | | | It is still breaking Chrome. http://llvm.org/PR31361 This reverts commit r290026. llvm-svn: 290047
* Move test to correct directoryMatthias Braun2016-12-171-0/+0
| | | | | | See also test/CodeGen/MIR/README llvm-svn: 290032
* Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."Evgeniy Stepanov2016-12-171-38/+0
| | | | | | | | This reverts r289696, which caused TSan perf regression. See PR31382. llvm-svn: 290030
* AArch64CollectLOH: Rewrite as block-local analysis.Matthias Braun2016-12-174-9/+194
| | | | | | | | | | | | | | | | | Re-apply r288561: Liveness tracking should be correct now after r290014. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 290026
* [sancov] skip dead files from computationsMike Aizatsky2016-12-174-8/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D27863 llvm-svn: 290017
* AArch64: Enable post-ra liveness updatesMatthias Braun2016-12-162-6/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D27559 llvm-svn: 290014
* Allow "line 0" to be the first explicit debug location in a function.Paul Robinson2016-12-161-1/+6
| | | | | | Feedback on r289468 from Adrian Prantl. llvm-svn: 290012
* Fix a bugs with using some Mach-O command line flags like "-arch armv7m".Kevin Enderby2016-12-163-2/+7
| | | | | | | | | | | | | | | | | | The Mach-O command line flag like "-arch armv7m" does not match the arch name part of its llvm Triple which is "thumbv7m-apple-darwin”. I think the best way to fix this is to have llvm::object::MachOObjectFile::getArchTriple() optionally return the name of the Mach-O arch flag that would be used with -arch that matches the CPUType and CPUSubType. Then change llvm::object::MachOUniversalBinary::ObjectForArch::getArchTypeName() to use that and change it to getArchFlagName() as the type name is really part of the Triple and the -arch flag name is a Mach-O thing for a specific Triple with a specific Mcpu value. rdar://29663637 llvm-svn: 290001
* Resubmit "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."Zachary Turner2016-12-162-22/+7
| | | | | | | The original patch was broken due to some undefined behavior as well as warnings that were triggering -Werror. llvm-svn: 290000
* [ThinLTO] Import composite types as declarationsTeresa Johnson2016-12-162-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | Summary: When reading the metadata bitcode, create a type declaration when possible for composite types when we are importing. Doing this in the bitcode reader saves memory. Also it works naturally in the case when the type ODR map contains a definition for the same composite type because it was used in the importing module (buildODRType will automatically use the existing definition and not create a type declaration). For Chromium built with -g2, this reduces the aggregate size of the generated native object files by 66% (from 31G to 10G). It reduced the time through the ThinLTO link and backend phases by about 20% on my machine. Reviewers: mehdi_amini, dblaikie, aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27775 llvm-svn: 289993
* Preserve loop metadata when folding branches to a common destination.Michael Kuperstein2016-12-161-0/+28
| | | | | | Differential Revision: https://reviews.llvm.org/D27830 llvm-svn: 289992
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-12-167-14/+215
| | | | | | | | | | | | | | This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988
* Revert "[IR] Remove the DIExpression field from DIGlobalVariable."Adrian Prantl2016-12-16166-566/+418
| | | | | | | | | | | | | | | | | This reverts commit 289920 (again). I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable has not DIExpression. Unfortunately it is not possible to safely upgrade these variables without adding a flag to the bitcode record indicating which version they are. My plan of record is to roll the planned follow-up patch that adds a unit: field to DIGlobalVariable into this patch before recomitting. This way we only need one Bitcode upgrade for both changes (with a version flag in the bitcode record to safely distinguish the record formats). Sorry for the churn! llvm-svn: 289982
* Revert "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."Zachary Turner2016-12-162-7/+22
| | | | | | | This reverts commit r289978, which is failing due to some rebase/merge issues. llvm-svn: 289981
* [CodeView] Hook CodeViewRecordIO for reading/writing symbols.Zachary Turner2016-12-162-22/+7
| | | | | | | | | This is the 3rd of 3 patches to get reading and writing of CodeView symbol and type records to use a single codepath. Differential Revision: https://reviews.llvm.org/D26427 llvm-svn: 289978
* Strip invalid TBAA when reading bitcodeMehdi Amini2016-12-161-2/+6
| | | | | | | | This ensures backward compatibility on bitcode loading. Differential Revision: https://reviews.llvm.org/D27839 llvm-svn: 289977
* Reapply "[LV] Enable vectorization of loops with conditional stores by default"Matthew Simpson2016-12-165-6/+6
| | | | | | | | This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975
* Fix CodeGenPrepare::stripInvariantGroupMetadataSanjoy Das2016-12-161-3/+4
| | | | | | | | | | | | `dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The only reason it worked at all is that `getMetadataID` returns something unrelated -- it returns the subclass ID of the receiver (which is used in `dyn_cast` etc.). That does not numerically match `LLVMContext::MD_invariant_group` and ends up dropping `invariant_group` along with every other metadata that does not numerically match `LLVMContext::MD_invariant_group`. llvm-svn: 289973
* [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.Eli Friedman2016-12-162-6/+223
| | | | | | | | | | | | | | | | | | | | Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972
* AMDGPU: Fix name for v_ashrrev_i16Matt Arsenault2016-12-166-12/+12
| | | | llvm-svn: 289967
* Revert "dwarfdump: Support/process relocations on a CU's abbrev_off"David Blaikie2016-12-162-8/+0
| | | | | | | | | Reverting because this breaks lld's gdb_index support - it's probably double counting the abbrev relocation offset. This reverts commit r289954. llvm-svn: 289961
* Revert "[CodeGenPrep] Skip merging empty case blocks"Jun Bum Lim2016-12-165-212/+11
| | | | | | This reverts commit r289951. llvm-svn: 289960
* [InstCombine] auto-generate checks; NFCSanjay Patel2016-12-161-2/+15
| | | | llvm-svn: 289959
* [LV] Don't attempt to type-shrink scalarized instructionsMatthew Simpson2016-12-161-0/+53
| | | | | | | | | | After r288909, instructions feeding predicated instructions may be scalarized if profitable. Since these instructions will remain scalar, we shouldn't attempt to type-shrink them. We should only truncate vector types to their minimal bit widths. This bug was exposed by enabling the vectorization of loops containing conditional stores by default. llvm-svn: 289958
* Pass sample pgo flags to thinlto.Dehao Chen2016-12-162-0/+27
| | | | | | | | | | | | Summary: ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing. Reviewers: davidxl, mehdi_amini, tejohnson Subscribers: pcc, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D27790 llvm-svn: 289957
* [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, ↵Hans Wennborg2016-12-161-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955
* dwarfdump: Support/process relocations on a CU's abbrev_offDavid Blaikie2016-12-162-0/+8
| | | | | | | | | | Input can be produced by ld -r, for example (a normal LLVM workflow never hits this - LLVM only ever produces a single abbrev table in an object (shared by multiple CUs), so the reloc's always 0, and when it's linked together the relocation's resolved so it doesn't need to be handled) llvm-svn: 289954
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-12-165-11/+212
| | | | | | | | | | | | | | This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951
* [X86][AVX512] use a single shufps for 512-bit vectors when it can save ↵Simon Pilgrim2016-12-161-8/+3
| | | | | | | | | | | | instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946
* [X86][AVX512] Add tests showing missed opportunity to efficiently lower ↵Simon Pilgrim2016-12-161-0/+32
| | | | | | v16i32 to VSHUFPS (PR27885) llvm-svn: 289945
* Speculatively revert r289925, see PR31407Nico Weber2016-12-1613-70/+31
| | | | llvm-svn: 289944
OpenPOWER on IntegriCloud