summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Optimize floating point materializationAdhemerval Zanella2019-03-187-21/+96
| | | | | | | | | | | | | | | | | | This patch follows some ideas from r352866 to optimize the floating point materialization even further. It changes isFPImmLegal to considere up to 2 mov instruction or up to 5 in case subtarget has fused literals. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer). Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58460 llvm-svn: 356390
* [TargetLowering] Add code size information on isFPImmLegal. NFCAdhemerval Zanella2019-03-1819-56/+97
| | | | | | | | | | | This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389
* [OPENMP] Set scheduling for doacross loops as schedule, 1.Alexey Bataev2019-03-184-4/+22
| | | | | | | The default scheduling for doacross loops is changed from static to static, 1. llvm-svn: 356388
* [AArch64] Refactor floating point materialization. NFCAdhemerval Zanella2019-03-184-463/+485
| | | | | | | | | | | | | | | It splits the login of actual instruction emission away from the logic that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm. The new function AArch64_IMM::expandMOVImm, which return the list of the instructions to materialize the immediate constant, is implemented on a separated unit because it will be used in a subsequent patch to optimize floating point materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58915 llvm-svn: 356387
* [libc++][NFC] Promote CMake comment to an actual option descriptionLouis Dionne2019-03-181-3/+3
| | | | llvm-svn: 356386
* [AMDGPU] Add the missing clang change of the experimental buffer fat pointerMichael Liao2019-03-183-4/+5
| | | | llvm-svn: 356385
* [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of ↵Craig Topper2019-03-1845-652/+788
| | | | | | | | | | custom printing and custom parsing to achieve the same result and more Similar to previous change done for VPCOM and VPCMP Differential Revision: https://reviews.llvm.org/D59468 llvm-svn: 356384
* [InstCombine] add/adjust test for NaN checks; NFCSanjay Patel2019-03-182-22/+84
| | | | llvm-svn: 356383
* [DAG] Cleanup unused node in SimplifySelectCC.Nirav Dave2019-03-182-11/+9
| | | | | | | | | | | | | | | | Delete temporarily constructed node uses for analysis after it's use, holding onto original input nodes. Ideally this would be rewritten without making nodes, but this appears relatively complex. Reviewers: spatel, RKSimon, craig.topper Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57921 llvm-svn: 356382
* [MVT] Fix typos in comment. NFC.Michael Liao2019-03-181-4/+4
| | | | llvm-svn: 356381
* lld-link: Run conflict-mangled.test on all systemsNico Weber2019-03-181-1/+0
| | | | | | | | | | | | | It seems to pass fine on my Mac, and it running it only on Windows made me miss it in r355959 and required r355959. When the test was added in r288992 we still used Win-only UnDecorateSymbolName() for demangling. Now we use LLVM's microsoftDemangle() which is cross-platform. Differential Revision: https://reviews.llvm.org/D59497 llvm-svn: 356380
* Skip TestVSCode_setFunctionBreakpoints on linuxPavel Labath2019-03-181-0/+1
| | | | | | Test hangs under heavy load. llvm-svn: 356379
* Fix some "variable 'foo' set but not used" warningsPavel Labath2019-03-182-6/+2
| | | | | | gcc-8 diagnoses these. llvm-svn: 356378
* Fix libstdc++ data formatters for python3Pavel Labath2019-03-181-2/+3
| | | | | | | Use floor-division for consistentcy across python versions. This fixes a couple of libstdc++ data formatter tests. llvm-svn: 356377
* [libc++] Add a test for PR40977Louis Dionne2019-03-181-0/+34
| | | | | | | Even though the header makes the exact same check since https://llvm.org/D59063, the headers could conceivably change in the future and introduce a bug. llvm-svn: 356376
* [ELF] Emit weak-undef symbols in .dynsym of a PIE binary only if linked ↵Siva Chandra2019-03-186-4/+44
| | | | | | | | | | | | | | against shared libs. Reviewers: espindola Subscribers: emaste, arichardson, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59275 llvm-svn: 356374
* [AMDGPU] Add an experimental buffer fat pointer address space.Neil Henning2019-03-189-26/+96
| | | | | | | | | | | | Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373
* [InstCombine] allow general vector constants for funnel shift to shift ↵Sanjay Patel2019-03-182-24/+17
| | | | | | | | | | | | | transforms Follow-up to: rL356338 rL356369 We can calculate an arbitrary vector constant minus the bitwidth, so there's no need to limit this transform to scalars and splats. llvm-svn: 356372
* [llvm-objcopy] - Calculate the string table section sizes correctly.George Rimar2019-03-186-14/+63
| | | | | | | | | | | | | | This fixes the https://bugs.llvm.org/show_bug.cgi?id=40980. Previously if string optimization occurred as a result of StringTableBuilder's finalize() method, the size wasn't updated. This hopefully also makes the interaction between sections during finalization processes a bit more clear. Differential revision: https://reviews.llvm.org/D59488 llvm-svn: 356371
* Fix TestCommandScriptImmediateOutput for python3Pavel Labath2019-03-181-3/+3
| | | | | | s/iteritems/items llvm-svn: 356370
* [InstCombine] extend rotate-left-by-constant canonicalization to funnel shiftSanjay Patel2019-03-182-14/+15
| | | | | | | | | | | Follow-up to: rL356338 Rotates are a special case of funnel shift where the 2 input operands are the same value, but that does not need to be a restriction for the canonicalization when the shift amount is a constant. llvm-svn: 356369
* [SystemZ] Remove icmp undef from reduced testsSimon Pilgrim2019-03-182-12/+12
| | | | | | | | Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @uweigand (Ulrich Weigand) llvm-svn: 356368
* [InstCombine] add funnel shift tests with arbitrary constants; NFCSanjay Patel2019-03-181-8/+44
| | | | llvm-svn: 356367
* [pp-trace] Delete -ignore and add a new option -callbacksFangrui Song2019-03-1813-64/+91
| | | | | | | | | | | | | | | | | | | | | | | | Summary: -ignore specifies a list of PP callbacks to ignore. It cannot express a whitelist, which may be more useful than a blacklist. Add a new option -callbacks to replace it. -ignore= (default) => -callbacks='*' (default) -ignore=FileChanged,FileSkipped => -callbacks='*,-FileChanged,-FileSkipped' -callbacks='Macro*' : print only MacroDefined,MacroExpands,MacroUndefined,... Reviewers: juliehockett, aaron.ballman, alexfh, ioeric Reviewed By: aaron.ballman Subscribers: nemanjai, kbarton, jsji, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D59296 llvm-svn: 356366
* [llvm-exegesis] Separate tool options into three categories.Roman Lebedev2019-03-181-18/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Results in much nicer -help output: ``` $ ./bin/llvm-exegesis -help USAGE: llvm-exegesis [options] OPTIONS: Color Options: -color - Use colors in output (default=autodetect) General options: -enable-cse-in-irtranslator - Should enable CSE in irtranslator -enable-cse-in-legalizer - Should enable CSE in Legalizer Generic Options: -help - Display available options (-help-hidden for more) -help-list - Display list of available options (-help-list-hidden for more) -version - Display the version of this program llvm-exegesis analysis options: -analysis-clustering-epsilon=<number> - dbscan epsilon for benchmark point clustering -analysis-clusters-output-file=<string> - -analysis-display-unstable-clusters - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes -analysis-inconsistencies-output-file=<string> - -analysis-inconsistency-epsilon=<number> - epsilon for detection of when the cluster is different from the LLVM schedule profile values -analysis-numpoints=<uint> - minimum number of points in an analysis cluster llvm-exegesis benchmark options: -ignore-invalid-sched-class - ignore instructions that do not define a sched class -mode=<value> - the mode to run =latency - Instruction Latency =inverse_throughput - Instruction Inverse Throughput =uops - Uop Decomposition =analysis - Analysis -num-repetitions=<uint> - number of time to repeat the asm snippet -opcode-index=<int> - opcode to measure, by index -opcode-name=<string> - comma-separated list of opcodes to measure, by name -snippets-file=<string> - code snippets to measure llvm-exegesis options: -benchmarks-file=<string> - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout. -mcpu=<string> - cpu name to use for pfm counters, leave empty to autodetect ``` llvm-svn: 356364
* [DebugInfo] Ignore bitcasts when lowering stack arg dbg.valuesDavid Stenberg2019-03-182-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Look past bitcasts when looking for parameter debug values that are described by frame-index loads in `EmitFuncArgumentDbgValue()`. In the attached test case we would be left with an undef `DBG_VALUE` for the parameter without this patch. A similar fix was done for parameters passed in registers in D13005. This fixes PR40777. Reviewers: aprantl, vsk, jmorse Reviewed By: aprantl Subscribers: bjope, javed.absar, jdoerfert, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D58831 llvm-svn: 356363
* Fix "type qualifiers ignored on cast result type" warningsPavel Labath2019-03-182-13/+10
| | | | | | These warnings start to get emitted with gcc-8. llvm-svn: 356362
* Reinitialize UnwindTable when the SymbolFile changesPavel Labath2019-03-185-4/+42
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a preparatory step to enable adding of unwind plans by symbol file plugins. Although at the surface it seems that currently symbol files have nothing to do with unwinding, this isn't entirely correct even now. The mere act of adding a symbol file can have the effect of making more sections (typically .debug_frame) available to the unwinding machinery, so that it can have more unwind strategies to choose from. Up until now, we've had a bug, which went largely unnoticed, where unwind info in the manually added symbols files (target symbols add) was being ignored during unwinding. Reinitializing the UnwindTable fixes that bug too. Reviewers: clayborg, jasonmolenda, alexshap Subscribers: jdoerfert, lldb-commits Differential Revision: https://reviews.llvm.org/D58347 llvm-svn: 356361
* [AArch64] Fix bug 35094 atomicrmw on Armv8.1-A+lseChristof Douma2019-03-182-37/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=35094 The Dead register definition pass should leave alone the atomicrmw instructions on AArch64 (LTE extension). The reason is the following statement in the Arm ARM: "The ST<OP> instructions, and LD<OP> instructions where the destination register is WZR or XZR, are not regarded as doing a read for the purpose of a DMB LD barrier." A good example was given in the gcc thread by Will Deacon (linked in the bugzilla ticket 35094): P0 (atomic_int* y,atomic_int* x) { atomic_store_explicit(x,1,memory_order_relaxed); atomic_thread_fence(memory_order_release); atomic_store_explicit(y,1,memory_order_relaxed); } P1 (atomic_int* y,atomic_int* x) { atomic_fetch_add_explicit(y,1,memory_order_relaxed); // STADD atomic_thread_fence(memory_order_acquire); int r0 = atomic_load_explicit(x,memory_order_relaxed); } P2 (atomic_int* y) { int r1 = atomic_load_explicit(y,memory_order_relaxed); } My understanding is that it is forbidden for r0 == 0 and r1 == 2 after this test has executed. However, if the relaxed add in P1 compiles to STADD and the subsequent acquire fence is compiled as DMB LD, then we don't have any ordering guarantees in P1 and the forbidden result could be observed. Change-Id: I419f9f9df947716932038e1100c18d10a96408d0 llvm-svn: 356360
* [X86] Hopefully fix a tautological compare warning in printVecCompareInstr.Craig Topper2019-03-182-2/+2
| | | | llvm-svn: 356359
* [RISCV] Add ImmArg to intrinsicsAlex Bradbury2019-03-181-6/+6
| | | | llvm-svn: 356358
* [X86] Add ADD8ri_DB and ADD8rr_DB to the autogenerated load folding table.Craig Topper2019-03-181-0/+3
| | | | | | | | | These were added in r355423. We only use the autogenerated table to assist with the maintenance of the manual table. These entries are alreayd in the manual table. llvm-svn: 356357
* [X86] Make ADD*_DB post-RA pseudos and expand them in expandPostRAPseudo.Craig Topper2019-03-184-32/+27
| | | | | | | | | These are used to help convert OR->LEA when needed to avoid avoid a copy. They aren't need after register allocation. Happens to remove an ugly goto from X86MCCodeEmitter.cpp llvm-svn: 356356
* [X86] Add tab character to the custom printing of VPCMP and VPCOM instructions.Craig Topper2019-03-182-0/+4
| | | | | | All the other instructions are printed with a preceeding tab. llvm-svn: 356355
* Add testcase from bug 41079Matt Arsenault2019-03-171-0/+17
| | | | llvm-svn: 356354
* Remove immarg from llvm.expectMatt Arsenault2019-03-174-15/+5
| | | | | | | | | The LangRef claimed this was required to be a constant, but this appears to be wrong. Fixes bug 41079. llvm-svn: 356353
* [X86] Merge printf32mem/printi32mem into a single printdwordmem. Do the same ↵Craig Topper2019-03-176-97/+60
| | | | | | | | | | | | for all other printing functions. The only thing the print methods currently need to know is the string to print for the memory size in intel syntax. This patch merges the functions based on this string. If we ever need something else in the future, its easy to split them back out. This reduces the number of cases in the assembly printers. It shrinks the intel printer to only use 7 bytes per instruction instead of 8. llvm-svn: 356352
* [CodeGen] Defined MVTs v3i32, v3f32, v5i32, v5f32Tim Renouf2019-03-175-161/+197
| | | | | | | | | AMDGPU would like to use these MVTs. Differential Revision: https://reviews.llvm.org/D58901 Change-Id: I6125fea810d7cc62a4b4de3d9904255a1233ae4e llvm-svn: 356351
* [CodeGen] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-175-2/+44
| | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp mechanism in TargetLoweringBase::getTypeConversion. * Cope with SETCC and VSELECT for odd-width i1 vector when the other vectors are legal type. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58899 Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8 llvm-svn: 356350
* [ARM] Check that CPSR does not have other usesDavid Green2019-03-172-1/+46
| | | | | | | Fix up rL356335 by checking that CPSR is not read between the compare and the branch. llvm-svn: 356349
* RegAllocFast: Add hint to debug printingMatt Arsenault2019-03-171-1/+2
| | | | llvm-svn: 356348
* AMDGPU: Partially fix default device for HSAMatt Arsenault2019-03-177-10/+35
| | | | | | | | | | | | | | | | | | There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347
* [ConstantRange] Add assertion for KnownBits validity; NFCNikita Popov2019-03-171-0/+2
| | | | | | Following the suggestion in D59475. llvm-svn: 356346
* [ValueTracking] Use ConstantRange overflow check for signed add; NFCNikita Popov2019-03-171-48/+8
| | | | | | | | | | | | | | | | This is the same change as rL356290, but for signed add. It replaces the existing ripple logic with the overflow logic in ConstantRange. This is NFC in that it should return NeverOverflow in exactly the same cases as the previous implementation. However, it does make computeOverflowForSignedAdd() more powerful by now also determining AlwaysOverflows conditions. As none of its consumers handle this yet, this has no impact on optimization. Making use of AlwaysOverflows in with.overflow folding will be handled as a followup. Differential Revision: https://reviews.llvm.org/D59450 llvm-svn: 356345
* [X86] Remove the _alt forms of AVX512 VPCMP instructions. Use a combination ↵Craig Topper2019-03-1711-227/+349
| | | | | | | | | | of custom printing and custom parsing to achieve the same result and more Similar to the previous patch for VPCOM. Differential Revision: https://reviews.llvm.org/D59398 llvm-svn: 356344
* [X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of ↵Craig Topper2019-03-1713-135/+213
| | | | | | | | | | | | | | | | | | | | custom printing and custom parsing to achieve the same result and more Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible. The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b". The _alt form on the other hand parsed and printed like any other instruction with no specialness. With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax. I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off. I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them. Differential Revision: https://reviews.llvm.org/D59398 llvm-svn: 356343
* [AMDGPU] Prepare for introduction of v3 and v5 MVTsTim Renouf2019-03-1713-14/+526
| | | | | | | | | | | | | | | | | | | AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342
* [ARM] Fixed an assumption of power-of-2 vector MVTTim Renouf2019-03-171-6/+6
| | | | | | | | | | | | I am about to introduce some non-power-of-2 width vector MVTs. This commit fixes a power-of-2 assumption that my forthcoming change would otherwise break, as shown by test/CodeGen/ARM/vcvt_combine.ll and vdiv_combine.ll. Differential Revision: https://reviews.llvm.org/D58927 Change-Id: I56a282e365d3874ab0621e5bdef98a612f702317 llvm-svn: 356341
* [AMDGPU] Regenerate some f16/i16 tests.Simon Pilgrim2019-03-173-231/+1224
| | | | | | Prep work for D51589 llvm-svn: 356340
* [ConstantRange] Add fromKnownBits() methodNikita Popov2019-03-174-11/+98
| | | | | | | | | | | | | | Following the suggestion in D59450, I'm moving the code for constructing a ConstantRange from KnownBits out of ValueTracking, which also allows us to test this code independently. I'm adding this method to ConstantRange rather than KnownBits (which would have been a bit nicer API wise) to avoid creating a dependency from Support to IR, where ConstantRange lives. Differential Revision: https://reviews.llvm.org/D59475 llvm-svn: 356339
OpenPOWER on IntegriCloud