bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AVR] Remove unneeded XFAILs from the Generic CodeGen tests	Dylan McKay	2019-01-20	5	-9/+0
\| \| \| \| \| \| \| \|	These have been in place for quite a while now. Several bugs have since been fixed, and these tests now pass. llvm-svn: 351679
*	[AVR] Fix codegen bug in 16-bit loads	Dylan McKay	2019-01-20	5	-32/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to instructions of this pattern: ld $GPR8, [PTR:XYZ]+ ld $GPR8, [PTR]+1 This has a problem; the [PTR] is incremented in-place once, but never decremented. Future uses of the same pointer will use the now clobbered value, leading to the pointer being incorrect by an offset of one. This patch modifies the expansion code of the LDWRdPtr pseudo instruction so that the pointer variable is not silently clobbered in future uses in the same live range. Bug first reported by Keshav Kini. Patch by Kaushik Phatak. llvm-svn: 351673
*	Revert "[AVR] Fix codegen bug in 16-bit loads"	Dylan McKay	2019-01-20	5	-45/+32
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r351544. In that commit, I had mistakenly misattributed the issue submitter as the patch author, Kaushik Phatak. The patch will be recommitted immediately with the correct attribution. llvm-svn: 351672
*	Revert r351584: "GlobalISel: Verify g_zextload and g_sextload"	Amara Emerson	2019-01-19	1	-23/+0
\| \| \| \| \| \|	This new assertion triggered on the AArch64 GlobalISel bots. Reverting while it's being investigated. llvm-svn: 351617
*	AMDGPU/GlobalISel: Legalize more types for select	Matt Arsenault	2019-01-18	2	-18/+174
\| \| \| \|	llvm-svn: 351599
*	AMDGPU/GlobalISel: Legalize illegal g_constant	Matt Arsenault	2019-01-18	2	-22/+96
\| \| \| \|	llvm-svn: 351596
*	GlobalISel: Verify G_BITCAST	Matt Arsenault	2019-01-18	1	-4/+4
\| \| \| \|	llvm-svn: 351594
*	[x86] add more movmsk tests; NFC	Sanjay Patel	2019-01-18	1	-11/+313
\| \| \| \| \| \| \| \|	The existing tests already show a sub-optimal transform, but this should make it clear that we can't just match an 'and' op when creating movmsk instructions. llvm-svn: 351590
*	GlobalISel: Verify g_zextload and g_sextload	Matt Arsenault	2019-01-18	1	-0/+23
\| \| \| \|	llvm-svn: 351584
*	[X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead ↵	Craig Topper	2019-01-18	2	-6/+3
\| \| \| \| \| \| \| \| \| \|	of going directly to MachineSDNode.: This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field. Differential Revision: https://reviews.llvm.org/D56827 llvm-svn: 351570
*	[AMDGPU][MC][GFX8+][DISASSEMBLER] Corrected 1/2pi value for 64-bit operands	Dmitry Preobrazhensky	2019-01-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	See bug 39332: https://bugs.llvm.org/show_bug.cgi?id=39332 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D56794 llvm-svn: 351555
*	[AVR] Fix codegen bug in 16-bit loads	Dylan McKay	2019-01-18	5	-32/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to instructions of this pattern: ld $GPR8, [PTR:XYZ]+ ld $GPR8, [PTR]+1 This has a problem; the [PTR] is incremented in-place once, but never decremented. Future uses of the same pointer will use the now clobbered value, leading to the pointer being incorrect by an offset of one. This patch modifies the expansion code of the LDWRdPtr pseudo instruction so that the pointer variable is not silently clobbered in future uses in the same live range. Patch by Keshav Kini. llvm-svn: 351544
*	[ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent	Shiva Chen	2019-01-18	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \|	We should not pre-scheduled the node has ADJCALLSTACKDOWN parent, or else, when bottom-up scheduling, ADJCALLSTACKDOWN and ADJCALLSTACKUP may hold CallResource too long and make other calls can't be scheduled. If there's no other available node to schedule, the scheduler will try to rename the register by creating copy to avoid the conflict which will fail because CallResource is not a real physical register. llvm-svn: 351527
*	[CodeGen] Fix bugs in LiveDebugVariables when debug labels are generated.	Hsiangkai Wang	2019-01-18	1	-0/+141
\| \| \| \| \| \| \| \| \| \| \| \|	Remove DBG_LABELs in LiveDebugVariables and generate them in VirtRegRewriter. This bug is reported in https://bugs.chromium.org/p/chromium/issues/detail?id=898152. Differential Revision: https://reviews.llvm.org/D54465 llvm-svn: 351525
*	[AVR] Expand 8/16-bit multiplication to libcalls on MCUs that don't have ↵	Dylan McKay	2019-01-18	4	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hardware MUL This change modifies the LLVM ISel lowering settings so that 8-bit/16-bit multiplication is expanded to calls into the compiler runtime library if the MCU being targeted does not support multiplication in hardware. Before this, MUL instructions would be generated on CPUs like the ATtiny85, triggering a CPU reset due to an illegal instruction at runtime. First raised in https://github.com/avr-rust/rust/issues/124. llvm-svn: 351523
*	[X86] Add test cases showing failure to fold a global variable address into ↵	Craig Topper	2019-01-18	2	-0/+42
\| \| \| \| \| \|	the gather addressing mode when using the target specific intrinsics. NFC llvm-svn: 351522
*	[X86] Change avx512-gather-scatter-intrin.ll to use x86_64-unknown-unknown ↵	Craig Topper	2019-01-18	1	-53/+53
\| \| \| \| \| \| \| \|	instead of x86_64-apple-darwin. NFC Will help with an upcoming patch. llvm-svn: 351521
*	[WebAssembly] Add languages from debug info to producers section	Thomas Lively	2019-01-18	1	-0/+13
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff, sbc100 Subscribers: aprantl, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D56889 llvm-svn: 351507
*	AMDGPU: Convert tests away from llvm.SI.load.const	Matt Arsenault	2019-01-17	8	-282/+282
\| \| \| \|	llvm-svn: 351494
*	[mips] Emit .reloc R_{MICRO}MIPS_JALR along with j(al)r(c) $25	Vladimir Stefanovic	2019-01-17	12	-95/+258
\| \| \| \| \| \| \| \| \| \| \| \|	The callee address is added as an optional operand (MCSymbol) in AdjustInstrPostInstrSelection() and then used by asm printer to insert: '.reloc tmplabel, R_MIPS_JALR, symbol tmplabel:'. Controlled with '-mips-jalr-reloc', default is true. Differential revision: https://reviews.llvm.org/D56694 llvm-svn: 351485
*	Fix the buildbot failure introduced by r351404	Sanjin Sijaric	2019-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	EXPENSIVE_CHECKS buildbots are failing due to r351404. Add x1 as live in to the funclet basic block for SEH funclets, as well as -verify-machineinstrs to the test case that triggered the failure. llvm-svn: 351472
*	[X86][SSE] Add PR40340 test case	Simon Pilgrim	2019-01-17	1	-0/+19
\| \| \| \|	llvm-svn: 351430
*	[X86] Add AVX512 test to insertps	Simon Pilgrim	2019-01-17	1	-3/+4
\| \| \| \| \| \|	Pre-commit for PR40340 llvm-svn: 351429
*	Allow FP types for atomicrmw xchg	Matt Arsenault	2019-01-17	10	-0/+151
\| \| \| \|	llvm-svn: 351427
*	[ARM GlobalISel] Allow calls to varargs functions	Diana Picus	2019-01-17	2	-7/+86
\| \| \| \| \| \| \| \| \|	Allow varargs functions to be called, both in arm and thumb mode. This boils down to choosing the correct calling convention, which we can easily test by making sure arm_aapcscc is used instead of arm_aapcs_vfpcc when the callee is variadic. llvm-svn: 351424
*	[RISCV] Add codegen support for RV64A	Alex Bradbury	2019-01-17	3	-0/+3911
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to support codegen RV64A, this patch: * Introduces masked atomics intrinsics for atomicrmw operations and cmpxchg that use the i64 type. These are ultimately lowered to masked operations using lr.w/sc.w, but we need to use these alternate intrinsics for RV64 because i32 is not legal * Modifies RISCVExpandPseudoInsts.cpp to handle PseudoAtomicLoadNand64 and PseudoCmpXchg64 * Modifies the AtomicExpandPass hooks in RISCVTargetLowering to sext/trunc as needed for RV64 and to select the i64 intrinsic IDs when necessary * Adds appropriate patterns to RISCVInstrInfoA.td * Updates test/CodeGen/RISCV/atomic-*.ll to show RV64A support This ends up being a fairly mechanical change, as the logic for RV32A is effectively reused. Differential Revision: https://reviews.llvm.org/D53233 llvm-svn: 351422
*	[ARM64][Windows] Share unwind codes between epilogues	Sanjin Sijaric	2019-01-17	2	-3/+228
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are cases where we have multiple epilogues that have the exact same unwind code sequence. In that case, the epilogues can share the same unwind codes in the .xdata section. This should get us past the assert "SEH unwind data splitting not yet implemented" in many cases. We still need to add support for generating multiple .pdata/.xdata sections for those functions that need to be split into fragments. Differential Revision: https://reviews.llvm.org/D56813 llvm-svn: 351421
*	[WebAssembly] Parse llvm.ident into producers section	Thomas Lively	2019-01-17	1	-0/+13
\| \| \| \|	llvm-svn: 351413
*	Revert "[WebAssembly] Parse llvm.ident into producers section"	Thomas Lively	2019-01-17	1	-13/+0
\| \| \| \| \| \|	This reverts commit eccdbba3a02a33e13b5262e92200a33e2ead873d. llvm-svn: 351410
*	[SEH] [ARM64] Retrieve the frame pointer from SEH funclets	Sanjin Sijaric	2019-01-17	1	-0/+121
\| \| \| \| \| \| \|	The Windows ARM64 runtime passes the establisher frame to funclets as the first argument. llvm-svn: 351404
*	[WebAssembly] Parse llvm.ident into producers section	Thomas Lively	2019-01-16	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Everything before the word "version" is the tool, and everything after the word "version" is the version. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56742 llvm-svn: 351399
*	[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv	Craig Topper	2019-01-16	4	-104/+252
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381
*	AMDGPU: Adjust the chain for loads writing to the HI part of a register.	Changpeng Fang	2019-01-16	1	-0/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 llvm-svn: 351379
*	[X86] Add a one use check to the setcc inversion code in ↵	Craig Topper	2019-01-16	1	-14/+11
\| \| \| \| \| \| \| \| \| \|	combineVSelectWithAllOnesOrZeros If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc. Differential Revision: https://reviews.llvm.org/D56765 llvm-svn: 351378
*	[X86] Add test case for D56765. NFC	Craig Topper	2019-01-16	1	-0/+36
\| \| \| \|	llvm-svn: 351377
*	[X86] Add additional saturating add/sub vector tests; NFC	Nikita Popov	2019-01-16	3	-581/+7174
\| \| \| \| \| \| \|	Additional tests for vNi32 and vNi64. I've added these for usub.sat before, this covers uadd.sat, ssub.sat and sadd.sat. llvm-svn: 351375
*	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally	Mandeep Singh Grang	2019-01-16	2	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370
*	[X86][BtVer2] Update latency of horizontal operations.	Andrea Di Biagio	2019-01-16	4	-60/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Jaguar, horizontal adds/subs have local forwarding disable. That means, we pay a compulsory extra cycle of write-back stage, and the value is not available until the end of that stage. This patch changes the latency of horizontal operations by adding an extra cycle. With this patch, latency numbers now match what is reported by perf. I plan to send another patch to also 'fix' the latency of shuffle operations (on Jaguar, local forwarding is disabled for vector shuffles too). Differential Revision: https://reviews.llvm.org/D56777 llvm-svn: 351366
*	[X86] Regenerate test	Simon Pilgrim	2019-01-16	1	-3/+3
\| \| \| \| \| \|	Split check-prefixes to support a future commit llvm-svn: 351362
*	[x86] add tests for extracted scalar casts (PR39974); NFC	Sanjay Patel	2019-01-16	1	-0/+205
\| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=39974 llvm-svn: 351354
*	AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap	Marek Olsak	2019-01-16	2	-0/+141
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351
*	[x86] lower shuffle of extracts to AVX2 vperm instructions	Sanjay Patel	2019-01-16	1	-90/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was trying to prevent shuffle regressions while matching more horizontal ops and ended up here: shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0 The affected tests were added for: https://bugs.llvm.org/show_bug.cgi?id=34380 This patch won't change the examples in the bug report itself, but we should be able to extend this to catch more types. Differential Revision: https://reviews.llvm.org/D56756 llvm-svn: 351346
*	[MSP430] Emit a separate section for every interrupt vector	Anton Korobeynikov	2019-01-16	3	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is LLVM part of D56663 Linker scripts shipped by TI require to have every interrupt vector in a separate section with a specific name: SECTIONS { __interrupt_vector_XX : { KEEP (*(__interrupt_vector_XX )) } > VECTXX ... } Follow the requirement emit the section for every vector which contain address of interrupt handler: .section __interrupt_vector_XX,"ax",@progbits .word %isr% Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56664 llvm-svn: 351345
*	[X86][SSE] Add additional PR40318 shuffle test cases	Simon Pilgrim	2019-01-16	2	-0/+154
\| \| \| \|	llvm-svn: 351333
*	[DAGCombine] Fix ReduceLoadWidth for shifted offsets	Sam Parker	2019-01-16	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \|	ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310
*	[GISel]: Add support for CSEing continuously during GISel passes.	Aditya Nandakumar	2019-01-16	4	-0/+58
\| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283
*	[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp	Mandeep Singh Grang	2019-01-16	6	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc. Refer D53541 for the context. Clang counterpart D56748. Reviewers: rnk, efriedma Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56747 llvm-svn: 351281
*	[X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar ↵	Craig Topper	2019-01-15	1	-92/+108
\| \| \| \| \| \| \| \|	integer. We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275
*	AMDGPU: Raise the priority of MAD24 in instruction selection.	Changpeng Fang	2019-01-15	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273
*	X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)	Roman Lebedev	2019-01-15	1	-35/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`. That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]]. That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it. It will not be unexpected if you will be doing further computations in 32-bit width. And then the current code breaks, as the tests show. The basic idea/pattern here is following: 1. We have `i64` input 2. We perform `i64` right-shift on it. 3. We `trunc`ate that shifted value 4. We do all further work (masking) in `i32` Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift. BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64` and truncate the result after masking: https://rise4fun.com/Alive/K4B ``` Name: @bextr64_32_b1 -> @bextr64_32_b0 %shiftedval = lshr i64 %val, %numskipbits %truncshiftedval = trunc i64 %shiftedval to i32 %widenumlowbits1 = zext i8 %numlowbits to i32 %notmask1 = shl nsw i32 -1, %widenumlowbits1 %mask1 = xor i32 %notmask1, -1 %res = and i32 %truncshiftedval, %mask1 => %shiftedval = lshr i64 %val, %numskipbits %widenumlowbits = zext i8 %numlowbits to i64 %notmask = shl nsw i64 -1, %widenumlowbits %mask = xor i64 %notmask, -1 %wideres = and i64 %shiftedval, %mask %res = trunc i64 %wideres to i32 ``` Thus, we are again able to extract that `lshr` into `BEXTR`'s control. Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea: ``` $ cat /tmp/old.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RSI # LLVM-EXEGESIS-LIVEIN EDX # LLVM-EXEGESIS-LIVEIN RDI movq %rsi, %rcx shrq %cl, %rdi shll $8, %edx bextrl %edx, %edi, %eax $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o --- mode: latency key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.6638, per_snippet_value: 2.6552 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o --- mode: uops key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... ``` vs ``` $ cat /tmp/new.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RDX # LLVM-EXEGESIS-LIVEIN SIL # LLVM-EXEGESIS-LIVEIN RDI shlq $8, %rdx movzbl %sil, %eax orq %rdx, %rax bextrq %rax, %rdi, %rax $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o --- mode: latency key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.7454, per_snippet_value: 2.9816 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o --- mode: uops key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... ``` ^ latency increased (worse). Except //maybe// not really. Like with all synthetic benchmarks, they //may// be misleading. Let's take a look on some actual real-world hotpath. In this case it's 'my' [[ https://github.com/darktable-org/rawspeed \| RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ https://github.com/darktable-org/rawspeed/blob/e3316dc85127c2c29baa40f998f198a7b278bf36/src/librawspeed/decompressors/VC5Decompressor.cpp#L814 \| GoPro VC5 decompressor ]]: ``` raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM 2018-12-22 21:23:03 Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench Run on (8 X 4012.81 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.41, 2.41, 2.03 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 40 ms 40 ms 128 0.322244 7.96974 12M 37.4457M 298.534M 3.12047 24.8778 0.040465 GOPR9172.GPR/threads:8/real_time_median 39 ms 39 ms 128 0.312606 7.99155 12M 38.387M 306.788M 3.19891 25.5656 0.039115 GOPR9172.GPR/threads:8/real_time_stddev 4 ms 3 ms 128 0.0271557 0.130575 0 2.4941M 21.3909M 0.207842 1.78257 3.81081m RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9 2018-12-22 21:23:08 Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Run on (8 X 4013.1 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.78, 2.50, 2.06 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 39 ms 39 ms 128 0.311533 7.97323 12M 38.6828M 308.471M 3.22356 25.706 0.0390928 GOPR9172.GPR/threads:8/real_time_median 38 ms 38 ms 128 0.304231 7.99005 12M 39.4437M 315.527M 3.28698 26.294 0.0380316 GOPR9172.GPR/threads:8/real_time_stddev 3 ms 3 ms 128 0.0229149 0.133814 0 2.26225M 19.1421M 0.188521 1.59517 3.13671m Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 GOPR9172.GPR/threads:8/real_time_mean -0.0339 -0.0316 40 39 40 39 GOPR9172.GPR/threads:8/real_time_median -0.0277 -0.0274 39 38 39 38 GOPR9172.GPR/threads:8/real_time_stddev -0.1769 -0.1267 4 3 3 3 ``` I.e. this results in //roughly// -3% improvements in perf. While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]], it won't address it fully. Reviewers: RKSimon, craig.topper, andreadb, spatel Reviewed By: craig.topper Subscribers: courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D56052 llvm-svn: 351253