std::mutex vs. boost::mutex (1.71.0) vs. Windows CRITICAL_SECTION with VS2019 and Windows 10
現在の Windows 10 で Visual Studio 2019 の C++ プロジェクトので一般的に採用しやすい Mutex の各種の実装の実効速度的な優位性についてのメモです。特に結論とかべき論を展開する気はないのであくまでも参考程度に図ってみたらこういう結果になる事もあった、程度の事です。
条件
- CPU: AMD Threadripper 2990WX ( 3.4GHz x 32 cores )
- OS: Microsoft Windows (Version 10.0.18362.476)
- 開発環境: Microsoft Visual Studio 2019 Community (Version 16.3.9)
- Boost: 1.71.0
- Windows SDK: 10.0.17763.0
- Platform Toolset: Visual Studio 2019 (v142)
今回、CPUに本気を出させていませんが、このメモでは評価対象の実効速度性能を同じ条件で比較できればよいので気にしなくてよいです。
計測用に書いたコードはメモの末尾につけます。内容は、
- 「
std::pow
による計算を Mutex ( std::mutex / boost::mutex / Windows CRITICAL_SECTION ) で排他制御しながら一定回数(=1,000,000)繰り返す負荷」を - 「同時に一定数(=8)のスレッドから処理させる計測」を
- 「ばらつきも見るために一定回数(=8)繰り返す」
という実装にしました。
結果は、 簡単な開発環境のバナー --> 計測前の warm up --> std::mutex --> boost::mutex --> Windows CRITICAL_SECTION の順に出力しています。 calculated value は各計測バッチごとに、負荷による計算の結果(値そのものに意味はありません)を表示しています。 warm up は排他制御していないので計算結果は実行ごとに競合が発生し変化し得ます。計算負荷の中心は{ネイピア数}を{円周率の指数関数で増幅}/{1/円周率の指数関数で減衰}を繰り返す内容です。
Result ( release: /Ox )
[the Benchmark of the Windows Mutices] _MSC_VER = 1923 _DEBUG = (undefined: RELEASE BUILD) _WIN32_WINNT = 0x0A00 <warm up> wall=0.19 user=1.28 system=0.00 total(u+s)=1.28 p(t/w)=680.9% | caclulated value = 2.71828 wall=0.18 user=1.31 system=0.00 total(u+s)=1.31 p(t/w)=718.4% | caclulated value = 2.71828 wall=0.21 user=1.63 system=0.00 total(u+s)=1.63 p(t/w)=786.5% | caclulated value = 2.71828 wall=0.21 user=1.53 system=0.00 total(u+s)=1.53 p(t/w)=741.1% | caclulated value = 2.71828 wall=0.21 user=1.53 system=0.00 total(u+s)=1.53 p(t/w)=743.2% | caclulated value = 23.1407 wall=0.20 user=1.56 system=0.00 total(u+s)=1.56 p(t/w)=788.9% | caclulated value = 2.71828 wall=0.21 user=1.52 system=0.00 total(u+s)=1.52 p(t/w)=734.6% | caclulated value = 2.71828 wall=0.21 user=1.53 system=0.00 total(u+s)=1.53 p(t/w)=742.8% | caclulated value = 2.71828 <std::mutex> wall=1.00 user=7.48 system=0.00 total(u+s)=7.48 p(t/w)=749.6% | caclulated value = 2.71828 wall=0.89 user=6.92 system=0.00 total(u+s)=6.92 p(t/w)=773.8% | caclulated value = 2.71828 wall=1.07 user=8.17 system=0.00 total(u+s)=8.17 p(t/w)=765.8% | caclulated value = 2.71828 wall=1.01 user=7.70 system=0.00 total(u+s)=7.70 p(t/w)=759.4% | caclulated value = 2.71828 wall=0.94 user=7.11 system=0.00 total(u+s)=7.11 p(t/w)=758.8% | caclulated value = 2.71828 wall=0.90 user=7.06 system=0.00 total(u+s)=7.06 p(t/w)=783.4% | caclulated value = 2.71828 wall=0.93 user=7.19 system=0.00 total(u+s)=7.19 p(t/w)=773.7% | caclulated value = 2.71828 wall=0.93 user=7.16 system=0.00 total(u+s)=7.16 p(t/w)=770.9% | caclulated value = 2.71828 <boost::mutex> wall=0.91 user=0.53 system=1.03 total(u+s)=1.56 p(t/w)=172.6% | caclulated value = 2.71828 wall=0.91 user=0.64 system=0.89 total(u+s)=1.53 p(t/w)=169.1% | caclulated value = 2.71828 wall=0.91 user=0.73 system=0.80 total(u+s)=1.53 p(t/w)=168.3% | caclulated value = 2.71828 wall=0.88 user=0.69 system=0.78 total(u+s)=1.47 p(t/w)=166.2% | caclulated value = 2.71828 wall=0.91 user=0.67 system=0.91 total(u+s)=1.58 p(t/w)=173.9% | caclulated value = 2.71828 wall=0.90 user=0.83 system=0.63 total(u+s)=1.45 p(t/w)=161.2% | caclulated value = 2.71828 wall=0.90 user=0.66 system=0.98 total(u+s)=1.64 p(t/w)=181.4% | caclulated value = 2.71828 wall=0.91 user=0.58 system=0.94 total(u+s)=1.52 p(t/w)=166.9% | caclulated value = 2.71828 <Windows CRITICAL_SECTION> wall=1.77 user=6.55 system=3.03 total(u+s)=9.58 p(t/w)=539.8% | caclulated value = 2.71828 wall=1.74 user=5.89 system=3.16 total(u+s)=9.05 p(t/w)=519.4% | caclulated value = 2.71828 wall=1.75 user=5.00 system=3.06 total(u+s)=8.06 p(t/w)=459.9% | caclulated value = 2.71828 wall=1.75 user=6.33 system=3.03 total(u+s)=9.36 p(t/w)=535.1% | caclulated value = 2.71828 wall=1.75 user=6.61 system=2.89 total(u+s)=9.50 p(t/w)=543.9% | caclulated value = 2.71828 wall=1.75 user=5.73 system=3.41 total(u+s)=9.14 p(t/w)=521.9% | caclulated value = 2.71828 wall=1.77 user=5.80 system=3.67 total(u+s)=9.47 p(t/w)=536.4% | caclulated value = 2.71828 wall=1.75 user=6.47 system=3.20 total(u+s)=9.67 p(t/w)=552.8% | caclulated value = 2.71828
これを見た私はこの性能評価をしようと思ったきっかけに対して「ぁー」という感想を得られました。わたしは満足。実用上のヒントとしては十分。いまのところさらにINTERNALは想像以上の興味はないので。
Result ( Debug: /Od )
[the Benchmark of the Windows Mutices] _MSC_VER = 1923 _DEBUG = (defined: DEBUG BUILD) _WIN32_WINNT = 0x0A00 <warm up> wall=0.58 user=4.63 system=0.00 total(u+s)=4.63 p(t/w)=790.6% | caclulated value = 23.1407 wall=0.59 user=4.70 system=0.02 total(u+s)=4.72 p(t/w)=798.1% | caclulated value = 23.1407 wall=0.60 user=4.81 system=0.00 total(u+s)=4.81 p(t/w)=796.2% | caclulated value = 23.1407 wall=0.58 user=4.59 system=0.00 total(u+s)=4.59 p(t/w)=797.4% | caclulated value = 2.71828 wall=0.58 user=4.63 system=0.00 total(u+s)=4.63 p(t/w)=791.5% | caclulated value = 2.71828 wall=0.59 user=4.59 system=0.00 total(u+s)=4.59 p(t/w)=784.9% | caclulated value = 2.71828 wall=0.60 user=4.69 system=0.00 total(u+s)=4.69 p(t/w)=787.7% | caclulated value = 2.71828 wall=0.59 user=4.69 system=0.00 total(u+s)=4.69 p(t/w)=790.2% | caclulated value = 23.1407 <std::mutex> wall=4.69 user=37.03 system=0.00 total(u+s)=37.03 p(t/w)=788.8% | caclulated value = 2.71828 wall=4.67 user=36.80 system=0.03 total(u+s)=36.83 p(t/w)=788.9% | caclulated value = 2.71828 wall=4.62 user=36.58 system=0.02 total(u+s)=36.59 p(t/w)=791.5% | caclulated value = 2.71828 wall=4.63 user=36.58 system=0.00 total(u+s)=36.58 p(t/w)=789.9% | caclulated value = 2.71828 wall=4.67 user=36.92 system=0.00 total(u+s)=36.92 p(t/w)=790.4% | caclulated value = 2.71828 wall=4.68 user=37.22 system=0.00 total(u+s)=37.22 p(t/w)=794.9% | caclulated value = 2.71828 wall=4.63 user=36.55 system=0.00 total(u+s)=36.55 p(t/w)=789.3% | caclulated value = 2.71828 wall=4.67 user=37.08 system=0.00 total(u+s)=37.08 p(t/w)=793.3% | caclulated value = 2.71828 <boost::mutex> wall=4.11 user=4.03 system=3.78 total(u+s)=7.81 p(t/w)=190.0% | caclulated value = 2.71828 wall=4.11 user=3.64 system=4.06 total(u+s)=7.70 p(t/w)=187.4% | caclulated value = 2.71828 wall=4.13 user=3.63 system=4.11 total(u+s)=7.73 p(t/w)=187.4% | caclulated value = 2.71828 wall=4.11 user=3.83 system=3.97 total(u+s)=7.80 p(t/w)=189.8% | caclulated value = 2.71828 wall=4.12 user=3.64 system=4.14 total(u+s)=7.78 p(t/w)=188.9% | caclulated value = 2.71828 wall=4.12 user=3.86 system=3.89 total(u+s)=7.75 p(t/w)=188.3% | caclulated value = 2.71828 wall=4.11 user=3.56 system=4.11 total(u+s)=7.67 p(t/w)=186.8% | caclulated value = 2.71828 wall=4.12 user=3.92 system=4.06 total(u+s)=7.98 p(t/w)=194.0% | caclulated value = 2.71828 <Windows CRITICAL_SECTION> wall=3.62 user=22.45 system=2.83 total(u+s)=25.28 p(t/w)=698.5% | caclulated value = 2.71828 wall=3.60 user=22.27 system=2.91 total(u+s)=25.17 p(t/w)=698.8% | caclulated value = 2.71828 wall=3.63 user=22.47 system=2.81 total(u+s)=25.28 p(t/w)=696.7% | caclulated value = 2.71828 wall=3.65 user=23.31 system=2.77 total(u+s)=26.08 p(t/w)=714.3% | caclulated value = 2.71828 wall=3.63 user=23.52 system=2.20 total(u+s)=25.72 p(t/w)=708.8% | caclulated value = 2.71828 wall=3.64 user=23.55 system=2.63 total(u+s)=26.17 p(t/w)=719.1% | caclulated value = 2.71828 wall=3.62 user=22.28 system=2.92 total(u+s)=25.20 p(t/w)=695.6% | caclulated value = 2.71828 wall=3.61 user=21.94 system=3.14 total(u+s)=25.08 p(t/w)=694.8% | caclulated value = 2.71828
デバッグビルド版は今回のメモの目的ではないのでおまけ参考データ程度です。