
ă¸ăĽăŞă˘ăŻćăčĽăć°ĺŚçăăă°ăŠăăłă°č¨čŞăŽ1ă¤ă§ăăăăăŽĺéăŽä¸ťčŚăŞăăă°ăŠăăłă°č¨čŞă§ăăă¨ä¸ťĺźľăăŚăăžăă ćŽĺżľăŞăăăçžćçšă§ăŻăăˇă˘čŞă§ĺĺăŞćçŽăăŞăăčąčŞă§ĺ
ĽćĺŻč˝ăŞčłćăŤăŻăă¸ăĽăŞă˘ăŽĺçăŞéçşăŽăăăŤçžĺ¨ăŽăăźă¸ă§ăłăŤĺż
ăăă寞ĺżăăŞăć
ĺ ąăĺŤăžăăŚăăžăăăĺĺżč
ăŽă¸ăĽăŞă˘ăăă°ăŠăăźăŤă¨ăŁăŚăŻăăăŻćăăă§ăŻăăăžăăă ăŽăŁăăăĺăăç°ĄĺăŞäžăŽĺ˝˘ă§čŞč
ăŤă¸ăĽăŞă˘ăŽă˘ă¤ăă˘ăäźăăăă¨ăăžăă
ăăŽč¨äşăŽçŽçăŻăčŞč
ăŤJuliaăăă°ăŠăăłă°č¨čŞă§ăăźăăŤăćä˝ăăĺşćŹçăŞćšćłăŽă˘ă¤ăă˘ăćäžăăĺŽéăŽăăźăżăĺŚçăăăăăŤăăŽăăă°ăŠăăłă°č¨čŞăŽä˝żç¨ăéĺ§ăăăăäżăăă¨ă§ăă čŞč
ăŻăă§ăŤäťăŽăăă°ăŠăăłă°č¨čŞăŤç˛žéăăŚăăăă¨ăĺćă¨ăăŚăăăăăăăăăŠăŽăăăŤčĄăăăăăŤă¤ăăŚćĺ°éăŽć
ĺ ąăŽăżăćäžăăžăăăăăźăżĺŚçćšćłăŽčŠłç´°ăŤă¤ăăŚăŻčŞŹćăăžăăă
ăăĄăăăăăźăżĺćăĺŽčĄăăăăă°ăŠă ăŽä˝ćĽă§ćăéčŚăŞćŽľéăŽ1ă¤ăŻăă¤ăłăăźăă¨ă¨ăŻăšăăźăă§ăă ăăăŤăćăä¸čŹçăŞăăźăżčĄ¨ç¤şĺ˝˘ĺźăŻăăźăăŤă§ăă JuliaăŤăŻăăŞăŹăźăˇă§ăăŤDBMSă¸ăŽă˘ăŻăťăšăćäžăăHDF5ăMATLABăJLDăŞăŠăŽäş¤ćăăŠăźăăăă使ç¨ăăăŠă¤ăăŠăŞăăăăžăă ăă ăăăăŽĺ ´ĺăŻăCSVăŞăŠăŽăăźăăŤă襨ăăăăšă形ĺźăŤăŽăżé˘ĺżăăăăžăă
ăăźăăŤăčŚăĺăŤăăăŽăăźăżć§é ăŽăăŹăźăłăăźăˇă§ăłăĺ°ăç´šäťăăĺż
čŚăăăăžăă JuliaăŽĺ ´ĺăăăźăăŤăŻ2揥ĺ
é
ĺăžăăŻDataFrameă¨ăăŚčĄ¨ăăă¨ăă§ăăžăă
é
ĺ
ă¸ăĽăŞă˘ăŽé
ĺăăĺ§ăăžăăăă čŚç´ ăŽçŞĺˇäťăăŻ1ăăĺ§ăžăăžăă ăăăŻć°ĺŚč
ăŤă¨ăŁăŚé常ăŤčŞçśă§ăăăăăăŤăFortranăPascalăMatlabă§ăĺăăšăăźă ă使ç¨ăăăžăă ăăăăŽč¨čŞă使ç¨ăăăă¨ăăŞăăăă°ăŠăăźăŤă¨ăŁăŚăăăŽçŞĺˇäťăăŻä¸ĺżŤăŤćăăăĺ˘çćĄäťśăč¨čż°ăăă¨ăăŤă¨ăŠăźăĺźă辡ăăăăăăăžăăăăĺŽéăŤăŻăăăăŻĺăŞăçżć
ŁăŽĺéĄă§ăă ă¸ăĽăŞă˘ăć°éąé使ç¨ăăĺžăč¨čŞă˘ăăŤéăŽĺăćżăăŽĺéĄăŻçşçăăŞăăŞăăžăăă
ăăŽč¨čŞăŽ2çŞçŽăŽéčŚăŞçšăŻăé
ĺăŽĺ
é¨čĄ¨çžă§ăă JuliaăŤă¨ăŁăŚăçˇĺ˝˘é
ĺăŻĺă§ăă ĺćăŤăCăJavaăŞăŠăŽč¨čŞă§ăŻă1揥ĺ
é
ĺăŻćĺĺă§ăă
ăăăăłăăłăăŠă¤ăłďźREPLďźă§ä˝ćăăé
ĺă§čŞŹćăăžă
julia> a = [1, 2, 3] 3-element Array{Int64,1}: 1 2 3
é
ĺăŽăżă¤ăăŤćł¨ćăăŚăă ăă-é
ĺ{Int64,1}ă é
ĺăŻ1揥ĺ
ă§ăInt64ĺă§ăă ăăăŤăăăŽé
ĺăĺĽăŽé
ĺă¨çľăżĺăăăăĺ ´ĺăŻăĺăćąăŁăŚăăăăăvcaté˘ć°ďźă¤ăžăăĺç´éŁçľďźă使ç¨ăăĺż
čŚăăăăžăă çľćăŻć°ăăĺă§ăă
julia> b = vcat(a, [5, 6, 7]) 7-element Array{Int64,1}: 1 2 3 5 6 7
é
ĺăćĺĺă¨ăăŚä˝ćăăĺ ´ĺăăŞăăŠăŤăč¨čż°ăăă¨ăăŤăăłăłăăŽäťŁăăăŤăšăăźăšă使ç¨ăăŚăé
ĺ{Int64,2}ăŽ2揥ĺ
é
ĺăĺĺžăăžăă ĺ厣č¨ăŽ2çŞçŽăŽĺźć°ăŻăĺ¤ćŹĄĺ
é
ĺăŽĺş§ć¨ăŽć°ăćĺłăăžăă
julia> c = [1 2 3] 1Ă3 Array{Int64,2}: 1 2 3
ă¤ăžăă1čĄ3ĺăŽăăăŞăăŻăšăĺĺžăăžăăă
ăăŽčĄă¨ĺăŽčĄ¨ç¤şăFortrană¨MatlabăŽçšĺž´ă§ăăăJuliaăŻă˘ăăŞăąăźăˇă§ăłăŽĺéăŤçšĺăăč¨čŞă§ăăăă¨ăćăĺşăăŚăă ăăă
JuliaăŽăăăŞăăŻăšăŻăăăšăŚăŽăťăŤăĺăăżă¤ăăŽ2揥ĺ
é
ĺă§ăă Int64ăFloat64ăăăăŤăŻStringăŞăŠăĺăć˝čąĄAnyăžăăŻé常ăŤĺ
ˇä˝çă§ăăĺŻč˝ć§ăăăăă¨ăŤćł¨ćăăŚăżăžăăăă
ăŞăăŠăŤăŽĺ˝˘ĺźă§ăăăŞăăŻăšăä˝ćă§ăăžăă
julia> a = [1 2; 3 4] 2Ă2 Array{Int64,2}: 1 2 3 4
ăłăłăšăăŠăŻăżăźă使ç¨ăăŚä˝ćăăĺćĺăŞăă§ăĄă˘ăŞăĺ˛ăĺ˝ăŚăžăďźundefďźďź
julia> a = Array{Int64,2}(undef, 2, 3) 2Ă3 Array{Int64,2}: 4783881648 4783881712 4782818640 4783881680 4783881744 4782818576
ăžăăŻăundefăŽäťŁăăăŤçšĺŽăŽĺ¤ăćĺŽăăăŚăăĺ ´ĺăŻĺćĺă
ĺĽă
ăŽĺăăăŽćĽçďź
julia> a = [1, 2, 3] 3-element Array{Int64,1}: 1 2 3 julia> b = hcat(a, a, a, a) 3Ă4 Array{Int64,2}: 1 1 1 1 2 2 2 2 3 3 3 3
ăŠăłăă ăŤĺćĺăăďź
julia> x = rand(1:10, 2, 3) 2Ă3 Array{Int64,2}: 1 10 2 9 7 7
ĺźć°rand-1ă10ăŽçŻĺ˛ă§ă揥ĺ
2 x 3ă
ăžăăŻăĺ
ĺŤă使ç¨ăăžăďźĺ
ĺ
襨č¨ďź
julia> x = [min(i, j) for i = 0:2, j = 0:2 ] 3Ă3 Array{Int64,2}: 0 0 0 0 1 1 0 1 2
ă¸ăĽăŞă˘ăŽĺăŻăĄă˘ăŞăŽçˇĺ˝˘ăăăăŻă§ăăă¨ăăäşĺŽăŻăĺăă¨ăŽčŚç´ ăŽĺ垊ăčĄăă˝ăźăăăăăăăŻăăăŤéŤéă§ăăă¨ăăäşĺŽăŤă¤ăŞăăăă¨ăŤćł¨ćăăŚăă ăăă çšăŤă揥ăŽäžă§ăŻă1_000_000čĄă¨100ĺăŽăăăŞăăŻăšă使ç¨ăăŚăăžăă
#!/usr/bin/env julia using BenchmarkTools x = rand(1:1000, 1_000_000, 100) #x = rand(1_000_000, 100) function sumbycolumns(x) sum = 0 rows, cols = size(x) for j = 1:cols, i = 1:rows sum += x[i, j] end return sum end @show @btime sumbycolumns(x) function sumbyrows(x) sum = 0 rows, cols = size(x) for i = 1:rows, j = 1:cols sum += x[i, j] end return sum end @show @btime sumbyrows(x)
çľćďź
74.378 ms (1 allocation: 16 bytes) =# @btime(sumbycolumns(x)) = 50053093495 206.346 ms (1 allocation: 16 bytes) =# @btime(sumbyrows(x)) = 50053093495
ăăŽäžăŽ@btimeăŻăĺŽčĄăŤăăăĺšłĺćéăč¨çŽăăé˘ć°ăŽč¤ć°ăŽĺŽčĄă§ăă ăăŽăăŻăăŻBenchmarkTools.jlăŠă¤ăăŠăŞăŤăăŁăŚćäžăăăžăă JuliaăăźăšăăăăŤăŻćéăăŻăăăăăžăăăĺä¸ăŽééă渏ĺŽăăžăăăăŽĺ ´ĺăä¸ćŁç˘şăŤăŞăăžăă showăăŻăăŻăĺăŤĺźă¨ăăŽč¨çŽĺ¤ăăłăłă˝ăźăŤăŤčĄ¨ç¤şăăžăă
ĺăšăăŹăźă¸ăŽćéŠĺăŻăăăźăăŤă§çľąč¨ćä˝ăĺŽčĄăăăŽăŤäžżĺŠă§ăă ĺžćĽăăăźăăŤăŻĺăŽć°ăŤăăŁăŚĺśéăăăčĄăŽć°ăŻäťťćă§ăăăăăĺšłĺĺ¤ăćĺ°ĺ¤ăć大ĺ¤ăŽč¨çŽăŞăŠăŽăťă¨ăăŠăŽćä˝ăŻăčĄă§ăŻăŞăčĄĺăŽĺăŤĺŻžăăŚăŽăżĺŽčĄăăăžăă
2揥ĺ
é
ĺăŽĺ瞊čŞăŻMatrixĺă§ăă ăă ăăăăăŻĺż
čŚă¨ăăăăăăăăšăżă¤ăŤăŽĺŠäžżć§ă§ăă
ăăăŞăăŻăščŚç´ ă¸ăŽă˘ăŻăťăšăŻă¤ăłăăăŻăšăŤăăŁăŚĺŽčĄăăăžăă ăă¨ăă°ă䝼ĺăŤä˝ćăăăăăăŞăăŻăšăŽĺ ´ĺ
julia> x = rand(1:10, 2, 3) 2Ă3 Array{Int64,2}: 1 10 2 9 7 7
çšĺŽăŽčŚç´ ăx [1ă2] => 10ă¨ăăŚĺĺžă§ăăžăăăăăăŁăŚăĺĺ
¨ä˝ăăă¨ăă°2çŞçŽăŽĺăĺĺžăăžăă
julia> x[:, 2] 2-element Array{Int64,1}: 10 7
ăžăăŻă2čĄçŽďź
julia> x[2, :] 3-element Array{Int64,1}: 9 7 7
ăžăăäžżĺŠăŞselectdimé˘ć°ăăăăžăăăăŽé˘ć°ă§ăŻăé¸ćăăăăŁăĄăłăˇă§ăłăŽĺşć°ă¨ăăăŽăăŁăĄăłăˇă§ăłăŽčŚç´ ăŽă¤ăłăăăŻăšăćĺŽă§ăăžăă ăă¨ăă°ă1çŞçŽă¨3çŞçŽăŽă¤ăłăăăŻăšăé¸ćăăŚă2çŞçŽăŽćŹĄĺ
ďźĺďźă§é¸ćăčĄăăžăă ăăŽă˘ăăăźăăŻăćĄäťśăŤăăŁăŚăŻčĄă¨ĺăĺăćżăăĺż
čŚăăăĺ ´ĺăŤäžżĺŠă§ăă ăă ăăăăăŻă揥ĺ
ć°ă2ăčś
ăăĺ¤ćŹĄĺ
ăŽĺ ´ĺăŤĺ˝ăŚăŻăžăăžăă
julia> selectdim(x, 2, [1, 3]) 2Ă2 view(::Array{Int64,2}, :, [1, 3]) with eltype Int64: 1 2 9 7
é
ĺăŽçľąč¨ĺŚçăŽăăăŽé˘ć°
1揥ĺ
é
ĺăŽčŠłç´°
ĺ¤ćŹĄĺ
é
ĺ
çˇĺ˝˘äťŁć°ăŽé˘ć°ă¨çšćŽăŞĺ˝˘ĺźăŽčĄĺ
ăăĄă¤ăŤăăăăźăăŤăčŞăżĺăăŤăŻăDelimitedFilesăŠă¤ăăŠăŞăŤĺŽčŁ
ăăăŚăăreaddlmé˘ć°ă使ç¨ăăžăă č¨é˛-writedlmă使ç¨ăăžăă ăăăăŽé˘ć°ăŻăĺşĺăăăĄă¤ăŤă§ăŽä˝ćĽăćäžăăžăăăăŽçšĺŽăŽăąăźăšăŻCSV形ĺźă§ăă
ăăăĽăĄăłăăŽäžă使ç¨ăăŚčŞŹćăăžăă
julia> using DelimitedFiles julia> x = [1; 2; 3; 4]; julia> y = ["a"; "b"; "c"; "d"]; julia> open("delim_file.txt", "w") do io writedlm(io, [xy]) # end; julia> readdlm("delim_file.txt") # 4Ă2 Array{Any,2}: 1 "a" 2 "b" 3 "c" 4 "d"
ăăŽĺ ´ĺăăăźăăŤăŤăăžăăžăŞăżă¤ăăŽăăźăżăĺŤăžăăŚăăăă¨ăŤćł¨ćăăĺż
čŚăăăăžăă ăăŽăăăăăĄă¤ăŤăčŞăżĺăă¨ăé
ĺ{Anyă2}ăżă¤ăăŽăăăŞăăŻăšăä˝ćăăăžăă
ĺĽăŽäžăŻăĺ税ăŽăăźăżăĺŤăăăźăăŤăŽčŞăżĺăă§ăă
julia> using DelimitedFiles julia> x = [1; 2; 3; 4]; julia> y = [5; 6; 7; 8]; julia> open("delim_file.txt", "w") do io writedlm(io, [xy]) # end; julia> readdlm("delim_file.txt", Int64) # Int64 4Ă2 Array{Int64,2}: 1 5 2 6 3 7 4 8 julia> readdlm("delim_file.txt", Float64) # Float64 4Ă2 Array{Float64,2}: 1.0 5.0 2.0 6.0 3.0 7.0 4.0 8.0
ĺŚçĺšçăŽčŚłçšăăăăăŽăŞăăˇă§ăłăŻăăăźăżăăłăłăăŻăăŤčĄ¨ç¤şăăăăăăćăžăăă§ăă ĺćăŤăăăăŞăăŻăšă§čĄ¨ăăăăăźăăŤăŽć示çăŞĺśéăŻăăăźăżăŽĺä¸ć§ăŽčŚäťśă§ăă
ăăăĽăĄăłăăŽĺŽĺ
¨ăŞreaddlmćŠč˝ă確čŞăăăă¨ăăĺ§ăăăžă čż˝ĺ ăŞăăˇă§ăłăŽä¸ă§ăŻ ăăăăăźăŽĺŚçă˘ăźăăčĄăŽăšăăăăăťăŤăŽĺŚçćŠč˝ăŞăŠăćĺŽă§ăăžă ă
ăăźăăŤăčŞăżĺăĺĽăŽćšćłăŻCSV.jlăŠă¤ăăŠăŞă§ăreaddlmăwritedlmă¨ćŻčźăăŚăăăŽăŠă¤ăăŠăŞăŻăć¸ăčžźăżăăăłčŞăżĺăăŽăŞăăˇă§ăłă大ĺš
ăŤĺśĺžĄăăĺşĺăăăĄă¤ăŤăŽăăźăżăăă§ăăŻăăžăă ăă ăăĺşćŹçăŞéăăŻăCSV.Fileé˘ć°ăŽçľćăDataFrameĺăŤĺ
ˇä˝ĺă§ăăăă¨ă§ăă
ăăźăżăăŹăźă
DataFramesăŠă¤ăăŠăŞăŻăăăźăăŤăŽčĄ¨ç¤şăŤçŚçšăĺ˝ăŚăDataFrameăăźăżć§é ăŽăľăăźăăćäžăăžăă ăăă§ăŽăăăŞăăŻăšă¨ăŽĺşćŹçăŞéăăŻăĺĺăĺĺĽăŤäżĺăăăĺĺăŤçŹčŞăŽĺĺăăăăă¨ă§ăă JuliaăŽĺ ´ĺăä¸čŹăŤĺĺä˝ăŽăšăăŹăźă¸ă˘ăźăăčŞçśă§ăăăă¨ăćăĺşăăŚăă ăăă ăžăăăăă§ăŻ1揥ĺ
é
ĺăŽçšćŽăŞăąăźăšăăăăžăăăĺĺăŽăżă¤ăăŻĺĺĽăŤă§ăăăăăăăźăżčĄ¨çžăŽé庌ă¨ćčťć§ăŽä¸ĄćšăŽçšă§ćéŠăŞă˝ăŞăĽăźăˇă§ăłăĺžăăăžăă
DataFrameăŽä˝ććšćłăčŚăŚăżăžăăăă
äťťćăŽăăăŞăăŻăšăDataFrameăŤĺ¤ćă§ăăžăă
julia> using DataFrames julia> a = [1 2; 3 4; 5 6] 3Ă2 Array{Int64,2}: 1 2 3 4 5 6 julia> b = convert(DataFrame, a) 3Ă2 DataFrame â Row â x1 â x2 â â â Int64 â Int64 â âââââââźââââââââźâââââââ⤠â 1 â 1 â 2 â â 2 â 3 â 4 â â 3 â 5 â 6 â
converté˘ć°ăŻăăăźăżăćĺŽăăăĺăŤĺ¤ćăăžăă ăăăăŁăŚăDataFrameăżă¤ăăŽĺ ´ĺăĺ¤ćé˘ć°ăŽăĄă˝ăăăŻDataFramesăŠă¤ăăŠăŞă§ĺŽçžŠăăăžăďźJuliaăŽç¨čŞăŤăăă¨ăé˘ć°ăăăăăăžăăžăŞĺźć°ăćă¤ăăžăăžăŞĺŽčŁ
ăŻăĄă˝ăăă¨ĺźă°ăăžăďźă ăăăŞăăŻăšăŽĺăŤăŻăĺĺx1ăx2ăčŞĺçăŤĺ˛ăĺ˝ăŚăăăăă¨ăŤćł¨ćăăŚăă ăăă ă¤ăžăăĺĺăčŚćąăăĺ ´ĺăé
ĺăŽĺ˝˘ĺźă§ĺĺžăăžăă
julia> names(b) 2-element Array{Symbol,1}: :x1 :x2
ăžăăĺĺăŻSymbolďźRubyăŽä¸çă§ăŻăăçĽăăăŚăăďźăŽăăăŞĺ˝˘ĺźă§čĄ¨ç¤şăăăžăă
DataFrameăŻç´ćĽä˝ćă§ăăžă-ä˝ććăŤçŠşăžăăŻä¸é¨ăŽăăźăżăĺŤăżăžăă äžďź
julia> df = DataFrame([collect(1:3), collect(4:6)], [:A, :B]) 3Ă2 DataFrame â Row â A â B â â â Int64 â Int64 â âââââââźââââââââźâââââââ⤠â 1 â 1 â 4 â â 2 â 2 â 5 â â 3 â 3 â 6 â
ăăă§ăŻăĺăŽĺ¤ăćă¤é
ĺă¨ăăăăăŽĺăŽĺĺăćă¤é
ĺă示ăăžăă collectďź1ďź3ďźă¨ăă形ĺźăŽć§ćä˝ăŻă1ăă3ăžă§ăŽĺ垊ĺçŻĺ˛ăĺ¤ăŽé
ĺăŤĺ¤ćăăžăă
ĺă¸ăŽă˘ăŻăťăšăŻăĺĺă¨ă¤ăłăăăŻăšăŽä¸Ąćšă§ĺŻč˝ă§ăă
ć˘ĺăŽăăšăŚăŽčĄăŤĺ¤ăć¸ăčžźăăă¨ă§ăć°ăăĺăčż˝ĺ ăăăŽăŻé常ăŤç°Ąĺă§ăă ăă¨ăă°ăä¸č¨ăŽdfăScoreĺăčż˝ĺ ăăžăă ăăăčĄăăŤăŻă揥ăŽăăăŤč¨čż°ăăĺż
čŚăăăăžăă
julia> df[:Score] = 0.0 0.0 julia> df 3Ă3 DataFrame â Row â A â B â Score â â â Int64 â Int64 â Float64 â âââââââźââââââââźââââââââźâââââââââ⤠â 1 â 1 â 4 â 0.0 â â 2 â 2 â 5 â 0.0 â â 3 â 3 â 6 â 0.0 â
ĺç´ăŞăăăŞăăŻăšăŽĺ ´ĺă¨ĺć§ăŤăvcatăhcaté˘ć°ă使ç¨ăăŚDataFrameă¤ăłăšăżăłăšăćĽçă§ăăžăă ăă ăăvcatăŻä¸ĄćšăŽăăźăăŤăŽĺăĺă§ăŽăżä˝żç¨ă§ăăžăă ăă¨ăă°ă揥ăŽé˘ć°ă使ç¨ăăŚăDataFrameăć´ĺă§ăăžăă
function merge_df(first::DataFrame, second::DataFrame)::DataFrame if (first == nothing) return second else names_first = names(first) names_second = names(second) sub_names = setdiff(names_first, names_second) second[sub_names] = 0 sub_names = setdiff(names_second, names_first) first[sub_names] = 0 vcat(second, first) end end
ăăŽé˘ć°ăŻăĺĺăŽé
ĺăĺĺžăăžăă äžăŽsetdiffďźs1ăs2ďźé˘ć°ăŻăs2ăŤăŞăs1ăŽăăšăŚăŽčŚç´ ăć¤ĺşăăžăă 揥ăŤăDataFrameăăăăăŽčŚç´ ăŤĺąéăăžăă vcatăŻ2ă¤ăŽDataFrameăćĽçăăçľćăčżăăžăă ćĺžăŽćä˝ăŽçľćăŻćăăăŞăŽă§ăăăŽĺ ´ĺăŤreturnă使ç¨ăăĺż
čŚăŻăăăžăăă
çľćă確čŞă§ăăžăă
julia> df1 = DataFrame(:A => collect(1:2)) 2Ă1 DataFrame â Row â A â â â Int64 â âââââââźâââââââ⤠â 1 â 1 â â 2 â 2 â julia> df2 = DataFrame(:B => collect(3:4)) 2Ă1 DataFrame â Row â B â â â Int64 â âââââââźâââââââ⤠â 1 â 3 â â 2 â 4 â julia> df3 = merge_df(df1, df2) 4Ă2 DataFrame â Row â B â A â â â Int64 â Int64 â âââââââźââââââââźâââââââ⤠â 1 â 3 â 0 â â 2 â 4 â 0 â â 3 â 0 â 1 â â 4 â 0 â 2 â
JuliaăŽĺ˝ĺčŚĺăŤé˘ăăŚăŻăă˘ăłăăźăšăłă˘ă使ç¨ăăăă¨ăŻć
Łçżçă§ăŻăăăžăăăăčŞăżăăăăä˝ä¸ăăăă¨ăŤćł¨ćăăŚăă ăăă ăăŽĺŽčŁ
ă§ăŻăĺ
ăŽDataFrameăĺ¤ć´ăăăăă¨ăăăžăčŻăăăăžăăă ăă ăăăăă§ăăăăŽäžăŻč¤ć°ăŽĺăć´ĺăăăăăťăšă誏ćăăăŽăŤéŠăăŚăăžăă
çľĺé˘ć°ă使ç¨ăăŚăĺĺ
ăŽĺ
ąéăŽĺ¤ă§č¤ć°ăŽDataFramesăçľĺăăăă¨ăă§ăăžăďźăă¨ăă°ăä¸čŹçăŞăŚăźăśăźăŽčĺĽĺă§ç°ăŞăĺăćă¤2ă¤ăŽăăźăăŤăćĽçăăžăďźă
DataFrameăŻăăłăłă˝ăźăŤă§čĄ¨ç¤şăăăŽăŤäžżĺŠă§ăă ĺşĺăŽăăăăćšćłďź showăăŻăăŽä˝żç¨ăprintlné˘ć°ăŽä˝żç¨ăŞăŠăŤăăăăăźăăŤăčŞăżăăă形ĺźă§ăłăłă˝ăźăŤăŤĺşĺăăăžăă DataFrameă大ăăăăĺ ´ĺăéĺ§čĄă¨çľäşčĄă襨示ăăăžăă ăă ăăheadé˘ć°ă¨tailé˘ć°ă使ç¨ăăŚăăăăăć示çăŤheadă¨tailăčŚćąă§ăăžăă
DataFrameă§ăŻăćĺŽăăăé˘ć°ăŽăăźăżăŽă°ăŤăźăĺăăăłéč¨é˘ć°ăĺŠç¨ĺŻč˝ă§ăă čżăăăĺ
厚ăŤăŻéăăăăăžăă ăăăŻăă°ăŤăźăĺĺşćşăćşăăDataFrameăćă¤ăłăŹăŻăˇă§ăłăăžăăŻĺ
ăŽĺĺă¨éč¨é˘ć°ăŽĺĺăăĺĺă形ćăăăĺä¸ăŽDataFrameăŤăŞăăžăă ćŹčłŞçăŤăsplit-apply-combineăšăăźă ăĺŽčŁ
ăăăŚăăžăă 芳細ăčŚă
DataFramesăăăąăźă¸ăŽä¸é¨ă¨ăăŚä˝żç¨ĺŻč˝ăŞăľăłăăŤăăźăăŤăĺŤăăăăĽăĄăłăăŽăľăłăăŤă使ç¨ăăžăă
julia> using DataFrames, CSV, Statistics julia> iris = CSV.read(joinpath(dirname(pathof(DataFrames)), "../test/data/iris.csv"));
groupbyé˘ć°ă使ç¨ăăŚă°ăŤăźăĺăĺŽčĄăăžăă ă°ăŤăźăĺĺăŽĺĺăćĺŽăăGroupedDataFrameăżă¤ăăŽçľćăĺĺžăăžăăăăăŤăŻăă°ăŤăźăĺĺăŽĺ¤ăŤăăŁăŚĺéăăăĺă
ăŽDataFrameăŽăłăŹăŻăˇă§ăłăĺŤăžăăžăă
julia> species = groupby(iris, :Species) GroupedDataFrame with 3 groups based on key: :Species First Group: 50 rows â Row â SepalLength â SepalWidth â PetalLength â PetalWidth â Species â â â Float64 â Float64 â Float64 â Float64 â String â âââââââźââââââââââââââźâââââââââââââźââââââââââââââźâââââââââââââźâââââââââ⤠â 1 â 5.1 â 3.5 â 1.4 â 0.2 â setosa â â 2 â 4.9 â 3.0 â 1.4 â 0.2 â setosa â â 3 â 4.7 â 3.2 â 1.3 â 0.2 â setosa â
ĺčż°ăŽcollecté˘ć°ă使ç¨ăăŚăçľćăé
ĺăŤĺ¤ćă§ăăžăă
julia> collect(species) 3-element Array{Any,1}: 50Ă5 SubDataFrame{Array{Int64,1}} â Row â SepalLength â SepalWidth â PetalLength â PetalWidth â Species â â â Float64 â Float64 â Float64 â Float64 â String â âââââââźââââââââââââââźâââââââââââââźââââââââââââââźâââââââââââââźâââââââââ⤠â 1 â 5.1 â 3.5 â 1.4 â 0.2 â setosa â â 2 â 4.9 â 3.0 â 1.4 â 0.2 â setosa â â 3 â 4.7 â 3.2 â 1.3 â 0.2 â setosa â âŚ
byé˘ć°ă使ç¨ăăŚă°ăŤăźăĺăăžăă ĺ俥ăăDataFrameăŽĺĺă¨ĺŚçé˘ć°ăćĺŽăăžăă ä˝ćĽăŽćĺăŽćŽľéăŻgroupbyé˘ć°ăŤäźźăŚăăžă-DataFrameăłăŹăŻăˇă§ăłăĺăĺăăžăă ăăŽăăăŞĺDataFrameăŤă¤ăăŚăčĄăŽć°ăăŤăŚăłăăăŚĺNăŤé
罎ăăžăăçľćăŻĺä¸ăŽDataFrameăŤćĽçăăăbyé˘ć°ăŽçľćă¨ăăŚčżăăăžăă
julia> by(iris, :Species, df -> DataFrame(N = size(df, 1))) 3Ă2 DataFrame â Row â Species â N â â â Stringâ° â Int64 â âââââââźâââââââââââââźâââââââ⤠â 1 â setosa â 50 â â 2 â versicolor â 50 â â 3 â virginica â 50 â
ăăŚăćĺžăŽăŞăăˇă§ăłăŻéç´é˘ć°ă§ăă ă°ăŤăźăĺç¨ăŽĺă¨ăćŽăăŽĺăŽéč¨é˘ć°ăćĺŽăăžăă çľćăŻăă˝ăźăšĺă¨éç´é˘ć°ăŽĺĺăŤäťŁăăŁăŚĺĺă形ćăăăDataFrameă§ăă
julia> aggregate(iris, :Species, sum) 3Ă5 DataFrame âRowâSpecies âSepalLength_sumâSepalWidth_sumâPetalLength_sumâPetalWidth_sumâ â â String â Float64 â Float64 â Float64 â Float64 â âââââźâââââââââââźââââââââââââââââźâââââââââââââââźââââââââââââââââźââââââââââââââ⤠â 1 âsetosa â250.3 â 171.4 â 73.1 â 12.3 â â 2 âversicolorâ296.8 â 138.5 â 213.0 â 66.3 â â 3 âvirginica â329.4 â 148.7 â 277.6 â 101.3 â
colwiseé˘ć°ăŻăćĺŽăăăé˘ć°ăăăšăŚăžăăŻćĺŽăăăDataFrameĺăŽăżăŤéŠç¨ăăžăă
julia> colwise(mean, iris[1:4]) 4-element Array{Float64,1}: 5.843333333333335 3.057333333333334 3.7580000000000027 1.199333333333334
ăăźăăŤăŽćŚčŚăĺĺžăăăăăŽé常ăŤäžżĺŠăŞé˘ć°ăŻădescribeă§ăă 使ç¨äžďź
julia> describe(iris) 5Ă8 DataFrame âRowâ variable âmean âmin âmedianâ max ânuniqueânmissingâ eltype â â â Symbol âUnion⌠âAny âUnionâŚâ Any âUnion⌠âInt64 âDataTypeâ âââââźââââââââââââźââââââââźâââââââźâââââââźââââââââââźââââââââźâââââââââźââââââââ⤠â 1 âSepalLengthâ5.84333â 4.3 â 5.8 â 7.9 â â 0 â Float64â â 2 âSepalWidth â3.05733â 2.0 â 3.0 â 4.4 â â 0 â Float64â â 3 âPetalLengthâ3.758 â 1.0 â 4.35 â 6.9 â â 0 â Float64â â 4 âPetalWidth â1.19933â 0.1 â 1.3 â 2.5 â â 0 â Float64â â 5 âSpecies â âsetosaâ âvirginicaâ 3 â 0 â String â
DataFramesćŠč˝ăŽĺŽĺ
¨ăŞăŞăšă ă
ăăăŞăăŻăšăŽĺ ´ĺă¨ĺć§ăŤăDataFrameăŽStatisticsă˘ă¸ăĽăźăŤă§ĺŠç¨ĺŻč˝ăŞăăšăŚăŽçľąč¨é˘ć°ă使ç¨ă§ăăžăă https://docs.julialang.org/en/v1/stdlib/Statistics/index.htmlăĺç
§ăăŚăă ăă
StatPlots.jlăŠă¤ăăŠăŞăŻăDataFrameăă°ăŠăăŁăŤăŤăŤčĄ¨ç¤şăăăăăŤä˝żç¨ăăăžăă ăăŁă¨čŚăhttps://github.com/JuliaPlots/StatPlots.jl
ăăŽăŠă¤ăăŠăŞăŻăčŚčŚĺăç°Ąç´ ĺăăăăŻăăŽăťăăăĺŽčŁ
ăăŚăăžăă
julia> df = DataFrame(a = 1:10, b = 10 .* rand(10), c = 10 .* rand(10)) 10Ă3 DataFrame â Row â a â b â c â â â Int64 â Float64 â Float64 â âââââââźââââââââźââââââââââźâââââââââ⤠â 1 â 1 â 0.73614 â 7.11238 â â 2 â 2 â 5.5223 â 1.42414 â â 3 â 3 â 3.5004 â 2.11633 â â 4 â 4 â 1.34176 â 7.54208 â â 5 â 5 â 8.52392 â 2.98558 â â 6 â 6 â 4.47477 â 6.36836 â â 7 â 7 â 8.48093 â 6.59236 â â 8 â 8 â 5.3761 â 2.5127 â â 9 â 9 â 3.55393 â 9.2782 â â 10 â 10 â 3.50925 â 7.07576 â julia> @df df plot(:a, [:b :c], colour = [:red :blue])

ćĺžăŽčĄă§ăŻă@ dfăŻăăŻăădfăŻDataFrameăĺŤăĺ¤ć°ăŽĺĺă§ăă
Query.jlăŻé常ăŤäžżĺŠăŞăŠă¤ăăŠăŞă§ăă Query.jlăŻăăăŻăăŽăĄăŤăăşă ă¨ĺŚçăăŁăăŤă使ç¨ăăŚăçšćŽăŞăŻă¨ăŞč¨čŞăćäžăăžăă äžă¨ăăŚă50ćłäťĽä¸ăŽäşşă¨ăăŽĺäžăŽć°ăŽăŞăšăăĺĺžăăžăă
julia> using Query, DataFrames julia> df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2]) 3Ă3 DataFrame â Row â name â age â children â â â String â Float64 â Int64 â âââââââźâââââââââźââââââââââźââââââââââ⤠â 1 â John â 23.0 â 3 â â 2 â Sally â 42.0 â 5 â â 3 â Kirk â 59.0 â 2 â julia> x = @from i in df begin @where i.age>50 @select {i.name, i.children} @collect DataFrame end 1Ă2 DataFrame â Row â name â children â â â String â Int64 â âââââââźâââââââââźââââââââââ⤠â 1 â Kirk â 2 â
ăžăăŻăăăŁăăŤăŽăăăăŠăźă ďź
julia> using Query, DataFrames julia> df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2]); julia> x = df |> @query(i, begin @where i.age>50 @select {i.name, i.children} end) |> DataFrame 1Ă2 DataFrame â Row â name â children â â â String â Int64 â âââââââźâââââââââźââââââââââ⤠â 1 â Kirk â 2 â
芳細ăčŚă
ä¸č¨ăŽä¸ĄćšăŽäžăŻădplyrăžăăŻLINQă¨ćŠč˝çăŤéĄäźźăăăŻă¨ăŞč¨čŞăŽä˝żç¨ă示ăăŚăăžăă ăăăŤăăăăăŽč¨čŞăŻQuery.jlăŤéĺŽăăăžăăă DataFrameă§ăăăăŽč¨čŞă使ç¨ăăćšćłăŽčŠłç´°ăŤă¤ăăŚăŻăăăĄăăă茧ăă ăă ă
ćĺžăŽäžă§ăŻă| |ćźçŽĺă使ç¨ăăžăă 芳細ăă茧ăă ăă ă
ăăŽćźçŽĺăŻăĺźć°ăăăŽĺłĺ´ăŤç¤şăăăŚăăé˘ć°ăŤç˝Žăćăăžăă č¨ăćăăă°ďź
julia> [1:5;] |> x->x.^2 |> sum |> inv 0.01818181818181818
ă¨ĺçďź
julia> inv(sum( [1:5;] .^ 2 )) 0.01818181818181818
ăăăŚăç§ă注ćăăăćĺžăŽăă¨ăŻăĺčż°ăŽCSV.jlăŠă¤ăăŠăŞă使ç¨ăăŚăăťăăŹăźăżäťăăŽĺşĺăăŹăźă ăŤDataFrameăć¸ăčžźăćŠč˝ă§ă
julia> df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2]) 3Ă3 DataFrame â Row â name â age â children â â â String â Float64 â Int64 â âââââââźâââââââââźââââââââââźââââââââââ⤠â 1 â John â 23.0 â 3 â â 2 â Sally â 42.0 â 5 â â 3 â Kirk â 59.0 â 2 â julia> CSV.write("out.csv", df) "out.csv"
č¨é˛ăăăçľćă確čŞă§ăăžăă
> cat out.csv name,age,children John,23.0,3 Sally,42.0,5 Kirk,59.0,2
ăăăăŤ
ăă¨ăă°ăă¸ăĽăŞă˘ăRă¨ĺăăăăä¸čŹçăŞăăă°ăŠăăłă°č¨čŞăŤăŞăăăŠăăăäşć¸Źăăăă¨ăŻĺ°éŁă§ăăăäťĺš´ăŻăă§ăŤćăćĽéăŤćéˇăăŚăăăăă°ăŠăăłă°č¨čŞăŤăŞăăžăăă ć¨ĺš´ăäťĺš´ăăăźă¸ă§ăł1.0ăŽăŞăŞăźăšă¨ăŠă¤ăăŠăŞé˘ć°ăŽĺŽĺŽĺăŽĺžăăăĺ°ć°ăŽäşşăăăăăçĽăăŞăăŁăĺ ´ĺă彟ăăŻăăăŤă¤ăăŚć¸ăĺ§ăăžăăă ăăăŚăăăźăżăŽĺćăŤă¸ăĽăŞă˘ă使ç¨ăĺ§ăăŞăăŁăäźćĽăŻăăăćŠćăŞĺĺŤăŤĺăŁăŚäťŁăăăăĺŽĺ
¨ăŞćçŤăŤăŞăă§ăăăă
ă¸ăĽăŞă˘ăŻčĽăăăă°ăŠăăłă°č¨čŞă§ăă ĺŽéăăă¤ăăăăăă¸ă§ăŻăăŽçťĺ ´ĺžăă¸ăĽăŞă˘ăŽă¤ăłăăŠăšăăŠăŻăăŁăĺŽéăŽçŁćĽĺŠç¨ăŤăŠăă ăćşĺăă§ăăŚăăăăćăăăŤăŞăăžăă ă¸ăĽăŞă˘ăŽéçşč
ăŻé常ăŤéĺżçă§ăćşĺăă§ăăŚăăžăă ăăăăŤăăăă¸ăĽăŞă˘ăŽăˇăłăăŤă ăĺłć źăŞć§ćăŻăäťăŽĺŚçżăŤă¨ăŁăŚé常ăŤé
ĺçăŞăăă°ăŠăăłă°č¨čŞă§ăă éŤăăăăŠăźăăłăšăŤăăăćč˛çŽçă ăă§ăŞăăăăźăżĺćă§ăŽĺŽéăŽä˝żç¨ăŤăéŠăăă˘ăŤă´ăŞăşă ăĺŽčŁ
ă§ăăžăă ăăžăăžăŞăăă¸ă§ăŻăă§ä¸č˛ŤăăŚJuliaă芌ăĺ§ăăžăă