Overview

Dataset statistics

Number of variables5
Number of observations1107933
Missing cells0
Missing cells (%)0.0%
Total size in memory42.3 MiB
Average record size in memory40.0 B

Variable types

Numeric3
Text2

Alerts

num_group1 has 150732 (13.6%) zerosZeros

Reproduction

Analysis started2024-02-13 19:58:08.823892
Analysis finished2024-02-13 19:58:11.071055
Duration2.25 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct150732
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1469876.122
Minimum49435
Maximum2703452
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.5 MiB
2024-02-13T20:58:11.243058image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum49435
5-th percentile229480
Q1997668
median1854645
Q31907416
95-th percentile2686146
Maximum2703452
Range2654017
Interquartile range (IQR)909748

Descriptive statistics

Standard deviation705344.7771
Coefficient of variation (CV)0.4798668178
Kurtosis-0.6728545046
Mean1469876.122
Median Absolute Deviation (MAD)84248
Skewness-0.5004240807
Sum1.628524261 × 1012
Variance4.975112545 × 1011
MonotonicityIncreasing
2024-02-13T20:58:11.432092image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1869915 101
 
< 0.1%
2681835 83
 
< 0.1%
1917900 70
 
< 0.1%
229270 65
 
< 0.1%
1863796 64
 
< 0.1%
1009026 60
 
< 0.1%
1861451 59
 
< 0.1%
1853166 58
 
< 0.1%
990891 56
 
< 0.1%
242302 54
 
< 0.1%
Other values (150722) 1107263
99.9%
ValueCountFrequency (%)
49435 11
< 0.1%
49490 6
< 0.1%
49526 2
 
< 0.1%
49563 11
< 0.1%
49576 11
< 0.1%
ValueCountFrequency (%)
2703452 6
< 0.1%
2703449 8
< 0.1%
2703448 6
< 0.1%
2703445 5
< 0.1%
2703443 6
< 0.1%

amount_4917619A
Real number (ℝ)

Distinct191635
Distinct (%)17.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20104.96572
Minimum0
Maximum344250
Zeros32
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size8.5 MiB
2024-02-13T20:58:11.608088image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1735
Q16885
median13130.2
Q324300
95-th percentile58140.5612
Maximum344250
Range344250
Interquartile range (IQR)17415

Descriptive statistics

Standard deviation25201.74513
Coefficient of variation (CV)1.253508485
Kurtosis43.42465887
Mean20104.96572
Median Absolute Deviation (MAD)7124.3997
Skewness5.169141853
Sum2.227495499 × 1010
Variance635127957.8
MonotonicityNot monotonic
2024-02-13T20:58:11.771089image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6885 57533
 
5.2%
8100 16295
 
1.5%
644.2 9243
 
0.8%
16200 8442
 
0.8%
9720 7070
 
0.6%
7290 5882
 
0.5%
1288.4 5426
 
0.5%
11340 4669
 
0.4%
12960 4209
 
0.4%
24300 3908
 
0.4%
Other values (191625) 985256
88.9%
ValueCountFrequency (%)
0 32
 
< 0.1%
0.2 23
 
< 0.1%
0.4 23
 
< 0.1%
0.6 8
 
< 0.1%
0.8 88
< 0.1%
ValueCountFrequency (%)
344250 755
0.1%
344248.4 3
 
< 0.1%
344162 1
 
< 0.1%
344036.22 1
 
< 0.1%
344018.4 1
 
< 0.1%
Distinct260
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.5 MiB
2024-02-13T20:58:12.195376image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters11079330
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019-10-16
2nd row2019-10-16
3rd row2019-10-16
4th row2019-10-16
5th row2019-10-16
ValueCountFrequency (%)
2020-04-03 20426
 
1.8%
2020-04-09 19943
 
1.8%
2020-05-06 16006
 
1.4%
2020-06-04 15974
 
1.4%
2020-06-05 15145
 
1.4%
2020-06-08 14735
 
1.3%
2020-05-05 14713
 
1.3%
2020-04-02 14533
 
1.3%
2020-05-08 13750
 
1.2%
2020-05-11 13408
 
1.2%
Other values (250) 949300
85.7%
2024-02-13T20:58:12.741326image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3855766
34.8%
2 2612860
23.6%
- 2215866
20.0%
1 618372
 
5.6%
3 321552
 
2.9%
4 281439
 
2.5%
6 270104
 
2.4%
5 254847
 
2.3%
7 241968
 
2.2%
9 214568
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8863464
80.0%
Dash Punctuation 2215866
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3855766
43.5%
2 2612860
29.5%
1 618372
 
7.0%
3 321552
 
3.6%
4 281439
 
3.2%
6 270104
 
3.0%
5 254847
 
2.9%
7 241968
 
2.7%
9 214568
 
2.4%
8 191988
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 2215866
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11079330
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3855766
34.8%
2 2612860
23.6%
- 2215866
20.0%
1 618372
 
5.6%
3 321552
 
2.9%
4 281439
 
2.5%
6 270104
 
2.4%
5 254847
 
2.3%
7 241968
 
2.2%
9 214568
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11079330
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3855766
34.8%
2 2612860
23.6%
- 2215866
20.0%
1 618372
 
5.6%
3 321552
 
2.9%
4 281439
 
2.5%
6 270104
 
2.4%
5 254847
 
2.3%
7 241968
 
2.2%
9 214568
 
1.9%
Distinct55857
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Memory size8.5 MiB
2024-02-13T20:58:13.160300image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.056925825
Min length8

Characters and Unicode

Total characters8926534
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6154 ?
Unique (%)0.6%

Sample

1st row6b730375
2nd row6b730375
3rd row6b730375
4th row6b730375
5th row6b730375
ValueCountFrequency (%)
5e180ef0 85284
 
7.7%
p114_118_163 7205
 
0.7%
74ca9587 7153
 
0.6%
7444479d 5173
 
0.5%
3613fb71 4079
 
0.4%
a409d8fa 3529
 
0.3%
36a9355c 3499
 
0.3%
e304888c 3465
 
0.3%
cda1fd10 3278
 
0.3%
c75d2f47 3203
 
0.3%
Other values (55847) 982065
88.6%
2024-02-13T20:58:13.708429image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 675640
 
7.6%
0 673508
 
7.5%
e 664650
 
7.4%
5 605680
 
6.8%
8 599952
 
6.7%
f 562940
 
6.3%
7 527523
 
5.9%
d 524884
 
5.9%
4 523107
 
5.9%
3 518634
 
5.8%
Other values (8) 3050016
34.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5632463
63.1%
Lowercase Letter 3232751
36.2%
Connector Punctuation 40880
 
0.5%
Uppercase Letter 20440
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 675640
12.0%
0 673508
12.0%
5 605680
10.8%
8 599952
10.7%
7 527523
9.4%
4 523107
9.3%
3 518634
9.2%
9 512812
9.1%
6 502308
8.9%
2 493299
8.8%
Lowercase Letter
ValueCountFrequency (%)
e 664650
20.6%
f 562940
17.4%
d 524884
16.2%
c 499668
15.5%
a 491977
15.2%
b 488632
15.1%
Connector Punctuation
ValueCountFrequency (%)
_ 40880
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 20440
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5673343
63.6%
Latin 3253191
36.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1 675640
11.9%
0 673508
11.9%
5 605680
10.7%
8 599952
10.6%
7 527523
9.3%
4 523107
9.2%
3 518634
9.1%
9 512812
9.0%
6 502308
8.9%
2 493299
8.7%
Latin
ValueCountFrequency (%)
e 664650
20.4%
f 562940
17.3%
d 524884
16.1%
c 499668
15.4%
a 491977
15.1%
b 488632
15.0%
P 20440
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8926534
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 675640
 
7.6%
0 673508
 
7.5%
e 664650
 
7.4%
5 605680
 
6.8%
8 599952
 
6.7%
f 562940
 
6.3%
7 527523
 
5.9%
d 524884
 
5.9%
4 523107
 
5.9%
3 518634
 
5.8%
Other values (8) 3050016
34.2%

num_group1
Real number (ℝ)

ZEROS 

Distinct101
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.144719942
Minimum0
Maximum100
Zeros150732
Zeros (%)13.6%
Negative0
Negative (%)0.0%
Memory size8.5 MiB
2024-02-13T20:58:13.884105image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile12
Maximum100
Range100
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.108048495
Coefficient of variation (CV)0.9911522495
Kurtosis16.79635319
Mean4.144719942
Median Absolute Deviation (MAD)2
Skewness2.609646326
Sum4592072
Variance16.87606243
MonotonicityNot monotonic
2024-02-13T20:58:14.051487image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 150732
13.6%
1 149210
13.5%
2 146224
13.2%
3 141814
12.8%
4 135063
12.2%
5 116602
10.5%
6 59418
 
5.4%
7 42568
 
3.8%
8 34229
 
3.1%
9 28762
 
2.6%
Other values (91) 103311
9.3%
ValueCountFrequency (%)
0 150732
13.6%
1 149210
13.5%
2 146224
13.2%
3 141814
12.8%
4 135063
12.2%
ValueCountFrequency (%)
100 1
< 0.1%
99 1
< 0.1%
98 1
< 0.1%
97 1
< 0.1%
96 1
< 0.1%