Overview

Dataset statistics

Number of variables41
Number of observations2638295
Missing cells34503185
Missing cells (%)31.9%
Total size in memory825.3 MiB
Average record size in memory328.0 B

Variable types

Numeric20
Text19
Boolean2

Alerts

isbidproduct_390L is highly imbalanced (69.3%)Imbalance
annuity_853A has 94885 (3.6%) missing valuesMissing
approvaldate_319D has 1244273 (47.2%) missing valuesMissing
byoccupationinc_3656910L has 2095507 (79.4%) missing valuesMissing
childnum_21L has 1605531 (60.9%) missing valuesMissing
credacc_actualbalance_314A has 2486439 (94.2%) missing valuesMissing
credacc_credlmt_575A has 75062 (2.8%) missing valuesMissing
credacc_maxhisbal_375A has 2486439 (94.2%) missing valuesMissing
credacc_minhisbal_90A has 2486439 (94.2%) missing valuesMissing
credacc_status_367L has 2486439 (94.2%) missing valuesMissing
credacc_transactions_402L has 2486439 (94.2%) missing valuesMissing
credamount_590A has 78869 (3.0%) missing valuesMissing
credtype_587L has 78869 (3.0%) missing valuesMissing
currdebt_94A has 976135 (37.0%) missing valuesMissing
dateactivated_425D has 1297051 (49.2%) missing valuesMissing
downpmt_134A has 78869 (3.0%) missing valuesMissing
dtlastpmt_581D has 1890009 (71.6%) missing valuesMissing
dtlastpmtallstes_3545839D has 1609466 (61.0%) missing valuesMissing
employedfrom_700D has 1705609 (64.6%) missing valuesMissing
familystate_726L has 1148691 (43.5%) missing valuesMissing
firstnonzeroinstldate_307D has 287307 (10.9%) missing valuesMissing
inittransactioncode_279L has 78869 (3.0%) missing valuesMissing
isdebitcard_527L has 2425416 (91.9%) missing valuesMissing
mainoccupationinc_437A has 65371 (2.5%) missing valuesMissing
maxdpdtolerance_577P has 1278326 (48.5%) missing valuesMissing
outstandingdebt_522A has 980346 (37.2%) missing valuesMissing
pmtnum_8L has 238987 (9.1%) missing valuesMissing
revolvingaccount_394A has 2498196 (94.7%) missing valuesMissing
tenor_203L has 238987 (9.1%) missing valuesMissing
actualdpd_943P is highly skewed (γ1 = 530.2918463)Skewed
credacc_maxhisbal_375A is highly skewed (γ1 = 43.42768751)Skewed
downpmt_134A is highly skewed (γ1 = 21.07123108)Skewed
actualdpd_943P has 2636655 (99.9%) zerosZeros
annuity_853A has 200456 (7.6%) zerosZeros
byoccupationinc_3656910L has 34448 (1.3%) zerosZeros
childnum_21L has 574109 (21.8%) zerosZeros
credacc_actualbalance_314A has 47706 (1.8%) zerosZeros
credacc_credlmt_575A has 2369092 (89.8%) zerosZeros
credacc_maxhisbal_375A has 84335 (3.2%) zerosZeros
credacc_minhisbal_90A has 89232 (3.4%) zerosZeros
credacc_transactions_402L has 134552 (5.1%) zerosZeros
credamount_590A has 73846 (2.8%) zerosZeros
currdebt_94A has 1441963 (54.7%) zerosZeros
downpmt_134A has 2348164 (89.0%) zerosZeros
maxdpdtolerance_577P has 986233 (37.4%) zerosZeros
num_group1 has 438525 (16.6%) zerosZeros
outstandingdebt_522A has 1436446 (54.4%) zerosZeros

Reproduction

Analysis started2024-02-13 19:36:58.685528
Analysis finished2024-02-13 19:37:23.604870
Duration24.92 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct438525
Distinct (%)16.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1482078.441
Minimum40704
Maximum2703454
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:23.748873image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum40704
5-th percentile198771
Q1257431
median1788760
Q31895740
95-th percentile2683686.3
Maximum2703454
Range2662750
Interquartile range (IQR)1638309

Descriptive statistics

Standard deviation822852.6011
Coefficient of variation (CV)0.5552017887
Kurtosis-0.9550918823
Mean1482078.441
Median Absolute Deviation (MAD)129385
Skewness-0.5094453682
Sum3.910160139 × 1012
Variance6.770864032 × 1011
MonotonicityIncreasing
2024-02-13T20:37:23.913262image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2660368 20
 
< 0.1%
229396 20
 
< 0.1%
229411 20
 
< 0.1%
195358 20
 
< 0.1%
250069 20
 
< 0.1%
1809155 20
 
< 0.1%
1906631 20
 
< 0.1%
250081 20
 
< 0.1%
1715312 20
 
< 0.1%
250087 20
 
< 0.1%
Other values (438515) 2638095
> 99.9%
ValueCountFrequency (%)
40704 1
 
< 0.1%
40734 1
 
< 0.1%
40737 1
 
< 0.1%
40791 3
< 0.1%
40821 2
< 0.1%
ValueCountFrequency (%)
2703454 2
 
< 0.1%
2703453 9
< 0.1%
2703452 3
 
< 0.1%
2703451 6
< 0.1%
2703450 13
< 0.1%

actualdpd_943P
Real number (ℝ)

SKEWED  ZEROS 

Distinct157
Distinct (%)< 0.1%
Missing266
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.02098536445
Minimum0
Maximum4206
Zeros2636655
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:24.067133image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4206
Range4206
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5.707747487
Coefficient of variation (CV)271.9870555
Kurtosis320201.8546
Mean0.02098536445
Median Absolute Deviation (MAD)0
Skewness530.2918463
Sum55360
Variance32.57838137
MonotonicityNot monotonic
2024-02-13T20:37:24.218022image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2636655
99.9%
1 520
 
< 0.1%
2 220
 
< 0.1%
3 153
 
< 0.1%
4 53
 
< 0.1%
6 38
 
< 0.1%
5 32
 
< 0.1%
7 21
 
< 0.1%
8 20
 
< 0.1%
9 14
 
< 0.1%
Other values (147) 303
 
< 0.1%
(Missing) 266
 
< 0.1%
ValueCountFrequency (%)
0 2636655
99.9%
1 520
 
< 0.1%
2 220
 
< 0.1%
3 153
 
< 0.1%
4 53
 
< 0.1%
ValueCountFrequency (%)
4206 1
< 0.1%
3980 1
< 0.1%
3623 1
< 0.1%
2617 1
< 0.1%
2505 1
< 0.1%

annuity_853A
Real number (ℝ)

MISSING  ZEROS 

Distinct76848
Distinct (%)3.0%
Missing94885
Missing (%)3.6%
Infinite0
Infinite (%)0.0%
Mean3502.854599
Minimum0
Maximum103000
Zeros200456
Zeros (%)7.6%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:24.390597image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11711.6
median2825.8
Q34595.6
95-th percentile8829.601
Maximum103000
Range103000
Interquartile range (IQR)2884

Descriptive statistics

Standard deviation2963.25686
Coefficient of variation (CV)0.8459548564
Kurtosis28.49244386
Mean3502.854599
Median Absolute Deviation (MAD)1324.8
Skewness3.117039065
Sum8909195416
Variance8780891.216
MonotonicityNot monotonic
2024-02-13T20:37:24.551560image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 200456
 
7.6%
1580 2372
 
0.1%
1508 1974
 
0.1%
2716 1633
 
0.1%
3820 1096
 
< 0.1%
2000 1083
 
< 0.1%
2103 1015
 
< 0.1%
2558.4001 1001
 
< 0.1%
3837.4001 997
 
< 0.1%
1668 990
 
< 0.1%
Other values (76838) 2330793
88.3%
(Missing) 94885
 
3.6%
ValueCountFrequency (%)
0 200456
7.6%
2 1
 
< 0.1%
2.2 1
 
< 0.1%
2.4 1
 
< 0.1%
2.6000001 1
 
< 0.1%
ValueCountFrequency (%)
103000 1
< 0.1%
99646.6 2
< 0.1%
96987.9 1
< 0.1%
95685.2 1
< 0.1%
94012.2 1
< 0.1%

approvaldate_319D
Text

MISSING 

Distinct5398
Distinct (%)0.4%
Missing1244273
Missing (%)47.2%
Memory size20.1 MiB
2024-02-13T20:37:24.927151image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters13940220
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st row2019-10-28
2nd row2019-09-13
3rd row2019-10-09
4th row2019-12-01
5th row2019-10-27
ValueCountFrequency (%)
2019-12-14 1795
 
0.1%
2019-12-13 1717
 
0.1%
2019-09-21 1506
 
0.1%
2019-08-30 1503
 
0.1%
2018-12-07 1474
 
0.1%
2019-12-27 1454
 
0.1%
2019-09-20 1449
 
0.1%
2019-11-30 1432
 
0.1%
2019-11-29 1419
 
0.1%
2019-06-28 1403
 
0.1%
Other values (5388) 1378870
98.9%
2024-02-13T20:37:25.469364image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3265444
23.4%
- 2788044
20.0%
1 2513445
18.0%
2 2385626
17.1%
9 614752
 
4.4%
8 543699
 
3.9%
7 448386
 
3.2%
3 393048
 
2.8%
6 367360
 
2.6%
5 313079
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11152176
80.0%
Dash Punctuation 2788044
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3265444
29.3%
1 2513445
22.5%
2 2385626
21.4%
9 614752
 
5.5%
8 543699
 
4.9%
7 448386
 
4.0%
3 393048
 
3.5%
6 367360
 
3.3%
5 313079
 
2.8%
4 307337
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 2788044
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13940220
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3265444
23.4%
- 2788044
20.0%
1 2513445
18.0%
2 2385626
17.1%
9 614752
 
4.4%
8 543699
 
3.9%
7 448386
 
3.2%
3 393048
 
2.8%
6 367360
 
2.6%
5 313079
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13940220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3265444
23.4%
- 2788044
20.0%
1 2513445
18.0%
2 2385626
17.1%
9 614752
 
4.4%
8 543699
 
3.9%
7 448386
 
3.2%
3 393048
 
2.8%
6 367360
 
2.6%
5 313079
 
2.2%

byoccupationinc_3656910L
Real number (ℝ)

MISSING  ZEROS 

Distinct17995
Distinct (%)3.3%
Missing2095507
Missing (%)79.4%
Infinite0
Infinite (%)0.0%
Mean20411.15626
Minimum0
Maximum200000
Zeros34448
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:25.640101image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median10000
Q330000
95-th percentile75000
Maximum200000
Range200000
Interquartile range (IQR)29999

Descriptive statistics

Standard deviation30931.99075
Coefficient of variation (CV)1.515445296
Kurtosis10.36416476
Mean20411.15626
Median Absolute Deviation (MAD)9999
Skewness2.743120942
Sum1.107893068 × 1010
Variance956788051.9
MonotonicityNot monotonic
2024-02-13T20:37:25.809130image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 224806
 
8.5%
0 34448
 
1.3%
15000 27344
 
1.0%
20000 23247
 
0.9%
30000 21042
 
0.8%
25000 19112
 
0.7%
50000 17923
 
0.7%
10000 13261
 
0.5%
35000 10526
 
0.4%
40000 10433
 
0.4%
Other values (17985) 140646
 
5.3%
(Missing) 2095507
79.4%
ValueCountFrequency (%)
0 34448
 
1.3%
1 224806
8.5%
2 3
 
< 0.1%
3 1
 
< 0.1%
4 2
 
< 0.1%
ValueCountFrequency (%)
200000 3856
0.1%
199000 13
 
< 0.1%
198000 12
 
< 0.1%
197000 6
 
< 0.1%
196300 3
 
< 0.1%
Distinct73
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:26.029651image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.961243909
Min length8

Characters and Unicode

Total characters23642405
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowP94_109_143
2nd rowP94_109_143
3rd rowa55475b1
4th rowP94_109_143
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 1723449
65.3%
p94_109_143 654286
 
24.8%
p30_86_84 48402
 
1.8%
p180_60_137 31915
 
1.2%
p198_89_166 26012
 
1.0%
p73_130_169 25691
 
1.0%
p85_114_140 23658
 
0.9%
p52_67_90 18020
 
0.7%
p24_27_36 14627
 
0.6%
p69_72_116 12954
 
0.5%
Other values (63) 59281
 
2.2%
2024-02-13T20:37:26.443225image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 5244804
22.2%
1 3440737
14.6%
4 3170300
13.4%
7 1838516
 
7.8%
_ 1829692
 
7.7%
a 1723449
 
7.3%
b 1723449
 
7.3%
9 1447957
 
6.1%
P 914846
 
3.9%
0 873198
 
3.7%
Other values (4) 1435457
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 17450969
73.8%
Lowercase Letter 3446898
 
14.6%
Connector Punctuation 1829692
 
7.7%
Uppercase Letter 914846
 
3.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 5244804
30.1%
1 3440737
19.7%
4 3170300
18.2%
7 1838516
 
10.5%
9 1447957
 
8.3%
0 873198
 
5.0%
3 840337
 
4.8%
6 265561
 
1.5%
8 233431
 
1.3%
2 96128
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
a 1723449
50.0%
b 1723449
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1829692
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 914846
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19280661
81.6%
Latin 4361744
 
18.4%

Most frequent character per script

Common
ValueCountFrequency (%)
5 5244804
27.2%
1 3440737
17.8%
4 3170300
16.4%
7 1838516
 
9.5%
_ 1829692
 
9.5%
9 1447957
 
7.5%
0 873198
 
4.5%
3 840337
 
4.4%
6 265561
 
1.4%
8 233431
 
1.2%
Latin
ValueCountFrequency (%)
a 1723449
39.5%
b 1723449
39.5%
P 914846
21.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23642405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 5244804
22.2%
1 3440737
14.6%
4 3170300
13.4%
7 1838516
 
7.8%
_ 1829692
 
7.7%
a 1723449
 
7.3%
b 1723449
 
7.3%
9 1447957
 
6.1%
P 914846
 
3.9%
0 873198
 
3.7%
Other values (4) 1435457
 
6.1%

childnum_21L
Real number (ℝ)

MISSING  ZEROS 

Distinct19
Distinct (%)< 0.1%
Missing1605531
Missing (%)60.9%
Infinite0
Infinite (%)0.0%
Mean0.8416530785
Minimum0
Maximum20
Zeros574109
Zeros (%)21.8%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:26.591038image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum20
Range20
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.238723149
Coefficient of variation (CV)1.471774037
Kurtosis6.280759591
Mean0.8416530785
Median Absolute Deviation (MAD)0
Skewness2.047090682
Sum869229
Variance1.534435041
MonotonicityNot monotonic
2024-02-13T20:37:26.728994image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
0 574109
 
21.8%
1 220608
 
8.4%
2 142948
 
5.4%
3 52869
 
2.0%
4 22176
 
0.8%
5 11446
 
0.4%
6 5079
 
0.2%
7 1959
 
0.1%
8 813
 
< 0.1%
9 406
 
< 0.1%
Other values (9) 351
 
< 0.1%
(Missing) 1605531
60.9%
ValueCountFrequency (%)
0 574109
21.8%
1 220608
 
8.4%
2 142948
 
5.4%
3 52869
 
2.0%
4 22176
 
0.8%
ValueCountFrequency (%)
20 10
< 0.1%
17 1
 
< 0.1%
16 1
 
< 0.1%
15 8
< 0.1%
14 7
< 0.1%
Distinct5402
Distinct (%)0.2%
Missing31
Missing (%)< 0.1%
Memory size20.1 MiB
2024-02-13T20:37:27.170104image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters26382640
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row2018-11-20
2nd row2019-12-26
3rd row2014-07-17
4th row2017-08-21
5th row2014-12-28
ValueCountFrequency (%)
2019-12-13 3083
 
0.1%
2019-12-27 2862
 
0.1%
2019-12-14 2856
 
0.1%
2020-01-01 2836
 
0.1%
2019-12-02 2740
 
0.1%
2019-08-30 2716
 
0.1%
2019-09-30 2693
 
0.1%
2019-09-27 2674
 
0.1%
2019-11-29 2672
 
0.1%
2020-01-10 2637
 
0.1%
Other values (5392) 2610495
98.9%
2024-02-13T20:37:27.769019image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 6241503
23.7%
- 5276528
20.0%
1 4657136
17.7%
2 4569753
17.3%
9 1160245
 
4.4%
8 996039
 
3.8%
7 821862
 
3.1%
3 749668
 
2.8%
6 685941
 
2.6%
4 617725
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 21106112
80.0%
Dash Punctuation 5276528
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 6241503
29.6%
1 4657136
22.1%
2 4569753
21.7%
9 1160245
 
5.5%
8 996039
 
4.7%
7 821862
 
3.9%
3 749668
 
3.6%
6 685941
 
3.2%
4 617725
 
2.9%
5 606240
 
2.9%
Dash Punctuation
ValueCountFrequency (%)
- 5276528
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 26382640
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 6241503
23.7%
- 5276528
20.0%
1 4657136
17.7%
2 4569753
17.3%
9 1160245
 
4.4%
8 996039
 
3.8%
7 821862
 
3.1%
3 749668
 
2.8%
6 685941
 
2.6%
4 617725
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26382640
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 6241503
23.7%
- 5276528
20.0%
1 4657136
17.7%
2 4569753
17.3%
9 1160245
 
4.4%
8 996039
 
3.8%
7 821862
 
3.1%
3 749668
 
2.8%
6 685941
 
2.6%
4 617725
 
2.3%

credacc_actualbalance_314A
Real number (ℝ)

MISSING  ZEROS 

Distinct48131
Distinct (%)31.7%
Missing2486439
Missing (%)94.2%
Infinite0
Infinite (%)0.0%
Mean16055.30345
Minimum-134008.42
Maximum1600000
Zeros47706
Zeros (%)1.8%
Negative526
Negative (%)< 0.1%
Memory size20.1 MiB
2024-02-13T20:37:27.937207image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-134008.42
5-th percentile0
Q10
median182
Q323136
95-th percentile75998
Maximum1600000
Range1734008.42
Interquartile range (IQR)23136

Descriptive statistics

Standard deviation27948.64962
Coefficient of variation (CV)1.740773677
Kurtosis88.97889429
Mean16055.30345
Median Absolute Deviation (MAD)182
Skewness4.150777153
Sum2438094161
Variance781127015.3
MonotonicityNot monotonic
2024-02-13T20:37:28.091162image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 47706
 
1.8%
100000 2563
 
0.1%
2 785
 
< 0.1%
0.2 509
 
< 0.1%
190 462
 
< 0.1%
10 462
 
< 0.1%
4 450
 
< 0.1%
42640 437
 
< 0.1%
12000 435
 
< 0.1%
20300 427
 
< 0.1%
Other values (48121) 97620
 
3.7%
(Missing) 2486439
94.2%
ValueCountFrequency (%)
-134008.42 1
< 0.1%
-99800 1
< 0.1%
-94996 1
< 0.1%
-83473.94 1
< 0.1%
-70909.414 1
< 0.1%
ValueCountFrequency (%)
1600000 1
< 0.1%
952181.6 1
< 0.1%
459241.6 1
< 0.1%
419952.4 1
< 0.1%
400000.28 2
< 0.1%

credacc_credlmt_575A
Real number (ℝ)

MISSING  ZEROS 

Distinct30376
Distinct (%)1.2%
Missing75062
Missing (%)2.8%
Infinite0
Infinite (%)0.0%
Mean3608.607928
Minimum0
Maximum400000
Zeros2369092
Zeros (%)89.8%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:28.247223image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile24000
Maximum400000
Range400000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation16507.4851
Coefficient of variation (CV)4.574474543
Kurtosis68.64487772
Mean3608.607928
Median Absolute Deviation (MAD)0
Skewness6.800101576
Sum9249702926
Variance272497064.5
MonotonicityNot monotonic
2024-02-13T20:37:28.408766image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2369092
89.8%
100000 21455
 
0.8%
12000 6982
 
0.3%
40000 4799
 
0.2%
20000 4736
 
0.2%
60000 3721
 
0.1%
150000 2546
 
0.1%
30000 2469
 
0.1%
10000 2161
 
0.1%
24000 1817
 
0.1%
Other values (30366) 143455
 
5.4%
(Missing) 75062
 
2.8%
ValueCountFrequency (%)
0 2369092
89.8%
0.2 773
 
< 0.1%
0.8 4
 
< 0.1%
1.2 2
 
< 0.1%
3 2
 
< 0.1%
ValueCountFrequency (%)
400000 195
< 0.1%
394600 2
 
< 0.1%
391400 2
 
< 0.1%
382200 2
 
< 0.1%
377400 1
 
< 0.1%

credacc_maxhisbal_375A
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct39326
Distinct (%)25.9%
Missing2486439
Missing (%)94.2%
Infinite0
Infinite (%)0.0%
Mean-1878.262158
Minimum-290265.1
Maximum3800000
Zeros84335
Zeros (%)3.2%
Negative21622
Negative (%)0.8%
Memory size20.1 MiB
2024-02-13T20:37:28.590323image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-290265.1
5-th percentile-29680.941
Q10
median0
Q37.651499925
95-th percentile3544.169
Maximum3800000
Range4090265.1
Interquartile range (IQR)7.651499925

Descriptive statistics

Standard deviation29606.77848
Coefficient of variation (CV)-15.76285736
Kurtosis4283.776058
Mean-1878.262158
Median Absolute Deviation (MAD)0
Skewness43.42768751
Sum-285225378.2
Variance876561331.7
MonotonicityNot monotonic
2024-02-13T20:37:28.770790image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 84335
 
3.2%
2 1352
 
0.1%
4 683
 
< 0.1%
10 444
 
< 0.1%
190 435
 
< 0.1%
22 374
 
< 0.1%
6 367
 
< 0.1%
180 328
 
< 0.1%
90 307
 
< 0.1%
80 282
 
< 0.1%
Other values (39316) 62949
 
2.4%
(Missing) 2486439
94.2%
ValueCountFrequency (%)
-290265.1 1
< 0.1%
-199950 1
< 0.1%
-198762 1
< 0.1%
-197850.3 1
< 0.1%
-196450 1
< 0.1%
ValueCountFrequency (%)
3800000 1
< 0.1%
3640000 1
< 0.1%
2400200 1
< 0.1%
2000000 2
< 0.1%
1999990 1
< 0.1%

credacc_minhisbal_90A
Real number (ℝ)

MISSING  ZEROS 

Distinct37539
Distinct (%)24.7%
Missing2486439
Missing (%)94.2%
Infinite0
Infinite (%)0.0%
Mean-5670.541866
Minimum-350532.6
Maximum239000
Zeros89232
Zeros (%)3.4%
Negative29161
Negative (%)1.1%
Memory size20.1 MiB
2024-02-13T20:37:28.931405image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-350532.6
5-th percentile-39975
Q10
median0
Q30
95-th percentile157.7795
Maximum239000
Range589532.6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation18143.37868
Coefficient of variation (CV)-3.199584644
Kurtosis27.84215621
Mean-5670.541866
Median Absolute Deviation (MAD)0
Skewness-4.501135816
Sum-861105805.7
Variance329182189.9
MonotonicityNot monotonic
2024-02-13T20:37:29.079782image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 89232
 
3.4%
2 998
 
< 0.1%
4 564
 
< 0.1%
10 444
 
< 0.1%
190 430
 
< 0.1%
-10 366
 
< 0.1%
6 325
 
< 0.1%
180 324
 
< 0.1%
90 310
 
< 0.1%
80 269
 
< 0.1%
Other values (37529) 58594
 
2.2%
(Missing) 2486439
94.2%
ValueCountFrequency (%)
-350532.6 1
< 0.1%
-319998.6 1
< 0.1%
-319856 1
< 0.1%
-309628.03 1
< 0.1%
-299717.5 1
< 0.1%
ValueCountFrequency (%)
239000 1
< 0.1%
120000 1
< 0.1%
101840 1
< 0.1%
100000 2
< 0.1%
99990 1
< 0.1%

credacc_status_367L
Text

MISSING 

Distinct6
Distinct (%)< 0.1%
Missing2486439
Missing (%)94.2%
Memory size20.1 MiB
2024-02-13T20:37:29.201759image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length3
Median length2
Mean length2.00603203
Min length2

Characters and Unicode

Total characters304628
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAC
2nd rowAC
3rd rowAC
4th rowAC
5th rowAC
ValueCountFrequency (%)
ac 100509
66.2%
cl 43060
28.4%
ca 7098
 
4.7%
pcl 916
 
0.6%
po 249
 
0.2%
cr 24
 
< 0.1%
2024-02-13T20:37:29.450720image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 151607
49.8%
A 107607
35.3%
L 43976
 
14.4%
P 1165
 
0.4%
O 249
 
0.1%
R 24
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 304628
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 151607
49.8%
A 107607
35.3%
L 43976
 
14.4%
P 1165
 
0.4%
O 249
 
0.1%
R 24
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 304628
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 151607
49.8%
A 107607
35.3%
L 43976
 
14.4%
P 1165
 
0.4%
O 249
 
0.1%
R 24
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 304628
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 151607
49.8%
A 107607
35.3%
L 43976
 
14.4%
P 1165
 
0.4%
O 249
 
0.1%
R 24
 
< 0.1%

credacc_transactions_402L
Real number (ℝ)

MISSING  ZEROS 

Distinct93
Distinct (%)0.1%
Missing2486439
Missing (%)94.2%
Infinite0
Infinite (%)0.0%
Mean0.5760128016
Minimum0
Maximum147
Zeros134552
Zeros (%)5.1%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:29.604425image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum147
Range147
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.226973612
Coefficient of variation (CV)5.602260233
Kurtosis273.1004518
Mean0.5760128016
Median Absolute Deviation (MAD)0
Skewness13.22151442
Sum87471
Variance10.41335869
MonotonicityNot monotonic
2024-02-13T20:37:29.772423image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 134552
 
5.1%
1 6112
 
0.2%
2 2671
 
0.1%
3 2561
 
0.1%
4 1221
 
< 0.1%
5 851
 
< 0.1%
6 560
 
< 0.1%
7 473
 
< 0.1%
8 368
 
< 0.1%
9 290
 
< 0.1%
Other values (83) 2197
 
0.1%
(Missing) 2486439
94.2%
ValueCountFrequency (%)
0 134552
5.1%
1 6112
 
0.2%
2 2671
 
0.1%
3 2561
 
0.1%
4 1221
 
< 0.1%
ValueCountFrequency (%)
147 1
< 0.1%
135 1
< 0.1%
119 1
< 0.1%
118 1
< 0.1%
117 1
< 0.1%

credamount_590A
Real number (ℝ)

MISSING  ZEROS 

Distinct165672
Distinct (%)6.5%
Missing78869
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean42985.43599
Minimum0
Maximum1000000
Zeros73846
Zeros (%)2.8%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:29.941899image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4178
Q114236
median29978
Q359087.7005
95-th percentile127199.85
Maximum1000000
Range1000000
Interquartile range (IQR)44851.7005

Descriptive statistics

Standard deviation45796.84328
Coefficient of variation (CV)1.065403717
Kurtosis14.53568459
Mean42985.43599
Median Absolute Deviation (MAD)17980
Skewness2.97161691
Sum1.100180425 × 1011
Variance2097350854
MonotonicityNot monotonic
2024-02-13T20:37:30.100355image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000 137768
 
5.2%
60000 134159
 
5.1%
40000 127555
 
4.8%
20000 122036
 
4.6%
30000 98139
 
3.7%
0 73846
 
2.8%
50000 46978
 
1.8%
10000 38675
 
1.5%
200000 34783
 
1.3%
80000 34107
 
1.3%
Other values (165662) 1711380
64.9%
(Missing) 78869
 
3.0%
ValueCountFrequency (%)
0 73846
2.8%
0.2 410
 
< 0.1%
0.8 2
 
< 0.1%
1.2 1
 
< 0.1%
3 1
 
< 0.1%
ValueCountFrequency (%)
1000000 7
< 0.1%
950000 1
 
< 0.1%
900000 1
 
< 0.1%
800000 3
< 0.1%
700000 1
 
< 0.1%

credtype_587L
Text

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing78869
Missing (%)3.0%
Memory size20.1 MiB
2024-02-13T20:37:30.242540image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters7678278
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCAL
2nd rowCAL
3rd rowCAL
4th rowCOL
5th rowCOL
ValueCountFrequency (%)
col 1249129
48.8%
cal 1097416
42.9%
rel 212881
 
8.3%
2024-02-13T20:37:30.505801image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 2559426
33.3%
C 2346545
30.6%
O 1249129
16.3%
A 1097416
14.3%
R 212881
 
2.8%
E 212881
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 7678278
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 2559426
33.3%
C 2346545
30.6%
O 1249129
16.3%
A 1097416
14.3%
R 212881
 
2.8%
E 212881
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 7678278
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 2559426
33.3%
C 2346545
30.6%
O 1249129
16.3%
A 1097416
14.3%
R 212881
 
2.8%
E 212881
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7678278
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 2559426
33.3%
C 2346545
30.6%
O 1249129
16.3%
A 1097416
14.3%
R 212881
 
2.8%
E 212881
 
2.8%

currdebt_94A
Real number (ℝ)

MISSING  ZEROS 

Distinct201408
Distinct (%)12.1%
Missing976135
Missing (%)37.0%
Infinite0
Infinite (%)0.0%
Mean5301.262335
Minimum0
Maximum482980.84
Zeros1441963
Zeros (%)54.7%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:30.659835image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile35702.45575
Maximum482980.84
Range482980.84
Interquartile range (IQR)0

Descriptive statistics

Standard deviation20463.6821
Coefficient of variation (CV)3.860152696
Kurtosis45.25006491
Mean5301.262335
Median Absolute Deviation (MAD)0
Skewness5.880501606
Sum8811546203
Variance418762285
MonotonicityNot monotonic
2024-02-13T20:37:30.852392image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1441963
54.7%
100000 96
 
< 0.1%
9998 91
 
< 0.1%
11998 78
 
< 0.1%
17998 68
 
< 0.1%
7998 66
 
< 0.1%
19998 63
 
< 0.1%
5998 59
 
< 0.1%
60000 57
 
< 0.1%
13998 55
 
< 0.1%
Other values (201398) 219564
 
8.3%
(Missing) 976135
37.0%
ValueCountFrequency (%)
0 1441963
54.7%
0.002 1
 
< 0.1%
0.006 1
 
< 0.1%
0.020000001 1
 
< 0.1%
0.06600001 1
 
< 0.1%
ValueCountFrequency (%)
482980.84 1
< 0.1%
476617.3 1
< 0.1%
473946.3 1
< 0.1%
459136.4 1
< 0.1%
458601.6 1
< 0.1%

dateactivated_425D
Text

MISSING 

Distinct4211
Distinct (%)0.3%
Missing1297051
Missing (%)49.2%
Memory size20.1 MiB
2024-02-13T20:37:31.277184image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters13412440
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)< 0.1%

Sample

1st row2019-11-06
2nd row2019-09-19
3rd row2019-10-21
4th row2019-12-06
5th row2019-11-05
ValueCountFrequency (%)
2019-01-31 2342
 
0.2%
2019-10-23 2070
 
0.2%
2019-10-29 1955
 
0.1%
2019-10-21 1954
 
0.1%
2020-01-08 1937
 
0.1%
2019-12-11 1933
 
0.1%
2019-10-28 1892
 
0.1%
2020-01-02 1863
 
0.1%
2020-01-03 1850
 
0.1%
2019-11-01 1837
 
0.1%
Other values (4201) 1321611
98.5%
2024-02-13T20:37:32.062181image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3156125
23.5%
- 2682488
20.0%
1 2442990
18.2%
2 2273853
17.0%
9 579514
 
4.3%
8 527963
 
3.9%
7 429076
 
3.2%
3 364959
 
2.7%
6 355623
 
2.7%
5 301692
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10729952
80.0%
Dash Punctuation 2682488
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3156125
29.4%
1 2442990
22.8%
2 2273853
21.2%
9 579514
 
5.4%
8 527963
 
4.9%
7 429076
 
4.0%
3 364959
 
3.4%
6 355623
 
3.3%
5 301692
 
2.8%
4 298157
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 2682488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13412440
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3156125
23.5%
- 2682488
20.0%
1 2442990
18.2%
2 2273853
17.0%
9 579514
 
4.3%
8 527963
 
3.9%
7 429076
 
3.2%
3 364959
 
2.7%
6 355623
 
2.7%
5 301692
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13412440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3156125
23.5%
- 2682488
20.0%
1 2442990
18.2%
2 2273853
17.0%
9 579514
 
4.3%
8 527963
 
3.9%
7 429076
 
3.2%
3 364959
 
2.7%
6 355623
 
2.7%
5 301692
 
2.2%
Distinct1030
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:32.418266image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length10.29188358
Min length8

Characters and Unicode

Total characters27153025
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique134 ?
Unique (%)< 0.1%

Sample

1st rowP147_6_101
2nd rowP111_148_100
3rd rowa55475b1
4th rowP19_11_176
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 391791
 
14.9%
p131_33_167 101299
 
3.8%
p123_6_84 92997
 
3.5%
p197_47_166 70453
 
2.7%
p204_99_158 66400
 
2.5%
p98_137_111 53077
 
2.0%
p62_144_102 49284
 
1.9%
p159_143_123 48525
 
1.8%
p111_135_181 48495
 
1.8%
p147_21_170 47763
 
1.8%
Other values (1020) 1668211
63.2%
2024-02-13T20:37:32.948064image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 5833743
21.5%
_ 4493008
16.5%
5 2412231
8.9%
P 2246502
 
8.3%
7 2104439
 
7.8%
4 1805523
 
6.6%
6 1388964
 
5.1%
3 1362572
 
5.0%
2 1336975
 
4.9%
8 1244108
 
4.6%
Other values (8) 2924960
10.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19629925
72.3%
Connector Punctuation 4493008
 
16.5%
Uppercase Letter 2246504
 
8.3%
Lowercase Letter 783588
 
2.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 5833743
29.7%
5 2412231
12.3%
7 2104439
 
10.7%
4 1805523
 
9.2%
6 1388964
 
7.1%
3 1362572
 
6.9%
2 1336975
 
6.8%
8 1244108
 
6.3%
9 1231948
 
6.3%
0 909422
 
4.6%
Lowercase Letter
ValueCountFrequency (%)
a 391791
50.0%
b 391791
50.0%
t 2
 
< 0.1%
h 2
 
< 0.1%
e 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2246502
> 99.9%
Q 2
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 4493008
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 24122933
88.8%
Latin 3030092
 
11.2%

Most frequent character per script

Common
ValueCountFrequency (%)
1 5833743
24.2%
_ 4493008
18.6%
5 2412231
10.0%
7 2104439
 
8.7%
4 1805523
 
7.5%
6 1388964
 
5.8%
3 1362572
 
5.6%
2 1336975
 
5.5%
8 1244108
 
5.2%
9 1231948
 
5.1%
Latin
ValueCountFrequency (%)
P 2246502
74.1%
a 391791
 
12.9%
b 391791
 
12.9%
Q 2
 
< 0.1%
t 2
 
< 0.1%
h 2
 
< 0.1%
e 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27153025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 5833743
21.5%
_ 4493008
16.5%
5 2412231
8.9%
P 2246502
 
8.3%
7 2104439
 
7.8%
4 1805523
 
6.6%
6 1388964
 
5.1%
3 1362572
 
5.0%
2 1336975
 
4.9%
8 1244108
 
4.6%
Other values (8) 2924960
10.8%

downpmt_134A
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct13796
Distinct (%)0.5%
Missing78869
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean388.9884686
Minimum0
Maximum420400
Zeros2348164
Zeros (%)89.0%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:33.121574image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1778
Maximum420400
Range420400
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2614.634679
Coefficient of variation (CV)6.721625163
Kurtosis1085.921984
Mean388.9884686
Median Absolute Deviation (MAD)0
Skewness21.07123108
Sum995587200.2
Variance6836314.502
MonotonicityNot monotonic
2024-02-13T20:37:33.270576image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2348164
89.0%
2000 25129
 
1.0%
4000 16373
 
0.6%
1000 15180
 
0.6%
6000 8923
 
0.3%
10000 7809
 
0.3%
200 7538
 
0.3%
400 6910
 
0.3%
3000 6440
 
0.2%
8000 4928
 
0.2%
Other values (13786) 112032
 
4.2%
(Missing) 78869
 
3.0%
ValueCountFrequency (%)
0 2348164
89.0%
0.2 198
 
< 0.1%
0.4 27
 
< 0.1%
0.6 80
 
< 0.1%
0.8 17
 
< 0.1%
ValueCountFrequency (%)
420400 1
< 0.1%
320400 1
< 0.1%
275028 2
< 0.1%
230000 2
< 0.1%
222592.2 1
< 0.1%

dtlastpmt_581D
Text

MISSING 

Distinct2353
Distinct (%)0.3%
Missing1890009
Missing (%)71.6%
Memory size20.1 MiB
2024-02-13T20:37:33.648005image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters7482860
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)< 0.1%

Sample

1st row2019-12-15
2nd row2019-12-15
3rd row2019-12-26
4th row2019-12-05
5th row2019-12-28
ValueCountFrequency (%)
2019-09-16 25909
 
3.5%
2019-12-17 2089
 
0.3%
2019-12-13 1475
 
0.2%
2019-12-25 1338
 
0.2%
2019-09-19 1300
 
0.2%
2019-12-24 1251
 
0.2%
2019-12-23 1234
 
0.2%
2019-11-15 1186
 
0.2%
2020-01-20 1179
 
0.2%
2019-12-26 1179
 
0.2%
Other values (2343) 710146
94.9%
2024-02-13T20:37:34.177732image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1747905
23.4%
- 1496572
20.0%
2 1328877
17.8%
1 1267565
16.9%
9 415093
 
5.5%
8 311251
 
4.2%
7 255209
 
3.4%
6 238229
 
3.2%
3 164963
 
2.2%
5 135974
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5986288
80.0%
Dash Punctuation 1496572
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1747905
29.2%
2 1328877
22.2%
1 1267565
21.2%
9 415093
 
6.9%
8 311251
 
5.2%
7 255209
 
4.3%
6 238229
 
4.0%
3 164963
 
2.8%
5 135974
 
2.3%
4 121222
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
- 1496572
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7482860
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1747905
23.4%
- 1496572
20.0%
2 1328877
17.8%
1 1267565
16.9%
9 415093
 
5.5%
8 311251
 
4.2%
7 255209
 
3.4%
6 238229
 
3.2%
3 164963
 
2.2%
5 135974
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7482860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1747905
23.4%
- 1496572
20.0%
2 1328877
17.8%
1 1267565
16.9%
9 415093
 
5.5%
8 311251
 
4.2%
7 255209
 
3.4%
6 238229
 
3.2%
3 164963
 
2.2%
5 135974
 
1.8%
Distinct2365
Distinct (%)0.2%
Missing1609466
Missing (%)61.0%
Memory size20.1 MiB
2024-02-13T20:37:34.525896image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10288290
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique174 ?
Unique (%)< 0.1%

Sample

1st row2019-12-28
2nd row2019-12-15
3rd row2019-12-19
4th row2019-12-30
5th row2019-12-27
ValueCountFrequency (%)
2019-09-16 26380
 
2.6%
2020-01-20 3832
 
0.4%
2019-12-25 3506
 
0.3%
2019-12-27 3456
 
0.3%
2020-01-01 3395
 
0.3%
2019-12-24 3365
 
0.3%
2019-12-26 3346
 
0.3%
2020-01-24 3264
 
0.3%
2019-12-17 3250
 
0.3%
2019-12-23 3196
 
0.3%
Other values (2355) 971839
94.5%
2024-02-13T20:37:34.993309image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2591633
25.2%
- 2057658
20.0%
2 2034311
19.8%
1 1541821
15.0%
9 523583
 
5.1%
8 374790
 
3.6%
7 306382
 
3.0%
6 286842
 
2.8%
3 234640
 
2.3%
5 178561
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8230632
80.0%
Dash Punctuation 2057658
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2591633
31.5%
2 2034311
24.7%
1 1541821
18.7%
9 523583
 
6.4%
8 374790
 
4.6%
7 306382
 
3.7%
6 286842
 
3.5%
3 234640
 
2.9%
5 178561
 
2.2%
4 158069
 
1.9%
Dash Punctuation
ValueCountFrequency (%)
- 2057658
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10288290
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2591633
25.2%
- 2057658
20.0%
2 2034311
19.8%
1 1541821
15.0%
9 523583
 
5.1%
8 374790
 
3.6%
7 306382
 
3.0%
6 286842
 
2.8%
3 234640
 
2.3%
5 178561
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10288290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2591633
25.2%
- 2057658
20.0%
2 2034311
19.8%
1 1541821
15.0%
9 523583
 
5.1%
8 374790
 
3.6%
7 306382
 
3.0%
6 286842
 
2.8%
3 234640
 
2.3%
5 178561
 
1.7%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:35.197384image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length9.101808175
Min length8

Characters and Unicode

Total characters24013255
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa55475b1
2nd rowa55475b1
3rd rowP97_36_170
4th rowa55475b1
5th rowP97_36_170
ValueCountFrequency (%)
a55475b1 1379355
52.3%
p97_36_170 852757
32.3%
p33_146_175 370707
 
14.1%
p106_81_188 17178
 
0.7%
p17_36_170 17168
 
0.7%
p157_18_172 1130
 
< 0.1%
2024-02-13T20:37:35.495275image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4509902
18.8%
7 3492172
14.5%
1 3062786
12.8%
_ 2517880
10.5%
4 1750062
 
7.3%
3 1611339
 
6.7%
a 1379355
 
5.7%
b 1379355
 
5.7%
P 1258940
 
5.2%
6 1257810
 
5.2%
Other values (4) 1793654
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 17477725
72.8%
Lowercase Letter 2758710
 
11.5%
Connector Punctuation 2517880
 
10.5%
Uppercase Letter 1258940
 
5.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 4509902
25.8%
7 3492172
20.0%
1 3062786
17.5%
4 1750062
 
10.0%
3 1611339
 
9.2%
6 1257810
 
7.2%
0 887103
 
5.1%
9 852757
 
4.9%
8 52664
 
0.3%
2 1130
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 1379355
50.0%
b 1379355
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2517880
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 1258940
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19995605
83.3%
Latin 4017650
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
5 4509902
22.6%
7 3492172
17.5%
1 3062786
15.3%
_ 2517880
12.6%
4 1750062
 
8.8%
3 1611339
 
8.1%
6 1257810
 
6.3%
0 887103
 
4.4%
9 852757
 
4.3%
8 52664
 
0.3%
Latin
ValueCountFrequency (%)
a 1379355
34.3%
b 1379355
34.3%
P 1258940
31.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24013255
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4509902
18.8%
7 3492172
14.5%
1 3062786
12.8%
_ 2517880
10.5%
4 1750062
 
7.3%
3 1611339
 
6.7%
a 1379355
 
5.7%
b 1379355
 
5.7%
P 1258940
 
5.2%
6 1257810
 
5.2%
Other values (4) 1793654
 
7.5%

employedfrom_700D
Text

MISSING 

Distinct8343
Distinct (%)0.9%
Missing1705609
Missing (%)64.6%
Memory size20.1 MiB
2024-02-13T20:37:35.892499image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters9326860
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1996 ?
Unique (%)0.2%

Sample

1st row2014-01-15
2nd row2013-04-15
3rd row2013-04-15
4th row2012-02-15
5th row2018-01-15
ValueCountFrequency (%)
2017-01-15 16608
 
1.8%
2015-01-15 15436
 
1.7%
2013-01-15 15290
 
1.6%
2014-01-15 15105
 
1.6%
2016-01-15 15074
 
1.6%
2012-01-15 13022
 
1.4%
2018-01-15 11930
 
1.3%
2010-01-15 10067
 
1.1%
2010-09-15 9608
 
1.0%
2011-01-15 9585
 
1.0%
Other values (8333) 800961
85.9%
2024-02-13T20:37:36.422642image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2123405
22.8%
1 1975825
21.2%
- 1865372
20.0%
2 1075472
11.5%
5 1043024
11.2%
9 389836
 
4.2%
8 191653
 
2.1%
6 171491
 
1.8%
3 166811
 
1.8%
4 164425
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7461488
80.0%
Dash Punctuation 1865372
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2123405
28.5%
1 1975825
26.5%
2 1075472
14.4%
5 1043024
14.0%
9 389836
 
5.2%
8 191653
 
2.6%
6 171491
 
2.3%
3 166811
 
2.2%
4 164425
 
2.2%
7 159546
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 1865372
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9326860
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2123405
22.8%
1 1975825
21.2%
- 1865372
20.0%
2 1075472
11.5%
5 1043024
11.2%
9 389836
 
4.2%
8 191653
 
2.1%
6 171491
 
1.8%
3 166811
 
1.8%
4 164425
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9326860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2123405
22.8%
1 1975825
21.2%
- 1865372
20.0%
2 1075472
11.5%
5 1043024
11.2%
9 389836
 
4.2%
8 191653
 
2.1%
6 171491
 
1.8%
3 166811
 
1.8%
4 164425
 
1.8%

familystate_726L
Text

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing1148691
Missing (%)43.5%
Memory size20.1 MiB
2024-02-13T20:37:36.595268image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length19
Median length7
Mean length7.067301108
Min length6

Characters and Unicode

Total characters10527480
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMARRIED
2nd rowSINGLE
3rd rowSINGLE
4th rowMARRIED
5th rowMARRIED
ValueCountFrequency (%)
married 1082575
72.7%
single 201743
 
13.5%
widowed 138790
 
9.3%
divorced 45087
 
3.0%
living_with_partner 21409
 
1.4%
2024-02-13T20:37:36.891963image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 2253055
21.4%
I 1532422
14.6%
E 1489604
14.1%
D 1450329
13.8%
A 1103984
10.5%
M 1082575
10.3%
W 298989
 
2.8%
N 244561
 
2.3%
L 223152
 
2.1%
G 223152
 
2.1%
Other values (8) 625657
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 10484662
99.6%
Connector Punctuation 42818
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 2253055
21.5%
I 1532422
14.6%
E 1489604
14.2%
D 1450329
13.8%
A 1103984
10.5%
M 1082575
10.3%
W 298989
 
2.9%
N 244561
 
2.3%
L 223152
 
2.1%
G 223152
 
2.1%
Other values (7) 582839
 
5.6%
Connector Punctuation
ValueCountFrequency (%)
_ 42818
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10484662
99.6%
Common 42818
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 2253055
21.5%
I 1532422
14.6%
E 1489604
14.2%
D 1450329
13.8%
A 1103984
10.5%
M 1082575
10.3%
W 298989
 
2.9%
N 244561
 
2.3%
L 223152
 
2.1%
G 223152
 
2.1%
Other values (7) 582839
 
5.6%
Common
ValueCountFrequency (%)
_ 42818
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10527480
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R 2253055
21.4%
I 1532422
14.6%
E 1489604
14.1%
D 1450329
13.8%
A 1103984
10.5%
M 1082575
10.3%
W 298989
 
2.8%
N 244561
 
2.3%
L 223152
 
2.1%
G 223152
 
2.1%
Other values (8) 625657
 
5.9%
Distinct5153
Distinct (%)0.2%
Missing287307
Missing (%)10.9%
Memory size20.1 MiB
2024-02-13T20:37:37.302463image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters23509880
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)< 0.1%

Sample

1st row2018-12-20
2nd row2020-01-26
3rd row2014-08-17
4th row2017-09-21
5th row2015-01-28
ValueCountFrequency (%)
2019-12-15 5044
 
0.2%
2019-03-14 4785
 
0.2%
2019-09-15 4775
 
0.2%
2020-03-15 4335
 
0.2%
2019-10-15 4115
 
0.2%
2020-01-15 3797
 
0.2%
2020-02-15 3791
 
0.2%
2019-10-12 3728
 
0.2%
2019-07-15 3722
 
0.2%
2020-01-11 3704
 
0.2%
Other values (5143) 2309192
98.2%
2024-02-13T20:37:37.850391image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5625459
23.9%
- 4701976
20.0%
1 4129869
17.6%
2 4102127
17.4%
9 990127
 
4.2%
8 855287
 
3.6%
7 710419
 
3.0%
5 680403
 
2.9%
3 621076
 
2.6%
6 578670
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 18807904
80.0%
Dash Punctuation 4701976
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5625459
29.9%
1 4129869
22.0%
2 4102127
21.8%
9 990127
 
5.3%
8 855287
 
4.5%
7 710419
 
3.8%
5 680403
 
3.6%
3 621076
 
3.3%
6 578670
 
3.1%
4 514467
 
2.7%
Dash Punctuation
ValueCountFrequency (%)
- 4701976
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23509880
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5625459
23.9%
- 4701976
20.0%
1 4129869
17.6%
2 4102127
17.4%
9 990127
 
4.2%
8 855287
 
3.6%
7 710419
 
3.0%
5 680403
 
2.9%
3 621076
 
2.6%
6 578670
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23509880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5625459
23.9%
- 4701976
20.0%
1 4129869
17.6%
2 4102127
17.4%
9 990127
 
4.2%
8 855287
 
3.6%
7 710419
 
3.0%
5 680403
 
2.9%
3 621076
 
2.6%
6 578670
 
2.5%
Distinct3
Distinct (%)< 0.1%
Missing78869
Missing (%)3.0%
Memory size20.1 MiB
2024-02-13T20:37:38.005152image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.428913749
Min length3

Characters and Unicode

Total characters8776051
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCASH
2nd rowCASH
3rd rowCASH
4th rowPOS
5th rowPOS
ValueCountFrequency (%)
pos 1334600
52.1%
cash 1097773
42.9%
ndf 127053
 
5.0%
2024-02-13T20:37:38.277937image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 2432373
27.7%
P 1334600
15.2%
O 1334600
15.2%
C 1097773
12.5%
A 1097773
12.5%
H 1097773
12.5%
N 127053
 
1.4%
D 127053
 
1.4%
F 127053
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8776051
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2432373
27.7%
P 1334600
15.2%
O 1334600
15.2%
C 1097773
12.5%
A 1097773
12.5%
H 1097773
12.5%
N 127053
 
1.4%
D 127053
 
1.4%
F 127053
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 8776051
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2432373
27.7%
P 1334600
15.2%
O 1334600
15.2%
C 1097773
12.5%
A 1097773
12.5%
H 1097773
12.5%
N 127053
 
1.4%
D 127053
 
1.4%
F 127053
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8776051
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 2432373
27.7%
P 1334600
15.2%
O 1334600
15.2%
C 1097773
12.5%
A 1097773
12.5%
H 1097773
12.5%
N 127053
 
1.4%
D 127053
 
1.4%
F 127053
 
1.4%

isbidproduct_390L
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Memory size20.1 MiB
False
2493462 
True
 
144802
(Missing)
 
31
ValueCountFrequency (%)
False 2493462
94.5%
True 144802
 
5.5%
(Missing) 31
 
< 0.1%
2024-02-13T20:37:38.407938image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

isdebitcard_527L
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2425416
Missing (%)91.9%
Memory size20.1 MiB
False
 
146098
True
 
66781
(Missing)
2425416 
ValueCountFrequency (%)
False 146098
 
5.5%
True 66781
 
2.5%
(Missing) 2425416
91.9%
2024-02-13T20:37:38.506923image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

mainoccupationinc_437A
Real number (ℝ)

MISSING 

Distinct17340
Distinct (%)0.7%
Missing65371
Missing (%)2.5%
Infinite0
Infinite (%)0.0%
Mean43046.11571
Minimum0
Maximum199600
Zeros32
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:38.641936image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6315.8003
Q120000
median37000
Q358000
95-th percentile100000
Maximum199600
Range199600
Interquartile range (IQR)38000

Descriptive statistics

Standard deviation32550.06984
Coefficient of variation (CV)0.7561674102
Kurtosis4.670832521
Mean43046.11571
Median Absolute Deviation (MAD)18000
Skewness1.769823701
Sum1.107543842 × 1011
Variance1059507046
MonotonicityNot monotonic
2024-02-13T20:37:38.799963image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40000 186417
 
7.1%
30000 183448
 
7.0%
50000 162276
 
6.2%
60000 135382
 
5.1%
20000 91145
 
3.5%
70000 87299
 
3.3%
24000 79960
 
3.0%
36000 76607
 
2.9%
80000 47766
 
1.8%
100000 45645
 
1.7%
Other values (17330) 1476979
56.0%
(Missing) 65371
 
2.5%
ValueCountFrequency (%)
0 32
< 0.1%
0.2 51
< 0.1%
0.4 6
 
< 0.1%
0.6 10
 
< 0.1%
0.8 1
 
< 0.1%
ValueCountFrequency (%)
199600 12883
0.5%
199400 8
 
< 0.1%
199200 5
 
< 0.1%
199120 1
 
< 0.1%
199000 20
 
< 0.1%

maxdpdtolerance_577P
Real number (ℝ)

MISSING  ZEROS 

Distinct3209
Distinct (%)0.2%
Missing1278326
Missing (%)48.5%
Infinite0
Infinite (%)0.0%
Mean16.78874518
Minimum0
Maximum4362
Zeros986233
Zeros (%)37.4%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:38.966655image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile18
Maximum4362
Range4362
Interquartile range (IQR)1

Descriptive statistics

Standard deviation158.135784
Coefficient of variation (CV)9.419154454
Kurtosis263.3173287
Mean16.78874518
Median Absolute Deviation (MAD)0
Skewness14.91851137
Sum22832173
Variance25006.92618
MonotonicityNot monotonic
2024-02-13T20:37:39.118637image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 986233
37.4%
1 191671
 
7.3%
5 25367
 
1.0%
6 19116
 
0.7%
10 14207
 
0.5%
4 8844
 
0.3%
9 8395
 
0.3%
14 7146
 
0.3%
18 6781
 
0.3%
7 6239
 
0.2%
Other values (3199) 85970
 
3.3%
(Missing) 1278326
48.5%
ValueCountFrequency (%)
0 986233
37.4%
1 191671
 
7.3%
2 6033
 
0.2%
3 1935
 
0.1%
4 8844
 
0.3%
ValueCountFrequency (%)
4362 1
< 0.1%
4245 2
< 0.1%
4222 1
< 0.1%
4206 1
< 0.1%
4185 1
< 0.1%

num_group1
Real number (ℝ)

ZEROS 

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.613015603
Minimum0
Maximum19
Zeros438525
Zeros (%)16.6%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:39.244620image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q37
95-th percentile14
Maximum19
Range19
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.485514586
Coefficient of variation (CV)0.9723605927
Kurtosis0.7244665374
Mean4.613015603
Median Absolute Deviation (MAD)2
Skewness1.163952685
Sum12170496
Variance20.1198411
MonotonicityNot monotonic
2024-02-13T20:37:39.395369image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0 438525
16.6%
1 369409
14.0%
2 309947
11.7%
3 259106
9.8%
4 216484
8.2%
5 180462
6.8%
6 150620
 
5.7%
7 125780
 
4.8%
8 105118
 
4.0%
9 88418
 
3.4%
Other values (10) 394426
15.0%
ValueCountFrequency (%)
0 438525
16.6%
1 369409
14.0%
2 309947
11.7%
3 259106
9.8%
4 216484
8.2%
ValueCountFrequency (%)
19 17541
0.7%
18 20372
0.8%
17 23687
0.9%
16 27647
1.0%
15 32524
1.2%

outstandingdebt_522A
Real number (ℝ)

MISSING  ZEROS 

Distinct179739
Distinct (%)10.8%
Missing980346
Missing (%)37.2%
Infinite0
Infinite (%)0.0%
Mean7097.727649
Minimum0
Maximum1029392.8
Zeros1436446
Zeros (%)54.4%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:39.555921image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile40964.368
Maximum1029392.8
Range1029392.8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation30871.45589
Coefficient of variation (CV)4.349484429
Kurtosis79.30366253
Mean7097.727649
Median Absolute Deviation (MAD)0
Skewness7.504083064
Sum1.176767046 × 1010
Variance953046788.5
MonotonicityNot monotonic
2024-02-13T20:37:39.720988image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1436446
54.4%
10 286
 
< 0.1%
9998 92
 
< 0.1%
11998 72
 
< 0.1%
17998 64
 
< 0.1%
20 61
 
< 0.1%
7998 60
 
< 0.1%
19998 59
 
< 0.1%
5998 58
 
< 0.1%
8998 52
 
< 0.1%
Other values (179729) 220699
 
8.4%
(Missing) 980346
37.2%
ValueCountFrequency (%)
0 1436446
54.4%
0.002 1
 
< 0.1%
0.004 1
 
< 0.1%
0.006 1
 
< 0.1%
0.008 1
 
< 0.1%
ValueCountFrequency (%)
1029392.8 1
< 0.1%
987535 1
< 0.1%
984399 1
< 0.1%
978072.2 1
< 0.1%
910766 1
< 0.1%

pmtnum_8L
Real number (ℝ)

MISSING 

Distinct60
Distinct (%)< 0.1%
Missing238987
Missing (%)9.1%
Infinite0
Infinite (%)0.0%
Mean16.85803407
Minimum3
Maximum63
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:39.878499image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4
Q19
median12
Q324
95-th percentile36
Maximum63
Range60
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.25158885
Coefficient of variation (CV)0.6674318494
Kurtosis1.295980682
Mean16.85803407
Median Absolute Deviation (MAD)6
Skewness1.211121972
Sum40447616
Variance126.5982517
MonotonicityNot monotonic
2024-02-13T20:37:40.032494image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12 594080
22.5%
24 414664
15.7%
6 355258
13.5%
18 200921
 
7.6%
36 156551
 
5.9%
3 117252
 
4.4%
48 81251
 
3.1%
16 77062
 
2.9%
9 51005
 
1.9%
4 47083
 
1.8%
Other values (50) 304181
11.5%
(Missing) 238987
9.1%
ValueCountFrequency (%)
3 117252
 
4.4%
4 47083
 
1.8%
5 23564
 
0.9%
6 355258
13.5%
7 4135
 
0.2%
ValueCountFrequency (%)
63 3
 
< 0.1%
62 12
 
< 0.1%
61 18
 
< 0.1%
60 12352
0.5%
59 1
 
< 0.1%
Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:40.211805image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length11.24311951
Min length8

Characters and Unicode

Total characters29662666
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP46_145_78
2nd rowP149_40_170
3rd rowP46_145_78
4th rowP177_117_192
5th rowP60_146_156
ValueCountFrequency (%)
p177_117_192 1283096
48.6%
p46_145_78 671053
25.4%
p149_40_170 260036
 
9.9%
p60_146_156 200271
 
7.6%
p67_102_161 175416
 
6.6%
p217_110_186 30413
 
1.2%
p169_115_83 13766
 
0.5%
p140_48_169 3899
 
0.1%
a55475b1 345
 
< 0.1%
2024-02-13T20:37:40.536330image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 7421392
25.0%
_ 5275900
17.8%
7 4986551
16.8%
P 2637950
 
8.9%
4 2070592
 
7.0%
6 1670776
 
5.6%
9 1560797
 
5.3%
2 1488925
 
5.0%
0 930071
 
3.1%
5 886125
 
3.0%
Other values (4) 733587
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 21748126
73.3%
Connector Punctuation 5275900
 
17.8%
Uppercase Letter 2637950
 
8.9%
Lowercase Letter 690
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 7421392
34.1%
7 4986551
22.9%
4 2070592
 
9.5%
6 1670776
 
7.7%
9 1560797
 
7.2%
2 1488925
 
6.8%
0 930071
 
4.3%
5 886125
 
4.1%
8 719131
 
3.3%
3 13766
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
a 345
50.0%
b 345
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 5275900
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 2637950
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 27024026
91.1%
Latin 2638640
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 7421392
27.5%
_ 5275900
19.5%
7 4986551
18.5%
4 2070592
 
7.7%
6 1670776
 
6.2%
9 1560797
 
5.8%
2 1488925
 
5.5%
0 930071
 
3.4%
5 886125
 
3.3%
8 719131
 
2.7%
Latin
ValueCountFrequency (%)
P 2637950
> 99.9%
a 345
 
< 0.1%
b 345
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29662666
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 7421392
25.0%
_ 5275900
17.8%
7 4986551
16.8%
P 2637950
 
8.9%
4 2070592
 
7.0%
6 1670776
 
5.6%
9 1560797
 
5.3%
2 1488925
 
5.0%
0 930071
 
3.1%
5 886125
 
3.0%
Other values (4) 733587
 
2.5%
Distinct5799
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:40.870426image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length8.024117091
Min length7

Characters and Unicode

Total characters21169988
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3891 ?
Unique (%)0.1%

Sample

1st rowa55475b1
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 2613108
99.0%
p46_72_80 682
 
< 0.1%
p104_137_180 436
 
< 0.1%
p167_22_171 374
 
< 0.1%
p21_76_53 372
 
< 0.1%
p143_116_69 342
 
< 0.1%
p25_111_112 335
 
< 0.1%
p139_125_64 322
 
< 0.1%
p121_114_58 283
 
< 0.1%
p103_114_185 279
 
< 0.1%
Other values (5789) 21762
 
0.8%
2024-02-13T20:37:41.328457image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 7854981
37.1%
1 2667040
 
12.6%
7 2630879
 
12.4%
4 2628479
 
12.4%
a 2613130
 
12.3%
b 2613109
 
12.3%
_ 50374
 
0.2%
P 25146
 
0.1%
2 17853
 
0.1%
6 17554
 
0.1%
Other values (24) 51443
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15868035
75.0%
Lowercase Letter 5226392
 
24.7%
Connector Punctuation 50374
 
0.2%
Uppercase Letter 25187
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2613130
50.0%
b 2613109
50.0%
e 24
 
< 0.1%
r 19
 
< 0.1%
o 14
 
< 0.1%
t 12
 
< 0.1%
d 12
 
< 0.1%
k 10
 
< 0.1%
y 10
 
< 0.1%
c 10
 
< 0.1%
Other values (11) 42
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 7854981
49.5%
1 2667040
 
16.8%
7 2630879
 
16.6%
4 2628479
 
16.6%
2 17853
 
0.1%
6 17554
 
0.1%
3 13264
 
0.1%
8 13257
 
0.1%
0 13197
 
0.1%
9 11531
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
P 25146
99.8%
Q 41
 
0.2%
Connector Punctuation
ValueCountFrequency (%)
_ 50374
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15918409
75.2%
Latin 5251579
 
24.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2613130
49.8%
b 2613109
49.8%
P 25146
 
0.5%
Q 41
 
< 0.1%
e 24
 
< 0.1%
r 19
 
< 0.1%
o 14
 
< 0.1%
t 12
 
< 0.1%
d 12
 
< 0.1%
k 10
 
< 0.1%
Other values (13) 62
 
< 0.1%
Common
ValueCountFrequency (%)
5 7854981
49.3%
1 2667040
 
16.8%
7 2630879
 
16.5%
4 2628479
 
16.5%
_ 50374
 
0.3%
2 17853
 
0.1%
6 17554
 
0.1%
3 13264
 
0.1%
8 13257
 
0.1%
0 13197
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21169988
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 7854981
37.1%
1 2667040
 
12.6%
7 2630879
 
12.4%
4 2628479
 
12.4%
a 2613130
 
12.3%
b 2613109
 
12.3%
_ 50374
 
0.2%
P 25146
 
0.1%
2 17853
 
0.1%
6 17554
 
0.1%
Other values (24) 51443
 
0.2%
Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:41.532218image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.730558561
Min length8

Characters and Unicode

Total characters23033789
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP198_131_9
2nd rowP45_84_106
3rd rowa55475b1
4th rowP99_56_166
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 1814894
68.8%
p99_56_166 354478
 
13.4%
p94_109_143 285561
 
10.8%
p198_131_9 88231
 
3.3%
p45_84_106 83707
 
3.2%
p48_22_32 4446
 
0.2%
p30_86_84 2041
 
0.1%
p121_60_164 1378
 
0.1%
p196_88_176 1347
 
0.1%
p52_67_90 1240
 
< 0.1%
Other values (8) 972
 
< 0.1%
2024-02-13T20:37:41.849263image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 5884277
25.5%
1 3097659
13.4%
4 2561781
11.1%
7 1817922
 
7.9%
a 1814894
 
7.9%
b 1814894
 
7.9%
_ 1646802
 
7.1%
9 1459810
 
6.3%
6 1157112
 
5.0%
P 823401
 
3.6%
Other values (4) 955237
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16933798
73.5%
Lowercase Letter 3629788
 
15.8%
Connector Punctuation 1646802
 
7.1%
Uppercase Letter 823401
 
3.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 5884277
34.7%
1 3097659
18.3%
4 2561781
15.1%
7 1817922
 
10.7%
9 1459810
 
8.6%
6 1157112
 
6.8%
3 380407
 
2.2%
0 374229
 
2.2%
8 183643
 
1.1%
2 16958
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
a 1814894
50.0%
b 1814894
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1646802
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 823401
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 18580600
80.7%
Latin 4453189
 
19.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5 5884277
31.7%
1 3097659
16.7%
4 2561781
13.8%
7 1817922
 
9.8%
_ 1646802
 
8.9%
9 1459810
 
7.9%
6 1157112
 
6.2%
3 380407
 
2.0%
0 374229
 
2.0%
8 183643
 
1.0%
Latin
ValueCountFrequency (%)
a 1814894
40.8%
b 1814894
40.8%
P 823401
18.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23033789
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 5884277
25.5%
1 3097659
13.4%
4 2561781
11.1%
7 1817922
 
7.9%
a 1814894
 
7.9%
b 1814894
 
7.9%
_ 1646802
 
7.1%
9 1459810
 
6.3%
6 1157112
 
5.0%
P 823401
 
3.6%
Other values (4) 955237
 
4.1%
Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:42.012258image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.792563758
Min length8

Characters and Unicode

Total characters23197377
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP94_109_143
2nd rowP94_109_143
3rd rowa55475b1
4th rowP94_109_143
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 1889911
71.6%
p94_109_143 654356
 
24.8%
p30_86_84 48409
 
1.8%
p52_67_90 18027
 
0.7%
p69_72_116 12955
 
0.5%
p129_162_80 8320
 
0.3%
p84_14_61 2885
 
0.1%
p64_121_167 1849
 
0.1%
p19_25_34 761
 
< 0.1%
p5_143_178 612
 
< 0.1%
Other values (4) 210
 
< 0.1%
2024-02-13T20:37:42.311061image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 5689545
24.5%
4 3256030
14.0%
1 3254895
14.0%
7 1923354
 
8.3%
a 1889911
 
8.1%
b 1889911
 
8.1%
_ 1496768
 
6.5%
9 1348782
 
5.8%
P 748384
 
3.2%
0 729319
 
3.1%
Other values (4) 970478
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 17172403
74.0%
Lowercase Letter 3779822
 
16.3%
Connector Punctuation 1496768
 
6.5%
Uppercase Letter 748384
 
3.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 5689545
33.1%
4 3256030
19.0%
1 3254895
19.0%
7 1923354
 
11.2%
9 1348782
 
7.9%
0 729319
 
4.2%
3 704345
 
4.1%
8 108638
 
0.6%
6 107252
 
0.6%
2 50243
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
a 1889911
50.0%
b 1889911
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1496768
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 748384
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 18669171
80.5%
Latin 4528206
 
19.5%

Most frequent character per script

Common
ValueCountFrequency (%)
5 5689545
30.5%
4 3256030
17.4%
1 3254895
17.4%
7 1923354
 
10.3%
_ 1496768
 
8.0%
9 1348782
 
7.2%
0 729319
 
3.9%
3 704345
 
3.8%
8 108638
 
0.6%
6 107252
 
0.6%
Latin
ValueCountFrequency (%)
a 1889911
41.7%
b 1889911
41.7%
P 748384
 
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23197377
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 5689545
24.5%
4 3256030
14.0%
1 3254895
14.0%
7 1923354
 
8.3%
a 1889911
 
8.1%
b 1889911
 
8.1%
_ 1496768
 
6.5%
9 1348782
 
5.8%
P 748384
 
3.2%
0 729319
 
3.1%
Other values (4) 970478
 
4.2%

revolvingaccount_394A
Real number (ℝ)

MISSING 

Distinct52406
Distinct (%)37.4%
Missing2498196
Missing (%)94.7%
Infinite0
Infinite (%)0.0%
Mean761933430.7
Minimum540342340
Maximum800608700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:42.466044image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum540342340
5-th percentile685001930
Q1760163480
median780311500
Q3780789950
95-th percentile800253630
Maximum800608700
Range260266360
Interquartile range (IQR)20626470

Descriptive statistics

Standard deviation47558807.47
Coefficient of variation (CV)0.06241858613
Kurtosis10.27654451
Mean761933430.7
Median Absolute Deviation (MAD)19745840
Skewness-3.060067244
Sum1.067461117 × 1014
Variance2.261840168 × 1015
MonotonicityNot monotonic
2024-02-13T20:37:42.634033image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
780784600 29
 
< 0.1%
780851400 27
 
< 0.1%
800146900 25
 
< 0.1%
780783040 24
 
< 0.1%
780851650 24
 
< 0.1%
780661440 23
 
< 0.1%
780826560 23
 
< 0.1%
780561100 23
 
< 0.1%
780826300 22
 
< 0.1%
780621760 22
 
< 0.1%
Other values (52396) 139857
 
5.3%
(Missing) 2498196
94.7%
ValueCountFrequency (%)
540342340 1
 
< 0.1%
540342460 2
< 0.1%
540342500 3
< 0.1%
540342600 2
< 0.1%
540342660 1
 
< 0.1%
ValueCountFrequency (%)
800608700 1
< 0.1%
800608100 1
< 0.1%
800607550 1
< 0.1%
800607500 1
< 0.1%
800607400 1
< 0.1%
Distinct11
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Memory size20.1 MiB
2024-02-13T20:37:42.747462image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2638264
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowD
2nd rowD
3rd rowD
4th rowD
5th rowD
ValueCountFrequency (%)
d 1104119
41.9%
k 1053357
39.9%
a 284608
 
10.8%
t 177685
 
6.7%
n 14762
 
0.6%
q 2711
 
0.1%
l 489
 
< 0.1%
s 302
 
< 0.1%
h 196
 
< 0.1%
p 20
 
< 0.1%
2024-02-13T20:37:42.974869image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
D 1104119
41.9%
K 1053357
39.9%
A 284608
 
10.8%
T 177685
 
6.7%
N 14762
 
0.6%
Q 2711
 
0.1%
L 489
 
< 0.1%
S 302
 
< 0.1%
H 196
 
< 0.1%
P 20
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2638264
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D 1104119
41.9%
K 1053357
39.9%
A 284608
 
10.8%
T 177685
 
6.7%
N 14762
 
0.6%
Q 2711
 
0.1%
L 489
 
< 0.1%
S 302
 
< 0.1%
H 196
 
< 0.1%
P 20
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 2638264
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 1104119
41.9%
K 1053357
39.9%
A 284608
 
10.8%
T 177685
 
6.7%
N 14762
 
0.6%
Q 2711
 
0.1%
L 489
 
< 0.1%
S 302
 
< 0.1%
H 196
 
< 0.1%
P 20
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2638264
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 1104119
41.9%
K 1053357
39.9%
A 284608
 
10.8%
T 177685
 
6.7%
N 14762
 
0.6%
Q 2711
 
0.1%
L 489
 
< 0.1%
S 302
 
< 0.1%
H 196
 
< 0.1%
P 20
 
< 0.1%

tenor_203L
Real number (ℝ)

MISSING 

Distinct60
Distinct (%)< 0.1%
Missing238987
Missing (%)9.1%
Infinite0
Infinite (%)0.0%
Mean16.85803407
Minimum3
Maximum63
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.1 MiB
2024-02-13T20:37:43.122850image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4
Q19
median12
Q324
95-th percentile36
Maximum63
Range60
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.25158885
Coefficient of variation (CV)0.6674318494
Kurtosis1.295980682
Mean16.85803407
Median Absolute Deviation (MAD)6
Skewness1.211121972
Sum40447616
Variance126.5982517
MonotonicityNot monotonic
2024-02-13T20:37:43.275732image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12 594080
22.5%
24 414664
15.7%
6 355258
13.5%
18 200921
 
7.6%
36 156551
 
5.9%
3 117252
 
4.4%
48 81251
 
3.1%
16 77062
 
2.9%
9 51005
 
1.9%
4 47083
 
1.8%
Other values (50) 304181
11.5%
(Missing) 238987
9.1%
ValueCountFrequency (%)
3 117252
 
4.4%
4 47083
 
1.8%
5 23564
 
0.9%
6 355258
13.5%
7 4135
 
0.2%
ValueCountFrequency (%)
63 3
 
< 0.1%
62 12
 
< 0.1%
61 18
 
< 0.1%
60 12352
0.5%
59 1
 
< 0.1%