Espacios. Vol. 35 (Nº 6) Año 2014. Pág. 4


Formality or informality: a choice based on individual characteristics

Formalidade ou informalidade: a escolha com base em características individuais

Michele ROMANELLO 1; Flávio DE OLIVEIRA GONÇALVES 2

Recibido: 21/03/14 • Aprobado: 05/05/14


Contenido

ABSTRACT:
This paper aims to investigate the possible factors that determine the choice by an individual between formal and informal sector in Brazil considering the characteristics of individual. Brazilian active population is separated into four groups considering the characteristics of individuals: formal workers, informal workers, formal entrepreneurs and informal entrepreneurs. Four group can be distinctly separated through multinomial logistic regression and discriminant analysis. Characteristics that result most important are, in order, years of schooling, age, gender and skin color.
KEYWORDS: informality, multinomial logit, discriminant analysis

RESUMO:
Este trabalho tem como objetivo investigar os possíveis fatores que determinam a escolha por um indivíduo entre o setor formal e informal no Brasil, considerando as características do indivíduo. População activa brasileiro é separado em quatro grupos, considerando as características de indivíduos: os trabalhadores formais, trabalhadores informais, empreendedores formais e empreendedores informais. Quatro grupo pode ser claramente separado por meio de regressão logística multinomial e análise discriminante. Características que resultam mais importantes são, pela ordem, anos de escolaridade, idade, sexo e cor da pele.
PALAVRAS-CHAVE: informalidade, logit multinomial, análise discriminante


1. Introduction

Informality is an important economic phenomenon to be investigated because it has several effects on the economy as a whole.  Informality creates problems in raising funds by the state, as taxes are not collected in this type of work. The large presence of this type of work also creates problems of access to credit by businesses. For example, Dabla-Norris, Koeda (2008) found evidence that informality is robustly and significantly associated with lower access to and use of bank credit and a higher dependence on informal origins of financing.
We cannot forget that informality is linked mostly to lack of protection for workers at time of service, but also in the future, considering the absence of payments to the social security system. Moreover, informality creates inefficiency in the choice of public policies by the state, because the data of informal workers are not part of official statistics.
The literature on the subject evidences that informality creates other problems: lack of economic efficiency and cut-throat competition between the formal and informal sectors of the economy. Hsieh, Klenow (2009) affirm that the existence at the same time of formal and informal firms in the same sector means that firms have different marginal production costs, which leads to misallocation of resources. In fact, a negative consequence of informality would be distortions and inefficiencies, as results of rent-seeking behavior, and introduction of uncertainty about future fiscal condition.
Furthermore, economists consider necessary to combat informality in an attempt to reach greater distributional equity. Neri (2007) affirms that reducing informality leads to a better equity in taxation, public services and social protection.
Considered the relevance of the problem, this paper aims to investigate the possible factors that determine the choice by an individual between formal and informal sector in Brazil considering the characteristics of individual. Recent researches have been using firm level data and reduced-form models to estimate the effect of different policies, such as: decreasing entry costs into the formal sector [Bruhn (2011), Kaplan et al. (2011), and de Mel et al. (2012)]; tax reduction and simplification [Monteiro and Assuncao (2012) and Fajnzylber et al. (2011)]; and enhanced enforcement of current institutions [Almeida and Carneiro (2009)] (Ulyssea, 2013). On the other side, this paper tries to separate Brazilian active population into four groups considering the characteristics of individuals. The four groups are: formal workers, informal workers, formal entrepreneurs and informal entrepreneurs.
The paper will be divided into 5 parts: in the first section, the model of choice between formal and informal sector is set up; in the second section the paper considers and explains data sources and variables used to develop the research; in the third section, personal background characteristics by occupation groups are described; in the fourth section, the results of multinomial logistic regression are evidenced and in the fifth section, canonical discriminant analysis is explained and developed. The final section concludes.

2. Set up of the model

The model [3] of this paper is an economy with two types of economic agents: firms and workers. Government is considered an endogenous agent. Firms are heterogeneous in their managerial ability. They produce formally or informally depending on profit maximization and personal background characteristics:

F= f (πF , πI , X)

where F is the choice of entrepreneur between formal and informal sector, πF is the profit of being formal entrepreneur, πI is the profit of being informal entrepreneur and X are individual background characteristics.
The profit of operating in the formal sector is given by:

πF = Pf (a,l) – wf l(1+t) – T ,

where P is the price of the good produced by the firm, f (a,l) is the production function (with inputs a: managerial skills and l: units of labour), wf l(1+t) is wage per unit of labour paid to workers in the formal sector (including taxes) and T is fixed cost incurred by firms that operate formally.
On the contrary, the profit of operating informally is given by:

πI = (Pf (a,l) - wil ) (1-q)

where wi  is the wage per unit of labour paid to workers in the informal sector and q is the probability that a firm is caught operating informally.
On the other side, workers are heterogeneous in their endowment of human capital. They choose whether to work in the formal or informal sector according to the utility maximization and personal background characteristics:

F= f (UF , UI , X)

where F is the choice of entrepreneur between formal and informal sector, UF is the utility of being formal worker, UI is the utility of being informal worker and X are individual background characteristics.
The utility of working formally is:

UF = wF l + B – g

where B are government benefits related to work formally and g are fixed cost of working in the formal sector.
On the contrary, the utility of working informally is:

UI = wi l (1 - q)

The presence of q is due to the fact that workers do not receive their payment when an informal firm is detected.

Many papers focused their attention to the relation of profits and utilities of formal and informal sector. That is, the choice of entrepreneurs and workers between formal and informal sector is predominantly determined by the fact of obtaining a larger profit or utility in one sector respect the other. 

For example, De Soto (1989) pointed out that a heavy load of taxes, bribes, and bureaucratic issues reduce the incentives, profits and utilities, to produce and work in the formal sector, that is, in our model, he focused on the way by which t, T and g affected π and U.

This paper, instead, focuses on the way by which personal background characteristics (X) affect the entry of individuals in formal or informal sector, considering as given the factors linked to government and institutions.                       

3. Brazilian National Household Survey – PNAD and group classification

The main data source used in this paper is PNAD (National Household Survey) a survey developed by Brazilian Institute of Geography and Statistics. PNAD investigates annually, permanently, general characteristics of the population: as education, labor, income and housing, and others with varying regularity. The year of survey used in this paper is 2012.

PNAD is developed from a complex sampling design: it adopts a stratified and conglomerate sampling design with one, two or three selection stages, depending on the stratum (Silva et. Al., 2002). In this paper, the problem of complex sampling plan of the survey is considered and resolved.

The data that we have extracted from this survey are about job condition of individuals and other characteristics: gender, age, migrant, years of schooling, skin color, child, sector of employment and living in urban area.  
Data about job condition of individuals were adapted to obtain four statuses: formal worker, informal worker, formal entrepreneur and informal entrepreneur.  The separation between formal and informal workers was already present in the original data, that is, workers with formal contract ("carteira de trabalho assinada") or workers without formal contract ("sem carteira de trabalho assinada");  while the distinction between formal and informal entrepreneurs was obtained observing if the entrepreneur was registered in the National Register of Legal Entities (CNPJ).

The variables gender, age and migrant were not adapted. That is, gender is equal to 1 if individual is female and equal to 0 if individual is male; in this case it is worth noting that, according to the research of Ramalho, Silveira Neto (2010),  men are more likely to self-employment in the informal sector, while women recorded higher chances of inclusion in the informal salaried jobs.

The variable migrant indicates if the individual lived in another Brazilian federal state or in another country during his life. Being in the condition of migrant can mean being more incline to accept whatever work, also informal, or to start a new firm (formal or informal).

The variable child considers the fact of having a child under the age of 18 years and shows a greater urgency in finding a job by individuals that are father or mother. Through this division, we want to exhibit the fact of having or not having a family and so the need to work to maintain dependents in the family.

Years of schooling can be an important determinant in the choice of an individual between formal and informal sector. In their paper, Mello and Santos (2009) find that education levels are, at any instant of time, the main individual characteristics that determine the relevance of the workers of the two economic sectors considered (formal and informal). The conclusion is that the improvement in the distribution of education of the population is the real responsible for the increase in the degree of formalization of economies.

In our paper, education enters in the model in three ways: firstly through the profit functions, that is, improving managerial skills (a), only in the case of entrepreneurs; secondly, through salaries (w) since education improves human capital; finally through personal background characteristics (X), due to the fact that high schooling individuals have more knowledge of law, rules and ways for formalize themselves or theirs firms.

The data regarding skin color were divided into White and East-Asian people on one side and Black, Brown and Amerindian people on the other. This division is justified by the fact that, in Brazil, Black, Brown and Amerindian people have been always disadvantaged socially and have had less opportunities respect Whites and East-Asians.

Thus, we can expect a larger presence of Blacks, Browns and Amerindians in informal jobs and firms. Saboia, Saboia (2006) showed the most unfavorable situation of black / brown in relation to whites in the labor market in the country. In the population white workers receive about double the income of black / brown; when we consider only workers with degree, salary differential is 15%.

The data regarding sector of employment was adapted to reach a division into people that work or act in agriculture, in industry and in services.  Belonging to sectors characterized by a high degree of informality (the sector of agriculture and services) or to a sector more intensive in formal jobs (industrial sector), can change the degree of formality in the market job. For instance, structural change in the sectorial composition explained 25% of the increase in the degree of informality observed throughout the 90s in Brazil (Ulyssea, 2006, 2010).

The data about region where individual lives are adjusted to identify, on one side, people that live in urban areas and, on the other side, people that live in rural areas. This distinction can be useful because type of informality can be different between urban and rural individuals.

Variables used in this work can be observed in table 1, which resumes also the values that each variable can take.

Table 1. Description of variables

Variable

Description

Formalinf

1= formal worker; 2= informal worker; 3= formal entrepreneur; 4= informal entrepreneur

Gender

1= female; 0= male

Age

Age of individual in the year 2012

Migrant

1= Lived in another state of Brazilian Federation or abroad; 0= otherwise

Schoolingy

Years of schooling (=0-15); 0= no-schooling or below 1 year; 15= 15 years of schooling or above

Skincol

1= Blacks, Browns and Amerindians; 0= Whites and East-Asians

Child

1= at least one child born after year 1994; 0= otherwise

Agrindser

1= works or acts in the sectors of agriculture or services; 0=  works or acts in the sector of industry

Urban

1= lives in urban area; 0= lives in rural area

Source: Own elaboration. Extracted from PNAD 2012

4. Personal Background Characteristics

Following De Mel et.al. (2010) and Bruhn (2012), this paper classifies both groups of formal and informal individuals into wage worker and business owner species using discriminant analysis. As described in De Mel et.al., discriminant analysis is a tool used by other sciences like biology to separate element of nature into species based on measured characteristics.

For verifying if the variables chosen are relevant to separate the four groups through a discriminant analysis, firstly we analyze means and standard deviations of variables in each group derived through the variable formalinf.

Table 2 displays averages and standard deviations for the personal background characteristics, by occupation group. The statistics in table 2 show that women are more present in the general group of workers respect to the group of entrepreneur, in particular are largely employed in informal jobs that in formal jobs.

Another evidence that results from table 2 is the fact that informal workers are the group with an absolute lower age (34,28 years) follow by formal workers (35,91 years); two groups of entrepreneur present higher ages: formal entrepreneur 43,52 years and informal entrepreneur 44,07 years. This fact may be linked to life cycle, where individuals in the first stages of life are workers, while in the last stages become entrepreneurs, given the possession of greater experience and savings.

Regarding the variable migrant, it can be affirmed that this varies slightly among groups. On the contrary, the variable schoolingy is very different considering each group: informal workers and entrepreneurs have an average level of schooling inferior in comparison to formal workers and entrepreneurs; the biggest difference occurs between informal and formal entrepreneurs, where formals have nearly twice years of schooling than informals according to average figures. This difference could be explained through larger difficulties faced by low schooling entrepreneur in the effort for formalize their firms, given the complexity of laws, taxes and regulations. Otherwise, it could be explained by an effort of low schooling entrepreneur of being competitive through tax evasion, given the fact that he is not competitive through human capital.

Passing to the successive variable that is skincol, it can be noted that Blacks, Browns and Amerindians are largely present in informal sectors of economy on average, being 63% and 61% respectively in the group of workers and of entrepreneurs; while they are 50% in the group of formal workers.
The most interesting figure concerns formal entrepreneurs, whose group is composed only by 32% of Blacks, Browns and Amerindians.

When we consider the fact of having at least one child after year 1994, table 2 shows that formal and informal entrepreneur have less propensity to have at least one child on average that the groups of workers. Moreover, considering workers it can be note that informal ones have slightly more propensity than formal, while considering entrepreneurs it occurs the opposite.

Passing to the next variable that is agrindser, table 2 demonstrate that, on average, formal workers are more present in the sector of industry in comparison to informal workers, while formal entrepreneur are less active in the sector of industry if compared to informal entrepreneur.

Urban, the last variable indicates that in general urban individuals are more present in the formal groups than in the informal ones, with the lowest figure in the case of informal entrepreneurs.

Table 2. Personal background characteristics by occupation group:
averages and standard deviations (in brackets)


Variable

Formal worker

Informal worker

Formal entrepreneur

Informal entrepreneur

gender

0,44
(0,49)

       0,50***
(0,50)

0,33
(0,47)

     0,34***
(0,48)

age

35,91
(11,58)

    34,28***
(13,45)

43,52
(12,19)

     44,07***
(14,85)

migrant

0,10
(0,30)

     0,11***
(0,31)

0,16
(0,37)

   0,13***
(0,33)

schoolingy

9,88
(3,92)

    7,40***
(4,36)

10,41
(3,99)

   5,88***
(4,39)

skincol

0,50
(0,50)

    0,63***
(0,48)

0,32
(0,47)

   0,61***
(0,49)

child

0,20
(0,48)

   0,23***
(0,47)

0,16
(0,42)

    0,14***
(0,37)

agrindser

0,76
(0,42)

   0,83***
(0,37)

0,86
(0,35)

   0,76***
(0,43)

urban

0,95
(0,23)

  0,85***
(0,36)

0,96
(0,20)

   0,68***
(0,47)

Source: Own elaboration. Extracted from PNAD 2012.
The stars on the averages for informal workers and informal entrepreneurs denote the statistical significance level of the difference in averages compared to respectively formal workers and formal entrepreneur. Significance levels: *10 percent, **5 percent, ***1 percent

5.  Multinomial logistic regression

For observing sign of each independent variable in classifying independent variable, it has been used a multinomial logistic regression through the software STATA.  Multinomial logistic regression is a maximum likelihood model with discrete dependent variables, with dependent variable that takes more than two outcomes and the outcomes have no natural ordering.

In the multinomial logit model, a set of coefficients, β(1), β(2), β(3) and β(4) are estimated, corresponding to each outcome:


However, in this econometric technique, it exists more than one solution to β(1), β(2), β(3), β(4) that leads to the same probabilities for y=1, y=2, y=3 and y=4 and thus the model has not an identified solution. The way to identify the model is to set one of the βs equal to zero. In this paper, β(1) is set equal to zero and so the remaining coefficients β(2), β(3), β(4) measure the change relative to the y=1 group.

To obtain the multinomial logistic regression, some aspects relative to the data are considered. The first aspect considered is the fact that National Household Survey (PNAD) is developed from a complex sampling plan. Through STATA software the structure of sampling plan is firstly specified and after taken into account in the multinomial logistic regression.

The second aspect is the presence of a selected sample in this research.
Selected sample is a sample that, intentionally or unintentionally, is based in part on values taken by a dependent variable and so parameter estimates may be inconsistent if corrective measures are not taken (Cameron, Trivedi, 2005).

In this paper, dependent variable (formalinf) intentionally can take values that are relative to employed people, while unemployed people are not taken into account and excluded from dependent variable. Corrective measures are taken through Mills inverse ratio. The variable of inverse Mills ratio is manually calculated through a probit regression where the dependent variable indicates if individual works or not and independent variables are child, skincol, schoolingy, gender, age and a further variable that evidences whether individual receives income from other sources than work.  The variable of inverse Mills ratio is calculated as the ratio of the probability density function to the cumulative distribution function and consequently included in multinomial logistic regression.
The third aspect is the presence of spatial correlation when we utilize the variable agrindser. This problem is solved using a variable, which specifies the area where each individual lives, as an error cluster variable.
The result of multinomial logistic regression are showed in table 3.

Table 3. Multinomial logistic regression (coefficients).

Formal workers

Informal workers

Formal entrepreneurs

Informal entrepreneurs

gender

Base outcome

 .326 ***

-.286 **

 .435 ***

age

-.030 ***

 .032 ***

 .011 ***

migrant

 .192 ***

 .285 ***

 .293 ***

schoolingy

-.134 ***

-.006

-.248 ***

skincol

 .135 ***

-.676 ***

-.178 ***

child

-.067

 .013

-.450 ***

agrindser

 .475 ***

 .560 ***

-.316 ***

urban

-.253 ***

 .633 ***

-.354 ***

Source: Own elaboration
Significance levels: *10 percent, **5 percent, ***1 percent

Multinomial logistic regression shows that nearly all variable are statistically significant at 1% of significance; gender in the group of formal entrepreneurs is significant at 5%.  Variable child in group 2 and 3, and schoolingy in group 3 are statistically insignificant. In these cases, the variables in question do not determine the choice of individuals to belong to a group.

Considering coefficient and in particular β(2), it can be affirmed that variables age, schoolingy and urban have a negative effect on belonging to the group of informal workers. That is, increasing the age of individual, raising the years of schooling and living in urban area lead to a reduced probability to belong to informal workers group.

In particular, living in urban area has a larger negative correlation than other two variables. All the other independent variables are positively associated to the probability to belong to informal workers group, excluding child, which is statistically insignificant. Specifically, being employed in the sector of agriculture or services (agrindser) and being a woman (gender) lead to a larger effect than other variables; in any case, all variables have an important effect. In this group, all the variables have the sign of coefficients as predicted by theory.

Passing to the group of formal entrepreneurs, being black, brown or Amerindian (skincol) and being a woman (gender) have a negative relation on belonging to this group. In particular, skincol has a large negative effect. On the contrary, variables age, migrant, agrindser and urban have a positive effect: an individual has more possibility to enter in this group if he is older, migrant, acts in the sectors of agriculture or services, and lives in urban area. In this case, significant results are in accordance with our expectations, except the sign of agrindser that is expected being negative. We can suppose that the effect of the variable agrindser on entrepreneurs is inverse respect on workers, differently than expected. Industrial sector could be more formalized in the recruitment of employees in comparison with agricultural and services sectors, but it could be less formalized considering the number of entrepreneurs that are registered in the National Register of Legal Entities.

Moreover, we have to evidence the fact that level of education is not statistically significant; that is, the choice of being a formal entrepreneur is not determined by level of education.

The last group to be considered is that of informal entrepreneurs. In this case, schoolingy, skincol, child, agrindser and urban have a negative effect on belonging to the group of informal entrepreneurs. Individuals that have studied more years, are black, brown or Amerindian, have at least one child born after year 1994, act in the sectors of agriculture or services and live in urban areas, have less possibilities to enter in the group of informal entrepreneurs. On the contrary, feminine gender, individuals with an older age and migrant are characteristics positively related with informal entrepreneur.

An interesting point of these results is that being black, brown and Amerindian (skincol) is negatively correlated with both groups of entrepreneurs, while age is positively correlated with both groups, although with a low value of the coefficient.  

6. Canonical discriminant analysis

For investigating the groups according to the type of work or enterprise, it has been used canonical discriminant analysis through STATA software.

Canonical discriminant analysis is developed in this paper to obtain the relative importance of each variable in the explanation of informality.

Canonical discriminant analysis derives a linear combination of the variables that has the highest possible multiple correlation with the groups. This maximum multiple correlation is called the first canonical correlation. The coefficients of the linear combination are the canonical coefficients. The variable defined by the linear combination is the first canonical variable. The second canonical correlation is obtained by finding the linear combination uncorrelated with the first canonical variable that has the highest possible multiple correlation with the groups. The process of extracting canonical variables can be repeated until the number of canonical variables equals the number of original variables or the number of groups minus one, whichever is smaller. Thus, in this work, the variables will be three, due to the fact that the groups are four.

Discriminant analysis involves the determination of a linear equation that will forecast which group individual belongs. The form of the equation or function is:

D= v1 X1 + v2 X2 + v3 X3 + ..= vi Xi + a

where D = discriminate function
v = the discriminant coefficient or weight for that variable
X = respondent's score for that variable
a = a constant
i = the number of predictor variables (Burns, Burns, 2008).

The objective of this function is maximizing the distance between groups, that is, resulting in an equation that has strong discriminatory power between groups.

Before utilizing canonical discriminant analysis we have to test the assumption of this type of analysis: sample size, normal distribution, homogeneity of variances / covariances, outliers and non-multicollinearity (Poulsen, French, n.a.).

The first assumption is sample size, that is, the sample size of the smallest group needs to exceed the number of predictor variables. In this paper, this assumption is largely accepted since the sample size of the smallest group is very large.

The assumption normal distribution refers to the fact that the data (for the variables) represent a sample from a multivariate normal distribution. To test this hypothesis we recur to Doornik-Hansen multivariate normality test, which does not reject the hypothesis of normality at 1% of significance.

Third hypothesis is related to homogeneity of variances/covariances. To test this assumption we recur to the test of equality of covariance matrices across the four groups. The test gives positive results and this assumption could not be rejected.

The assumption of outliers refers to the fact that discriminant analysis is highly sensitive to the inclusion of outliers. We check the presence of outliers through a test for univariate and multivariate outliers for each group, and we eliminate them.

The last assumption is relative to the fact that if one of the independent variables is very highly correlated with another, then the matrix will not have a unique discriminant solution. We check this hypothesis through a test of multicollinearity where VIFs (variance inflator factors) are found for each independent variable. All VIFs have values below 10 and thus we can conclude that multicollinearity is not present in the model.
The results of canonical discriminant analysis can now be observed and analyzed:

Table 4. Canonical linear discriminant analysis

Canon.

Eigen-

Variance

Like-lihood
Ratio

 

Fcn

Corr.

value

Prop.

Cumul.

F

df1

df2

Prob>F

1

0.3360

.127238

0.5518

0.5518

0.8034

1330.1

24

4.1e+05

0.0000

2

0.2932

.094036

0.4078

0.9596

0.9056

1019.3

14

2.8e+05

0.0000

3

0.0961

.009328

0.0404

1

0.9908

218.23

6

1.4e+05

0.0000

Source: Own elaboration

As seen in the canonical-correlation table (table 4), the first linear discriminant function accounts for almost 55% of the variance, the second accounts for almost 41% and so these variables cumulate approximately 96%. This paper will consider only the first function in the continuation of the analysis.

F test is used to test the null hypothesis that the covariance matrices do not differ between groups formed by the dependent variables, because the basic assumption is that the variance-co-variance matrices are equivalent. If the test is not significant, as in the case of three variables of this work, the null hypothesis that the groups do not differ can be retained.

Canonical correlation explains total correlation between the predictors and the discriminant function. More interesting is studying the partial correlation of each variable with the function through standardized canonical discriminant function coefficients.

Table 5. Standardized canonical discriminant function coefficients

Function 1

Function 1 variables ranking

schoolingy

 .821

1

age

 .550

2

gender

-.333

3

skincol

-.227

4

urban

 .199

5

child

 .102

6

agrindser

-.077

7

migrant

-.030

8

Source: Own elaboration

Standardized canonical discriminant function coefficients, showed in table 5, indicate discriminating ability of each variable for these four groups.
For example, the discriminate function will be:

D1= ( - 0,333 x gender) + (0,550 x age) + ( - 0,030 x migrant) + (0,821 x schoolingy) + ( - 0,227 x skincol) + (0,102 x child) + ( - 0,077x agrindser) + (0,199 x urban)

Table 5 also shows the ranking of variables according to their discriminant ability. Schoolingy, age and gender are the most important variables in discriminating individuals among the four groups. We observe that skincol and urban are the successive variables in order of importance, after schoolingy, age and gender.

Table 6. Group means on canonical variables

Formalinf

Means

1

 .191

2

-.521

3

 .834

4

-.126

Source: Own elaboration

Table 6 shows means of each group according to canonical variables.

The group means on the canonical variables are shown, giving some indication of how the groups are separated. In this case, it can be observed that, in the first function, formal workers mean (1) is very distant from informal workers mean (2); the same thing occurs between formal entrepreneurs (3) and informal entrepreneurs (4).

The successive data (table A1 in Appendix) useful to analyze discriminant analysis in question is confusion matrix or resubstitution classification table. The resubstitution classification table indicates how many observations from each group are classified correctly or misclassified into the other groups. The upper value indicates the number of individuals, while the value below specifies number percent.

Considering table A1, the best classification occurs in the group 3 (formal entrepreneurs) and also group 2 (informal workers) obtains relatively good classification (50,72 %). On the other side, classification of group 1 (formal workers) is poor: in group 1 many individuals are misclassified in groups 2 and 3.

In conclusion, through canonical discriminant analysis, the four groups can be separated using independent variables chosen for this purpose.

The most important variables to divide the four groups are schoolingy, age and gender.

7. Conclusions

The study investigated the possible factors that determine the choice by an individual between formal and informal sector in Brazil considering the characteristics of individuals.
According to discriminant analysis, four group can be distinctly separated through the characteristics of individuals chosen in this article. 

Multinomial regression anticipates the results of discriminant analysis, also providing the sign of correlation between characteristics and belonging to a group. Most of variables are statistically significant and have sign according to the initial assumptions

The most important characteristics are, in order, years of schooling, age, gender and skin color. Among these characteristics, we can underline firstly the importance of years of schooling, which is also evidenced in the literature about recent decreasing of Brazilian informality [Mello, Santos, 2009]. Investing in human capital leads to low incentive for individuals to be informal in job market or in entrepreneurialism. Individuals with a higher level of education can realize the advantages of being formal: for instance, participation in partnerships, class associations and unions, access to credit, not being under the risk of being confiscated and access to social welfare. Moreover, individual with more schooling can easily research and know how to formalize own firm or how to find a formal job in the market.

The variable age confirms to be a discriminant variable and appears to be a variable that describe more the life cycle of individuals, distinguishing between workers and entrepreneurs, than discriminating between formal and informal sectors.

Gender seems to have a role in discriminating between formal and informal sectors for both workers and entrepreneurs.

The role of the variable skin color is less clear: we can suppose that it differentiates individuals both at level of worker/entrepreneur and at level of formal/informal sector. 

Finally, we have to remark the fact that this paper takes into account only individual characteristics to explain informality, considering given institutional characteristics, as level of taxation and public oversight of informality.

Appendix

Table A1. Resubstitution classification summary

True

Classified

formalinf

1

2

3

4

Total

 

 

 

 

1

22,301

17,474

20,224

12,893

72,892

 

30.59

23.97

27.75

17.69

100.00

 

 

 

 

 

 

2

5,828

17,001

4,339

6,351

33,519

 

17.39

50.72

12.94

18.95

100.00

 

 

 

 

 

 

3

1,512

757

4,621

1,248

8,138

 

18.58

9.30

56.78

15.34

100.00

 

 

 

 

 

 

4

3,618

5,696

4,956

11,566

25,836

 

14.00

22.05

19.18

44.77

100.00

 

 

 

 

 

 

Total

33,259

40,928

34,14

32,058

140,385

23.69

29.15

24.32

22.84

100.00

Source: Own elaboration

References

Almeida, R., Carneiro, P. (2009); "Enforcement of labor regulation and firm size". Journal of Comparative Economics, 37 (1), 28 – 46

Bruhn, M. (2011); "License to sell: The effect of business registration reform on entrepreneurial activity in Mexico". Review of Economics and Statistics 93 (1), 382–386

Bruhn, M. (2012); "A Tale of Two Species: Revisiting the Effect of Registration Reform on Informal Business Owners in México", Policy Research Working Paper n. 5971, World Bank

Burns, R., Burns, R. (2008); Business Research Methods and Statistics using SPSS, London: SAGE Publications

Cameron, A. C., Trivedi, P. K. (2005); Microeconometrics: Methods and Applications, Cambridge: Cambridge University Press

Dabla-Norris, E., Koeda, J. (2008); "Informality and Bank Credit: evidence from firm-level data", IMF working paper, Washington

De Mel, S., McKenzie, D., Woodruff C. (2010); "Who Are the Microenterprise Owners? Evidence from Sri Lanka" on Tokman v. de Soto.? in J. Lerner and A. Schoar (eds.) International Differences in Entrepreneurship, 63-87.

De Mel, S., McKenzie, D., Woodruff, C. (2012); "The demand for, and consequences of, formalization among informal firms in Sri Lanka". World Bank Policy Research Working Paper N.5991

De Soto, H., (1989); The Other Path: The Invisible Revolution in the Third World, Harper Row, New York

Fajnzylber, P., Maloney, W. F., Montes-Rojas, G. V. (2011); "Does formality improve micro-firm performance? Evidence from the Brazilian simples program". Journal of Development Economics 94 (2), 262 – 276

Galiani, S., Weinschelbaum, F. (2006); "Modeling Informality Formally:  Households and Firms", Centro de Estudios Distributivos, Laborales y Sociales,  Documento de Trabajo Nro. 47, Universidad Nacional de La Plata

Hsieh, C.-T., Klenow, P. J. (2009); "Misallocation and manufacturing tfp in China and India". Quarterly Journal of Economics 124 (4), 1403 – 1448.

Kaplan, D. S., Piedra, E., Seira, P. (2011); "Entry regulation and business start- ups: Evidence from Mexico".Journal of Public Economics 95 (11-12): 1501–1515

Mello, R.F., Santos, D.D. (2009); "Aceleração educacional e a queda recente da informalidade". IPEA, Boletim Mercado de Trabalho 39

Monteiro, J. C., Assuncao, J.J. (2012); "Coming out of the shadows? estimating the impact of bureaucracy simplification and tax cut on formality in Brazilian microenterprises". Journal of Development Economics 99, 105-115

Neri, M.C. (2007); "Informalidade". In: Tafner P, Giambiagi F, organizadores. Previdência no Brasil:  debates, dilemas e escolhas. Rio de Janeiro: Ipea; p. 285-319

Poulsen, J., & French A. "Discriminant Function Analysis", Retrieved from:
http://online.sfsu.edu/~efc/classes/biol710/discrim/discrim.pdf

Ramalho, H. M. B., Silveira Neto, R.M., (2010); "A importância do setor informal na migração rural-urbana: evidencias para o Brasil". Encontro Nacional ANPEC.

Saboia, A. L., Saboia, J. (2006); "Brancos, Pretos e Pardos no Mercado de Trabalho no Brasil Um Estudo sobre Desigualdades", Instituto de economia, UFRJ

Silva, P. L. do N., Pessoa, D. G. C., Lila, M. F., (2002); "Análise estatística de dados da PNAD: incorporando a estrutura do plano amostral", Ciência Saúde Coletiva, vol.7, no.4, 659-670

Ulyssea, G. L., (2006); "Informalidade no mercado de trabalho brasileiro: uma resenha da literatura", Revista de Economia Política, vol. 26, nº 4 (104), 596-618

Ulyssea, G. L. (2010); "Regulation of entry, labor market institutions and the informal sector". Journal of Development Economics 91, 87–99

Ulyssea, G. L. (2013); "Formal sector's entry costs, taxes, enforcement and
informality", Instituto de Economia, UFRJ


1PhD student in Development Economics (PPGDE) Federal University of Paraná (UFPR), Brazil romanello.michele@gmail.com
2 Professor of Department of Economics Federal University of Paraná (UFPR), Brazil flaviogonsalves@hotmail.com
3 Galiani, Weinschelbaum, 2006


Vol. 35 (Nº6) Año 2014
[Índice]

[En caso de encontrar algún error en este website favor enviar email a webmaster]