Productive efficiency analysis of quantitative economics journals through Stochastic Frontier Analysis using panel data

The main goal of a scientific journal is to diffuse new knowledge. The number of citations received by a journal can be considered as a measure of this objective and, in turn, as a measure of productivity in relation to the production process in which the journals are involved. In order to assess this production process, in this paper econometric models using data panel are employed to obtain measures of efficiency for those journals belonging simultaneously to the areas of “economics” and “social science, mathematical methods” in the Web of Science database. This efficiency is measured in terms of the distance between the actual production of the journals and their estimated maximum achievable number of citations based on their available resources.

Análisis de eficiencia productiva de revistas de economía cuantitativa a través del análisis de frontera estocástica utilizando datos de panel RESUMEN El principal objetivo de una revista científica es difundir nuevo conocimiento. El número de citas recibidas por una revista se puede considerar como una medida de este objetivo y, a su vez, como una medida de productividad en relación al proceso de producción en el que las revistas están involucradas. Para evaluar este proceso de producción, en este artículo se emplean modelos econométricos que usan datos de panel para obtener medidas de eficiencia para aquellas revistas que pertenecen simultáneamente a las áreas "economics" y "social science, mathematical methods" en la base de datos Web of Science. Esta eficiencia se mide en términos de la distancia entre la producción real de las revistas y su número de citas máximo alcanzable estimado en base a sus recursos disponibles.

Introduction.
In general terms, any production process consists of fabricating a series of products (outputs) using certain available resources (inputs). Comparing the productivity of the different entities involved in the production process, which is the production obtained in relation to the resources used, constitutes the main goal in this framework. This comparison, or benchmarking, is usually established in terms of efficiency in the production process. The efficiency of an entity is set as the difference between its actual production and its frontier or maximum attainable production with its available resources. To this end, several mathematical and econometric models have been developed in the field of Economics.
In the setting of the process of publishing papers by the different scientific journals, it is clear that the objective is to diffuse new knowledge, whose extent of diffusion could be measured through the number of readers of such articles. Scientists reading the papers published in a journal, use the content of these papers to develop their own articles, and provide citations to the papers they have read. As a consequence, the number of citations received by the journal can be considered as a proxy of the diffusion of the knowledge (output of the production process). In order to apply the aforementioned models, a set of homogeneous journals (each of which reaching approximately the same number of potential readers) has to be selected, and several inputs have been chosen. Great care must be taken in the selection of these inputs since they determine the production frontier and, in consequence, the efficiency of each journal. Candidate inputs in the framework of this production process could include those related to the number of people involved in the process, the available budget, and the number of articles published.
Once the inputs and outputs have been selected, there are two alternative approaches to assess the efficiency of the selected group of journals: data envelopment analysis (DEA) and stochastic frontier analysis (SFA). In Coelli, Rao and Battese (1998) and Bogetoft and Otto (2011), detailed explanations of the two methodologies are shown. These two approaches establish the production frontier in terms of the selected inputs and outputs, and the efficiency is then measured. The DEA is a non-parametric non-stochastic approach based on linear programming. Therefore, in this approach, no source of randomness is taken into account. This drawback is solved in the SFA approach by the inclusion of random perturbations in econometric models.
In the specific setting of scientific documentation, the DEA approach has been widely utilised. As a matter of fact, all the following papers use this approach. Abbott and Doucouliagos (2003) carry out an analysis of the efficiency of Australian universities. Similar studies include Bonaccorsi, Daraio and Simar (2006), where Italian universities are analysed, and Bonaccorsi and Daraio (2003), who consider institutes of the French INSERM and biomedical research institutes of the Italian CNR. Ruiz et al. (2010) examine the efficiency in the scientific production of a sample of Colombian research groups. Agasisti, Catalano, Landoni and Varganti (2012) analyse the production of 69 academic departments located in Italy. The DEA approach has also been applied to studies of efficiency in scientific production in a number of countries and regions. In Rousseau and Rousseau (1998), this approach is applied to a sample of 18 countries of the world, while in Guan and Chen (2010), 30 Chinese provinces are considered. In relation to the analysis of the efficiency applied to a group of journals, Lozano and Salmeron (2005) show the results of a DEA analysis applied to a group of journals of Operations Research/Management Sciences. Petridis et al. (2013) provide an evaluation of 54 forestry journals.
In this framework of scientific documentation, the SFA approach is used in Ortega and Gavilan (2013) to benchmark a group of quantitative economics journals. In that paper, the efficiency of the considered journals is estimated for the year 2011 using a cross-section database, which provides a snapshot of the situation of that year. However, this situation could be a fluke related to that particular year. Panel data provides evidence of a more reliable nature on the performance of the journals, and also enables this performance to be tracked over time. The SFA approach using panel data has been extensively applied to many fields, as, for example, in economics (Zhou, Li, & Li 2011;Kumbhakar & Zhang, 2013), healthcare (Greene, 2004;O'Donnell & Nguyen, 2013), universities (Abbott & Doucouliagos, 2009;Sav, 2011), sports (Barros, Garcia-del-Barrio, & Leach, 2009;Park & Lee, 2012), fishing (Melfou, Theocharopoulos, & Papanagiotou, 2009), banking (Brissimis, Delis, & Tsionas, 2010;Parinduri & Riyanto, 2014), ports (Wang, Knox, & Lee, 2013), airlines (Kumbhakar, 1991;Merkert, Odeck, Brathen & Pagliari, 2012), agriculture and biological sciences (Maietta, 2002;Kumbhakar, Lien, & Hardaker, 2014), and energy (Filippini & Hunt, 2011;Stern, 2012). However, to the best of our knowledge, no application has yet been made in the field of the scientific documentation.
The main objective of this paper is to analyse the productive efficiency of a significant group of scientific journals that belong to the field of quantitative methods applied to Economics. To this end, a SFA model using panel data is utilised. The application of this methodology to the field of scientific documentation constitutes the mail innovation of this paper; this approach is applied to generalize the cross-section model considered in Ortega and Gavilan (2013).
The remainder of this paper is arranged as follows. In Section 2, the set of journals analysed and the variables included in the model are selected. In Section 3, the econometric model employed to fit the panel data is presented. Section 4 is devoted to stating the main results provided by the model, and in Section 5 the conclusions reached in this paper are briefly shown.

Selection of journals and variables.
As mentioned in the introduction, this paper generalizes the cross-sectional analysis carried out in Ortega and Gavilan (2013) to the case of panel data in order to attain better estimations and also to be able to follow the evolution over time of the efficiencies obtained. Therefore, for comparison purposes, the same set of journals and variables are to be considered.
With regard to the group of journals, a homogeneous group of which each has potentially the same quantity of readers is considered. Specifically, the 21 journals belonging simultaneously to the areas of "economics" and "social sciences, mathematical methods" of the JCR social sciences edition database for the analysed time span have been selected. Therefore, a group of journals relating to the field of quantitative economics has been chosen.
With respect to the output, a variable that measures the production (diffusion of the new knowledge) of the journals has to be selected. As argued in the introduction, the number of citations received can be considered as a proxy of that production (variable NC). It should be pointed out that, despite being the criteria most commonly used, this selection presents certain issues related to the fact that the number of citations received by a paper could be the consequence of reasons other than that of the value of its content (Callon, Courtial, & Penan, 1995;Ortega 2003).
In relation to the selection of the inputs, variables related to the quantity and quality of the articles published can be selected. Therefore, the total number of articles published by the journals can be considered. However, the citations received by a paper are retrospectively obtained since the citations refer to papers published before the citation is made (Basulto & Ortega, 2005;Gupta, 1997). This is the reason why the average number of articles published by the journal in the recent past (three previous years) has been selected (variable AP). Additionally, a variable related to the visibility of the journal is considered since the greater the number of people there are who read a paper, the more citations that paper will receive when these readers publish their own research. Moreover, it is clear that there is great interest in publishing in the journals with the highest levels of visibility. The impact factor of a journal can be taken as a measure of its visibility (Callon, Courtial, & Penan, 1995;Basulto & Ortega, 2005), since a better impact factor implies that the journal has been read by a greater quantity of researchers. In this sense, the act of selecting journals that cover the same subject area (quantitative economics) supports the choice of the impact factor as a visibility measure due to the similarity in the number of potential readers of these journals. As a consequence, although this selection is not standard practice, the average impact factor reached in the recent past (three previous years) by the journals is chosen (variable IF). It could be considered that the impact factor is an output. However, it is precisely the fact of using data from the past that makes it an input for the future, since what a journal has achieved to date with respect to visibility is determinant of what it will be able to attain in the future.
The panel of data containing the values of the three aforementioned variables, NC, AP and IF, for each of the selected 21 journals and for the years 2010, 2011 and 2012, have been obtained from the JCR Social Science edition database of the Web of Science and are shown in Table 1. The journal QME-QUANT MARK ECON did not belong to the JRC database in the year 2007, and therefore its averages AP and IF for the year 2010 have been calculated using the information of only two (instead of three) previous years.
Other related inputs, for example those linked to the labour force or capital factor, could be considered. However, the variable representing the total number members of the editorial boards (as a variable representing the human factor) was found not to be of significance in Ortega and Gavilan (2013) and therefore has not been considered in the proposed model. With regard to the capital factor, it would be desirable to include reference to this in the model as, for example, the budget available for each journal. However, to date, this information has not been made available. Source: Own elaboration.

Model.
The origins of the SFA econometric models are set in the (almost simultaneously published) papers of Aigner, Lovell, and Schmidt (1977) and Meeusen and van den Broeck (1977), who share the composed error structure for inefficiencies and statistical noise. The econometric model selected in this paper is a Cobb-Douglas production model where all the inputs and the output are taken in logarithmic terms.
Since one of the main objectives is to analyse the possible stability of the efficiencies over time and the changes of position of the journals in the yearly ranking, in this paper the efficiencies are left completely free and unrestricted in time instead of considering a panel data model based on the assumption of time-invariant efficiency as in the early models, or imposing any functional restriction to the time evolution of the efficiencies as in Kumbhakar (1990) or Battese and Coelli (1992). Moreover, a variable t is added to take into account the possible technological changes over time. This variable has been centred by taking the values -1 (for 2010), 0 (for 2011) and 1 (for 2012). Therefore, by considering the inputs and the output selected to assess the efficiency of the selected group of journals for the period 2010-2012, the following model is proposed: This is a Coob-Douglas type model. The more general translog model has been considered; however, this approach is not suitable since it presents the wrong skewness problem.
The stochastic frontier is 0 where the stochastic part, representing the statistical noise, is described by the random variable it v which, as usual, is considered to have a normal distribution with zero mean and variance other part of the composite error it ε is the random variable it u , which represents the inefficiency of the journals (distance between the production frontier or maximum quantity of citations attainable given the levels of the inputs and the citations actually received). This variable, as a distance, has to be selected positive and, as it is commonly set, it is taken following a half-normal distribution, that is u are assumed to be independent. Other possibilities considered for modelling the inefficiency are the exponential (especially in the Bayesian methodology) and gamma distributions. In this paper, the halfnormal model has been selected because it is the most commonly used. Furthermore, the main results remain unchanged when considering these other options.
It is important to point out that the selected model is neutral at the stochastic frontier. The possibility of utilising a non-neutral model has been discarded since it has problems of convergence, and of validity and reliability of the statistics tests.
Once the model is estimated, the quantity it u provides the levels of the inefficiency of the journals in the year t (Jondrow, Lovell, Materov, & Schmidt, 1982), that is to say, it is considered that the inefficiencies are time varying. Furthermore, the likelihood ratio test has been carried out to test whether the inefficiencies can be accepted to be constant over time, which is clearly rejected since a virtually zero p-value is obtained. When the variables are expressed in logarithmic terms, as it is the case, it is more common to consider the efficiency measures The model considered is estimated through the maximum likelihood method by using the version 1.1-0 of the package frontier in the software R (Coelli & Henningsen, 2013). In Coelli (1996), a description of the method for various kinds of models is presented. In this method, a reparameterisation of the model considered is set. Specifically, the parameters 2 2 2 v u ε σ σ σ = + (total variance of the composite error) and ( ) 2 2 2 2 u v u γ σ σ σ = + are utilised. This last parameter is related to the proportion of the total variance of the composite error, which is due to inefficiency. This parameter is crucial in the determination of the levels of inefficiency of the journals, and clearly ranges from 0 (where the whole composite error stems from random effects and there exist no differences in efficiency in the journals) to 1 (where the whole composite error is due to inefficiency and no statistical noise exists). Therefore, its statistical significance determines the presence of different levels of efficiency in the journals considered.

Results.
As prior step to the estimation of the considered model, it has been checked that the wrong skewness problem is not present, since the skewness coefficient of the residuals is negative (specifically, its value is -0.2190).
The results of the maximum likelihood estimation of the parameters of the model, along with their standard errors, z-values and p-values are shown in Table 2, where the p-values have been marked according to their significance at 5% (*), 1% (**) or 0.1 (***). The first significant conclusion that can be drawn from these results is that the inputs chosen in the econometric model (IF and NA) are highly significant in the establishment of the frontier (maximum attainable number of citations) and, as expected, both are positive, that is, an increase in the available resources of a journal leads to an increase in its level of citations. However, the variable introduced in order to account for technological change over time remains insignificant at the usual levels. Therefore, there is no evidence of that change.
With regard to the major parameter γ , it is significant at the usual 5% level. The estimated value 0.7061 establishes that 46.61% of the composite error is due to the variations in the inefficiencies of the analysed journals and the rest is statistical noise (Coelli, 1995;Ortega & Gavilan, 2014).
Once the model has been estimated, the specific efficiencies for each considered journal and year, whose determination forms the main goal of this type of model, are obtained and presented in Table 3 together with the average efficiency of the three years studied and the corresponding ranks to these averages. In order to graphically illustrate the situation, the obtained efficiencies are also shown in Figure 1. The efficiencies estimated for the journals show great stability over time, which is the reason because their averages of the three years and the corresponding ranks have been added in Table 3, thereby representing a summary of the three years. As a consequence of this stability, the three average efficiencies for the years considered are also very stable ranging between 0.66 and 0.71.
The stability of the efficiencies and their corresponding ranks can be quantified through Pearson's coefficients of correlation and Spearman's rank correlation, which are shown in Table 4. All Pearson´s correlations are above 0.79 and all Spearman's rank correlations are above 0.62. As expected, the correlations between two consecutive years are greater than the correlations between 2010 and 2012.   Pearson's correlations are above the diagonal and Spearman's rank correlations are below the diagonal. Correlations involving all 21 journals considered are shown in the left-hand-side table and the correlations excluding the journal JAHRB NATL STAT are presented in the right-hand-side table.
The journals ECONOMETRICA, INT J GAME THEORY, and J PROD ANAL are always ranked in the first positions. ECONOMETRICA always presents by far the greatest production (number of citations) of the group of analysed journals. However, the number of citations of INT J GAME THEORY and J PROD ANAL are not especially outstanding, although they are reached with moderate resources (visibility, IF, and number of articles published, AP), both in the recent past.
On the other hand, the journals QUANT FINANC, INSUR MATH ECON, and QME-QUANT MARK ECON always occupy the last positions. In the case of INSUR MATH ECON, although it presents a high number of citations in relation to the group of journals analysed and a visibility (IF) in the recent past near to the average of the group, it publishes a very high number of articles in relation to the group (about twice the average of the group). The journal QUANT FINANC has much fewer citations than INSUR MATH ECON and, although its IF is below the average of the group, it publishes a number of articles far above the average of the group. As for the journal QME-QUANT MARK, it presents one of the lowest number of citations (the second minimum) and, although it has the lowest number of articles published it presents an IF near to the average of the group. The greatest exception to the aforementioned stability in the efficiencies reached for the analysed journals over time is the journal JAHRB NATL STAT, which presents the highest efficiency in 2010 and falls to the positions 13 in 2011 and 17 in 2012. This journal always has the lowest number of citations and IF. However, in 2010 it publishes the lowest number of articles of the group, while reaching the first position in the rank of that year. Nevertheless, as time goes on, this journal maintains the level of citations in 2011 and increases it by 29% in 2012 but doubles its IF and publishes 61% more articles than in 2010, which is why it does not maintain the level of efficiency reached in 2010 and loses numerous positions in the rankings of 2011 and 2012. These are precisely the kind of conclusions that can be drawn using a panel data approach, and cannot be observed through a cross-sectional analysis for a particular year. Specifically, if a cross-sectional analysis for 2010 is carried out, a misleading conclusion for the journal JAHRB NATL STAT could be reached, since its high efficiency is a fluke related to that particular year, which is not maintained over time. If this journal is removed from the sample, then the stability of the efficiencies increases significantly since all Pearson´s correlations remain above 0.84, and all Spearman's rank correlations are above 0.73, and the correlations involving the year 2010 are now greater (Table 4).
In order to have a general overview of the efficiencies reached for the journals as a group, the densities fitted to all 21 journals for each of the three years considered is shown in Figure 2. These densities have been fitted while bearing in mind that the efficiencies range from 0 (minimum level efficiency) to 1 (maximum level of efficiency) (Silverman, 1986). In 2010, the average efficiency of the group is 0.695 and the mode 0.751 with 62% of the journals (13 out of 21) having a level of efficiency above 0.70. With regard to 2011, the average efficiency is 0.660 and the mode 0.689 with 52% (11 out of 21) of the journals whose efficiency is above 0.70, thereby indicating a slight decrease in relation to 2010. As for 2012, the average efficiency is 0.710 and the mode 0.724 with 62% of the journals (13 out of 21) having a level of efficiency above 0.70, which constitute very similar figures to those of 2010. Therefore, a great stability in time is observed and the group of journals operates at a high level of efficiency during the period of time analysed, whereby its global efficiency for the three years considered stands at 0.688. Source: Own elaboration. On the other hand, Table 2 shows that the two inputs considered (IF and AP) in the process of producing citations (diffusion of knowledge, NC) are highly significant, that is to say, the journals having, in the recent past, a greater number of articles published and higher levels of visibility measured through the impact factor are, potentially, able to reach a greater number of citations (production frontier). This is reinforced by the fact that the coefficient of linear correlation between each input (IF and AP) and the estimated frontier is high and significantly different from 0 (Table 5). However, unlike the production frontier, the level of efficiency is not determined by the two inputs considered, as the low coefficients of linear correlation and non-significantly different from zero in Table 5 show. There is a minor direct relationship between the levels of efficiency of the journals and their actual production in the form of NC, the output considered, since their linear correlation takes the value 0.3551, with a pvalue for the significance test equal to 0.0043.

Conclusions.
In this paper, the process of diffusing new knowledge by means of scientific journals through the publication of papers is set as an input-output process in order to assess the efficiency in that process of a homogeneous set of journals. These 21 journals belong simultaneously to the areas of "economics" and "social sciences, mathematical methods" for the years 2010, 2011 and 2012. The output and the inputs of that process have been carefully selected. The output, a proxy of the diffusion of new knowledge, consists of the number of citations received by the journal in a specific year (NC). Two inputs have been included in the model: one to take into account the number of articles published by the journal in the recent past (NA), and the other measures the visibility reached by the journal through the average of its impact factor in the recent past (IF). Other candidate inputs include the total number of members of the editorial boards of the journals as a representation of the labour force, which has not been included in this model since it yielded insignificant results (Ortega & Gavilan, 2013), and an input related to the capital factor as the budget available for the journal, which has also been left out of the model due to the unavailability of that information.
Once the inputs and outputs have been selected, the SFA approach is carried out to benchmark the selected group of journals. To this end, a Cobb-Douglas econometric model is applied to a panel of data, and a variable is also added to take into consideration the possible technological changes over time. This panel data approach, in which the inefficiencies are left completely free and unrestricted over time, generalizes the cross-sectional approach established in Ortega and Gavilan (2013), thereby providing better estimations and enabling the efficiency of the journals to be tracked over time.
The maximum likelihood estimation of the proposed model shows that the two selected inputs are highly significant in the determination of the production frontier (the maximum attainable number of citations by the journals). However, there is no evidence of technological change over time ( Table 2).
The efficiencies obtained for the journals and for each of the years analysed, show great stability across time, and present high correlations between the years considered (Pearson's correlations above 0.79 and Spearman's rank correlations above 0.62). The journals ECONOMETRICA, INT J GAME THEORY and J PROD ANAL are always ranked in the first positions. On the other hand, the journals QUANT FINANC, INSUR MATH ECON and QME-QUANT MARK ECON are always in the last positions. The greatest exception to the aforementioned stability in the efficiencies over time is the journal JAHRB NATL STAT, which presents the highest efficiency in 2010 and falls to position 13 in 2011, and 17 in 2012. It should be borne in mind that this conclusion can be drawn using a panel data approach but cannot be observed through a cross-sectional analysis for a particular year.
The densities, fitted to all 21 journals for each of the years considered ( Figure 2) in order to attain a general overview of the journals as a group, again show great stability over time and also show that the group of journals operates at a high level of efficiency during the period of time analysed: the global efficiency for the three years under consideration is of 0.688.
As aforementioned, the two inputs (IF and AP) considered in the process of diffusing new knowledge are highly significant. However, unlike the production frontier, the level of efficiency is not determined by the two inputs considered, as the low coefficients of linear correlation and non-significantly different from zero show (Table 5). A slight direct relation between the levels of efficiency of the journals and their number of citations is observed.