Retrieval of aerosol properties from in situ, multi-angle light scattering measurements using invertible neural networks

noise to the data, we further demonstrate that the method is robust with respect to measurement errors. In addition, realistic case studies are performed to demonstrate that the algorithm performs well even with missing measurement data.


Introduction/Motivation
Atmospheric aerosols are small particles suspended ubiquitously throughout the Earth's atmosphere.These particles have important impacts on the Earth's climate [1] and public health [2], which are governed by aerosol properties such as concentration, size, and composition.These properties can change drastically in time and space.A conceptually simple but powerful method for characterizing the variable properties of an aerosol sample is to measure the angular distribution of light that is scattered from it.Such multi-angle light scattering measurements can be performed both in situ (aerosol sample drawn from the atmosphere into an instrument known as a polar nephelometer; e.g.[3]) and remotely (using satellite, aircraft-borne, or ground-based sensors, e.g.[4]).The resulting measurements contain substantial information about the concentration, size, shape, and complex refractive index of the aerosol being probed.Inversion methods are then required to retrieve this information.
The most sophisticated inversion methods rely on iterative optimization algorithms that solve forward models describing the underlying physics of light scattering by gas molecules and small particles [5].In the simplified case when the particles are assumed to be spherically shaped, the forward models are based on Mie theory, which provides an analytical solution to Maxwell's equations for a plane monochromatic wave incident on a homogeneous sphere of arbitrary radius [6].For the more realistic case of arbitrarily shaped aerosol particles, more complex and computationally expensive superposition-based forward models are required such as the multi-sphere T-matrix method [7], or those based on the Discrete Dipole Approximation [8].Efficient and optimized computer codes that implement these types of forward models (e.g.[9,10,11]), as well as their use in iterative inversion schemes (e.g.GRASP-OPEN [12]) are now freely available.
The main limitation of iterative, physics-based retrieval algorithms is that they are too computationally expensive and slow for some important applications.In the remote sensing context, such applications include the operational processing in near real-time or reanalysis of large volumes of satellite data for downstream use in climate, air quality and weather models (e.g.[13]).In the in situ context, such applications include the near real-time processing of polar nephelometer measurements to retrieve particular aerosol properties, which is becoming increasingly important with the development of distributed sensor networks, and the corresponding drive to miniaturize instruments (e.g.[14]).
The traditional approach for solving the problem of retrieval speed is to use pre-computed look-up tables (LUTs).LUTs contain limited sets of simulated multi-angle light scattering signals that correspond to discrete sets of combinations of the parameters to be retrieved.A retrieval is performed by selecting the set of parameters whose simulated signals most closely matches a particular measurement.The LUT approach is currently the most widely used method for the operational processing of satellite-obtained aerosol remote sensing data [4].However, although LUTs are able to solve the problem of retrieval speed, they do so at the cost of retrieval accuracy, since they necessarily involve parameter selectivity and discretization.This problem is becoming even more of a limitation with the continued development of advanced polarimetric light scattering measurements (remote sensing and in situ) [15,4], which have higher information content concerning a larger number of free parameters, in particular also for combined multi-parameter retrievals using data from multiple sensors.E.g. combined surface reflectivity and aerosol property retrieval.
Machine learning algorithms present new opportunities for solving the aerosol property retrieval problem with sufficient speed and accuracy to overcome some of the limitations of previous approaches.Some very early efforts in this direction began already in the 1990s and involved the application of neural networks to different forms of light scattering data [16,17,18,19,20].Since then, much work has focused on remotely sensed light scattering data, as reviewed recently in [21].In the remote sensing context, it seems that a combination of tools (e.g.neural networks as providers of initial guesses or as forward models in iterative optimization schemes) may ultimately prove to be the most effective way of operationally processing the large amounts of data collected by satellite-borne and ground-based sensors [22,23,24,25].
In the present study, we return to the comparatively less-studied problem of the inversion of in situ light scattering phase function measurements using neural networks.Although similar in principle to remote sensing measurements, in situ measurements are unique in that the single scattering approximation is typically valid (i.e., additional light scattering between the sensitive volume and detector is negligible.), and there is no need to account for light reflections from different types of Earth surfaces (e.g., water, ice, different land types) as is necessary in satellite or airborne remote sensing.This simpler configuration makes it possible to keep the problem constrained on light scattering phase functions and associated aerosol property retrieval.
This study has a concrete and practical motivation: to train models that will be applied to measurements obtained with the new polarized, laserimaging type polar nephelometer [15] that is currently being developed within the Aerosol Physics Group at the Paul Scherrer Institute (PSI).This instrument will eventually be used in laboratory experiments to measure angular distribution of light scattering y (e.g.phase functions and polarized phase functions at multiple wavelengths) for different types of aerosols described with the properties x (e.g.size distribution parameters, complex refractive index).We present here a proof of concept showing that neural network models can be successfully applied to the problem of quickly and accurately retrieving the aerosol properties x from the light scattering measurements y that will be obtained with this instrument.For this purpose, we use synthetic data sets with realistically simulated measurement uncertainties.In follow up studies, experiments will be performed with the instrument to further evaluate the performance of the neural network based retrievals presented here, as well as to evaluate the performance of common physics-based retrieval algorithms used in aerosol remote sensing (e.g.[26]).
A key novelty of the present study relative to other recent studies (e.g.[27,28]) is the model architecture that we consider.In particular, we focus on a class of neural network models known as invertible neural networks (INNs) [29].A single INN model trained in one direction on a particular data set has the unique feature that it can be run in both the forward (aerosol properties to light scattering data, x → y) and inverse (light scattering data to aerosol properties, y → x) directions with negligible computational cost.This feature creates great flexibility with respect to the model application.
In Section 2 we present a theoretical overview of the problem and INN model architecture.Section 3 introduces the synthetic data sets we use for model training and validation, while Section 4 discusses the implementation of the model including the data preprocessing steps.In Section 5 we present the results and in Section 6 we discuss the conclusions and outlook of the study.

Problem description
In this work we retrieve aerosol properties from multi-angle multi-wavelength and polarized light scattering data.Let x ∈ X ⊆ R N , denote the aerosol properties, like spectral complex refractive index or particle size distribution parameters, where N is the total number of properties.The functions obtained by the measurement device are the phase function P 11 = P 11 (θ) (i.e.angularly-resolved scattered light intensity) and the polarized phase function − P 12 P 11 = − P 12 (θ) P 11 (θ) (i.e.angularly resolved relative degree of linear polarization of scattered light for unpolarized incident light), where θ is the angle.These are (normalized) elements of the scattering matrix used in the Stokes formalism, see e.g.[15,30] for more details.
The forward problem is now to compute the phase and polarized phase functions y ∈ {P 11 , − P 12 P 11 } from aerosol properties x ∈ X: y = F (x).Where F indicates the underlying Mie theory or other theories of aerosol light scattering.
The inverse problem is to retrieve the aerosol properties from the measurement data (y δ ) : x = F −1 (y δ ), where theˆsymbol indicates that x is an estimate of the true values x.Due to the ill-posed nature of inverse problems, i.e. either there exists no solution, or the solution is not unique or small errors in the data y δ can lead to huge errors in the retrieval of x, special solution methods, including e.g.regularization, need to be applied to solve inverse problems.
In our work, we focus on so-called invertible neural networks.A detailed description can be found in section 2.2.The main advantage of this method is, that the forward and the inverse problem are solved simultaneously, even though only one model needs to be trained.This is due to the architecture of the INN.The general concept of building the model is depicted in Figure 1.Training the INN requires data, which can be obtained either from simulations or measurements.The trained model can then be used to either predict light scattering measurement data from a given set of aerosol properties, or to retrieve a set of aerosol properties from measured or simulated light scattering data.One key advantage of having forward and inverse pass: it is possible to assess how well the retrieved simplified surrogate aerosol model represents the measured phase functions (even angularly resolved if needed).To use the invertible neural network model in the inverse direction, a best-of-n strategy is applied, so for a given measurement, n sets of aerosol properties are retrieved, and the best one, according to the forward pass is chosen.

Invertible neural networks
The used invertible neural network was first introduced by Ardizzone et al. [29], and is described here for the benefit of readability.Input and output of the network are divided randomly into two halves, x = (x 1 , x 2 ), y = (y 1 , y 2 ).The main components of the INN are so called, affine coupling blocks, having the following simplified structure where s is a scaling variable, that damps, together with arctan, the exponential function, and NN stands for an arbitrary neural network.In theory, four different neural networks could be used, but this would increase the number of hyperparameters -so in this work, all NN are dense and have the same width and depth.Hence it is clear, that x 1 , x 2 , y 1 and y 2 need to have the same dimension.This is guaranteed by padding both input and output by zero or low noise, x pad ∈ R d Px , y pad ∈ R d Py , where d Px and d Py are the corresponding dimensions, that can be zero as well.With this choice, the invertibility of the INN is also assured.The INN is then given by a concatenation of affine coupling blocks and permutation layers.The permutation layers guarantee that the splitting of input and output into two parts is not always the same.To capture the information about x that is not contained in the measurements y ∈ R 2M , a latent output variable z ∈ R d Z , with dimension d Z , is introduced.The dimension M for the measurements denotes the number of measured angles, so it varies between 2 and 178.During the inverse pass, these latent variables are sampled from a standard normal distribution.The forward pass of the INN, F , is then given by: and the inverse pass, F −1 , is: and it holds that F ( F −1 (y, z, y pad )) = (y, z, y pad ) and F −1 ( F (x, x pad )) = (x, x pad ).To train the neural network a loss function needs to be defined.In this case, it is a composition of numerous loss functions: The loss L x = p x (x) − p x (x) 2 ensures that the sampled aerosol property distributions, p x (x), match the distributions of the aerosol properties in the data set, p x (x).Likewise, the loss L z = p z (z|ŷ) − p z (z) 2 assures that the latent variable is sampled from the desired normal distribution.The losses 2 make sure that the forward and inverse predictions, resp., mimic the data.Through the introduction of Gaussian noise ε ∼ N (0, σ r ) 2M +dz+d Py in L r , robustness of the inverse prediction should be guaranteed.Lastly, the loss L p = x pad 2 + y pad 2 ensures that the amplitude of the noise fed into the model through the padding dimensions is low, so that no information is encoded there.Further details on INNs can be found in the work of Ardizzone et al., [29].
The model is trained w.r.t the forward pass, meaning that the input consists of a set of aerosol properties and that the output, or predictions, consists of the phase and polarized phase functions.Due to the structure of the INN, the inverse pass comes with no additional effort, meaning, the network does not need to be trained in addition with the (polarized) phase functions as input and the aerosol properties as output.Although, of course, the loss function contains forward and inverse prediction errors.As discussed in Section 2.1, we are faced with the challenge that the inverse direction is not unique, such that several sets of aerosol properties can lead to similar phase and polarized phase functions.The INN architecture considers this problem on the one hand by the special loss function and on the other hand by the introduction of the latent space.Therefore, for the retrieval of aerosol properties, we apply a best-of-n strategy, meaning that for a given measurement of angular resolved scattering data n aerosol properties are predicted.The variation of the measurements to get the n predictions is done via the latent space.All of these n properties are then handed back to the forward pass, and the set of aerosol properties whose associated phase functions are closest to the given data are chosen to be the predicted aerosol properties.

Synthetic data sets
The GRASP-OPEN forward model [12] is used to generate the synthetic data to train the INN models.In this work, two types of data, corresponding to two different applications of in situ multi-angle light scattering measurements are considered.These applications are chosen because they mimic the type of measurements that will be performed with the PSI polar nephelometer.
1. Simulation of aerosol measurements in a laboratory: specifically, measurements of spherical, monodisperse, pure-component aerosols.Such aerosols can be routinely generated in the laboratory using common aerosol generation techniques (e.g.nebulization of aqueous solutions of known composition) combined with size classification (e.g. by particle electrical mobility or by particle aerodynamic diameter).2. Simulation of aerosol measurements in the field.This represents the situation of taking the instrument into the field to measure multicomponent, ambient aerosols.In this case bi-lognormal modes are chosen to represent atmospheric aerosol size distributions over the diameter range from ∼ 50 nm up to ∼ 10 µm, with different refractive indices in each mode and allowance for non-spherical particles in the upper-most mode (defined as the coarse mode, see further discussion below in 3.3).

Measurement space variables y: virtual polar nephelometer instrument configurations
The simulated measurements are chosen such that they match the measurement output of the PSI polar nephelometer.The instrument is designed to measure both scattering phase functions, P 11 and polarized phase functions, − P 12 P 11 at three different wavelengths λ (450, 532, and 630 nm), over the polar scattering angle range from 0°to 179°.Both, P 11 and − P 12 P 11 , will be measured at an angular resolution of approximately 1°.
There are numerous technical challenges associated with polar nephelometry that can result in the loss of measurement information in real experi-mental setups relative to the ideal measurement configuration.We consider the following real-world artefacts in our simulations: 1. Scattered light truncation.Due to physical design limitations, nephelometers can not perform scattered light measurements at extreme forward and backward angles, beyond the so-called truncation angles (e.g.[31]).We consider forward and backward truncation angles of 0°-5°and 175°-179°.In addition, we also consider the situation where measurements can not be performed over the range from 85°to 95°, since these angles are associated with higher measurement uncertainties in the PSI polar nephelometer.2. Loss of information on polarization dependence (i.e., only P 11 is measured) 3. Loss of spectral information (i.e., measurements only performed at 1 or 2 wavelengths).
To investigate the impact of these practical artefacts on INN model performance we study 22 different cases as outlined in Table A.6.First, we simulate the ideal case of having available P 11 and − P 12 P 11 at all angles from 0°to 179°and with three wavelengths.(Note that since − P 12 P 11 is always zero at the angle 0°, this point is ignored for training, validation and testing even in this ideal case).Then, 21 additional cases are examined that only take into account angles from 5°to 85°and 95°to 175°to better represent actual measurements.These 21 cases correspond to different polarization and spectral combinations: i.e., the different combinations of either P 11 and/or − P 12 P 11 , measured with one, two or three wavelengths.
To have a baseline for the quality of the models, results are compared to the measurement device errors.These errors are not yet fully characterized for the PSI polar nephelometer.Therefore we choose values that slightly overestimate the errors reported for the instruments similiar to the PSI polar nephelometer [15].Specifically, we consider the relative error in P 11 to be 5% and the absolute error in − P 12 P 11 to be 0.1.

State space variables
x: describing spherical, monodisperse aerosols measured in the laboratory In the first data set we assume the aerosol particles are homogeneous spheres for three reasons: i) Mie theory is applicable, ii) spherical particles can be generated easily in the laboratory (e.g.polystyrene latex spheres, organic aerosols) and iii) it is a common assumption for retrieving fine mode properties from polarimetric data.
As described in Section 2.1, phase and polarized phase function depend on the aerosol size and shape distributions and the complex refractive index.The volume size distributions of monomodal aerosols V (r) (where r represents aerosol particle radius) can be represented by lognormal functions, which can be described by three parameters: the total volume concentration, V tot , the mean radius R mean and the geometric standard deviation σ g : where N = 1 for the monomodal case and N = 2 for the bimodal case (described in Section 3.3).
The complex refractive index is wavelength-dependent and contains two parts: real and imaginary, i.e., m λ = n λ + ik λ .
Typically for atmospheric aerosols, variations in refractive index values are minor, or at least smooth, over the range of visible wavelengths that we consider here.To incorporate this knowledge into our simulated data we use the following relationships, which can be considered as approximately valid for aerosols over the λ range from 450 to 630 nm.For the real part of the refractive index we assume n λ 2 = n λ 1 , for two different wavelengths λ 1 , λ 2 .For the complex part of the refractive index we assume Here AAE is an often reported parameter known as the absorption Angstrom exponent, which is defined through the empirical relationship b abs,λ 1 /b abs,λ 2 ≈ (λ 2 /λ 1 ) −AAE (with the relationship to k then given by b abs,λ ∝ C N k/λ, which is valid for fixed particle size and n, and where C N represents the particle number density [32,33]).The parameter b abs,λ is known as the aerosol light absorption coefficient, and it is easily measurable.Therefore, AAE values are reported widely in the aerosol literature.For black carbon aerosols (i.e., highly absorbing carbonaceous aerosols) with small particle size, AAE ≈ 1 over the visible range, which is a consequence of k being approximately wavelength-independent across visible wavelengths for such aerosols.For brown carbon (i.e.moderately absorbing carbonaceous aerosols) AAE > 1, which is a consequence of k decreasing with increasing λ across visible wavelengths.Given these relationships, simulating measurements for the 3-λ version of the instrument only requires one extra state space variable -AAE -than is required for the 1-λ simulations.Summarized, the state space consists of at most 4 + l variables, where l is the number of wavelengths.The variables, together with their lower and upper bounds are listed in Table 1.

State space variables x describing atmospheric aerosols measured in the field
The second data set represents the application of a polar nephelometer to measure ambient atmospheric aerosols in a field setting.This is generally a more complicated and less-controlled use case for the instrument than laboratory use.Atmospheric aerosols are typically complex mixtures of particles of different sizes (covering the diameter range from a few nm's to 10s of µm), chemical components (i.e., with different complex RIs), and shapes.Typical simplifications used to represent this complexity: • Bi-lognormal distributions are used to represent volume size distributions over a broad range, diameters from 50 nm to 10 µm.The smaller size mode peaks at diameters less than 1 µm and is commonly called the 'fine' aerosol mode.The larger size mode peaks at diameters greater than 1 µm and is commonly called the 'coarse' aerosol mode.In this case, the full size distribution is represented by 6 parameters (3 lognormal parameters for the fine mode, 3 for the coarse mode): Total volume concentration, V tot , fine mode fraction, χ, coarse and fine radii, R mean f ine , R meancoarse and geometric standard deviations, σ g f ine , σ gcoarse .
In the present work it is assumed, that the coarse and fine mode mean radii do not overlap.The fine and coarse mode volume concentrations, V f ine and V coarse are defined with respect to the total volume concentration V tot and the fine mode fraction, χ: V f ine = χV tot and This is, what we use in our case, see equation 1 with N = 2 and i = 1 denotes the coarse and i = 2 the fine mode.
It should be stressed out that the bi-lognormal representation is still a simplified representation of true atmospheric aerosol size distributions.
Greater complexity can be considered by representing size distributions with concentration values in discrete size bins (i.e., a sectional representation).Typically, 10 -22 size bins may be considered.Therefore, sectional representations involve a greater number of aerosol state parameters than modal representations, although smoothness constraints can be applied to limit this number.
• In some studies, like Espinosa et al. [34], a single, effective complex refractive index (still λ dependent) is used to describe the optical properties of the complex atmospheric aerosol as a whole.However, GRASP-OPEN has the ability to simulate the more realistic but still simplified situation where two effective refractive indices are used to describe the aerosol: one for fine mode particles and one for coarse mode particles.We apply this approach here.Also in this bi-modal case, the refractive index values at different wavelengths are strongly related and should not vary independently from each other.Thus, we will again consider the following relationships: n(λ 2 ) = n(λ 1 ) and k(λ 2 ) = k(λ 1 )(λ 2 /λ 1 ) 1−AAE .
• Particle non-sphericity is either accounted for with a single parameter (e.g. the fraction of spherical particles in the ensemble, which we denote with the symbol φ), or size-resolved spherical particle fractions (e.g. in up to 22 discrete size bins), [35,12]).We will use a single parameter φ for our data set.Notice that non-spherical particle fraction is only considered in the coarse mode.All particles in fine mode are assumed to be spherical.
To obtain parameter ranges for simplified bi-modal atmospheric aerosol models we use the comprehensive, airborne polar-nephelometer measurements of Espinosa et al. [34] performed over the USA.The combined data set can be considered as reasonably representative of the different types of aerosols encountered over continents.They used GRASP-OPEN to retrieve the aerosol parameters (bi-modal size distribution, spherical fraction, complex refractive index) corresponding to the measurements.categorized according to aerosol types (e.g.urban, biogenic, biomass burning, dust-containing aerosols) 1 .We use the minimum and maximum values obtained across the different aerosol types for the bi-modal size distribution and spherical fraction parameters.However, we use broader ranges for the complex refractive index parameters since only a single effective refractive index was retrieved by these authors, whereas we choose to simulate the more atmospherically relevant case where the fine and coarse mode aerosols each have their own effective refractive index.
Considering the above, the state space for this type of modal data set will consist of a maximum of 11 + 2j, where j is 0, if only one wavelength is used, otherwise it is 1.A list of all variables with their lower and upper bounds is given in Table 2.

Summary of simulated data sets
The data sets were created with GRASP-OPEN using the Latin hypercube sampling method based on the parameter spaces stated in Tables 1   for the simulations with aerosols in a bimodal representation.The numbers in the Data set ID denote, whether all data are at hand (1), or some data are missing (2).The numbers of measurement space variables include i = 1, 2, for either P 11 or − P12 P11 or both and j = 1 for one and j = 2 for two or three wavelengths.The numbers for the state space variables depend on the number of wavelengths.
2. Both data sets included 100 000 samples.The numbers of measurement and state space variables are summarized in Table 3 .

Implementation and Data Preprocessing
The models are implemented in Python using the TensorFlow framework and Keras together with the Ray Tune library for the hyperparameter scan.According to the INN architecture, the following hyperparameters need to be chosen: number, depth and width of affine coupling blocks, batch size, learning rate, type of activation function, number of epochs and weights and noise in the loss function.The details are described in section 4.2.All computations were done at the Merlin 6 Cluster at Paul Scherrer Institut using one core.Each cluster node provides 384 GB of memory and contains two Intel Xeon Gold 6152 Processors, with 22 cores and 2 threads per physical core.For the implementations in this work only one core was used.achieving a broad spread of the simulated data across all angles.The scikitlearn ( [36]) MinMaxScaler preprocessing function turned out to give good results for the state space variables x.The measurement space is divided into the two parts P 11 and − P 12 P 11 , where for P 11 first the logarithm is applied and then on both parts the scikit-learn StandardScaler preprocessing function.As can be seen in Figure 2 this preprocessing (second row, second and third column) spreads the data per angle.

Hyperparameters
An initial random hyperparameter scan was performed for the first model being trained on simulated data: i.e., the model corresponding to the ideal measurement configuration with three wavelengths and all angles.By taking the hyperparameter values from the best model, all but 3 hyperparameters were fixed and used for all the further trained models, also to make the models more comparable amongst each other.For all models, 3 affine coupling blocks with a depth of 2 and a width of 92 together with a batch size of 8 and a learning rate of 9 × 10 −5 were used.The activation functions were chosen to be rectified linear units for the hidden layers and linear for the last layer.The number of epochs was kept at 50.For the loss function, the artificial weight was kept small, w p = 0.0005 and the weight w y was set to 350.The noise used in the loss L r was set to 0.1.It turned out, that the remaining weights have a non-negligible influence on the performance of the models, hence a further hyperparameter scan based on grid search was performed for all models.For each weight three values, actually the values of the three best previous hyperparameter scans, where chosen: w x ∈ {138, 142, 146}, w z ∈ {291, 330, 339} and w r ∈ {258, 308, 323}.Summarized, for the 22 models of the case study, 27 different sets of hyperparameters were chosen to train the corresponding models.Among those models, the best one according to the R 2 value of the inverse direction evaluated with the validation data set was selected to be the final model for each case.

Results
In the following, the results for different cases are presented.Within all models in the hyperparameter scans the best model was chosen according to the highest coefficient of determination R 2 , scored on the validation data set for the inverse process at the last epoch of training.Notice that this model must not necessarily have the highest coefficient of determination for the forward pass, but the main target of the research is on the inverse model, namely to retrieve aerosol properties.To get an idea of the quality of the trained models, the models are further used to make predictions about the previously unseen test data set.Therefore, the relative, absolute and weighted mean absolute percentage errors (wMAPE) are valuable error metrics.Since the absolute error is not suited for comparing results across the parameters, the wMAPE is used for that.The wMAPE has the additional advantage over the relative error, that it does not explode when the actual value is zero or very close to zero.The formulae for computing these error metrics are given in the Appendix A.2.  P 11 at all three wavelengths and all angles The first case, that is considered, is the monomodal case with all available training and validation data used, i.e. three wavelengths, P 11 and − P 12 P 11 and all angles.The R 2 value for the validation data set for the inverse process at the last epoch of training is R 2 = 0.993.The R 2 value for the forward pass scored on the validation data set is R 2 = 0.9978.So, for both directions, the values are very close to the optimal and theoretical maximum value of 1.In Figure 3 the history of the mean absolute error (MAE) for the forward (left) and inverse (right) model over all epochs is depicted.The tendency of the curves is clearly decreasing, this means, that the models in forward and inverse direction are getting better the more epochs are used.In addition, the blue (training data) and red (validation) curves overlap until 50 epochs, so there is no sign of overfitting visible in this plot.
This best model is then used for testing with the so far unseen data (i.e., with the test data set consisting of 20 000 data points).For the forward model this results in a maximum relative error for log(P 11 ) of 1.5% and a maximum absolute error of 0.06 for − P 12 P 11 both at confidence interval of 95%.The mean wMAPEs are given by wMAPE(P 11 ) = 4.29% and wMAPE(−  4.20%.These results compare very favourably with respect to possible real measurement device errors, which we take as 5% relative error for P 11 and 0.1 absolute error for − P 12 P 11 as explained in Section 3.1.For the inverse model, the results of the aerosol property retrieval are summarized in Table 4.There the mean absolute error and the maximal absolute error at a confidence level of 95% as well as the wMAPE are stated.The wMAPE for all aerosol properties stays below 1.5%.The aerosol properties that are predicted best, according to the wMAPEs, are the real part of the refractive indices and the geometric standard deviation, whereas the parameters that are predicted worst, are the complex parts of the refractive indices.Also in the case of noisy test data, here we assumed 5% Gaussian noise added to the test data, the wMAPE is stated in Table 4.It stays below 2.2% for all aerosol properties, which clearly shows that the proposed method for aerosol property retrieval is robust against possible random measurement noise. The qualitative comparison of the forward prediction for different wavelengths and the inverse prediction of the particle size distribution and the according test data for six randomly chosen data samples each are depicted in Figure 4.Although the forward prediction follows the path of the test data quite good, they look a bit noisy, as can be seen in the figure (colored curves), nevertheless, this is due to the nature of the used prediction method, i.e. each data point of the simulated data set is seen as a single point and not part of a mathematical function, which would make the curve looking smoother.The colors across all images in this figure belong together and refer to the aerosol property values stated in the table in the lower right edge of the figure.In addition, the correlation plots for the whole test data set are shown for the predicted versus the test data for the (polarized) phase function at different wavelengths.In general, a good fit for all the data can be seen here, since all the grey dots are located close to the black line, which marks a perfect fit between test data and prediction.
Concerning the computation time, one forward prediction with the INN model takes approximately 0.46 ms, compared to GRASP-OPEN which took around 4 s for these simulations.For one inverse prediction, the INN model needs around 2.64 ms, compared to approximately 6 s for the GRASP-OPEN inversion.So for the monomodal case with all possible measurement data available, the proposed invertible neural network turns out to be fast and accurate.

Monomodal case with imperfect measurement configurations: results with
missing angles, wavelengths and polarimetric information In this section, the results from the case studies inspired by real measurements are presented for the monomodal aerosol data set.As stated in section 3.1 the polar nephelometer usually can not measure all angles, hence in these case studies, all neural networks are trained assuming that the angles 0°-5°, 85°-95°, 175°-179°are missing in the data set.Furthermore, it can happen that not both the phase functions and polarized phase functions can be measured at all 3 wavelengths.Hence, we compare networks trained with either P 11 or − P 12 P 11 , at either 1, 2 or 3 wavelengths to see the performance of the neural network under different measurement data conditions (see Table A .6).
In forward direction the performance of all models is similar, the R 2 values for the test data are at least 0.997 as can be seen in Figure A.9 in the Appendix A.4.This is also visible in the box plots of the errors of the forward models shown in Figure 5. On the x-axis the different wavelength combinations are depicted and on the y-axis the relative errors for P 11 and the absolute errors for − P 12 P 11 .The colors denote whether the INN was trained to predict only truncated P 11 (blue), only truncated − P 12 P 11 (red), both truncated phase functions (green) or both phase functions without angle truncation (brown).The errors are lower, when only single type functions (red and blue) are used, compared to the models that use all the available functions.This is most likely because, for the case with two functions available, the amount of data points is doubled.Hence, not only more failures can happen, but probably also more training data or an adaption of the INN layers would be necessary to get the same performance for all cases.
The solid brown lines indicate the median errors of the full data case and one sees that all the models behave similarly.To place the retrieval errors in context the figure also displays target error limits (solid black lines), which are defined as the corresponding measurement errors expected in P 11 and − P 12 P 11 (i.e., 5% and 0.1, respectively).It is clear that for all wavelength combinations the INN forward model retrieval errors are within the measurement error limits, which indicates that the error associated with the INN models are minor.
In Figure 6 the boxplots of the mean absolute errors for the aerosol properties, so the inverse prediction are depicted for comparison among the different cases.Each subplot displays the mean absolute errors for another aerosol quantity.It can be seen that the volume concentration can't be predicted at all if only the normalized polarized phase functions (red) are available.This is expected since absolute light intensity information (which is strongly proportional to the aerosol volume concentration) is lost in the ratio of − P 12 P 11 .Also the retrieval of the radius R mean , the real part of the refractive index n and the geometric standard deviation σ g is worse, if only − P 12 P 11 is at hand.Whereas for the complex part of the refractive index it seems, that both functions are equally suited for prediction.In general, for all aerosol properties the best results are gained if both functions are accessible.Concerning the wavelengths, the absolute error is the biggest, if only one wavelength is used.Whereas the retrieval works best if all three wavelengths can be utilized.For the three wavelengths case with P 11 and − P 12 P 11 , the network performs just slightly better if all angles can be measured.In general, one sees that increasing the measurement space is an advantage in the inverse model, but not in the forward model.Summarized, it can be emphasized, that except the case with only − P 12 P 11 , all other cases give similar results as the full data case and can hence be used also in practice for the retrieval of aerosol properties for the application involving a monomodal homogeneous aerosol.
5.3.Bimodal case with ideal measurement configuration: P 11 , − P 12 P 11 at all 3 wavelengths and all angles In this section, the results for the more complex bimodal aerosol test case, see Table 2, with all available data are presented (i.e., assuming an ideal measurement configuration).The results are similar to the more simple monomodal case (see Section 5.1).The coefficient of determination of the validation data set in forward direction is R 2 = 0.9975 and for the inverse   pass R 2 = 0.985.Figure 7 shows the mean absolute error for the best inverse and forward model for training and validation data evolving along the epochs, which is in both cases clearly decreasing, meaning that the model improves over the number of epochs.This best trained model is validated with the so far unseen test data set.The maximum relative error for log(P 11 ) is 0.33% and the maximum absolute error for − P 12 P 11 is 0.03 at a confidence level of 95%.The wMAPEs are as following: wMAPE(P 11 ) = 1.34% and wMAPE(− P 12 P 11 ) = 1.89%.Also in the bimodal case, the prediction errors are lower than the assumed device measurement errors and are even better than for the monomodal case.The mean absolute errors and the maximal absolute errors at a 95% confidence level and the wMAPEs for all aerosol properties are stated in Table 5.The properties that are retrieved best according to the wMAPE are the real parts of the refractive indices followed by the geometric standard deviations and the radii for both, the coarse and fine mode.Again the complex parts of the refractive indices for all wavelengths and all modes are predicted worst.But in general, the wMAPE stays below 3.4% for all quantities.This shows that, first, the spectral polarized light scattering phase functions also contain ample information on aerosol absorptive properties and, second, that the INN is capable of retrieving these.However, in practical applications the retrieval of aerosol absorptive properties  from phase function measurements will often be hampered by the fact that a simplified representation of the state space is typically inappropriate in the presence of light absorbing black carbon particles, see e.g.[26].For completeness, the correlation plots of test and predicted test data for the forward model and the qualitative comparison between test and predicted test data for the forward model and the bimodal particle size distribution for six randomly chosen test data is depicted in Figure A.10 in the Appendix A.4.The according values of the particle properties are given in Table A.11 in the Appendix A.4. Table 5 also shows the wMAPE, when 5% Gaussian noise is added to the test data, which is closer to real measurements, than noise-free test data (wMAPE [%] noisy).The wMAPE for all aerosol properties stays below 9% and still the best predicted properties are the radii, the geometric standard deviations and the real parts of the refractive indices for both modes, with wMAPE below 1%.This shows that the architecture of the invertible neural network is robust against random measurement errors.
The computation times for one forward and inverse pass are similar to the monomodal case, 0.46 ms and 2.66 ms, respectively.Hence, as in the monomodal case, also in the bimodal case, the invertible neural network method proofs to be an accurate, robust and fast retrieval method for aerosols from measurement data.
5.4.Bimodal case with imperfect measurement configurations: results with missing angles, wavelengths and polarimetric information For the bimodal case, again the same case study is performed as in the monomodal case, meaning that networks were trained, with different numbers and values of wavelengths, different presence of the (polarized) phase functions and missing angles (0°-5°, 85°-95°, 175°-179°).For the forward pass the comparison of the R 2 values shows as expected, that the prediction is similarly good for all models, see Figure A.12 in the Appendix A.4.The boxplots of the relative and mean absolute errors for the forward model are depicted in Figure A.13 as well in the Appendix.The errors show a similar behaviour over the different cases as for the monomodal data set.Again the results for all models are much better compared to the assumed measurement device errors (solid black lines).
For the inverse pass, as before, it can be clearly seen at the boxplots in Figure 8 of the absolute errors, that solely the polarized phase function is not suited for the prediction of the volume concentration, independently of the wavelength, for the reason already explained for the monomodal examples.In general the absolute error is smallest for all aerosol properties, if all 3 wavelengths, P 11 and − P 12 P 11 are available.The missing angles hardly influence the retrieval.Even if not all the wavelengths can be measured, it is important to have the polarized phase functions and the phase functions available, because the mixture of these two decreases the retrieval uncertainty for all aerosol properties.For the prediction of the radius, R mean f ine , and the real part of the refractive index, n f ine , P 11 seems to be better suited, whereas − P 12 P 11 solely, gives better results for the spherical fraction and all the complex parts of the refractive indices.No significant statement can be drawn, whether one or the other wavelength, or a combination of two wavelengths is better for the aerosol retrieval.Also in this bimodal case study it turned out that except solely − P 12 P 11 all other combinations of possible data are suited for the retrieval of aerosol properties from measurement data.

Conclusion, Discussion and Outlook
In summary, we introduced a novel method for aerosol property retrieval from in situ measurement data using invertible neural networks.The special structure of the neural networks allows to not only retrieve the aerosol properties, so solve the inverse problem, but also to simulate the forward model, which is to calculate measurement data from aerosol properties.By simulating laboratory and field in situ measurements with GRASP-OPEN and by using them for training and testing, we have shown the practical applicability of the proposed method.The quality achievable with the forward model is sufficient, since the current measurement device errors exceed the errors introduced by the model.This demonstrates that when it comes to application to atmospheric aerosols with complex size distribution and mixing state, the major errors will arise from the need to choose a simplified state space to represent the aerosol (i.e., from the ill-posedness of the inverse problem) rather than from limitations of INN capability.In addition, one forward simulation of the INN lasts not only a millisecond, which is much faster compared to physics based simulations that typically require seconds for one simulation.Also the performance of the inverse model, so the retrieval of the aerosol properties turned out to be satisfying, with a weighted mean absolute percentage error for all aerosol properties except the complex part of the refractive index from monomodal and bimodal case staying below 1.2% (and for the complex part of the refractive index below 3.4%), and a simulation time for one retrieval 1000 times faster than e.g. with GRASP-OPEN.This enables the near-real-time usage of the aerosol property retrieval from measurements and hence, could be a step further for processing of data from new sensors in real-time.
To reproduce real measurement conditions, we tested the method with data including 5% Gaussian measurement noise.The results showed that the method is robust against measurement errors, meaning that the errors for the retrieval in terms of the weighted mean absolute percentage error still stay below 3.6% for all aerosol properties except the complex part of the refractive index, for which it is less than 9%.In addition, a case study was performed to see how the models deal with missing data.Therefore, neural networks were trained and tested for 22 different cases of missing angles, wavelengths and/or (polarized) phase functions.Although the results for the retrieval are best if all data are available, nearly all the models with missing data achieved comparably good results, such that they are useful in practice.Additional noteworthy results include the fact that the addition of polarized phase function information offers a distinct performance benefit for retrieval of particle morphology information (i.e., the fraction of nonspherical particles in the coarse mode).
This proves, that invertible neural networks are a fast, accurate and robust method to retrieve aerosol properties from in situ, multi-angle light scattering measurements.Hence, invertible neural networks seem to be a promising alternative to commonly used pre-computed look-up tables and iterative, physics-based inversion methods.
After all, we see some options to improve the methods and datasets for future work: If a need for higher accuracy arose, the model architecture itself could be improved, by allowing different neural networks in the affine coupling blocks or by performing a more sophisticated hyperparameter scan.If physical knowledge is at hand, it is advisable to incorporate this, e.g.adding barrier functions to the loss to guarantee that physical quantities are within reasonable physical ranges or adding loss terms including Mie scattering theory, so using physics informed neural networks, [37].Additionally, in further studies we aim to relax some of the simplifying assumptions concerning the data set that we have used here.Like for example allowing multiple state parameters describing n(λ) in order to capture any spectral dependence, even if such dependence is only minor.Given the good results presented here and the fact that iterative models such as GRASP-OPEN are capable of capturing such spectral dependence, we expect that INN models will also be capable of capturing this.As discussed above, we considered here a relatively simplified bimodal representation of atmospheric aerosol size distributions where the fine and coarse aerosol modes are well-separated, which is consistent with ambient measurements processed with a state-of-the-art in classical retrieval scheme (GRASP-OPEN) [34].In future work, we intend to explore inversion performance also for the cases of overlapping size distribution modes, as well as for the case when the size distribution is represented in a sectional manner with a greater number of state parameters.

Funding source
Romana Boiger was funded by the PSI Career Return Program.Financial support was also received from MeteoSwiss through a science project in the framework of the Swiss contribution to the global atmosphere watch programme (GAW-CH). Cases

Figure 1 :
Figure1: Concept of how to build and use the INN model.To use the invertible neural network model in the inverse direction, a best-of-n strategy is applied, so for a given measurement, n sets of aerosol properties are retrieved, and the best one, according to the forward pass is chosen.

4. 1 . 2 . 11 Figure 2 :
Figure 2: The effects of preprocessing on P 11 and − P12P11 for a series of randomly chosen spherical, monodisperse aerosols probed with an incident light beam of wavelength 532 nm.The left column contains P 11 , the second column log(P 11 ) and the third column P ppf .The first row shows the unscaled versions and the second row the application of the StandardScaler.

Figure 3 :
Figure 3: Performance of the best model in forward and inverse direction in terms of mean absolute error (MAE) over the number of epochs.The blue dots denote the MAE for the training data and the red crosses for the validation data.

Figure 4 :
Figure 4: Comparison of the predicted and simulated test data.First and third column show correlation plots of the predicted versus the test data values of the (polarized)phase functions for all three wavelengths.Second and fourth column depict the simulated (black line) versus predicted (polarized) phase functions (colored dots) in a qualitative representation.Six different data samples were chosen.In the last line, the qualitative representation of the simulated versus the predicted total volume concentration over the particle radii for the according six samples is shown.The table in the lower right corner states the simulated aerosol property values of these six samples.

Figure 5 :
Figure 5: Forward model performance: Box plots of the relative error for the phase function and the absolute errors for the polarized phase functions for all 22 models for the monomodal case.Each model is depicted by a box extending from the first quartile to the third quartile and a horizontal line running through the box at the median.The whiskers show the range of the absolute errors over the 20 000 data points in the test set.The brown solid lines indicate the median errors of the full data case, whereas the solid black lines show the assumed measurement device errors.

Figure 6 :
Figure 6: Inverse model performance: Boxplots of the absolute errors of each aerosol property.Each box describes the lower and upper quartile values and the median(line).The range of the absolute errors over all data points in the test set is shown with the whiskers.

Figure 7 :
Figure 7: Performance of the best model in forward and inverse direction in terms of mean absolute error (MAE) over the number of epochs for the bimodal case.The blue dots denote the MAE for the training data and the red crosses for the validation data.

Figure 8 :
Figure 8: Boxplots of the absolute errors of each aerosol property for the bimodal case.Each box describes the lower and upper quartile values and the median(line).The range of the absolute errors over all data points in the test set is shown with the whiskers.

Figure A. 9 :
Figure A.9: The R 2 values for the forward prediction for the different cases for the monomodal data set.

Figure A. 10 :
Figure A.10: Comparison of the predicted and simulated test data for the bimodal case.First and third column shows correlation plots of the predicted versus the test data values of the (polarized) phase functions for all the three wavelengths.Second and fourth column depict the simulated (black line) versus predicted (colored dots) (polarized) phase functions in a qualitative representation.Six different data samples were chosen.

Figure A. 11 :
Figure A.11: Aerosol property values corresponding to the selected samples in Figure A.10

Figure A. 12 :
Figure A.12: The R 2 values for the forward prediction for the different cases for the bimodal data set.

Figure A. 13 :
Figure A.13: Box plots of the relative error of the phase functions and the absolute errors for the polarized phase functions for all 22 models for the bimodal case.Each boxplot includes a box that goes from the lower to the upper quartile values and a line at the median value of the absolute errors.The whiskers show the range of the absolute errors over the 20 000 data points in the test set.The brown solid line shows the median error of the full data case.The black solid line indicates the assumed measurement device errors.

Table 1 :
Parameter space covered by the simulations of monomodal, spherical, laboratorygenerated aerosols, as defined by the maximum and minimum values for each of the included state space variables.
1: As stated in the description k 450 , k 532 , k 630 and AAE are dependent of each other.

Table 2 :
The results were Parameter space covered by the simulations of atmospheric aerosols in a bimodal representation, as defined by the maximum and minimum values for each of the included state space variables and

Table 3 :
Summary and description of simulated data sets used in this study."Lab" stands for the case of monomodal, spherical, laboratory-generated aerosols, and "Atmos-bimodal"

Table 4 :
Aerosol property retrieval error results in terms of the Mean Absolute Error, the maximal Absolute Error at a confidence level of 95%, the wMAPE and the wMAPE for the case of noisy test data (assuming 5% Gaussian noise), for the monomodal data set with ideal measurement configuration (P 11 and − P12 P11 at three wavelengths and all angles).

Table 5 :
Results in terms of the Mean Absolute Error, the maximal Absolute Error at a confidence level of 95%, the wMAPE and the wMAPE for the case of noisy test data, assuming 5% Gaussian noise, for the aerosol property retrieval for the bimodal data set with three wavelengths, all angles and P 11 and − P12 P11 .