Insights into metastability of photovoltaic materials at the mesoscale through massive I – V analytics

The authors demonstrate the feasibility of quantifying cell-level performance heterogeneity from module-level I–V curves by determining conditions of bypass diode turn-on. Analysis of these curves falls outside of typical diode-based models of photovoltaic (PV) performance. The authors show that this approach can leverage statistical and machine learning techniques for broad application to massive datasets, and combine those insights with simulations and laboratory-based experiments to provide useful information into the metastability of the interfaces of a PV cell. The authors find good agreement between the experimentally determined curves and the simulated curves, which guide the variable selection in the massive dataset collected from sites in Cleveland, OH, USA, the Negev Desert, Israel, Isla Gran Canaria, Spain, and Mount Zugspitze, Germany.


I. INTRODUCTION AND BACKGROUND
2][3] We focus here on the temporal evolution of electrical properties and behavior of photovoltaic energy materials, particularly crystalline silicon solar cells, which include dynamics on timescales from minority carrier lifetimes in milliseconds to degradation-related device lifetimes in gigaseconds (30 years). 4Connection across this broad expanse of time-scales is essential for a physics-based understanding of slow and rare events associated with degradation and performance over lifetime.We provide a review of literature relating to the topics of modeling the currentvoltage characteristics (I-V curves) of solar cells and modules and the challenges of connecting those performance models to both large scale deployment of photovoltaics and actionable insights into the microscopic performance mechanisms.
With the rapid deployment of large scale global photovoltaics, there exist massive time-series datastreams on performance that can be aggregated and mined for signatures of climate-dependent degradation.In this sense, a statistically or data-driven approach to modeling the performance can detect subtle population differences and indicate the impact of real-world dynamics on the system metastability and performance.The laboratory and its analytical tools remains the arena where macroscopic performance can be linked to mesoscopic behavior; yet correlating any "engineered damage" to the solar cells to real-world performance remains a high barrier.This barrier is greatly reduced through acquiring and mining complete current-voltage (I-V) and maximum power point (P mp ) characteristics as time-series datastreams on cells in these two arenas and by acquiring massive data on the real-world environment using distributed sensors and newer forms of satellite-based imagery for weather and insolation data.Degradation-induced failures have been an ongoing characteristic of new and promising PV cells 5 and other energy materials even as producers continue to offer 25 year warranties.
We will first discuss the dominant physics-based diode models that have been used to extract useful parametric information on photovoltaic cell performance.Next we discuss the realistic electrically active interfaces in typical crystalline silicon cells, and how these encompass a more complex device architecture than a simple diode model approach.Various methods are then presented for extracting useful information from I À V; P mp time-series, comparing and contrasting the physics-based approaches with different data-driven statistical and machine learning approaches.This I À V; P mp time-series analysis approach is then demonstrated by developing lab-based hypotheses and experiments that are compared quantitatively with observed real-world data, and generating a data-driven hypothesis-testing process model, so as to illustrate some of the opportunities of this approach.

A. Physics-based diode models for photovoltaic cells
To model the I-V curve of an arbitrary solar cell a macroscopic physics-based model consisting of a current source, single pn diode, and two resistors is typically utilized. 6ften, researchers may also add a parallel second diode, or "recombination diode" that describes an internal carrier recombination process that is nonlinear. 7,8In this "twodiode" model solving parallel Shockley equations becomes intractable and many assumptions are required to simplify and specify the system. 9For simplicity, many researchers study performance within the paradigm of the single diode model.This model has had good success reproducing the shape of the curve for single cells, but its coarse grain structure yields few insights into mesoscopic behavior.To illustrate this issue, we describe a typical approach to obtain the series resistance of a solar cell, which typically represents the behavior of the front and back cell contacts.
Applying Kirchoff's laws and the Shockley equation yields the relationship between current (I) and voltage (V) for a solar cell in this single-diode model where I L is the light induced current from photon absorption, I 0 is the reverse saturation current, N s is the number of cells in the module, n is the diode ideality factor, R s is the cell series resistance, R sh is the cell shunt resistance, and V th is the thermal voltage equal to q/kT, where q is the elementary charge of an electron and k is Boltzmann's constant.
Because of the transcendental nature of Eq. (1), it is not possible to solve in terms of I(V) explicitly, except by means of the Lambert-W function. 10Typically, these data may be used simply to estimate I sc ; V oc , maximum power point (P mp ), fill factor (FF).2][13][14][15][16][17][18][19][20] All of these techniques are based upon analyzing the transcendental equation in certain regimes, i.e., near short circuit, open circuit or maximum power, and drawing upon several assumptions that typically hold for efficient solar cells.We consider only methods that invoke the diode equation of Eq. (1) and do not simplify it by, for example, ignoring the shunt resistance contribution, 20 and provide an example analysis. 15We intend to determine the series resistance, shunt resistance, and ideality factor variation in time.One common and straightforward manner for estimating the series and shunt resistance of the PV module is by simply calculating the pointwise resistance curve for an I-V curve, given by R(V) ¼ À1/(dI/dV), where R s ¼ R(V ¼ V oc ) and R sh ¼ RðV ¼ 0Þ. 12 Analytical solutions to the transcendental equation exist based upon the Lambert-W function, 10,21 that Ghani et al. 22 used to estimate the series, shunt resistances, and diode ideality factor.This analytical solution for I(V, T) is given by where W[F(V, T)] is Lambert's W function, with F(V, T), the functional, equaling This unwieldy analytical solution is straightforward to simplify by invoking that generally R sh ) R s and I L ) I 0 ; however, there is often more utility in solving for voltage, which we will utilize later in Sec.III B: Petrone et al. 23 found the analytical solution useful to predict behavior of a nonuniformly illuminated field of PV modules, and researchers in the area of PV-related power electronics have used the analytical solutions for maximumpower-point-tracking algorithm development.However, these methods are ultimately based upon the simplistic diode equivalent circuit model that may not capture all of the behavior of complex systems.
Although problems arise when using this technique, since realistic values of R s are only found when R s is rather high, compared to R s of many real modules, and therefore, R s losses dominate the I-V curve as V approaches V oc .This method for estimating R sh is commonly used in many parameter extraction techniques 15 although Hansen 14 has questioned its validity and developed a novel iterative integral approach to solving Eq. (1).
Khan 15 found the slope of the I-V curve (m ¼ dI=dV), and find m oc and m sc , which is dI=dV at V ¼ V oc and V ¼ 0, respectively, and which lead to the following inequality: Relation ( 9) allows for simplification of Eqs. ( 5) and ( 6), and applying the substitution results in Therefore, for a collection of I-V curves, the inverted slope at open circuit has a linear dependence on 1=ðI sc À V oc =R sh Þ, where the y-intercept is equal to R s and the ideality factor n can be obtained from the slope.This relation was tested using massive real-world data, 24 and we found that the assumptions listed above held for typical PV modules.Further we tested the model using nonuniformly illuminated I-V curves, under mirror augmentation to seek signatures of degradation and no significant change in the R s was found over 6 months.However, this analysis exhibited a bimodal distribution in the estimation of R s and overall indicated that the method is not sensitive enough to track small changes in the modules over this period of time.In addition, the assumption that a 60-cell module is fully described by a single diode model and particularly that a nonuniformly illuminated module is described by a single shunt resistance is unreasonable.
In explorations beyond simplistic diode models, Sellner et al. 25,26 described a novel "loss factors model" that describes six parameters that describe the system power losses and can be more easily extracted from I-V curves.In the study of degradation in performance, a loss factors model is a potentially useful description of system performance; yet, this model is largely derived from the diode model as well. 27As such, the loss factors model cannot model and estimate or predict behavior of I-V curves with bypass diodes.

II. MESOSCALE DESCRIPTION OF SOLAR CELL INTERFACES AND CONTACTS
Here, we focus on conventional Al back surface field (BSF) solar cells which contain a series of distinct and critically important interfaces which play an essential role in the performance and degradation of PV modules over their lifetime.It is common that engineering solutions to performance issues are developed and commercialized without a complete understanding of the behavioral mechanisms at the microscale, let alone the lifetime limiting mechanisms towards gigasecond time scales.These interfaces include the screenprinted silver (SP-Ag) metallization grid, the front-surface passivation, the pn junction itself, and the back-surface field that limits recombination in the aluminum back-surface contact.For example, the screen-printed silver metallization grid on the front of the solar cells, which is applied as a paste consisting of silver nanoparticles, glass frit, various organic binders and subsequently fired, can be susceptible to corrosion from acetic acid produced in the ethylene-vinyl acetate (EVA) encapsulant in front of the c-Si cell.
SP-Ag conductive lines are used in commercial silicon photovoltaic (PV) cells as a front-side electrical contact to the emitter.Formulation of the SP-Ag pastes includes micron-scale silver particles (1-10 lm), glass, and organic surfactants that facilitate printing and agglomeration upon firing. 28The screen-printed precursor undergoes a firing process that reaches up to 850 C, where the glass frit etches the SiN x antireflection coating allowing for contact formation to the emitter.The final conductive lines are composed of flocced silver particles and glass matrix. 29,305][36][37][38] The microscopic mechanisms that result in the formation of the low resistance contact between the silver conductive lines and the solar cell are still being uncovered, 28 but two prevailing models are generally accepted.One hypothesis suggests that the Ag becomes dissolved within the molten glass frit, and upon cooling down, the supersaturated solution allows large grain growth at the Si interface causing a Schottky-barrierlike boundary that acts more Ohmic as the emitter doping concentration is increased to n þþ levels.Another hypothesis is that Ag nanoparticles precipitate and are held in a colloidal suspension in an interfacial glass film contacting the emitter, with conduction enabled through quantum mechanical tunneling through the insulator regions.Cooper et al. 28 investigated these models finding evidence for both models prevailing in various regimes of firing temperature and emitter doping concentration; due to this dependency, one can expect that deployed systems are widely varying and their response to climatic conditions and degradation in performance is highly variable and heterogeneous.We expect to see great variability in the real-world lifetime performance of c-Si BSF SP-Ag cells.
These characterization schemes can be illuminating but quantitative and predictive relationships to the real operation of gigawatts of photovoltaic cells is tenuous.Similarly, modeling the underlying performance and performance degradation of screen printed silver contacts assuming a PV module is modeled by a single diode and series and parallel resistances will likely miss the complexity of behavior because the diode model lacks the essential parameters for this complex materials system. 6The series resistance, in particular, comprises several contributors and is not unrelated to the shunt resistance-or recombination current-in real devices.Yet by studying the massive real-world datastreams using data-driven models can provide new insights into complex behavior.For example, we will show here that it is possible to model the behavior of nonuniform PV modules with bypass diodes, and from the "turn-on" voltage of the bypass diode to model relative changes in the series resistance and the distribution of series resistance among the cells.
Considerable research has already been published on observed degradation modes or mechanisms, but it is not clear how these mechanisms interact or how the network of mechanistic degradation pathways, acting in parallel or series, actually produce the degradation in performance. 39eliability research efforts conducted on complete modules are typically conducted in the laboratory under accelerated aging conditions, exposing the module to high doses of heat/ humidity and/or light well in excess of the extremes of the real world.Relative humidity of 85 C and 85% is a typical exposure condition for accelerated aging.

III. ANALYTICS
As we are concerned with degradation in performance over lifetime, the variation in time of P mp and FF would convey the relative performance loss.The parameters R s , R sh , n, and I 0 are representations of the I-V curve behavior and are more intimately linked to fundamental mechanisms of the degradation phenomena.To gain insights into degradation phenomena it is desirable to track the I-V shape phenomena as time series data.
The International Energy Agency's (IEA) PV Power Systems Programme Task 13 (Ref.40) cataloged a collection of I-V curve responses related to PV module degradation and failure, 5 which we summarize in Fig. 1.
This typical selection of I-V curves indicates the features already associated with degradation mechanisms, although I-V curve shapes shown have been generated from equivalent circuit, or diode, models, i.e., the five parameters described earlier.Yet these shapes form the basis of what can be used for automated feature selection among massive datasets of I-V curves.In addition, the IEA report catalogs "inflex points," which we refer to as "change points," as points in the I-V curve where bypass diodes become forward biased and turn-on due to module heterogeneities.These heterogeneities can include cell cracking and hot spots for example and are more closely linked to the mesoscale degradation processes affecting localized areas of a single, or a few cells, in a module or string of modules.In this manuscript, we will describe the importance and utility of these points and the voltage at which they appear.We will also classify the change points as well as the overall curve shape.

A. Machine-learning modeling
New techniques of modeling solar cell behavior have been developed of late that are largely data-driven and less restricted by the mathematical tractability of physics-based diode models and draw upon advances in statistical and machine learning. 41,42Many of these techniques are supervised, in that the data are labeled and identified by a researcher a priori.Riley and Venayagamoorthy 43 employed a recurrent neural network to model maximum power time series behavior for a systematic model of PV performance and also described the utility of this methodology for prognostics and health management of PV systems over time. 44ther authors more recently have been utilizing machine learning algorithms to better predict the behavior of PV systems beginning within the constraints of the diode model, but these approaches have utility in a more supervised manner as well.For example, genetic algorithms 45 and differential evolution has been used to parametrize the I-V curve for sample PV cells. 46These methods are largely used as optimization and fitting methodologies by researchers in control theory to find best values within the single diode model for fitting to real data, yet these metaheuristics hold much promise for supervised analytics of I-V curves.Machine-learning techniques can be utilized to classify and find anomalous behavior in the data, such as bypassing of I-V curves.Machine-learning anomaly detection techniques can be supervised, semisupervised, or unsupervised, 41 depending on whether the dataset is labeled as typical or anomalous.The massive amount of PV data that exist as power, I-V, and weather time series suggests that these data are ripe for unsupervised machine learning automated analytics techniques.
We have developed an automated analysis of I-V curves based on a nonparametric regression approach of local polynomial regression, 48 and using the loess package in R. 49,50 In a population of I-V curves, such as acquired in a time-series study and dataset, the majority of curves are typical in shape as may well fit the simplistic diode model.This aids in the automated anomaly detection of I-V curves where the emphasis is to identify unusual curves exhibiting change points or inflection points.The loess, nonparametric regression approach we developed allow for model independent analysis of I-V time-series datasets, in which anomaly detection is the focus.Further, the machine-learning techniques need not try to analyze and fit curves to standard macroscopic physical models, but rather utilized to develop datadriven representations of the data that can be used in support of laboratory-based experimentation.In this concept, it becomes less critical to translate an increase in series resistance from real-world to laboratory, as it is to identify features and characteristics of the curve shape and translate those features among disparate datasets.

B. SPICE modeling
][53] Here, SPICE can provide a compromise between diode modeling and totally supervised models in that SPICE is built on a numerical solver and therefore with no inherent limitations on granularity.Further, there is no specification a priori of the equivalent circuit model to employ that describes a solar cell, and any other contributors including bypass diodes can be added.Although these models inherently have no connection to microscopic or quantum effects, the purely macroscopic coarseness can be mitigated to approach the mesoscale by deploying a high density of equivalent circuits per unit of real space.
We invoked this type of modeling to rapidly step through simulation scenarios and test hypotheses related to nonuniform illumination.Our models started with a single diode model for each cell, with the ability to vary series and shunt resistors, diode parameters, as well as illumination.In this way, we simulate a generalized version of Eq. ( 1), where we abandon the trivial example of all cells having equivalent parameters and the voltage scales by N s the number of solar cells, and instead replace it, whereby the voltage is a function of current for any given solar cell, i, given by where Min½I L is an operator to find the minimum I L in the entire ensemble of i solar cells.Here, the ensemble is the collection of I-V curves in a substring spanned by a bypass diode, and in the event of bypassing will include the bypass diode.Then, the complete I-V curve could be found by summing all V i ðI; T i Þ, and summing over all substrings, s, as in Our SPICE models were built in the Linear Technology SPICE (LTSPICE) distribution, and incorporated 60 individual circuit equivalent models for a solar cell with resistive interconnects and three bypass diodes equally distributed to construct a commercial 60-cell module.Our models allowed for direct I-V curve generation given the input parameters of temperature, integrated irradiance, series and shunt resistance, reverse saturation current, and diode ideality factor.This simple model was used to test the behavior of bypass diodes under forward and reverse bias conditions and simulate a large matrix of conditions related to changing the series and shunt resistors, photogenerated current, and any diode parameter we choose.
To add utility and rigor to the description and discussion of I-V curves with bypassing, we developed the following classification rules that are most easily understood when I-V curves are transformed into the power versus voltage, P-V, space, where P ¼ I Â V. Conventional I-V curves exhibit a single global maximum in the P-V space, or P g max;1 , where the nonconventional superscript represents "global."We also associate with that P g max point a change-point at the open-circuit voltage, V oc .Since V oc is also one of two global minima in power along with I sc , we can describe these points as P g min;1 and P g min;2 , respectively.For curves that show change points due to bypassing, the form of the P-V curve will show additional power maxima and minima for each change point.Thus, for example, a curve with three substrings under different irradiances will exhibit three change points (the turn-on voltage of two bypass diodes plus the open circuit voltage) that correspond to three minima in power, or P g min;1 ; P l min;2 ; P l min;3 ; P g min;4 , where P g min;4 in this case is the I sc .Correspondingly, there are three power maxima, and a comparison among these is required to determine which is the global maximum.We describe this set as P g max;1 ; P l max;2 ; P l max;3 , and in this example scenario, the local maximum closest in voltage to V oc is the global maximum.
The utility of this method is exemplified by simulating a four-cell "minimodule" comprising two substrings with Schottky bypass diodes in a configuration shown in Fig. 2. Sweeping the series resistance of a single cell under nonuniform irradiance is clearly changing the turn-on voltage of the string bypass in a predictable manner.The results of the SPICE simulation for varying irradiance and variable series resistance of a single cell are shown as a series of curves in Fig. 3. Further we find that for large series resistance values bypassing is turned on even under uniform irradiance, which will be shown below when compared to experimental measurements.
Hence, modules in the field with bypassing evident in the I-V curves need not always be nonuniformly illuminated, but localized hotspots can dramatically and temporarily increase the resistance of a cell and hypothetically manifest as bypassing.Next, we fabricated four-cell minimodules in order to compare experimental measurements with the simulation.

IV. EXPERIMENT
Here, we will verify in the laboratory the features that will be sought in the massive datasets obtained in the realworld and validate the shape of I-V curves related to specific degradation mechanisms causing heterogenous degradation of SP-Ag contacts.

A. Laboratory-based: Engineered damage signatures
We fabricated a four-cell mini module with three identical new cells and one cell heavily degraded.The cells were multicrystalline silicon, 156 Â 156 mm 2 , purchased from Q-Cells and were approximately 15% efficient.These cells contain three busbars and SP-Ag conductive lines making contact to the front emitter.The cells were placed in a 2 Â 2 pattern, and each two-cell half was tabbed and strung in series, and then, the interconnect was run outside of the laminated module.The modules were laminated using a polyethyleneterephthalate (PET) backsheet, two layers of EVA, and tempered front glass.
Heavily degraded PV cells were made by exposing several cells to the ASTM G154 cycle four testing protocol, using an environmental chamber (QUV by Q-Lab) to heavily degrade its performance.The cycle consists of 8 h UV exposure at 70 C, followed by 4 h condensing humidity at 50 C.The intensity of the UV exposure is 1.55 W/m 2 /nm centered at 340 nm, approximately five times the intensity of the UV component of 1 sun, i.e., AM 1.5 over 7 days.Hypothetically, the multifactor exposure condition will greatly degrade the performance of the solar cells by preferentially corroding the screen printed silver front contact. 54,55To enhance series resistivity further, only the center busbar was tabbed for current collection.In this way, we engineered damage into the cell in a predictable manner.
A solar simulator, the Solo Apollo by AllReal, and a DayStar DS100-C I-V curve tracer were used to characterize the degraded cells by collecting several I-V curves at points in time.A downward trend in the maximum power of threecells connected in series was observed over the exposure time and is shown in Fig. 4. All data points were normalized In this plot, we can see that the bypass turn-on voltage is not strongly dependent upon the irradiance mismatch in these ranges but is strongly dependent upon the effective series resistance of the "damaged" cell.to standard test conditions of 1000 W/m 2 and 25 C.The thermal coefficient of the maximum power shift is explicitly provided by the datasheet for the solar cells used and is equal to 0.43%/K.

B. Outdoor studies
The Solar Durability and Lifetime Extension (SDLE) Research Center at Case Western Reserve University owns and operates a unique outdoor test facility, the "SDLE SunFarm" in Cleveland, OH, 56 that provides 14 dual-axis solar trackers populated with 144 commercial photovoltaic modules, 122 individual grid-connected photovoltaic power plants of varying power, and four mirror-augmented PV (MAPV) modules, 57 with controllable nonuniform irradiance forcing bypass diodes to be forward biased.
The MAPV modules are not grid-connected, but instead were loaded by a 32 channel DayStar Multitracer, which provides maximum power point tracking and also scans the full I-V curve from open circuit voltage (V oc ) to I sc at regular 10 min intervals.Additionally, climatic (weather and irradiance) metrology is provided by irradiance pyranometers and pyrheliometers (Kipp and Zonen CMP 11 and CHP 1).The weather station (Vaisala WXT520) is capable of providing measurements 58 of air temperature, wind speed, wind direction, relative humidity, rain intensity, and rain direction, and these data are collected by a networked data acquisition system (Campbell Scientific CR1000) on a minute-by-minute basis. 56T-type thermocouples affixed to the back of the MAPV modules provide the local temperature of the module.
9][60] These outdoor test facilities are located at Isla Gran Canaria, Spain (GC), Mount Zugspitze, Germany (UFS), and Negev Desert, Isreal (NEG).The sites in GC and UFS have been recording data since 2010 while the NEG site started recording from 2012.On each of the three sites, I-V curve of two module samples are measured every 5 min.The maximum power output of the module is recorded approximately every minute between sequential I-V measurements.

A. Engineered damage
We collected irradiance-dependent I-V curves for two halves of the minimodule with engineered damage, as shown in Fig. 5.These curves show the disparity in performance between two cells that are new and two cells where one is new and one is heavily degraded and highly resistive.Among the damaged cells, we see that the behavior is not simply modeled by an increase in the series resistance, which is evident at voltages near V oc but the slope near short circuit is also affected.This region of the curve is associated with the shunt resistance in a diode model system and represents Ohmic carrier recombination mechanisms.The data support the interpretation that an increase in the series resistance, which is itself comprised of many factors, is linked to the shunt resistance in that the resistance cannot support fully the available current and more photogenerated carriers are recombining even if the characteristic scattering time is unchanged in the bulk.
The minimodule fabricated with the damaged cell showed bypass diode turn-on under uniform irradiance, due to the resistances of the cells being strongly nonuniform.This observation demonstrates the potential for real-world modules to show bypassing even under uniform illumination if a localized cell or interconnect becomes highly resistive.A scenario in which this observable may be found is localized hot spots, where positive thermal feedback occurs because the photogenerated carriers of a solar cell are thought of as a current source.Current sources through a resistor can undergo thermal runaway effects because the power dissipation is I 2 R and I is unchanged at the source, but as R increases the power dissipation increases, causing local heating and a continual increase in R.
Comparing the measurements of the irradiance-dependent I-V curves of the minimodule with engineered damage and SPICE modeling shows similar behavior.The SPICE model for the minimodules was seeded with measured values from the individual I-V curves and measurements of the irradiance uniformity in an attempt to closely predict the minimodule behavior as well as a model for the exact Schottky bypass diode used, a Hy Electronics 15SQ045.Shunt and series resistance of the individual cells were extracted by typical means found in the literature, 12 where the shunt resistance was found by fitting a tangent line near I ¼ I sc and series resistance was approximated by the tangent line near I ¼ 0. These curves are shown in Fig. 6, and although several similarities arise such as the position and magnitude of the bypassing even under uniform illumination, and the irradiance in which bypassing disappears, the fill factors of the simulated and measured curves are very different.In the The comparison of modeled and experimentally determined I-V curves underscores the potential problems equivalent circuit models, which are potentially insightful, but incomplete in describing photovoltaic behavior under all conditions.These conditions can occur under system degradation as well as the nonuniform irradiance that is often observed relying solely on diode modeling, particularly since diode models for a complete Si cell gives results that are inconsistent with experimental data.Hence, we shift to learnings that can be gained from examining real-world behavior and subsequent data-driven modeling of anomalous I-V behavior.

B. Massive analytics of I-V curves
Massive datasets of I-V curves are too burdensome to analyze one at a time and therefore require taking a statistical approach with a validated data analytics pipeline. 4hese large datasets comprise many climatic variables and module performance variables, which allow for the demonstration of more subtle behaviors that can arise from transient or permanent changes in the PV modules and can guide the developments of new models that are more inclusive of heterogeneous behaviors.
The analysis of massive temporal datasets is benefited by machine learning practices that are implemented in the data analytics pipeline, which can then be applied to the entire dataset.
Over 1.5 Â 10 6 I-V curves over 500 days have been acquired on an average interval of 10 min on the SDLE SunFarm.This massive dataset of I-V curves comprise many climatic variables and module performance variables, which allow for the demonstration of more subtle behaviors that can arise from transient or permanent changes in the PV modules, and can guide the developments of new models that are more inclusive of heterogeneous behaviors.
An automatic machine learning pipeline is needed to adequately extract key features from each of the 1.5 Â 10 6 I-V curves. 4The features of interest in the data include I sc , V oc , the number of change-points, and the I-V curve slopes, m, at short circuit and at each change-point.The selection of features was guided initially by the SPICE development and extended with empirical evidence.The I-V curves are grouped into appropriate subpopulations and modeled accordingly after the feature values are estimated.
The core of this pipeline is a statistical procedure that involves two parts: change-point detection and parameter estimation.The change-point detection procedure identifies the presence and locations of change points in an otherwise concave I-V curve, by fitting a first or second degree local polynomial regression 48 over the curve and flagging spikes in residuals.The sensitivity of the algorithm is tuned on a representative sample that is investigated and labeled by the researchers.The parameter estimation procedure estimates current, voltage, and the I-V curve slopes at short circuit and at each change-point, by, in this case, fitting a first degree polynomial regression line 41 along each I-V curve and using the linear components as the estimates (see red lines in Fig. 7).After feature recognition, we next classify the results.Here, the classification is straightforward and is based on the number of bypassed strings of cells.For a 60cell module, we have three types of curves: type I curves show a single V oc with no additional change points, type II shows V oc and 1 additional change point, and type III shows V oc and two additional change points.
A pairwise scatter and correlation matrix is a useful method in multivariate analytics for visualizing the learnings from these massive data.As an example, Fig. 8, generated using the language "R," 61 shows several variables including the currents at the intercept of each change point, I1, I2, and I3 (note that I1 is equivalent to I sc ), V oc , maximum power  point irradiance, and temperature in a pairwise scatter and correlation matrix.Variables and their histograms are shown along the matrix diagonal.Below the diagonal shows the pairwise scatter plots of the variables, and the corresponding pairwise linear correlation coefficient is shown above the diagonal.For example, the scatter plot of irradiance versus I sc is found in the first column, sixth row, and the corresponding linear correlation coefficient of I sc versus irradiance is 0.77, found in the first row, sixth column.This module was augmented with a mirror to apply nonuniform irradiance so as to engineer stress into the system as well as ensure bypass diode turn-on.These data were subset to daytime measurements only and resulted in approximately 6000 I-V curves over 1 year.Figure 9 is a bar plot of a small subset of I-V curve data showing the relative populations of classified I-V curves.
The data set shown was selected from eight days (June 09, 2012 to June 16, 2012) of I-V curves data from the "UFS" site.On each day, there were 288 measured I-V curves (5 min time interval, 24 h a day).An I-V curve with less than ten measurement points will be classified as "few.points."Thus I-V curves with a short circuit current measured less than 1.1 A are classified as "small.amps." Figure 9 shows that, on each day, within 288 I-V measurements, a large percentage are classified as either few.points or small.ampsbecause I-V measurements were made continuously day and night during times of little or no module illumination.The vast majority of the rest of the I-V curves are type I, yet there are type II I-V curves on each day.This finding supports the idea that many modules will undergo types II and III behavior in typical installations, at least in a transient manner and those data can be harvested to gain insights into the performance heterogeneity.Further, mining these data for the transient behavior of bypass turnon as a function of climate may potentially indicate the prevalence of performance heterogeneity with climatic stressors.Therefore, for analysis tracking, the periodicity, or lack thereof, and the irradiance are critical measures of the response function of the modules to determine heterogeneity and discern whether or not the bypassing is externally or internally caused.This work is ongoing and beyond the scope of this manuscript.Here, we focus mainly on the potential for massive data analytics to provide scientific insights into the mesoscopic solar cell performance.

VI. CONCLUSIONS
Big data analytics is becoming a commonplace term today, and one that is being invoked in materials science.We have demonstrated the feasibility and novelty of studying massive I-V datasets spanning many years and including thousands of curves per year per module using machine learning techniques.These techniques were validated among a small set of data and are then useful to devise data analytics pipelines that enable the analysis and modeling to efficiently be applied to large datasets.I-V curves that are collected from research facilities in the real-world show behaviors that are not modeled by simplistic equivalent circuit models and analytical solving techniques, but these behaviors and their time dynamics are crucial to understanding heterogeneous degradation.The I-V curves form a linkage to the laboratory where hypothetical degradation scenarios can be verified and a more conventional materials characterization process can be undertaken to identify and understand the behavior and dynamics of mechanisms.These linkages then allow for a nonbiased approach to mesoscale science applied rigorously to the vast scale of realworld photovoltaic deployment.Our future efforts in this area will involve classification and examination of the anomalous I-V curves collected from outdoor sites, and further modeling the system heterogeneity as a time-series metric and comparison to the simulation-aided, laboratory-based confirmatory experiments.

FIG. 1 . 4 J
FIG. 1. (Color online) Selection of I-V curves from Ref. 5 and typical associated degradation mechanisms.Red curves depict the beginning-of-life profile for a PV module, and the blue curves depict the degraded output.

FIG. 2 .
FIG. 2. (Color online) Schematic from LTSPICE showing four solar cells in series, described by their single diode equivalent circuit model, with two Schottky bypass diodes.

FIG. 4 .
FIG. 4. (Color online) Plot of the average power of a 156 Â 156 mm solar cell vs time in exposure under conditions prescribed in the ASTM G154 testing protocol and a linear fit to the power.

FIG. 5 .
FIG. 5. (Color online) Two series of I-V curves as a function of irradiance.The red series shows the curves acquired from the two cells that were new, and the blue curves were acquired from the two cells that contained one new cell and one heavily degraded cell.

FIG. 6 .
FIG. 6. (Color online) Series of I-V curves acquired as a function of irradiance for the minimodule with bypass diodes fabricated by combining the two halves whose I-V curves are shown in Fig. 5 in blue.A comparison to simulated I-V curves depicted in red is shown.The simulated curves were seeded with values extracted from I-V curves of the two halves and the datasheet of the bypass diodes used in an attempt to match the experimental scenario closely.

FIG. 8 .
FIG. 8. (Color online) Pairwise scatter and correlation matrix of several variables including the currents at the intercept of each change point, I1, I2, and I3 (note that I1 is equivalent to I sc ), V oc , maximum power point irradiance, and temperature.Variables and their histograms are shown along the matrix diagonal.Below the diagonal shows the pairwise scatter plots of the variables, and the corresponding pairwise linear correlation coefficient is shown above the diagonal.For example, the scatter plot of irradiance vs I sc is found in the first column, sixth row, and the corresponding linear correlation coefficient of I sc vs irradiance is 0.77, found in the first row, sixth column.