Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 19 Oct 21
 [1] arXiv:2110.08363 [pdf, other]

Title: Spatiotemporal extreme event modeling of terror insurgenciesSubjects: Applications (stat.AP); Machine Learning (stat.ML)
Extreme events with potential deadly outcomes, such as those organized by terror groups, are highly unpredictable in nature and an imminent threat to society. In particular, quantifying the likelihood of a terror attack occurring in an arbitrary spacetime region and its relative societal risk, would facilitate informed measures that would strengthen national security. This paper introduces a novel selfexciting marked spatiotemporal model for attacks whose inhomogeneous baseline intensity is written as a function of covariates. Its triggering intensity is succinctly modeled with a Gaussian Process prior distribution to flexibly capture intricate spatiotemporal dependencies between an arbitrary attack and previous terror events. By inferring the parameters of this model, we highlight specific spacetime areas in which attacks are likely to occur. Furthermore, by measuring the outcome of an attack in terms of the number of casualties it produces, we introduce a novel mixture distribution for the number of casualties. This distribution flexibly handles low and high number of casualties and the discrete nature of the data through a {\it Generalized ZipF} distribution. We rely on a customized Markov chain Monte Carlo (MCMC) method to estimate the model parameters. We illustrate the methodology with data from the open source Global Terrorism Database (GTD) that correspond to attacks in Afghanistan from 20132018. We show that our model is able to predict the intensity of future attacks for 20192021 while considering various covariates of interest such as population density, number of regional languages spoken, and the density of population supporting the opposing government.
 [2] arXiv:2110.08410 [pdf, ps, other]

Title: Covariate Adjustment in Regression Discontinuity DesignsSubjects: Methodology (stat.ME); Econometrics (econ.EM)
The Regression Discontinuity (RD) design is a widely used nonexperimental method for causal inference and program evaluation. While its canonical formulation only requires a score and an outcome variable, it is common in empirical work to encounter RD implementations where additional variables are used for adjustment. This practice has led to misconceptions about the role of covariate adjustment in RD analysis, from both methodological and empirical perspectives. In this chapter, we review the different roles of covariate adjustment in RD designs, and offer methodological guidance for its correct use in applications.
 [3] arXiv:2110.08411 [pdf, other]

Title: Multigroup Gaussian ProcessesSubjects: Methodology (stat.ME); Applications (stat.AP)
Gaussian processes (GPs) are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Modern scientific data sets are typically heterogeneous and often contain multiple known discrete subgroups of samples. For example, in genomics applications samples may be grouped according to tissue type or drug exposure. In the modeling process it is desirable to leverage the similarity among groups while accounting for differences between them. While a substantial literature exists for GPs over Euclidean domains $\mathbb{R}^p$, GPs on domains suitable for multigroup data remain less explored. Here, we develop a multigroup Gaussian process (MGGP), which we define on $\mathbb{R}^p\times \mathscr{C}$, where $\mathscr{C}$ is a finite set representing the group label. We provide general methods to construct valid (positive definite) covariance functions on this domain, and we describe algorithms for inference, estimation, and prediction. We perform simulation experiments and apply MGGP to gene expression data to illustrate the behavior and advantages of the MGGP in the joint modeling of continuous and categorical variables.
 [4] arXiv:2110.08418 [pdf, ps, other]

Title: Nuances in Margin Conditions Determine Gains in Active LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We consider nonparametric classification with smooth regression functions, where it is well known that notions of margin in $E[YX]$ determine fast or slow rates in both active and passive learning. Here we elucidate a striking distinction between the two settings. Namely, we show that some seemingly benign nuances in notions of margin  involving the uniqueness of the Bayes classifier, and which have no apparent effect on rates in passive learning  determine whether or not any active learner can outperform passive learning rates. In particular, for AudibertTsybakov's margin condition (allowing general situations with nonunique Bayes classifiers), no active learner can gain over passive learning in commonly studied settings where the marginal on $X$ is near uniform. Our results thus negate the usual intuition from past literature that active rates should improve over passive rates in nonparametric settings.
 [5] arXiv:2110.08425 [pdf, other]

Title: Exact Bias Correction for Linear Adjustment of Randomized Controlled TrialsSubjects: Methodology (stat.ME); Econometrics (econ.EM)
In an influential critique of empirical practice, Freedman \cite{freedman2008A,freedman2008B} showed that the linear regression estimator was biased for the analysis of randomized controlled trials under the randomization model. Under Freedman's assumptions, we derive exact closedform bias corrections for the linear regression estimator with and without treatmentbycovariate interactions. We show that the limiting distribution of the bias corrected estimator is identical to the uncorrected estimator, implying that the asymptotic gains from adjustment can be attained without introducing any risk of bias. Taken together with results from Lin \cite{lin2013agnostic}, our results show that Freedman's theoretical arguments against the use of regression adjustment can be completely resolved with minor modifications to practice.
 [6] arXiv:2110.08449 [pdf, other]

Title: Adversarial Attacks on Gaussian Process BanditsSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Gaussian processes (GP) are a widelyadopted tool used to sequentially optimize blackbox functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. In this paper, we study this problem from the attacker's perspective, proposing various adversarial attack methods with differing assumptions on the attacker's strength and prior information. Our goal is to understand adversarial attacks on GP bandits from both a theoretical and practical perspective. We focus primarily on targeted attacks on the popular GPUCB algorithm and a related eliminationbased algorithm, based on adversarially perturbing the function $f$ to produce another function $\tilde{f}$ whose optima are in some region $\mathcal{R}_{\rm target}$. Based on our theoretical analysis, we devise both whitebox attacks (known $f$) and blackbox attacks (unknown $f$), with the former including a Subtraction attack and Clipping attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $\mathcal{R}_{\rm target}$ even with a low attack budget, and we compare our attacks' performance and efficiency on several real and synthetic functions.
 [7] arXiv:2110.08500 [pdf, other]

Title: On Model Selection Consistency of Lasso for HighDimensional Ising Models on Treelike GraphsComments: 30 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We consider the problem of highdimensional Ising model selection using neighborhoodbased least absolute shrinkage and selection operator (Lasso). It is rigorously proved that under some mild coherence conditions on the population covariance matrix of the Ising model, consistent model selection can be achieved with sample sizes $n=\Omega{(d^3\log{p})}$ for any treelike graph in the paramagnetic phase, where $p$ is the number of variables and $d$ is the maximum node degree. When the same conditions are imposed directly on the sample covariance matrices, it is shown that a reduced sample size $n=\Omega{(d^2\log{p})}$ suffices. The obtained sufficient conditions for consistent model selection with Lasso are the same in the scaling of the sample complexity as that of $\ell_1$regularized logistic regression. Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.
 [8] arXiv:2110.08505 [pdf, other]

Title: Mode and Ridge Estimation in Euclidean and Directional Product Spaces: A Mean Shift ApproachComments: 51 pages, 10 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
The set of local modes and the ridge lines estimated from a dataset are important summary characteristics of the datagenerating distribution. In this work, we consider estimating the local modes and ridges from point cloud data in a product space with two or more Euclidean/directional metric spaces. Specifically, we generalize the wellknown (subspace constrained) mean shift algorithm to the product space setting and illuminate some pitfalls in such generalization. We derive the algorithmic convergence of the proposed method, provide practical guidelines on the implementation, and demonstrate its effectiveness on both simulated and real datasets.
 [9] arXiv:2110.08523 [pdf, ps, other]

Title: Spectral measures of empirical autocovariance matrices of high dimensional Gaussian stationary processesSubjects: Statistics Theory (math.ST); Probability (math.PR)
Consider the empirical autocovariance matrix at a given nonzero time lag based on observations from a multivariate complex Gaussian stationary time series. The spectral analysis of these autocovariance matrices can be useful in certain statistical problems, such as those related to testing for white noise. We study the behavior of their spectral measures in the asymptotic regime where the time series dimension and the observation window length both grow to infinity, and at the same rate. Following a general framework in the field of the spectral analysis of large random nonHermitian matrices, at first the probabilistic behavior of the small singular values of the shifted versions of the autocovariance matrix are obtained. This is then used to infer about the large sample behaviour of the empirical spectral measure of the autocovariance matrices at any lag. Matrix orthogonal polynomials on the unit circle play a crucial role in our study.
 [10] arXiv:2110.08570 [pdf, other]

Title: A ReducedBias Weighted least square estimation of the Extreme Value IndexComments: 24 pagesSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
In this paper, we propose a reducedbias estimator of the EVI for Paretotype tails (heavytailed) distributions. This is derived using the weighted least squares method. It is shown that the estimator is unbiased, consistent and asymptotically normal under the secondorder conditions on the underlying distribution of the data. The finite sample properties of the proposed estimator are studied through a simulation study. The results show that it is competitive to the existing estimators of the extreme value index in terms of bias and Mean Square Error. In addition, it yields estimates of $\gamma>0$ that are less sensitive to the number of toporder statistics, and hence, can be used for selecting an optimal tail fraction. The proposed estimator is further illustrated using practical datasets from pedochemical and insurance.
 [11] arXiv:2110.08648 [pdf]

Title: Minding noncollapsibility of odds ratios when recalibrating risk prediction modelsComments: 10 Pages, 1 Figure, 1 AppendixSubjects: Applications (stat.AP)
In clinical prediction modeling, model updating refers to the practice of modifying a prediction model before it is used in a new setting. In the context of logistic regression for a binary outcome, one of the simplest updating methods is a fixed oddsratio transformation of predicted risks to improve calibrationinthelarge. Previous authors have proposed equations for calculating this oddsratio based on the discrepancy between the prevalence in the original and the new population, or between the average of predicted and observed risks. We show that this method fails to consider the noncollapsibility of oddsratio. Consequently, it undercorrects predicted risks, especially when predicted risks are more dispersed (i.e., for models with good discrimination). We suggest an approximate equation for recovering the conditional oddsratio from the mean and variance of predicted risks. Brief simulations and a case study show that this approach reduces such undercorrection. R code for implementation is provided.
 [12] arXiv:2110.08665 [pdf, other]

Title: Quantile Regression by Dyadic CARTSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
In this paper we propose and study a version of the Dyadic Classification and Regression Trees (DCART) estimator from Donoho (1997) for (fixed design) quantile regression in general dimensions. We refer to this proposed estimator as the QDCART estimator. Just like the mean regression version, we show that a) a fast dynamic programming based algorithm with computational complexity $O(N \log N)$ exists for computing the QDCART estimator and b) an oracle risk bound (trading off squared error and a complexity parameter of the true signal) holds for the QDCART estimator. This oracle risk bound then allows us to demonstrate that the QDCART estimator enjoys adaptively rate optimal estimation guarantees for piecewise constant and bounded variation function classes. In contrast to existing results for the DCART estimator which requires subgaussianity of the error distribution, for our estimation guarantees to hold we do not need any restrictive tail decay assumptions on the error distribution. For instance, our results hold even when the error distribution has no first moment such as the Cauchy distribution. Apart from the Dyadic CART method, we also consider other variant methods such as the Optimal Regression Tree (ORT) estimator introduced in Chatterjee and Goswami (2019). In particular, we also extend the ORT estimator to the quantile setting and establish that it enjoys analogous guarantees. Thus, this paper extends the scope of these globally optimal regression tree based methodologies to be applicable for heavy tailed data. We then perform extensive numerical experiments on both simulated and real data which illustrate the usefulness of the proposed methods.
 [13] arXiv:2110.08676 [pdf, other]

Title: NoiseAugmented PrivacyPreserving Empirical Risk Minimization with Dualpurpose Regularizer and Privacy Budget Retrieval and RecyclingSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
We propose NoiseAugmented PrivacyPreserving Empirical Risk Minimization (NAPPERM) that solves ERM with differential privacy guarantees. Existing privacypreserving ERM approaches may be subject to overregularization with the employment of an l2 term to achieve strong convexity on top of the target regularization. NAPPERM improves over the current approaches and mitigates overregularization by iteratively realizing target regularization through appropriately designed augmented data and delivering strong convexity via a single adaptively weighted dualpurpose l2 regularizer. When the target regularization is for variable selection, we propose a new regularizer that achieves both privacy and sparsity guarantees simultaneously. Finally, we propose a strategy to retrieve privacy budget when the strong convexity requirement is met, which can be returned to users such that the DP of ERM is guaranteed at a lower privacy cost than originally planned, or be recycled to the ERM optimization procedure to reduce the injected DP noise and improve the utility of DPERM. From an implementation perspective, NAPPERM can be achieved by optimizing a nonperturbed object function given noiseaugmented data and can thus leverage existing tools for nonprivate ERM optimization. We illustrate through extensive experiments the mitigation effect of the overregularization and private budget retrieval by NAPPERM on variable selection and prediction.
 [14] arXiv:2110.08747 [pdf, ps, other]

Title: JEL ratio test for independence of time to failure and cause of failure in competing risksSubjects: Methodology (stat.ME)
In the present article, we propose jackknife empirical likelihood (JEL) ratio test for testing the independence of time to failure and cause of failure in competing risks data. We use Ustatistic theory to derive the JEL ratio test. The asymptotic distribution of the test statistic is shown to be chisquare distribution with one degree of freedom. A Monte Carlo simulation study is carried out to assess the finite sample behaviour of the proposed test. The performance of proposed JEL test is compared with the test given in Dewan et al. (2004). Finally we illustrate our test procedure using various real data sets.
 [15] arXiv:2110.08766 [pdf, ps, other]

Title: On minimax estimation problem for stationary stochastic sequences from observations in special sets of pointsComments: arXiv admin note: text overlap with arXiv:1804.08408Subjects: Statistics Theory (math.ST)
The problem of the meansquare optimal estimation of the linear functionals which depend on the unknown values of a stochastic stationary sequence from observations of the sequence in special sets of points is considered. Formulas for calculating the meansquare error and the spectral characteristic of the optimal linear estimate of the functionals are derived under the condition of spectral certainty, where the spectral density of the sequence is exactly known. The minimax (robust) method of estimation is applied in the case where the spectral density of the sequence is not known exactly while some sets of admissible spectral densities are given. Formulas that determine the least favourable spectral densities and the minimax spectral characteristics are derived for some special sets of admissible densities.
 [16] arXiv:2110.08849 [pdf, other]

Title: A Bayesian Selection Model for Correcting Outcome Reporting Bias With Application to a Metaanalysis on Heart Failure InterventionsComments: 26 pages, 5 tables, 8 figuresSubjects: Applications (stat.AP)
Multivariate metaanalysis (MMA) is a powerful tool for jointly estimating multiple outcomes' treatment effects. However, the validity of results from MMA is potentially compromised by outcome reporting bias (ORB), or the tendency for studies to selectively report outcomes. Until recently, ORB has been understudied. Since ORB can lead to biased conclusions, it is crucial to correct the estimates of effect sizes and quantify their uncertainty in the presence of ORB. With this goal, we develop a Bayesian selection model to adjust for ORB in MMA. We further propose a measure for quantifying the impact of ORB on the results from MMA. We evaluate our approaches through a metaevaluation of 748 bivariate metaanalyses from the Cochrane Database of Systematic Reviews. Our model is motivated by and applied to a metaanalysis of interventions on hospital readmission and quality of life for heart failure patients. In our analysis, the relative risk (RR) of hospital readmission for the intervention group changes from a significant decrease (RR: 0.931, 95% confidence interval [CI]: 0.8620.993) to a statistically nonsignificant effect (RR: 0.955, 95% CI: 0.8761.051) after adjusting for ORB. This study demonstrates that failing to account for ORB can lead to different conclusions in a metaanalysis.
 [17] arXiv:2110.08882 [pdf, ps, other]

Title: Building Degradation Index with Variable Selection for Multivariate Sensory DataComments: 28 pagesSubjects: Applications (stat.AP)
The modeling and analysis of degradation data have been an active research area in reliability and system health management. As the senor technology advances, multivariate sensory data are commonly collected for the underlying degradation process. However, most existing research on degradation modeling requires a univariate degradation index to be provided. Thus, constructing a degradation index for multivariate sensory data is a fundamental step in degradation modeling. In this paper, we propose a novel degradation index building method for multivariate sensory data. Based on an additive nonlinear model with variable selection, the proposed method can automatically select the most informative sensor signals to be used in the degradation index. The penalized likelihood method with adaptive group penalty is developed for parameter estimation. We demonstrate that the proposed method outperforms existing methods via both simulation studies and analyses of the NASA jet engine sensor data.
 [18] arXiv:2110.08884 [pdf, other]

Title: Persuasion by Dimension ReductionComments: arXiv admin note: text overlap with arXiv:2102.10909Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); General Economics (econ.GN); Statistics Theory (math.ST); Methodology (stat.ME)
How should an agent (the sender) observing multidimensional data (the state vector) persuade another agent to take the desired action? We show that it is always optimal for the sender to perform a (nonlinear) dimension reduction by projecting the state vector onto a lowerdimensional object that we call the "optimal information manifold." We characterize geometric properties of this manifold and link them to the sender's preferences. Optimal policy splits information into "good" and "bad" components. When the sender's marginal utility is linear, revealing the full magnitude of good information is always optimal. In contrast, with concave marginal utility, optimal information design conceals the extreme realizations of good information and only reveals its direction (sign). We illustrate these effects by explicitly solving several multidimensional Bayesian persuasion problems.
 [19] arXiv:2110.08905 [pdf, other]

Title: Exploitation of error correlation in a large analysis validation: GlobCurrent case studyAuthors: Richard E. Danielson, Johnny A. Johannessen, Graham D. Quartly, MarieHélène Rio, Bertrand Chapron, Fabrice Collard, Craig DonlonComments: 24 pages, 14 figuresJournalref: Remote Sens. Environ., 217, 476490 (2018)Subjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)
An assessment of variance in ocean current signal and noise shared by in situ observations (drifters) and a large gridded analysis (GlobCurrent) is sought as a function of day of the year for 19932015 and across a broad spectrum of current speed. Regardless of the division of collocations, it is difficult to claim that any synoptic assessment can be based on independent observations. Instead, a measurement model that departs from ordinary linear regression by accommodating error correlation is proposed. The interpretation of independence is explored by applying Fuller's (1987) concept of equation and measurement error to a division of error into shared (correlated) and unshared (uncorrelated) components, respectively. The resulting division of variance in the new model favours noise. Ocean current shared (equation) error is of comparable magnitude to unshared (measurement) error and the latter is, for GlobCurrent and drifters respectively, comparable to ordinary and reverse linear regression. Although signal variance appears to be small, its utility as a measure of agreement between two variates is highlighted.
Sparse collocations that sample a dense grid permit a first order autoregressive form of measurement model to be considered, including parameterizations of analysisin situ error crosscorrelation and analysis temporal error autocorrelation. The former (crosscorrelation) is an equation error term that accommodates error shared by both GlobCurrent and drifters. The latter (autocorrelation) facilitates an identification and retrieval of all model parameters. Solutions are sought using a prescribed calibration between GlobCurrent and drifters (by variance matching). Because the true current variance of GlobCurrent and drifters is small, signal to noise ratio is near zero at best. This is particularly evident for moderate current speed and meridional current component.  [20] arXiv:2110.08936 [pdf, ps, other]

Title: Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment RulesJournalref: Journal of the American Statistical Association, 116:534, 699707 (2021)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions between our work and Kallus (2020) in Section 1. In particular, the assumptions and sources of variation for consideration in these two papers lead to different research problems with different scopes and focuses. In Section 2, following the discussions in Li et al. (2020); Liang and Zhao (2020), we also consider the efficient policy evaluation problem when we have some data from the testing distribution available at the training stage. We show that under the assumption that the sample sizes from training and testing are growing in the same order, efficient value function estimates can deliver competitive performance. We further show some connections of these estimates with existing literature. However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore. In contrast, the requirement of the testing sample size for DRITR is not as strong as that of efficient policy evaluation using the combined data. Finally, we highlight the general applicability and usefulness of DRITR in Section 3.
 [21] arXiv:2110.08967 [pdf, other]

Title: Assessing Ecosystem State Space Models: Identifiability and EstimationSubjects: Applications (stat.AP); Quantitative Methods (qbio.QM)
Bayesian methods are increasingly being applied to parameterize mechanistic process models used in environmental prediction and forecasting. In particular, models describing ecosystem dynamics with multiple states that are linear and autoregressive at each step in time can be treated as statistical state space models. In this paper we examine this subset of ecosystem models, giving closed form Gibbs sampling updates for latent states and process precision parameters when process and observation errors are normally distributed. We use simulated data from an example model (DALECev) to assess the performance of parameter estimation and identifiability under scenarios of gaps in observations. We show that process precision estimates become unreliable as temporal gaps between observed state data increase. To improve estimates, particularly precisions, we introduce a method of tuning the timestep of the latent states to leverage higherfrequency driver information. Further, we show that data cloning is a suitable method for assessing parameter identifiability in this class of models. Overall, our study helps inform the application of state space models to ecological forecasting applications where 1) data are not available for all states and transfers at the operational timestep for the ecosystem model and 2) process uncertainty estimation is desired.
 [22] arXiv:2110.08969 [pdf, ps, other]

Title: On completing a measurement model by symmetryAuthors: Richard E. DanielsonComments: 4 pagesSubjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)
An appeal for symmetry is made to build established notions of specific representation and specific nonlinearity of measurement (often called model error) into a canonical linear regression model. Additive components are derived from the trivially complete model M = m. Factor analysis and equation error motivate corresponding notions of representation and nonlinearity in an errorsinvariables framework, with a novel interpretation of terms. It is suggested that a modern interpretation of correlation involves both linear and nonlinear association.
 [23] arXiv:2110.08970 [pdf, other]

Title: Sample size calculations for nof1 trialsSubjects: Methodology (stat.ME); Applications (stat.AP)
Nof1 trials, single participant trials in which multiple treatments are sequentially randomized over the study period, can give direct estimates of individualspecific treatment effects. Combining nof1 trials gives extra information for estimating the population average treatment effect compared with randomized controlled trials and increases precision for individualspecific treatment effect estimates. In this paper, we present a procedure for designing nof1 trials. We formally define the design components for determining the sample size of a series of nof1 trials, present models for analyzing these trials and use them to derive the sample size formula for estimating the population average treatment effect and the standard error of the individualspecific treatment effect estimates. We recommend first finding the possible designs that will satisfy the power requirement for estimating the population average treatment effect and then, if of interest, finalizing the design to also satisfy the standard error requirements for the individualspecific treatment effect estimates. The procedure is implemented and illustrated in the paper and through a Shiny app.
 [24] arXiv:2110.08989 [pdf, other]

Title: Valid and Exact Statistical Inference for Multidimensional Multiple ChangePoints by Selective InferenceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this paper, we study statistical inference of changepoints (CPs) in multidimensional sequence. In CP detection from a multidimensional sequence, it is often desirable not only to detect the location, but also to identify the subset of the components in which the change occurs. Several algorithms have been proposed for such problems, but no valid exact inference method has been established to evaluate the statistical reliability of the detected locations and components. In this study, we propose a method that can guarantee the statistical reliability of both the location and the components of the detected changes. We demonstrate the effectiveness of the proposed method by applying it to the problems of genomic abnormality identification and human behavior analysis.
 [25] arXiv:2110.09013 [pdf, other]

Title: A Spacetime Model for Inferring A Susceptibility Map for An Infectious DiseaseSubjects: Applications (stat.AP)
Motivated by footandmouth disease (FMD) outbreak data from Turkey, we develop a model to estimate disease risk based on a spacetime record of outbreaks. The spread of infectious disease in geographical units depends on both transmission between neighbouring units and the intrinsic susceptibility of each unit to an outbreak. Spatially correlated susceptibility may arise from known factors, such as population density, or unknown (or unmeasured) factors such as commuter flows, environmental conditions, or health disparities. Our framework accounts for both spacetime transmission and susceptibility. We model the unknown spatially correlated susceptibility as a Gaussian process. We show that the susceptibility surface can be estimated from observed, geolocated time series of infection events and use a projectionbased dimension reduction approach which improves computational efficiency. In addition to identifying high risk regions from the Turkey FMD data, we also study how our approach works on the well known EnglandWales measles outbreaks data; our latter study results in an estimated susceptibility surface that is strongly correlated with population size, consistent with prior analyses.
 [26] arXiv:2110.09040 [pdf, ps, other]

Title: A Bayesian approach to multitask learning with network lassoSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
Network lasso is a method for solving a multitask learning problem through the regularized maximum likelihood method. A characteristic of network lasso is setting a different model for each sample. The relationships among the models are represented by relational coefficients. A crucial issue in network lasso is to provide appropriate values for these relational coefficients. In this paper, we propose a Bayesian approach to solve multitask learning problems by network lasso. This approach allows us to objectively determine the relational coefficients by Bayesian estimation. The effectiveness of the proposed method is shown in a simulation study and a real data analysis.
 [27] arXiv:2110.09042 [pdf, other]

Title: Kernelbased estimation for partially functional linear model: Minimax rates and randomized sketchesSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
This paper considers the partially functional linear model (PFLM) where all predictive features consist of a functional covariate and a high dimensional scalar vector. Over an infinite dimensional reproducing kernel Hilbert space, the proposed estimation for PFLM is a least square approach with two mixed regularizations of a functionnorm and an $\ell_1$norm. Our main task in this paper is to establish the minimax rates for PFLM under high dimensional setting, and the optimal minimax rates of estimation is established by using various techniques in empirical process theory for analyzing kernel classes. In addition, we propose an efficient numerical algorithm based on randomized sketches of the kernel matrix. Several numerical experiments are implemented to support our method and optimization strategy.
 [28] arXiv:2110.09115 [pdf, other]

Title: Optimal designs for experiments for scalaronfunction linear modelsSubjects: Methodology (stat.ME)
The aim of this work is to extend the usual optimal experimental design paradigm to experiments where the settings of one or more factors are functions. For these new experiments, a design consists of combinations of functions for each run of the experiment along with settings for nonfunctional variables. After briefly introducing the class of functional variables, basis function systems are described. Basis function expansion is applied to a functional linear model consisting of both functional and scalar factors, reducing the problem to an optimisation problem of a single design matrix.
 [29] arXiv:2110.09143 [pdf, other]

Title: Variance Reduction in Stochastic Reaction Networks using Control VariatesComments: arXiv admin note: substantial text overlap with arXiv:1905.00854Subjects: Methodology (stat.ME); Systems and Control (eess.SY); Molecular Networks (qbio.MN); Quantitative Methods (qbio.QM)
Monte Carlo estimation in plays a crucial role in stochastic reaction networks. However, reducing the statistical uncertainty of the corresponding estimators requires sampling a large number of trajectories. We propose control variates based on the statistical moments of the process to reduce the estimators' variances. We develop an algorithm that selects an efficient subset of infinitely many control variates. To this end, the algorithm uses resampling and a redundancyaware greedy selection. We demonstrate the efficiency of our approach in several case studies.
 [30] arXiv:2110.09167 [pdf, other]

Title: RKHSSHAP: Shapley Values for Kernel MethodsComments: 11 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values, a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing Shapley values from a functional perspective, we propose \textsc{RKHSSHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations  a key yet often overlooked desideratum for interpretability. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the Shapley values of sensitive features.
 [31] arXiv:2110.09275 [pdf, ps, other]

Title: Double Robust MassImputation with Matching EstimatorsAuthors: Ali Furkan KalaySubjects: Methodology (stat.ME)
This paper proposes using a method named Double Score Matching (DSM) to do massimputation and presents an application to make inferences with a nonprobability sample. DSM is a $k$Nearest Neighbors algorithm that uses two balance scores instead of covariates to reduce the dimension of the distance metric and thus to achieve a faster convergence rate. DSM massimputation and population inference are consistent if one of two balance score models is correctly specified. Simulation results show that the DSM performs better than recently developed double robust estimators when the data generating process has nonlinear confounders. The nonlinearity of the DGP is a major concern because it cannot be tested, and it leads to a violation of the assumptions required to achieve consistency. Even if the consistency of the DSM relies on the two modeling assumptions, it prevents bias from inflating under such cases because DSM is a semiparametric estimator. The confidence intervals are constructed using a wild bootstrapping approach. The proposed bootstrapping method generates valid confidence intervals as long as DSM is consistent.
 [32] arXiv:2110.09333 [pdf, other]

Title: Regression with Missing Data, a Comparison Study of TechniquesBased on Random ForestsSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper GomezMendez and Joly (2020) on the consistency ofthis new algorithm.
 [33] arXiv:2110.09360 [pdf, other]

Title: Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learningAuthors: Rodolfo S. M. Freitas, Ágatha P. F. Lima, Cheng Chen, Fernando A. Rochinha, Daniel Mira, Xi JiangComments: 22 pages, 13 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Accurate determination of fuel properties of complex mixtures over a wide range of pressure and temperature conditions is essential to utilizing alternative fuels. The present work aims to construct cheaptocompute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a datafusionfidelity approach. Here, Gaussian Process (GP) and probabilistic generative models are adopted. GP is a popular nonparametric Bayesian approach to build surrogate models mainly due to its capacity to handle the aleatory and epistemic uncertainties. Generative models have shown the ability of deep neural networks employed with the same intent. In this work, ML analysis is focused on a particular property, the fuel density, but it can also be extended to other physicochemical properties. This study explores the versatility of the ML models to handle multifidelity data. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
 [34] arXiv:2110.09361 [pdf, other]

Title: Efficient Exploration in Binary and Preferential Bayesian OptimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Bayesian optimization (BO) is an effective approach to optimize expensive blackbox functions, that seeks to tradeoff between exploitation (selecting parameters where the maximum is likely) and exploration (selecting parameters where we are uncertain about the objective function). In many realworld situations, direct measurements of the objective function are not possible, and only binary measurements such as success/failure or pairwise comparisons are available. To perform efficient exploration in this setting, we show that it is important for BO algorithms to distinguish between different types of uncertainty: epistemic uncertainty, about the unknown objective function, and aleatoric uncertainty, which comes from noisy observations and cannot be reduced. In effect, only the former is important for efficient exploration. Based on this, we propose several new acquisition functions that outperform stateoftheart heuristics in binary and preferential BO, while being fast to compute and easy to implement. We then generalize these acquisition rules to batch learning, where multiple queries are performed simultaneously.
 [35] arXiv:2110.09382 [pdf, other]

Title: FrequentistBayes Hybrid Covariance Estimationfor Unfolding ProblemsAuthors: Pim Jordi VerschuurenSubjects: Methodology (stat.ME); High Energy Physics  Experiment (hepex)
In this paper we present a frequentistBayesian hybrid method for estimating covariances of unfolded distributions using pseudoexperiments. The method is compared with other covariance estimation methods using the unbiased RaoCramer bound (RCB) and frequentist pseudoexperiments. We show that the unbiased RCB method diverges from the other two methods when regularization is introduced. The new hybrid method agrees well with the frequentist pseudoexperiment method for various amounts of regularization. However, the hybrid method has the added advantage of not requiring a clear likelihood definition and can be used in combination with any unfolding algorithm that uses a response matrix to model the detector response.
 [36] arXiv:2110.09497 [pdf, other]

Title: Gradient boosting with extremevalue theory for wildfire predictionAuthors: Jonathan KohSubjects: Applications (stat.AP)
This paper details the approach of the team $\textit{Kohrrelation}$ in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extremevalue theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial crossvalidation scheme and show that in our setting it provides a better proxy for test set performance than naive crossvalidation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking.
 [37] arXiv:2110.09502 [pdf, other]

Title: Minimum $\ell_{1}$norm interpolators: Precise asymptotics and multiple descentSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators  the ones that achieve zero training error  may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$norm solutions in the overparameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and highdimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size).
We observe, and provide rigorous theoretical justification for, a curious multidescent phenomenon; that is, the generalization risk of the minimum $\ell_1$norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$norm interpolator as well as the delicate interplay between the overparameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two nonlinear equations with two unknowns.
Crosslists for Tue, 19 Oct 21
 [38] arXiv:2008.08342 (crosslist from condmat.disnn) [pdf, other]

Title: Structure Learning in Inverse Ising Problems Using $\ell_2$Regularized Linear EstimatorComments: 35 pages, 8 figuresSubjects: Disordered Systems and Neural Networks (condmat.disnn); Machine Learning (cs.LG); Machine Learning (stat.ML)
The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacherstudent scenario under the assumption that the teacher couplings are sparse, the analysis is conducted using the replica and cavity methods, with a special focus on whether the presence/absence of teacher couplings is correctly inferred or not. The result indicates that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization when the number of spins $N$ is smaller than the dataset size $M$, in the thermodynamic limit $N\to \infty$. Further, to access the underdetermined region $M < N$, we examine the effect of the $\ell_2$ regularization, and find that biases appear in all the coupling estimates, preventing the perfect identification of the network structure. We, however, find that the biases are shown to decay exponentially fast as the distance from the center spin chosen in the pseudolikelihood method grows. Based on this finding, we propose a twostage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold; in the second stage the naive linear regression is conducted only on the remaining couplings, and the resultant estimates are again pruned by another relatively large threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0<M/N<1$. Results of extensive numerical experiments support these findings.
 [39] arXiv:2110.08331 (crosslist from cs.LG) [pdf, other]

Title: A New Approach for Interpretability and Reliability in Clinical Risk Prediction: Acute Coronary Syndrome ScenarioAuthors: Francisco Valente, Jorge Henriques, Simão Paredes, Teresa Rocha, Paulo de Carvalho, João MoraisComments: Accepted for publication in the Artificial Intelligence in Medicine journal. Abstract abridged to respect the arXiv's characters limitJournalref: Artificial Intelligence in Medicine, Volume 117, 2021Subjects: Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)
We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models. More specifically, we aim to develop a method that, besides having a good performance, offers a personalized model and outcome for each patient, presents high interpretability, and incorporates an estimation of the prediction reliability which is not usually available. By combining these features in the same approach we expect that it can boost the confidence of physicians to use such a tool in their daily activity. In order to achieve the mentioned goals, a threestep methodology was developed: several rules were created by dichotomizing risk factors; such rules were trained with a machine learning classifier to predict the acceptance degree of each rule (the probability that the rule is correct) for each patient; that information was combined and used to compute the risk of mortality and the reliability of such prediction. The methodology was applied to a dataset of patients admitted with any type of acute coronary syndromes (ACS), to assess the 30days allcause mortality risk. The performance was compared with stateoftheart approaches: logistic regression (LR), artificial neural network (ANN), and clinical risk score model (Global Registry of Acute Coronary Events  GRACE). The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization; it also significantly outperforms the GRACE risk model and the standard ANN model. The calibration curve also suggests a very good generalization ability of the obtained model as it approaches the ideal curve. Finally, the reliability estimation of individual predictions presented a great correlation with the misclassifications rate. Those properties may have a beneficial application in other clinical scenarios as well. [abridged]
 [40] arXiv:2110.08348 (crosslist from qbio.PE) [pdf, other]

Title: Estimating individual admixture from finite reference databasesComments: 17 pages, 3 figuresSubjects: Populations and Evolution (qbio.PE); Statistics Theory (math.ST)
The concept of individual admixture (IA) assumes that the genome of individuals is composed of alleles inherited from $K$ ancestral populations. Each copy of each allele has the same chance $q_k$ to originate from population $k$, and together with the allele frequencies in all populations $p$ comprises the admixture model, which is the basis for software like {\sc STRUCTURE} and {\sc ADMIXTURE}. Here, we assume that $p$ is given through a finite reference database, and $q$ is estimated via maximum likelihood. Above all, we are interested in efficient estimation of $q$, and the variance of the estimator which originates from finiteness of the reference database, i.e.\ a variance in $p$. We provide a central limit theorem for the maximumlikelihood estimator, give simulation results, and discuss applications in forensic genetics.
 [41] arXiv:2110.08577 (crosslist from math.OC) [pdf, other]

Title: NysCurve: NyströmApproximated Curvature for Stochastic OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
The quasiNewton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the firstorder derivatives. In this study, we propose an approximate Newton stepbased stochastic optimization algorithm for largescale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address largescale scenarios in which even computing a partial Hessian may require significant time, we used distributionpreserving (DP) subsampling to compute a partial Hessian. The DP subsampling generates $p$ subsamples with similar first and secondorder distribution statistics and selects a single subsample at each epoch in a roundrobin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variancereduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the stateoftheart firstorder and the stochastic quasiNewton methods.
 [42] arXiv:2110.08600 (crosslist from eess.SP) [pdf, other]

Title: PDMM: A novel PrimalDual MajorizationMinimization algorithm for Poisson PhaseRetrieval problemSubjects: Signal Processing (eess.SP); Optimization and Control (math.OC); Computation (stat.CO)
In this paper, we introduce a novel iterative algorithm for the problem of phaseretrieval where the measurements consist of only the magnitude of linear function of the unknown signal, and the noise in the measurements follow Poisson distribution. The proposed algorithm is based on the principle of majorizationminimization (MM); however, the application of MM here is very novel and distinct from the way MM has been usually used to solve optimization problems in the literature. More precisely, we reformulate the original minimization problem into a saddle point problem by invoking Fenchel dual representation of the log (.) term in the Poisson likelihood function. We then propose tighter surrogate functions over both primal and dual variables resulting in a doubleloop MM algorithm, which we have named as PrimalDual MajorizationMinimization (PDMM) algorithm. The iterative steps of the resulting algorithm are simple to implement and involve only computing matrix vector products. We also extend our algorithm to handle various L1 regularized Poisson phaseretrieval problems (which exploit sparsity). The proposed algorithm is compared with previously proposed algorithms such as wirtinger flow (WF), MM (conventional), and alternating direction methods of multipliers (ADMM) for the Poisson data model. The simulation results under different experimental settings show that PDMM is faster than the competing methods, and its performance in recovering the original signal is at par with the stateoftheart algorithms.
 [43] arXiv:2110.08605 (crosslist from cs.DL) [pdf, other]

Title: Statistics in everyone's backyard: an impact study via citation network analysisSubjects: Digital Libraries (cs.DL); Applications (stat.AP)
The increasing availability of curated citation data provides a wealth of resources for analyzing and understanding the intellectual influence of scientific publications. In the field of statistics, current studies of citation data have mostly focused on the interactions between statistical journals and papers, limiting the measure of influence to mainly within statistics itself. In this paper, we take the first step towards understanding the impact statistics has made on other scientific fields in the era of Big Data. By collecting comprehensive bibliometric data from the Web of Science database for selected statistical journals, we investigate the citation trends and compositions of citing fields over time to show that their diversity has been increasing. Furthermore, we use the local clustering technique involving personalized PageRank with conductance for size selection to find the most relevant statistical research area for a given external topic of interest. We provide theoretical guarantees for the procedure and, through a number of case studies, show the results from our citation data align well with our knowledge and intuition about these external topics. Overall, we have found that the statistical theory and methods recently invented by the statistics community have made increasing impact on other scientific fields.
 [44] arXiv:2110.08607 (crosslist from cs.LG) [pdf, other]

Title: Physicsguided Deep Markov Models for Learning Nonlinear Dynamical Systems with UncertaintySubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)
In this paper, we propose a probabilistic physicsguided framework, termed Physicsguided Deep Markov Model (PgDMM). The framework is especially targeted to the inference of the characteristics and latent structure of nonlinear dynamical systems from measurement data, where it is typically intractable to perform exact inference of latent variables. A recently surfaced option pertains to leveraging variational inference to perform approximate inference. In such a scheme, transition and emission functions of the system are parameterized via feedforward neural networks (deep generative models). However, due to the generalized and highly versatile formulation of neural network functions, the learned latent space is often prone to lack physical interpretation and structured representation. To address this, we bridge physicsbased state space models with Deep Markov Models, thus delivering a hybrid modeling framework for unsupervised learning and identification for nonlinear dynamical systems. Specifically, the transition process can be modeled as a physicsbased model enhanced with an additive neural network component, which aims to learn the discrepancy between the physicsbased model and the actual dynamical system being monitored. The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system by imposing physicsdriven restrictions on the side of the latent space. We demonstrate the benefits of such a fusion in terms of achieving improved performance on illustrative simulation examples and experimental case studies of nonlinear systems. Our results indicate that the physicsbased models involved in the employed transition and emission functions essentially enforce a more structured and physically interpretable latent space, which is essential to generalization and prediction capabilities.
 [45] arXiv:2110.08627 (crosslist from cs.LG) [pdf, other]

Title: On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic BanditsComments: 27 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBWlil'UCB$({\gamma})$ algorithm, which achieves orderwise optimal performance for RM or BAI under different values of ${\gamma}$. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish nontrivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBWlil'UCB$({\gamma})$ achieves Paretooptimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBWlil'UCB outperforms a close competitor UCB$_{\alpha}$ (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.
 [46] arXiv:2110.08634 (crosslist from cs.SD) [pdf, other]

Title: Towards Robust WaveformBased Acoustic ModelsSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
We propose an approach for learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions. This problem is of paramount importance for the deployment of speech recognition systems that need to perform well in unseen environments. Our approach is an instance of vicinal risk minimization, which aims to improve risk estimates during training by replacing the delta functions that define the empirical density over the input space with an approximation of the marginal population density in the vicinity of the training samples. More specifically, we assume that local neighborhoods centered at training samples can be approximated using a mixture of Gaussians, and demonstrate theoretically that this can incorporate robust inductive bias into the learning process. We characterize the individual mixture components implicitly via data augmentation schemes, designed to address common sources of spurious correlations in acoustic models. To avoid potential confounding effects on robustness due to information loss, which has been associated with standard feature extraction techniques (e.g., FBANK and MFCC features), we focus our evaluation on the waveformbased setting. Our empirical results show that the proposed approach can generalize to unseen noise conditions, with 150% relative improvement in outofdistribution generalization compared to training using the standard risk minimization principle. Moreover, the results demonstrate competitive performance relative to models learned using a training sample designed to match the acoustic conditions characteristic of test utterances (i.e., optimal vicinal densities).
 [47] arXiv:2110.08678 (crosslist from cs.LG) [pdf, other]

Title: Transformer with a Mixture of Gaussian KeysAuthors: Tam Nguyen, Tan M. Nguyen, Dung Le, Khuong Nguyen, Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. OsherComments: 21 pages, 8 figures, 4 tablesSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Multihead attention is a driving force behind stateoftheart transformers which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (TransformerMGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head. These mixtures of keys follow a Gaussian mixture model and allow each attention head to focus on different parts of the input sequence efficiently. Compared to its conventional transformer counterpart, TransformerMGK accelerates training and inference, has fewer parameters, and requires less FLOPs to compute while achieving comparable or better accuracy across tasks. TransformerMGK can also be easily extended to use with linear attentions. We empirically demonstrate the advantage of TransformerMGK in a range of practical applications including language modeling and tasks that involve very long sequences. On the Wikitext103 and Long Range Arena benchmark, TransformerMGKs with 4 heads attain comparable or better performance to the baseline transformers with 8 heads.
 [48] arXiv:2110.08691 (crosslist from cs.DS) [pdf, ps, other]

Title: Terminal Embeddings in Sublinear TimeComments: Accepted to FOCS 2021Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Machine Learning (cs.LG); Machine Learning (stat.ML)
Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying
\begin{equation*}
\forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) .
\end{equation*}
In the case that $X,Y$ are both Euclidean metrics with $Y$ being $m$dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{2}\log n)$ for $n := T$. This generalizes the JohnsonLindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside is that evaluating the embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in $m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to preprocess $T$ to obtain an almost linearspace data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $n^{1\Theta(\epsilon^2)+o(1)} + dn^{o(1)}$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.  [49] arXiv:2110.08693 (crosslist from cs.LG) [pdf, other]

Title: On the Statistical Analysis of Complex Treeshaped 3D ObjectsSubjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Graphics (cs.GR); Machine Learning (stat.ML)
How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such treelike 3D objects. A hierarchical organization of subtrees characterizes these objects  each subtree has the main branch with some side branches attached  and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the SquareRoot Velocity Function (SRVF), initially developed for Euclidean curves, to treeshaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one treeshaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of treeshaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of treeshaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel treeshaped 3D objects through random sampling from estimated probability distributions.
 [50] arXiv:2110.08695 (crosslist from cs.LG) [pdf, other]

Title: Towards InstanceOptimal Offline Reinforcement Learning with PessimismComments: NeurIPS, 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a rewardmaximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for finitehorizon MDPs. Prior works study this problem based on different datacoverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches \[ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). \] In complementary, we also prove a perinstance informationtheoretical lower bound under the weak assumption that $d^\mu_h(s_h,a_h)>0$ if $d^{\pi^\star}_h(s_h,a_h)>0$. Different from the previous minimax lower bounds, the perinstance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately. Here $\pi^\star$ is a optimal policy, $\mu$ is the behavior policy and $d_h^\mu$ is the marginal stateaction probability. We call the above equation the intrinsic offline reinforcement learning bound since it directly implies all the existing optimal results: minimax rate under uniform datacoverage assumption, horizonfree setting, single policy concentrability, and the tight problemdependent results. Later, we extend the result to the assumptionfree regime (where we make no assumption on $ \mu$) and obtain the assumptionfree intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
 [51] arXiv:2110.08710 (crosslist from cs.LG) [pdf, ps, other]

Title: NeuralArTS: Structuring Neural Architecture Search with Type TheorySubjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL); Machine Learning (stat.ML)
Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Developing these search spaces is usually a manual affair with preoptimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralArTS) that categorizes the infinite set of network operations in a structured type system. We further demonstrate how NeuralArTS can be applied to convolutional layers and propose several future directions.
 [52] arXiv:2110.08720 (crosslist from cs.LG) [pdf, other]

Title: Centroid Approximation for BootstrapSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in largescale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality "centroid" points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.
 [53] arXiv:2110.08850 (crosslist from physics.socph) [pdf]

Title: Understanding the network formation pattern for better link predictionComments: 21 pages, 3 figures, 18 tables, and 29 referencesSubjects: Physics and Society (physics.socph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Molecular Networks (qbio.MN); Machine Learning (stat.ML)
As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network typespecific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is dominated by the Triadic Closure Principle. We still lack an adaptive and comprehensive understanding of network formation patterns for predicting potential links. In addition, it is valuable to investigate how network local information can be better utilized. To this end, we proposed a novel method named Link prediction using Multiple Order Local Information (MOLI) that exploits the local information from the neighbors of different distances, with parameters that can be a priordriven based on prior knowledge, or datadriven by solving an optimization problem on observed networks. MOLI defined a local network diffusion process via random walks on the graph, resulting in better use of network information. We show that MOLI outperforms the other 11 widely used link prediction algorithms on 11 different types of simulated and realworld networks. We also conclude that there are different patterns of local information utilization for different networks, including social networks, communication networks, biological networks, etc. In particular, the classical common neighborbased algorithm is not as adaptable to all social networks as it is perceived to be; instead, some of the social networks obey the Quadrilateral Closure Principle which preferentially connects paths of length three.
 [54] arXiv:2110.08871 (crosslist from cs.LG) [pdf, ps, other]

Title: Noiserobust ClusteringSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper presents noiserobust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$means and $K$medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distributionbased $K$means and $K$medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.
 [55] arXiv:2110.08922 (crosslist from cs.LG) [pdf, other]

Title: Explaining generalization in deep learning: progress and fundamental limitsAuthors: Vaishnavh NagarajanComments: arXiv admin note: text overlap with arXiv:1902.04742Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?
In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em datadependent} {\em uniformconvergencebased} generalization bounds with improved dependencies on the parameter count.
Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound.
With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniformconvergecebased complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision.
We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.  [56] arXiv:2110.08984 (crosslist from cs.LG) [pdf, ps, other]

Title: Optimistic Policy Optimization is Provably Efficient in Nonstationary MDPsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study episodic reinforcement learning (RL) in nonstationary linear kernel Markov decision processes (MDPs). In this setting, both the reward function and the transition kernel are linear with respect to the given feature maps and are allowed to vary over time, as long as their respective parameter variations do not exceed certain variation budgets. We propose the $\underline{\text{p}}$eriodically $\underline{\text{r}}$estarted $\underline{\text{o}}$ptimistic $\underline{\text{p}}$olicy $\underline{\text{o}}$ptimization algorithm (PROPO), which is an optimistic policy optimization algorithm with linear function approximation. PROPO features two mechanisms: slidingwindowbased policy evaluation and periodicrestartbased policy improvement, which are tailored for policy optimization in a nonstationary environment. In addition, only utilizing the technique of sliding window, we propose a valueiteration algorithm. We establish dynamic upper bounds for the proposed methods and a matching minimax lower bound which shows the (near) optimality of the proposed methods. To our best knowledge, PROPO is the first provably efficient policy optimization algorithm that handles nonstationarity.
 [57] arXiv:2110.08985 (crosslist from cs.CV) [pdf, other]

Title: StyleNeRF: A Stylebased 3DAware Generator for Highresolution Image SynthesisComments: 24 pages, 19 figures. Project page: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose StyleNeRF, a 3Daware generative model for photorealistic highresolution image synthesis with high multiview consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize highresolution images with fine details or yield noticeable 3Dinconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a stylebased generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for highresolution image generation. We perform volume rendering only to produce a lowresolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize highresolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoomin andout, style mixing, inversion, and semantic editing.
 [58] arXiv:2110.09006 (crosslist from cs.CV) [pdf, other]

Title: Natural Image Reconstruction from fMRI using Deep Learning: A SurveySubjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (qbio.NC); Machine Learning (stat.ML)
With the advent of brain imaging techniques and machine learning tools, much effort has been devoted to building computational models to capture the encoding of visual information in the human brain. One of the most challenging brain decoding tasks is the accurate reconstruction of the perceived natural images from brain activities measured by functional magnetic resonance imaging (fMRI). In this work, we survey the most recent deep learning methods for natural image reconstruction from fMRI. We examine these methods in terms of architectural design, benchmark datasets, and evaluation metrics and present a fair performance evaluation across standardized evaluation metrics. Finally, we discuss the strengths and limitations of existing studies and present potential future directions.
 [59] arXiv:2110.09140 (crosslist from cs.LG) [pdf, other]

Title: Learning Prototypeoriented Set Representations for MetaLearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning from setstructured data is a fundamental problem that has recently attracted increasing attention, where a series of summary networks are introduced to deal with the set input. In fact, many metalearning problems can be treated as setinput tasks. Most existing summary networks aim to design different architectures for the input set in order to enforce permutation invariance. However, scant attention has been paid to the common cases where different sets in a metadistribution are closely related and share certain statistical properties. Viewing each set as a distribution over a set of global prototypes, this paper provides a novel optimal transport (OT) based way to improve existing summary networks. To learn the distribution over the global prototypes, we minimize its OT distance to the set empirical distribution over data points, providing a natural unsupervised way to improve the summary network. Since our plugandplay framework can be applied to many metalearning problems, we further instantiate it to the cases of fewshot classification and implicit meta generative modeling. Extensive experiments demonstrate that our framework significantly improves the existing summary networks on learning more powerful summary statistics from sets and can be successfully integrated into metricbased fewshot classification and generative modeling applications, providing a promising tool for addressing setinput and metalearning problems.
 [60] arXiv:2110.09154 (crosslist from cs.SI) [pdf, other]

Title: Measuring the influence of beliefs in belief networksAuthors: Aleksandar TomaševićComments: 19 pages, 4 figures. Earlier version of this work was presented at Networks 2021 conferenceSubjects: Social and Information Networks (cs.SI); Physics and Society (physics.socph); Applications (stat.AP)
Influential beliefs are crucial for our understanding of how people reason about political issues and make political decisions. This research proposes a new method for measuring the influence of political beliefs within larger context of belief system networks, based on the advances in psychometric network methods and network influence research. Using the latest round of the European Social Survey data, we demonstrate this approach on a belief network expressing support for the regime in 29 European countries and capturing beliefs related to support for regime performance, principles, institutions, and political actors. Our results show that the average influence of beliefs can be related to the consistency and connectivity of the belief network and that the influence of specific beliefs (e.g. Satisfaction with Democracy) on a country level has a significant negative correlation with external indicators from the same domain (e.g. Liberal Democracy index), which suggests that highly influential beliefs are related to pressing political issues. These findings suggest that networkbased belief influence metrics estimated from largescale survey data can be used a new type of indicator in comparative political research, which opens new avenues for integrating psychometric network analysis methods into political science methodology.
 [61] arXiv:2110.09192 (crosslist from cs.LG) [pdf, other]

Title: Learning Optimal Conformal ClassifiersSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME); Machine Learning (stat.ML)
Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in highstake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's probability estimates to predict confidence sets containing the true class with a userspecified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper endtoend. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on minibatches during training. We show that CT outperforms stateoftheart CP methods for classification by reducing the average confidence set size (inefficiency). Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.
 [62] arXiv:2110.09234 (crosslist from cs.CY) [pdf, other]

Title: Impact of COVID19 Policies and Misinformation on Social UnrestAuthors: Martha Barnard (1), Radhika Iyer (1 and 2), Sara Y. Del Valle (1), Ashlynn R. Daughton (1) ((1) A1 Information Systems and Modeling, Los Alamos National Lab, Los Alamos, NM, USA, (2) Department of Political Science and Department of Computing, Data Science, and Society, University of California, Berkeley, Berkeley, CA, USA)Comments: 21 pages, 9 figuresSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
The novel coronavirus disease (COVID19) pandemic has impacted every corner of earth, disrupting governments and leading to socioeconomic instability. This crisis has prompted questions surrounding how different sectors of society interact and influence each other during times of change and stress. Given the unprecedented economic and societal impacts of this pandemic, many new data sources have become available, allowing us to quantitatively explore these associations. Understanding these relationships can help us better prepare for future disasters and mitigate the impacts. Here, we focus on the interplay between social unrest (protests), health outcomes, public health orders, and misinformation in eight countries of Western Europe and four regions of the United States. We created 13 week forecasts of both a binary protest metric for identifying times of high protest activity and the overall protest counts over time. We found that for all regions, except Belgium, at least one feature from our various data streams was predictive of protests. However, the accuracy of the protest forecasts varied by country, that is, for roughly half of the countries analyzed, our forecasts outperform a na\"ive model. These mixed results demonstrate the potential of diverse data streams to predict a topic as volatile as protests as well as the difficulties of predicting a situation that is as rapidly evolving as a pandemic.
 [63] arXiv:2110.09253 (crosslist from cs.CY) [pdf]

Title: A Sociotechnical View of Algorithmic FairnessComments: Accepted at Information Systems JournalSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
Algorithmic fairness has been framed as a newly emerging technology that mitigates systemic discrimination in automated decisionmaking, providing opportunities to improve fairness in information systems (IS). However, based on a stateoftheart literature review, we argue that fairness is an inherently social concept and that technologies for algorithmic fairness should therefore be approached through a sociotechnical lens. We advance the discourse on algorithmic fairness as a sociotechnical phenomenon. Our research objective is to embed AF in the sociotechnical view of IS. Specifically, we elaborate on why outcomes of a system that uses algorithmic means to assure fairness depends on mutual influences between technical and social structures. This perspective can generate new insights that integrate knowledge from both technical fields and social studies. Further, it spurs new directions for IS debates. We contribute as follows: First, we problematize fundamental assumptions in the current discourse on algorithmic fairness based on a systematic analysis of 310 articles. Second, we respond to these assumptions by theorizing algorithmic fairness as a sociotechnical construct. Third, we propose directions for IS researchers to enhance their impacts by pursuing a unique understanding of sociotechnical algorithmic fairness. We call for and undertake a holistic approach to AF. A sociotechnical perspective on algorithmic fairness can yield holistic solutions to systemic biases and discrimination.
 [64] arXiv:2110.09272 (crosslist from cs.CY) [pdf]

Title: MultiObjective Allocation of COVID19 Testing Centers: Improving Coverage and Equity in AccessSubjects: Computers and Society (cs.CY); Optimization and Control (math.OC); Applications (stat.AP)
At the time of this article, COVID19 has been transmitted to more than 42 million people and resulted in more than 673,000 deaths across the United States. Throughout this pandemic, public health authorities have monitored the results of diagnostic testing to identify hotspots of transmission. Such information can help reduce or block transmission paths of COVID19 and help infected patients receive early treatment. However, most current schemes of test site allocation have been based on experience or convenience, often resulting in low efficiency and nonoptimal allocation. In addition, the historical sociodemographic patterns of populations within cities can result in measurable inequities in access to testing between various racial and income groups. To address these pressing issues, we propose a novel test site allocation scheme to (a) maximize population coverage, (b) minimize prediction uncertainties associated with projections of outbreak trajectories, and (c) reduce inequities in access. We illustrate our approach with case studies comparing our allocation scheme with recorded allocation of testing sites in Georgia, revealing increases in both population coverage and improvements in equity of access over current practice.
 [65] arXiv:2110.09327 (crosslist from cs.LG) [pdf, other]

Title: SelfSupervised Representation Learning: Introduction, Advances and ChallengesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Selfsupervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pretraining alternatives across a variety of data modalities including image, video, sound, text and graphs. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how selfsupervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and compute cost. Finally, we survey the major open challenges in the field that provide fertile ground for future work.
 [66] arXiv:2110.09334 (crosslist from math.OC) [pdf, other]

Title: A portfolio approach to massively parallel Bayesian optimizationSubjects: Optimization and Control (math.OC); Machine Learning (stat.ML)
One way to reduce the time of conducting optimization studies is to evaluate designs in parallel rather than just oneatatime. For expensivetoevaluate blackboxes, batch versions of Bayesian optimization have been proposed. They work by building a surrogate model of the blackbox that can be used to select the designs to evaluate efficiently via an infill criterion. Still, with higher levels of parallelization becoming available, the strategies that work for a few tens of parallel evaluations become limiting, in particular due to the complexity of selecting more evaluations. It is even more crucial when the blackbox is noisy, necessitating more evaluations as well as repeating experiments. Here we propose a scalable strategy that can keep up with massive batching natively, focused on the exploration/exploitation tradeoff and a portfolio allocation. We compare the approach with related methods on deterministic and noisy functions, for mono and multiobjective optimization tasks. These experiments show similar or better performance than existing methods, while being orders of magnitude faster.
 [67] arXiv:2110.09356 (crosslist from cs.LG) [pdf, other]

Title: Towards Federated Bayesian Network Structure Learning with Continuous OptimizationComments: 16 pages; 5 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a crosssilo federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size.
 [68] arXiv:2110.09429 (crosslist from qfin.TR) [pdf, other]

Title: Understanding jumps in high frequency digital asset marketsSubjects: Trading and Market Microstructure (qfin.TR); Applications (stat.AP)
While attention is a predictor for digital asset prices, and jumps in Bitcoin prices are wellknown, we know little about its alternatives. Studying high frequency crypto data gives us the unique possibility to confirm that cross market digital asset returns are driven by high frequency jumps clustered around black swan events, resembling volatility and trading volume seasonalities. Regressions show that intraday jumps significantly influence end of day returns in size and direction. This provides fundamental research for crypto option pricing models. However, we need better econometric methods for capturing the specific market microstructure of cryptos. All calculations are reproducible via the quantlet.com technology.
 [69] arXiv:2110.09443 (crosslist from cs.LG) [pdf, other]

Title: Beltrami Flow and Neural Diffusion on GraphsAuthors: Benjamin Paul Chamberlain, James Rowbottom, Davide Eynard, Francesco Di Giovanni, Xiaowen Dong, Michael M BronsteinComments: 21 pages, 5 figures. Proceedings of the Thirtyfifth Conference on Neural Information Processing Systems (NeurIPS) 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose a novel class of graph neural networks based on the discretised Beltrami flow, a nonEuclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves stateoftheart results on several benchmarks.
 [70] arXiv:2110.09468 (crosslist from cs.LG) [pdf, other]

Title: Improving Robustness using Generated DataAuthors: Sven Gowal, SylvestreAlvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, Timothy MannComments: Accepted at NeurIPS 2021Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recent work argues that robust training requires substantially larger datasets than those required for standard classification. On CIFAR10 and CIFAR100, this translates into a sizable robustaccuracy gap between models trained solely on data from the original training set and those trained with additional data extracted from the "80 Million Tiny Images" dataset (TI80M). In this paper, we explore how generative models trained solely on the original training set can be leveraged to artificially increase the size of the original training set and improve adversarial robustness to $\ell_p$ normbounded perturbations. We identify the sufficient conditions under which incorporating additional generated data can improve robustness, and demonstrate that it is possible to significantly reduce the robustaccuracy gap to models trained with additional real data. Surprisingly, we even show that even the addition of nonrealistic random data (generated by Gaussian sampling) can improve robustness. We evaluate our approach on CIFAR10, CIFAR100, SVHN and TinyImageNet against $\ell_\infty$ and $\ell_2$ normbounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements in robust accuracy compared to previous stateoftheart methods. Against $\ell_\infty$ normbounded perturbations of size $\epsilon = 8/255$, our models achieve 66.10% and 33.49% robust accuracy on CIFAR10 and CIFAR100, respectively (improving upon the stateoftheart by +8.96% and +3.29%). Against $\ell_2$ normbounded perturbations of size $\epsilon = 128/255$, our model achieves 78.31% on CIFAR10 (+3.81%). These results beat most prior works that use external data.
 [71] arXiv:2110.09476 (crosslist from cs.LG) [pdf, other]

Title: Recovery Guarantees for Kernelbased Clustering under Nonparametric Mixture ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Despite the ubiquity of kernelbased clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernelbased clustering algorithms under nonparametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernelbased dataclustering and kernel densitybased clustering. This enables us to provide consistency guarantees for kernelbased estimators of nonparametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering.
 [72] arXiv:2110.09507 (crosslist from cs.LG) [pdf, other]

Title: Provable HierarchyBased MetaReinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expertconstructed hierarchies, or use hierarchylearning heuristics with no provable guarantees. To address this gap, we analyze HRL in the metaRL setting, where a learner learns latent hierarchical structure during metatraining for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised metalearning theory, we provide "diversity conditions" which, together with a tractable optimismbased algorithm, guarantee sampleefficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a metatest task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
 [73] arXiv:2110.09514 (crosslist from cs.LG) [pdf, other]

Title: Discovering and Achieving Goals via World ModelsComments: NeurIPS 2021. First two authors contributed equally. Website at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zeroshot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goalreaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/
Replacements for Tue, 19 Oct 21
 [74] arXiv:1712.01145 (replaced) [pdf, other]

Title: Learning Fast and Slow: PROPEDEUTICA for Realtime Malware DetectionAuthors: Ruimin Sun, Xiaoyong Yuan, Pan He, Qile Zhu, Aokun Chen, Andre Gregio, Daniela Oliveira, Xiaolin LiComments: 12 pages, 4 figures. This paper has been accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [75] arXiv:1901.08057 (replaced) [pdf, ps, other]

Title: Large dimensional analysis of general margin based classification methodsComments: 33 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
 [76] arXiv:1902.04742 (replaced) [pdf, other]

Title: Uniform convergence may be unable to explain generalization in deep learningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [77] arXiv:1904.11060 (replaced) [pdf, ps, other]

Title: Normal Approximation in Large Network ModelsSubjects: Econometrics (econ.EM); Statistics Theory (math.ST)
 [78] arXiv:1910.09714 (replaced) [pdf, other]

Title: SmoothnessAdaptive Contextual BanditsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [79] arXiv:1911.02319 (replaced) [pdf, other]

Title: Improving reinforcement learning algorithms: towards optimal learning rate policiesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [80] arXiv:1911.09171 (replaced) [pdf, other]

Title: ReEvaluating StrengthenedIV Designs: Asymptotic Efficiency, Bias Formula, and the Validity and Power of Sensitivity AnalysesComments: 86 pages, 4 figures, 6 tablesSubjects: Methodology (stat.ME); Applications (stat.AP)
 [81] arXiv:2003.00660 (replaced) [pdf, ps, other]

Title: Upper Confidence PrimalDual Reinforcement Learning for CMDP with Adversarial LossSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [82] arXiv:2003.06566 (replaced) [pdf, other]

Title: On the benefits of defining vicinal distributions in latent spaceComments: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper Award at CVPR 2021 Workshop on Adversarial Machine Learning in RealWorld Computer Vision (AMLCV), Also accepted at ICLR 2021 Workshops on RobustReliable Machine Learning (Oral) and Generalization beyond the training distribution (Abstract)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [83] arXiv:2003.10323 (replaced) [pdf, ps, other]

Title: Monte Carlo integration of nondifferentiable functions on $[0,1]^ι$, $ι=1,\dots,d$, using a single determinantal point pattern defined on $[0,1]^d$Subjects: Computation (stat.CO); Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA); Statistics Theory (math.ST)
 [84] arXiv:2005.03566 (replaced) [pdf, other]

Title: Noisy Differentiable Architecture SearchComments: BMVC 2021Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [85] arXiv:2005.12556 (replaced) [pdf, other]

Title: Truncating the Exponential with a Uniform DistributionSubjects: Methodology (stat.ME)
 [86] arXiv:2006.05842 (replaced) [pdf, other]

Title: The Emergence of IndividualityComments: The extended version of ICML 2021 paperSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
 [87] arXiv:2007.00823 (replaced) [pdf, other]

Title: Dropout as a Regularizer of Interaction EffectsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [88] arXiv:2007.02794 (replaced) [pdf, other]

Title: Efficient Connected and Automated Driving Systemwith Multiagent Graph Reinforcement LearningComments: the paper is not even readySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [89] arXiv:2007.03408 (replaced) [pdf, other]

Title: A Generative Model for Texture Synthesis based on Optimal Transport between Feature DistributionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [90] arXiv:2007.04803 (replaced) [pdf, other]

Title: A Global Stochastic Optimization Particle Filter AlgorithmComments: 61 pages, 4 figuresSubjects: Machine Learning (stat.ML); Statistics Theory (math.ST); Computation (stat.CO)
 [91] arXiv:2007.14052 (replaced) [pdf, other]

Title: Multioutput Gaussian Processes with Functional Data: A Study on Coastal Flood Hazard AssessmentSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
 [92] arXiv:2007.14190 (replaced) [pdf, other]

Title: Variable Selection for Doubly Robust Causal InferenceSubjects: Methodology (stat.ME)
 [93] arXiv:2007.14861 (replaced) [pdf, ps, other]

Title: Efficient Sparse Secure Aggregation for Federated LearningSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
 [94] arXiv:2008.08275 (replaced) [pdf, other]

Title: Asymptotic Analysis for DataDriven Inventory PoliciesSubjects: Statistics Theory (math.ST)
 [95] arXiv:2008.11140 (replaced) [pdf, other]

Title: Powerful InferenceComments: 29 pages, 4 figures, 3 tablesSubjects: Econometrics (econ.EM); Statistics Theory (math.ST)
 [96] arXiv:2008.13443 (replaced) [pdf, other]

Title: On the Quality Requirements of Demand Prediction for Dynamic Public TransportComments: 26 pages, 9 tables, 6 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)
 [97] arXiv:2009.06087 (replaced) [pdf, other]

Title: Neural Networks Enhancement with Logical KnowledgeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)
 [98] arXiv:2010.00373 (replaced) [pdf, other]

Title: Task Agnostic Continual Learning Using Online Variational Bayes with FixedPoint UpdatesComments: The arXiv paper "Task Agnostic Continual Learning Using Online Variational Bayes" is a preliminary preprint of this paper. The main differences between the versions are: 1. We develop new algorithmic framework (FOOVB). 2. We add multivariate Gaussian and matrix variate Gaussian versions of the algorithm. 3. We demonstrate the new algorithm performance in task agnostic scenariosJournalref: Neural Comput 2021; 33 (11)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [99] arXiv:2010.01777 (replaced) [pdf, other]

Title: A Unified View on Graph Neural Networks as Graph Signal DenoisingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [100] arXiv:2011.05348 (replaced) [pdf, other]

Title: SALR: Sharpnessaware Learning Rate Scheduler for Improved GeneralizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [101] arXiv:2012.11026 (replaced) [pdf]

Title: Independent Approximates enable closedform parameter estimation of heavytailed distributionsAuthors: Kenric P. NelsonComments: 30 pages, 8 figures, 7 tablesSubjects: Methodology (stat.ME); Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.dataan)
 [102] arXiv:2101.12353 (replaced) [pdf, other]

Title: On the capacity of deep generative networks for approximating distributionsSubjects: Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [103] arXiv:2103.01621 (replaced) [pdf, other]

Title: Fast selection of nonlinear mixed effect models using penalized likelihoodAuthors: Edouard OllierSubjects: Methodology (stat.ME); Computation (stat.CO)
 [104] arXiv:2103.03632 (replaced) [pdf, other]

Title: Modeling tail risks of inflation using unobserved component quantile regressionsAuthors: Michael PfarrhoferComments: JEL: C11, C22, C53, E31; Keywords: state space models, timevarying parameters, stochastic volatility, predictive inferenceSubjects: Econometrics (econ.EM); Applications (stat.AP)
 [105] arXiv:2104.04910 (replaced) [pdf, other]

Title: Semi$G$normal: a Hybrid between Normal and $G$normal (Full Version)Comments: 109 pages, 8 figures, a comprehensive document for conference and open discussions, to be divided later for publications, readers may navigate to the parts they are interested in by the table of contentsSubjects: Probability (math.PR); Statistics Theory (math.ST)
 [106] arXiv:2104.07084 (replaced) [pdf, other]

Title: Grouped Variable Selection with Discrete Optimization: Computational and Statistical PerspectivesSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
 [107] arXiv:2104.11734 (replaced) [pdf, other]

Title: Exact marginal prior distributions of finite Bayesian neural networksComments: 12+9 pages, 4 figures; v3: Accepted as NeurIPS 2021 SpotlightSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Machine Learning (stat.ML)
 [108] arXiv:2104.14023 (replaced) [pdf, ps, other]

Title: Measuring dependence between random vectors via optimal transportSubjects: Statistics Theory (math.ST)
 [109] arXiv:2104.14204 (replaced) [pdf, other]

Title: Optimal bidding in hourly and quarterhourly electricity price auctions: trading large volumes of power with market impact and transaction costsSubjects: Statistical Finance (qfin.ST); Mathematical Finance (qfin.MF); Portfolio Management (qfin.PM); Trading and Market Microstructure (qfin.TR); Applications (stat.AP)
 [110] arXiv:2105.03425 (replaced) [pdf, other]

Title: Kernel TwoSample Tests for Manifold DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [111] arXiv:2105.07025 (replaced) [pdf, other]

Title: Minimal Cycle Representatives in Persistent Homology using Linear Programming: an Empirical Study with User's GuideSubjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Machine Learning (stat.ML)
 [112] arXiv:2105.08024 (replaced) [pdf, other]

Title: SampleEfficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited RevisitingSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [113] arXiv:2105.08532 (replaced) [pdf, other]

Title: Robust Learning in Heterogeneous ContextsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [114] arXiv:2106.03227 (replaced) [pdf, other]

Title: Neural Tangent Kernel Maximum Mean DiscrepancySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [115] arXiv:2106.03762 (replaced) [pdf, other]

Title: Frustratingly Easy Uncertainty Estimation for Distribution ShiftComments: 17 pages, 4 Tables, 9 FiguresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [116] arXiv:2106.05232 (replaced) [pdf, ps, other]

Title: Realizing GANs via a Tunable Loss FunctionComments: Extended version of a paper accepted to ITW 2021. 8 pages, 2 figuresSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
 [117] arXiv:2106.05565 (replaced) [pdf, other]

Title: Identifiability of interaction kernels in meanfield equations of interacting particlesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
 [118] arXiv:2106.06134 (replaced) [pdf, other]

Title: Is Homophily a Necessity for Graph Neural Networks?Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [119] arXiv:2106.09215 (replaced) [pdf, other]

Title: Optimumstatistical Collaboration Towards General and Efficient Blackbox OptimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [120] arXiv:2106.09769 (replaced) [pdf, other]

Title: Generalized regression operator estimation for continuous time functional data processes with missing at random responseSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
 [121] arXiv:2106.10065 (replaced) [pdf, other]

Title: Being a Bit Frequentist Improves Bayesian Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [122] arXiv:2106.10624 (replaced) [pdf]

Title: Combined tests based on restricted mean time lost for competing risks dataComments: 26 pages, 3 figuresJournalref: Statistics in Biopharmaceutical Research, 2021Subjects: Applications (stat.AP); Methodology (stat.ME)
 [123] arXiv:2106.12423 (replaced) [pdf, other]

Title: AliasFree Generative Adversarial NetworksAuthors: Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo AilaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [124] arXiv:2106.13423 (replaced) [pdf, other]
 [125] arXiv:2106.15358 (replaced) [pdf, other]

Title: Towards SampleOptimal Compressive Phase Retrieval with Sparse and Generative PriorsComments: Accepted to NeurIPS 2021Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
 [126] arXiv:2107.00520 (replaced) [pdf, other]

Title: Predictive Modeling in the Presence of NuisanceInduced Spurious CorrelationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [127] arXiv:2107.00758 (replaced) [pdf, other]

Title: The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [128] arXiv:2107.05686 (replaced) [pdf, other]

Title: The Role of Pretrained Representations for the OOD Generalization of RL AgentsAuthors: Andrea Dittadi, Frederik Träuble, Manuel Wüthrich, Felix Widmaier, Peter Gehler, Ole Winther, Francesco Locatello, Olivier Bachem, Bernhard Schölkopf, Stefan BauerSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [129] arXiv:2107.10880 (replaced) [pdf, other]

Title: Using UMAP to Inspect Audio Data for Unsupervised Anomaly Detection under DomainShift ConditionsComments: Accepted at the DCASE2021 WorkshopSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Computation (stat.CO)
 [130] arXiv:2107.10884 (replaced) [pdf, other]

Title: Structured secondorder methods via natural gradient descentComments: Fixed some typos. ICML workshop paper. A short version of arXiv:2102.07405 with a focus on optimization tasksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [131] arXiv:2108.05721 (replaced) [pdf, other]

Title: Networks of News and CrossSectional ReturnsComments: Revision before another submissionSubjects: Portfolio Management (qfin.PM); Computation (stat.CO)
 [132] arXiv:2108.08987 (replaced) [pdf, other]

Title: Uniformity Testing in the Shuffle Model: Simpler, Better, FasterComments: Accepted to the SIAM Symposium on Simplicity in Algorithms (SOSA 2022). Added some details and discussionsSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
 [133] arXiv:2108.09676 (replaced) [pdf, other]

Title: Efficient Gaussian Neural Processes for RegressionComments: 6 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [134] arXiv:2108.10566 (replaced) [pdf, other]

Title: sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel ClassificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [135] arXiv:2109.05578 (replaced) [pdf, other]

Title: Kernel PCA with the Nyström methodAuthors: Fredrik HallgrenComments: 44 pages, 6 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [136] arXiv:2109.05583 (replaced) [pdf, ps, other]

Title: Automatic Componentwise Boosting: An Interpretable AutoML SystemComments: 6 pages, 4 figures, ECMLPKDD Workshop on Automating Data Science 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [137] arXiv:2109.05675 (replaced) [pdf, other]

Title: Online Unsupervised Learning of Visual Representations and CategoriesComments: Technical report, 28 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [138] arXiv:2109.11307 (replaced) [pdf, other]

Title: Semiparametric bivariate extremevalue copulasAuthors: Javier Fernández SerranoComments: 23 pages, 22 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)
 [139] arXiv:2110.00629 (replaced) [pdf, other]

Title: Factored couplings in multimarginal optimal transport via difference of convex programmingComments: Fix typo and correct the corollary 3.3Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
 [140] arXiv:2110.01571 (replaced) [pdf, other]

Title: Learning Causal Representation for Face Transfer across Large Appearance GapSubjects: Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME)
 [141] arXiv:2110.01593 (replaced) [pdf, other]

Title: Generalized Kernel ThinningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
 [142] arXiv:2110.04433 (replaced) [pdf, ps, other]

Title: Debiased Lasso for Generalized Linear Models with A Diverging Number of CovariatesComments: arXiv admin note: text overlap with arXiv:2006.12778Subjects: Methodology (stat.ME)
 [143] arXiv:2110.05430 (replaced) [pdf, other]

Title: Densitybased interpretable hypercube region partitioning for mixed numeric and categorical dataSubjects: Machine Learning (cs.LG); Applications (stat.AP)
 [144] arXiv:2110.06021 (replaced) [pdf, other]

Title: Embeddedmodel flows: Combining the inductive biases of modelfree deep learning and explicit probabilistic modelingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [145] arXiv:2110.07959 (replaced) [pdf, other]

Title: Lowrank Matrix Recovery With Unknown CorrespondenceSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
 [146] arXiv:2110.08211 (replaced) [pdf, other]

Title: Astronomical source finding services for the CIRASA visual analytic platformAuthors: S. Riggi, C. Bordiu, F. Vitello, G. Tudisco, E. Sciacca, D. Magro, R. Sortino, C. Pino, M. Molinaro, M. Benedettini, S.Leurini, F. Bufano, M. Raciti, U. BeccianiComments: 16 pages, 6 figuresSubjects: Instrumentation and Methods for Astrophysics (astroph.IM); Computation (stat.CO); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2110, contact, help (Access key information)