**Purpose**:
The purpose of this study was to validate a new automated method to locate the fovea on normal and pathological fundus images. Compared to the normative anatomic measures (NAMs), our vessel-based fovea localization (VBFL) approach relies on the retina's vessel structure to make predictions.

**Methods**:
The spatial relationship between the fovea location and vessel characteristics is learnt from healthy fundus images and then used to predict fovea location in new images. We evaluate the VBFL method on three categories of fundus images: healthy images acquired with different head orientations and fixation locations, healthy images with simulated macular lesions, and pathological images from age-related macular degeneration (AMD).

**Results**:
For healthy images taken with the head tilted to the side, the NAM estimation error is significantly multiplied by 4, whereas VBFL yields no significant increase, representing a 73% reduction in prediction error. With simulated lesions, VBFL performance decreases significantly as lesion size increases and remains better than NAM until lesion size reaches 200 degrees^{2}. For pathological images, average prediction error was 2.8 degrees, with 64% of the images yielding an error of 2.5 degrees or less. VBFL was not robust for images showing darker regions and/or incomplete representation of the optic disk.

**Conclusions**:
The vascular structure provides enough information to precisely locate the fovea in fundus images in a way that is robust to head tilt, eccentric fixation location, missing vessels, and actual macular lesions.

**Translational Relevance**:
The VBFL method should allow researchers and clinicians to assess automatically the eccentricity of a newly developed area of fixation in fundus images with macular lesions.

^{1}

^{,}

^{2}

^{3}and reading speed.

^{4}Moreover, it has recently been shown to be a predictor of PRL position change over the course of disease progression.

^{5}Because fovea-PRL distance is a powerful measure to monitor and better understand the progression and functional impact of maculopathies,

^{3}

^{–}

^{5}the exact position of the previously functional fovea must be estimated precisely in those patients with CFL.

^{4}

^{,}

^{6}

^{–}

^{8}According to the accepted measures, the mean foveal distance temporal to the middle of the optic disk is 15.5 ± 1.1 degrees horizontally and –1.5 ± 0.9 degrees vertically (Fig. 1A).

^{3}

^{,}

^{9}

^{–}

^{11}However, using such normative measures presents several limitations. First, these measures represent a population average whereas the actual position of the fovea can vary dramatically from one individual to the other.

^{12}Second, these normative measures were estimated under “standard” testing conditions (with the head and trunk as closely aligned as possible, i.e. primary position of the head and central fixation target). In the presence of a central scotoma, however, fixating on a specific target requires to use eccentric vision. In the case of a large scotoma, the eccentricity required to fixate can be so large that individuals may have to tilt their head and/or gaze to fixate. Therefore, they will not be able to maintain their head in the primary position, which will have a significant impact on the relative position of the different anatomic structures of the eyes (i.e. fovea, optic disk, and blood vessels). In addition to eccentric vision, abnormal ocular torsion, which has been observed in a number of strabismus conditions,

^{13}can also amplify this phenomenon. For instance, the distance between the optic disk and the fovea is highly dependent on the eye and head orientation, as discussed in Ref. 10. This dependency is illustrated in Figure 1 where a unique healthy eye performed a fixation task under different conditions. Despite these strong limitations, and because it is easy to administer, this method remains the most used in clinical and research settings. Therefore, there is a need for new automated methods to estimate the position of the fovea on color fundus images where the morphology of the macular region has severely changed.

^{14}The first one gathers anatomic structure-based methods, which use the main visible anatomic structures of a fundus image (optic disk, blood vessels, and macula) as landmarks to locate the macular region, and consequently the fovea. Whereas the simplest approach is to detect the fovea as the center of the darkest circular region on the whole fundus image,

^{15}

^{,}

^{16}most methods work in two stages: first, (i) the estimation of a region of interest (ROI) that most likely contains the fovea (using blood vessels or optic disk information), followed by (ii) the precise localization of the fovea within this ROI, using color intensity information.

^{17}

^{–}

^{19}Despite their efficiency with healthy fundus images, these methods cannot be applied as such to cases where the morphology of the macula has severely changed.

^{20}

^{–}

^{24}(refer to Ref. 25 for a review). Overall, two main deep learning-based approaches are used: one considers fovea localization as a segmentation task, and the other as a regression task. With the segmentation approach, each pixel of the fundus image is classified by one or several convolutional neural networks into either the “fovea” or “non-fovea” categories.

^{20}

^{,}

^{24}

^{,}

^{26}Because no local feature makes the fovea region distinct from the rest of the fundus in pathological images, this segmentation approach is not well-suited. The alternative strategy, which is to treat the fovea localization as a regression task,

^{27}allows to make predictions by building a regression network using all the features it extracts, even the ones relative to retinal regions far from the fovea. Yet, this method has always been trained and used with healthy fundus images, where the macular region could be used as a significant feature by the network.

^{28}

^{,}

^{29}To tackle this variability problem, while still taking advantage of the attractive properties of the vessel structure, we designed the vessel-based fovea localization (VBFL) method. This translational tool will allow researchers and eye care providers to estimate the exact position of the nonfunctional fovea in patients with CFL, as well as the fovea-PRL distance. Such information will be crucial to: (1) monitor the progression of the disease and the resulting functional deficit, (2) better understand the underlying causes of this functional deficit, and (3) improve individualized rehabilitative care, especially for optometrists.

^{3}

^{–}

^{5}After presenting the VBFL method in details, we will analyze its performance on healthy retina images taken under different conditions of head orientations and fixation positions (validation 1), as well as images with simulated lesions (validation 2) and pathological images (validation 3).

^{30}as the training set, referred to as

*D*. It contains 840 healthy eye fundus images with a manually annotated fovea location and optic disk position. As a preliminary step, we extracted a vessel information map

_{train}*v*for each single image

_{i}*u*. To do so, we used the recent retinal vessel segmentation network SA-UNet,

_{i}^{23}a convolutional neural network designed for biomedical image segmentation. We used the weights provided by the authors, pretrained on the DRIVE data set.

^{31}REFUGE and DRIVE images were acquired with different fundus cameras, all with a field of view of 45 degrees. The REFUGE data set was collected with both a Zeiss Visucam 500 (resolution 2124 × 2056 pixels) and a Canon CR-2 (resolution 1634 × 1634 pixels). DRIVE images were taken with a Canon CR5 NM (resolution 768 × 584 pixels). To follow SA-UNet requirements, all images were resized to 592 × 592 pixels before they were fed to the network. This method outputs a binary map

*v*(

*x*,

*y*) indicating the presence or not of a vessel at each pixel (

*x*,

*y*) of the fundus image, so that

*v*(

*x*,

*y*) = 1 if (

*x*,

*y*) belongs to a vessel, and

*v*(

*x*,

*y*) = 0 otherwise. Aggregating these individual vessel maps

*v*, we estimated two distribution representations: a vessel density map and a vessel direction map. The vessel density map (\(\bar V\)) aims at giving, for each position (

_{i}*x*,

*y*), the likelihood to have a vessel passing through. The whole process is illustrated in Figure 2 - step 1 and detailed in Appendix A1. The vessel direction map (\(\bar D\)) aims at giving, for each pixel (

*x*,

*y*), the most likely direction of a vessel between 0 and π. Details of its calculation are provided in the Appendix A2. The resulting direction \(\bar D\) map is shown in Figure 2 (step 1), where, for each pixel (

*x*,

*y*), the main direction is represented with a color code, with tensors being represented as ellipses. Note that because small vessels show great variability, they make a relatively small contribution to the vessel maps, as opposed to the larger vessels that exhibit more reproducible structure. For that reason, our approach is essentially guided by the large vessels distribution whereas small vessels show little to no impact.

*u*for which we want to predict fovea position, we extracted the same vessels representations as in step 1. Therefore, following the same procedures as described above, we estimated a target density map

*v*and a target direction map

*d*(see Fig. 2 - step 2).

*v*,

*d*), as defined in steps 1 and 2, respectively, we register the statistical representations (where the fovea position is known) with respect to the target representations. Mathematically, the registration can be formulated as an optimization problem, which is detailed in Appendix A3. Once registered, the transformed fovea position is used as a prediction for the fovea position in the target fundus image

*u*(see Fig. 2 - step 3).

^{32}

*gt*) was set at the (

*x*,

*y*) coordinates of the fixation cross presented during the fixation examination (in degrees of visual angle, noted \(( {x_{gt}^F,y_{gt}^F} )\)). Each image was also annotated using NAMs to derive its fovea position, noted as \(( {x_{nam}^F,y_{nam}^F} )\). First, the center of the optic disk was annotated manually by one of the authors (V.F.), then, the (

*x*,

*y*) coordinates of the fovea location were derived using standard NAM, that is, 1.5 degrees inferior and 15.5 degrees temporal relatively to the optic disk center.

^{10}Finally, the VBFL method was run on each image to predict its fovea location automatically, noted as \(( {x_p^F,y_p^F} )\).

*v*derived from the healthy images

_{i}*u*. A total of three shapes (circle, horizontal, and vertical ellipse) and six sizes (0, 20, 50, 100, 200, and 400 degrees

_{i}^{2}) were combined to create 18 different masks, as illustrated in Figure 4. Each of these masks was applied to each of the 198 healthy fundus images, resulting in a total of 3564 images with simulated lesions, hiding more or less vessels depending on their shape and size. The VBFL method was run on each of these images to predict their fovea location automatically.

^{22}These 89 images included all forms and stages of AMD, showing representative retinal lesions, such as drusen’s, exudates, hemorrhages, and/or scars. They were acquired using a Zeiss Visucam 500 fundus camera (resolution 2124 × 2056 pixels). Images were manually annotated by seven independent ophthalmologists who located the fovea position, as well as visible lesions. For each image, fovea's final coordinates were obtained by averaging the seven annotations and were quality-checked by a senior specialist. Similarly, manual pixel-wise annotations of the different lesion types (drusen, exudate, hemorrhage, and scar) were provided for each image. From the resulting lesion segmentation masks, we estimated for each image: its lesion type(s) (drusen, exudate, hemorrhage, and/or scar), and the size of each individual lesion (in degrees

^{2}), as well as its full lesion size (in degrees

^{2}) using the R packages for image processing, magick and countcolors (Fig. 5).

^{33}with the following additional R packages: tidyr, dplyr, stringr, nlme, emmeans, and ggplot2.

^{34}with prediction error as a dependent variable. The following variables: “method,” “head orientation,” “fixation location,” and “axial length of the eye” were set as fixed effects and “participants” was modeled as the random effect. Prediction error was transformed in natural logarithm (ln) units and eye axial length was centered around its mean.

^{35}Significance of the fixed effects was estimated using t-values: absolute values of t-values larger than 2 were considered significant, corresponding to a 5% significance level in a 2-tailed test.

^{36}

^{,}

^{37}In the Results section, fixed effects estimates are reported along with their t-values and 95% confidence intervals.

^{34}

^{100}), that is, a 26% increase, resulting in an average error prediction of 0.76 degrees. Following the same logic, prediction error reaches 0.96 degrees at 200 degrees

^{2}, 1.21 at 300 degrees

^{2}, and 1.57 at 400 degrees

^{2}.

^{100}), that is, a 38% increase, resulting in an average error prediction of 0.83 degrees. Following the same logic, prediction error reaches 1.15 degrees at 200 degrees

^{2}, 1.58 at 300 degrees

^{2}, and 2.18 at 400 degrees

^{2}.

^{2}. For larger lesions, VBFL tends to yield larger prediction error than NAM, except for the two head tilted conditions (Fleft and Fright) for which VBFL prediction error remains smaller than NAM, no matter the lesion size.

*v*, we found a significant effect of the percentage of vessels detected on the overall accuracy of the method's prediction: the higher the density in the vessel map, the smaller the prediction error. For instance, for a map containing 10% of pixels representing a vessel, which corresponds to the average density of vessels extracted in the 198 healthy images, prediction error shows an average value of 1.6 degrees (t = 4.6, 95% CI = 1.3, 2.0). For each 1% decrease in vessel density, prediction error increased by a factor of 0.86, reaching a mean value of 2.2 at 8%, 2.9 at 6%, and 3.9 degrees at 4% density.

_{i}^{2}, we found no significant effect of full lesion size on the error amplitude. Similarly, we found no significant effect of lesion type on prediction error, no matter their individual sizes.

^{38}In these eyes, higher axial length can cause a distorted retinal shape, resulting in an altered geometry of the apparent vessels.

^{39}Therefore, we expected our method to yield greater fovea localization error in those eyes with high axial length. However, we did not find a significant effect of eye axial length on the VBFL prediction error. Because our sample size of high myopic eyes was quite small, it remains to be tested in a greater sample of eyes presenting high myopia, whether the VBFL method can still be considered reliable on extreme values of axial length.

^{2}. This cutoff value covers 75% of reported scotoma sizes in both wet and dry AMD,

^{4}making our method widely efficient. After that limit, however, because hidden vessel features cannot be extracted properly, the method becomes unreliable. We conclude that the most informative vessel features for the VBFL method lay beyond 8 degrees from the fovea. This value being considerably larger than the mean radius of the foveal avascular zone, which ranges between 0.7 and 1.2 degrees in healthy subjects,

^{40}VBFL performance should not be affected by interindividual differences in the size of the foveal avascular zone.

^{41}

^{,}

^{42}Despite the poorer image quality they convey, it remains useful, especially in low-resource settings. It remains to be tested whether VBFL can accurately estimate the position of the fovea under such quality images with very low resolution.

^{32}Results were published recently

^{22}and show that all methods proposed for fovea localization were based on machine learning (segmentation and/or regression) relying on neural networks. However, comparing their results to ours is quite difficult for several reasons. First of all, their estimation was based on both normal and pathological images. Therefore, the reported value of prediction error average (in pixels only) may appear better than ours given that the majority of their images were nonpathological (311 normal against 89 pathological images). Furthermore, because they reported one single average value of prediction error for the whole dataset, it is impossible to compare our results on a case-by-case basis. Because we already stated that VBFL is not robust with absent optic disk or dark images, it would have been useful to compare our results only for cases where VBFL can be reliable to assess its efficiency. However, in Ref. 22, it is reported that participating teams can get better fovea localization results when the macular region has higher contrast and cleaner texture, regardless of AMD or non-AMD images, which is in line with our own conclusion. Second, they used a combination of methods (at least 2 or more), and then combined them by taking the best estimate for each image. Last, none of these approaches are described in detail, which does not allow us to compare them in detail with the VBFL approach.

^{22}

^{43}

^{,}

^{44}), future instances of VBFL should provide increased age range representations by including older adults’ images in the training set. Thanks to this individualized approach, the VBFL model would allow to apply an age-appropriate statistical representation map of vessel density and direction, taking into account potential changes in the density, direction or both of the vascular structure across the life span.

**A. Calabrèse**, None;

**V. Fournet**, None;

**S. Dours**, None;

**F. Matonti**, None;

**E. Castet**, None;

**P. Kornprobst**, None

*Invest Ophthalmol Vis Sci*. 2011; 52(3): 1887–1893. [CrossRef] [PubMed]

*Optom Vis Sci*. 2017; 94(3): 311–316. [CrossRef] [PubMed]

*Retina*. 2008; 28(1): 125–133. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2011; 52(5): 2417–2424. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2020; 9(8): 47. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2013; 2(4): 1. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2009; 50(8): 3953–3959. [CrossRef] [PubMed]

*J Vis*. 2021; 21(3): 9. [CrossRef] [PubMed]

*Vis Res*. 2007; 47(15): 2076–2085. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2004; 45(9): 3257–3258. [CrossRef] [PubMed]

*Optom Vis Sci*. 2005; 82(3): 177–185. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2021; 10(2): 25. [CrossRef] [PubMed]

*PLoS One*. 2020; 15(12): 1–11.

*Comput Struct Biotechnol J*. 2016; 14: 371–384. [CrossRef] [PubMed]

*2008 15th IEEE International Conference on Image Processing*; San Diego, CA, USA: IEEE; 2008; pp 1432–1435.

*Br J Ophthalmol*. 1999; 83(8): 902–910. [CrossRef] [PubMed]

*IEEE Trans. Biomed. Eng*. 2004; 51(2): 246–254. [CrossRef] [PubMed]

*Conference: Proceedings of the IAPR Conference on Machine Vision Applications (IAPR MVA2007), May 18, 2007*, Tokyo, Japan. Available at: https://www.researchgate.net/publication/221280680_Automatic_Detection_of_Anatomical_Structures_in_Digital_Fundus_Retinal_Images#:∼:text=This%20paper%20proposes%20a%20novel%20system%20for%20the,extraction%20of%20blood%20vessels%20and%20localization%20of%20macula; 2007.

*2008 16th European Signal Processing Conference*; 2008; pp 1–5.

*Biomed Signal Process Control*. 2018; 40: 91–101. [CrossRef]

*Proc SPIE 11511 Appl Mach Learn*. 2020:2020: 196–202.

*IEEE Trans Med Imaging.*2022;41:2828-2847.

*ArXiv200403696 Cs Eess preprint*2020. Available at: https://arxiv.org/abs/2004.03696.

*J Comput Sci*. 2017; 20: 70–79. [CrossRef]

*ArXiv210109864 Cs Eess preprint*2021.

*Ophthalmic Medical Image Analysis*. Fu H, Garvin MK, MacGillivray T, Xu Y, Zheng Y, Eds.; Cham, Switzerland: Lecture Notes in Computer Science; Springer International Publishing; 2020; pp. 93–103.

*IEEE Trans Med Imaging*. 2021; 40(1): 116–128. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1964; 71(1): 93–101. [CrossRef] [PubMed]

*Handbook of Vascular Biometrics*. Uhl A, Busch C, Marcel S, Veldhuis R, Eds.; Cham, Switzerland: Springer International Publishing; 2020; pp 309–354.

*Med Image Anal*. 2020; 59: 101570. [CrossRef] [PubMed]

*IEEE Trans Med Imaging*. 2004; 23(4): 501–509. [CrossRef] [PubMed]

*IEEE Dataport*. 2020. Available at: https://ieee-dataport.org/documents/adam-automatic-detection-challenge-age-related-macular-degeneration. https://dx.doi.org/10.21227/dt4f-rt59.

*R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing; 2020.

*J Stat Softw*. 2015; 67(1): 1–48. [CrossRef]

*Methods Ecol Evol*. 2010; 1(1): 3–14. [CrossRef]

*J Mem Lang*. 2008; 59: 390–412. [CrossRef]

*Data Analysis Using Regression and Multilevel/Hierarchical Models*. New York, NY: Cambridge University Press: 2007.

*Invest Ophthalmol Vis Sci*. 2019; 60(3): M20–M30. [CrossRef] [PubMed]

*PLoS One*. 2021; 16(4): e0250233. [CrossRef] [PubMed]

*Optom Vis Sci*. 2012; 89(5): 602–610. [CrossRef] [PubMed]

*Expert Rev Ophthalmol*. 2014; 9(6): 475–485. [CrossRef]

*JMIR Mhealth Uhealth*. 2020; 8(7): e17480. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2017; 58(9): 3804–3817. [CrossRef] [PubMed]

*Bull N Y Acad Med*. 1964; 40(2): 116–129. [PubMed]

*IEEE Trans Pattern Anal Mach Intell*. 2005; 27(4): 506–517. [CrossRef] [PubMed]

*Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations*. Aubert G., Kornprobst P., Eds.; New York, NY: Applied Mathematical Sciences; Springer; 2006. https://doi.org/10.1007/978-0-387-44588-5_3.

*Research Gate*. 1998, 184. Available at: https://www.researchgate.net/publication/202056812_Anisotropic_Diffusion_In_Image_Processing/link/54494eb60cf2f6388081425a/download.

*J Magn Reson B*. 1994; 103(3): 247–254. [CrossRef] [PubMed]

*Comput. J*. 1964; 7(2): 155–162. [CrossRef]

*x*,

*y*), the likelihood to have a vessel passing through.

*v*so that their fovea and optic disc position be aligned with a reference fovea position \(( {x_{ref}^F,y_{ref}^F} )\) and optic disk position \(( {x_{ref}^{OD},y_{ref}^{OD}} )\). This consists in computing an exact similarity transformation, accounting for translation, rotation, and uniform scaling. They are described by four parameters: \(\mathcal{T} = ( {{t_x},{t_y},\theta ,s} )\), where

_{i}*t*represents translation along the horizontal axis,

_{x}*t*represents translation along the vertical axis, θ represents rotation around the reference fovea location \(( {x_{ref}^F,y_{ref}^F} )\), and

_{y}*s*represents uniform scaling around the reference fovea location \(( {x_{ref}^F,y_{ref}^F} )\). This simple transformation allows for the perfect alignment of two pairs of points, denoted by \({\mathcal{T}_i}\), such that:

*v*, we obtained the resulting aligned vessel maps noted \({\tilde v_i}\). Next, we computed a first density estimate \({\bar V_0}\) by averaging the warped vessel maps \({\tilde v_i}\) as follows:

_{i}*D*| is the cardinal of

_{train}*D*. The final average density \(\bar V\) was then obtained after applying an anisotropic smoothing operator followed by a simple image enhancement.

_{train}^{45}

*x*,

*y*), the most likely direction of a vessel between 0 and π. This value is obtained by averaging vessel direction values from all aligned vessel maps \({\tilde v_i}\) within a given region around (

*x*,

*y*). To perform this averaging operation properly, we used the notion of structure tensors, which have been widely used in the domains of anisotropic diffusion

^{46}

^{,}

^{47}or diffusion magnetic resonance imaging (MRI).

^{48}Hence, we estimated average vessel directions by averaging locally the vectors orthogonal to the gradients of our vessels maps

*v*. To do so, we first define a local vessel direction by

*t*= (∇

*v*

_{σ})

^{⊥}, where

*v*

_{σ}=

*k*

_{σ}*

*v*(i.e. after convolving

*v*with a Gaussian kernel

*k*

_{σ}of variance σ which make the direction less sensitive to noise). Then, we define the vessel direction tensor by:

*k*

_{ρ}corresponds to the size of the neighborhood region around (

*x*,

*y*) used to average directions. Overall, the eigen elements of the tensor

*s*(

*v*) are informative about vessels direction distributions. For instance, in regions where vessels follow a consistent direction, tensors are characterized by λ

_{1}≫ λ

_{2}≈ 0 and

**e**_{1}indicates the average direction. In regions containing multiple directions (e.g. within the optic disk) or bifurcations, tensors are characterized by λ

_{1}≈ λ

_{2}. Finally, in regions with low vessel density (e.g. around the fovea), tensors are characterized by λ

_{1}≈ λ

_{2}≈ 0.

*D*as follows. First, for each aligned vessel maps \({\tilde v_i}\), we estimated its vessel direction tensor \({\boldsymbol s}( {{{\tilde v}_i}} )\). Second, we accumulated direction information across

_{train}*D*by summing each vessel direction tensor pixel-wise to obtain the average vessel direction tensor:

_{train}

**e**_{1}(λ

_{1}≥ λ

_{2}≥ 0) indicates the orientation minimizing the gray-value fluctuations, that is, the direction of the vessels. Here, we only keep the direction information:

*v*,

*d*), defined in steps 1 and 2, respectively, we register the statistical representations (where fovea position is known) with respect to the target representations. Mathematically, the registration can be formulated as an optimization problem, which aims at minimizing an energy

*E*with respect to a transformation \(\mathcal{T}\). This translates into:

*E*penalizes an incorrect alignment of \(\bar V\), with respect to

_{V}*v*. If they are correctly aligned, areas of high density in \(\mathcal{T}( {\bar V} )\) should correspond to areas containing vessels in

*v*, and conversely, areas of low density in \(\mathcal{T}( {\bar V} )\) should mostly correspond to empty areas in

*v*. To ensure this, we choose a simple weighted mean squared error:

*w*(

_{V}*x*,

*y*) allows to give more importance to the macular region surrounding the fovea which is poor in terms of large visible vessels (see Appendix A4).

*E*penalizes an incorrect alignment of \(\bar D\), with respect to

_{D}*d*. Images will be correctly aligned at a point (

*x*,

*y*) if the directions

*d*(

*x*,

*y*) and \(\mathcal{T}( {\bar D} ) + \theta ( \mathcal{T} )\) are close, modulo π. Note that \(\theta ( \mathcal{T} )\) has to be added because we do rotations of a function representing directions. So, we defined the term

*E*by:

_{D}*w*is given in Appendix A4.

_{D}^{49}which is classically used in multi-modal registration. Thus, given the optimal transform \({\mathcal{T}^*}\) minimizing criteria (8), the predicted fovea position \(( {x_p^F,y_p^F} )\) was given by: \(( {x_p^F,y_p^F} ) = {\mathcal{T}^*}( {x_{ref}^F,y_{ref}^F} )\). Although more complex transformations were considered (e.g. ones that are able to reproduce the 3D variations of position and orientation, given the 3D morphology of the eye), they would have resulted in longer computational time and greater risks to converge on local minima. Therefore, we decided to choose similarity transform as a good compromise between the real transformation we want to approximate and the convergence of our method.

*w*and

_{V}*w*Inside Energy E (2)

_{D}*v*, we define its domain of definition Λ (see Fig. 9A). The domain Λ contains all positions (

*x*,

*y*) where fundus image is defined. Thus, it contains the vessel information, and there is no information outside Λ. Similarly, we define the domain of definition of the average density \({\rm{\bar V}}\), denoted by \({\rm{\bar \Lambda }}\) (see Fig. 9B). Here, we chose \({\rm{\bar \Lambda }}\) as the smallest domain containing at least 50% of the domains

*T*(Λ

_{i}_{i}), for all

*u*∈

_{i}*D*, where

_{train}*T*is the transformation used to warp

_{i}*v*onto \(\bar V\), and Λ

_{i}_{i}is the domain of

*u*. Note that it is the same domain to consider for the distribution of directions \(\bar D\). Finally, considering the average density \(\bar V\), let us define a circular domain \({\rm{\bar \Phi }}\) centered around the fovea reference position \(( {x_{ref}^F,y_{ref}^F} )\) corresponding to the macular zone. We chose the size of this domain so that it covers the area around the fovea where there is no visible vessel (on average). Indeed, in this region surrounding the fovea, there are some macular vessels, but they are branches of the vessels of the temporal sector which divide rapidly into smaller components to create a very dense but small-diameter vascular network. In addition, the most central area (500–600 microns central, called the foveal avascular zone) has no retinal vessels at all, and it is vascularized only by the choroid.

_{i}*E*, the principle is to compare the transformed average representations (i.e. \(T( {\bar V} )\) and \(T( {\bar D} )\)) with the target representations (i.e.

*v*and

*d*). As such, given a transformation

*T*, we compare information defined in the transformed domain of definition (\(T( {{\rm{\bar \Lambda }}} )\)) and the transformed macular zone (\(T( {{\rm{\bar \Phi }}} )\)) with information defined in the target domain of definition (Λ). These domains are shown together in Figure 9C, which allows to define for types of regions defining the region-based weight (see Fig. 9C):

- • In the region corresponding to the macular zone of the average representation (\(T( {{\rm{\bar \Phi }}} )\)), because we want to predict the fovea position in the target image, we choose a high weight here (denoted by
*w*) to encourage the alignment of the macular zones from the target image and the distribution._{macula} - • At the intersection of the two domains of definition (\(T( {{\rm{\bar \Lambda }}} ) \cap {\rm{\Lambda }}\)), that is, where information is available for both sides (average and target representation), we choose a weight w
such that_{general}*w*<_{general}*w*._{macula} - • Where information is partly available, that is, for positions (
*x*,*y*) such that (*x*,*y*) ∈ Λ but \(( {x,y} ) \notin T( {{\rm{\bar \Lambda }}} )\) (or vice versa), it is not possible to estimate relevant energy because information is missing for either the average representation or the target representation. In these regions, we chose a weight*w*such that_{outside}*w*<_{outside}*w*. This weight should be the lowest because it is not necessarily the case that both domains of definition should match. This is because the domain Λ is fixed and always the same. It is not related to the information shown inside. However, we found that this term was useful to avoid the algorithm diverging, especially at the early stages of the optimization, because it prevents the two domains of definition from being too distinct._{general} - • Finally, for the positions outside the domains of definition such that \(( {x,y} ) \notin T( {{\rm{\bar \Lambda }}} ) \cup {\rm{\Lambda }}\), the value of the weight does not matter because there is no information in this region for both target image and average distribution (so that the energy is zero).

_{V}by:

*w*by:

_{D}*d*(

*x*,

*y*) is not well-defined (e.g. regions with no vessels or with many directions du to vessels crossings). We defined the weight ξ by: ξ(

*x*,

*y*) =

*v*(

*x*,

*y*)(λ

_{1}(

*x*,

*y*) − λ

_{2}(

*x*,

*y*)), where λ

_{1}(

*x*,

*y*) and λ

_{2}(

*x*,

*y*) are, respectively, the higher and lower eigenvalues of the vessel tensor

*s*(

*x*,

*y*). This weighted saliency map is minimal in two cases: (i) when (

*x*,

*y*) does not belong to a vessel (because of the multiplication by

*v*(

*x*,

*y*)), (ii) when (

*x*,

*y*) belongs to a vessel but λ

_{1}(

*x*,

*y*) ≈ λ

_{2}(

*x*,

*y*), that is, when there are multiple directions in the neighboorhod of (

*x*,

*y*). It is maximal when (

*x*,

*y*) belongs to a vessel, and λ

_{1}(

*x*,

*y*) ≫ λ

_{2}(

*x*,

*y*) ≈ 0, that is, along linear portions of vessels. This is where we can trust directions and we want to compare them.