Image-based Deep Learning for the time-dependent prediction of fresh concrete properties

Increasing the degree of digitisation and automation in the concrete production process can play a crucial role in reducing the CO$_2$ emissions that are associated with the production of concrete. In this paper, a method is presented that makes it possible to predict the properties of fresh concrete during the mixing process based on stereoscopic image sequences of the concretes flow behaviour. A Convolutional Neural Network (CNN) is used for the prediction, which receives the images supported by information on the mix design as input. In addition, the network receives temporal information in the form of the time difference between the time at which the images are taken and the time at which the reference values of the concretes are carried out. With this temporal information, the network implicitly learns the time-dependent behaviour of the concretes properties. The network predicts the slump flow diameter, the yield stress and the plastic viscosity. The time-dependent prediction potentially opens up the pathway to determine the temporal development of the fresh concrete properties already during mixing. This provides a huge advantage for the concrete industry. As a result, countermeasures can be taken in a timely manner. It is shown that an approach based on depth and optical flow images, supported by information of the mix design, achieves the best results.


Introduction
Reducing CO2 emissions poses a major challenge for the construction industry.Concrete production in particular plays a decisive role here.Concrete is one of the most widely used building materials in the world, and the production of its constituent cement is in itself responsible for approx.6.7 % of global anthropogenic CO2 emissions (IEA, 2022).Many approaches are therefore focusing on reducing the cement content by using substitute materials.As a consequence there is an increasing trend for concrete to no longer consist of just three materials (cement, aggregate and water), as was originally the case, but of several additional materials in order to reduce the amount of cement needed.In turn, this also leads to increasingly complex mix designs, which causes a potentially less robust concrete (González-Taboada et al., 2018).Consequently, the control of concrete properties becomes more difficult, especially the fresh concrete properties.
Existing quality assurance measurements are not well suited to overcome this new challenge.These measurements are usually carried out manually and only for batch samples, both of which increases the uncertainty associated to the measured properties.Since the properties during concrete placement are decisive for the quality of the resulting building component, the measurements are carried out directly before placement on the construction site, where there is no longer any possibility of significantly adjusting the fresh concrete properties.Deviations of the actual concrete quality from the target properties may lead to a rejection of the batch, resulting in an inefficient production which wastes a lot of resources.In the opinion of the authors, a key change to increase the sustainability of concrete production is to digitize and automate the production process including quality control (Haist et al., 2022).Since in comparison to other industries, the construction industry is one of the least digitized and automated industries in the world (Green, 2016), we see huge potential for improvement, however requiring sensor based insight into the material properties.
The ReCyCONtrol1 research project addresses this lack of digitization and automation in the concrete sector.One part of the project focusses on the prediction of the fresh concrete properties.Since the moment of production, i.e. during the mixing process, offers the most opportunities of adjusting the concrete properties in case of quality deviations, the prediction of these properties should be done during the mixing process.Also, as the properties of the concrete may further change between the mixing process and its placement, due to the cements chemical hydration process, the behaviour of the properties after mixing must be modeled over time.We therefore formulate the goal of predicting the future properties of the concrete, e.g. for the time of placement, already during the production step.If deviations to the target properties at the time of placement are estimated in this way, countermeasures in the form of chemical additives can be used to change the properties to reach the desired values.
To reach this goal of predicting the fresh concrete properties at the time of placement during the mixing process, we use optical sensors coupled with a corresponding photogrammetric processing chain: We use a stereo camera system to observe the concretes flow behaviour during the mixing process.The stereo images then serve as input for a deep learning method, which predicts the properties of the concrete as a function of time.To describe the properties, we use one parameter for the consistency and two for the flow behaviour.The slump flow diameter δ, which is measured by the slump test (EN 12350-5, 2019), represents the consistency of the concrete.The flow behaviour of the concrete, e.g. during the mixing process, is defined by its rheological parameters.In more detail, concrete, as a non-Newtonian fluid, can be described by the Bingham fluid with the rheological parameters yield stress τ0 and plastic viscosity µ (Yahia et al., 2016).The values of these parameters can be derived from the flow curve, which can be measured in batch experiments by a rheometer.The yield stress is a measure of how much stress has to be applied to set a liquid in motion.The plastic viscosity describes the viscosity of a liquid at high shear rate.The slump flow diameter depends on the rheological parameters.In particular the yield stress and the slump flow diameter are assumed to be correlated (Wallevik, 2006).
In this paper a novel method for the prediction of the slump flow diameter, the yield stress and the plastic viscosity is proposed.The prediction is carried out by a convolutional neural network (CNN), which receives stereo camera observations of the concretes flow at a certain sample age as input.Furthermore, we add temporal information and information from the mix design to the input.For the prediction of the temporal evolvement of the fresh concrete properties, the temporal information is crucial.Using this information, the CNN learns implicitly to model the time-dependent behaviour of the properties.This enables the possibility of predicting the fresh concrete properties for arbitrary points in time, e.g. the time of placement.
The paper is structured as follows: We first give an overview of the current research in the field of digitizing and automating the concrete sector.Our methodology is then described in section 3. The data set and how it is generated are presented in section 4. Section 5 shows the experiments and contains a discussion of the results.Section 6 concludes the paper and gives an outlook for possible future work.

Related work
In recent years, the automation of concrete quality assurance has received increasing attention.In Song et al. (2020), image segmentation is used to determine properties of the hardened concrete.Coenen et al. (2021) used image segmentation to determine the particle size of the aggregates in the hardened concrete.Although these methods have the potential to automate the current quality assurance measurements, at this stage no countermeasures can be applied if deviations to the target properties are detected.
To ensure the quality of the fresh concrete, traditional quality assurance measurements like the slump test (EN 12350-5, 2019) and rheometer measurements are typically employed.However, these methods are labor intensive and are associated with relatively high uncertainties.In Tuan et al. (2021) a method is proposed to automate the slump test: Instead of measuring the diameter of the spread concrete manually, a stereo camera system records images of the spreading concrete and the diameter is determined using image processing.The authors argue that replacing the manual measurement with an imaging system improves the accuracy of the result and reduces the workload.Yoon et al. (2023) propose a method for analysing cement paste with a similar set up.Instead of a stereo camera, a depth camera is used to record a point cloud of the cement paste after the slump test.The point cloud is used to extract the diameter, spread height and curvature.These parameters are used as input for a deep learning algorithm to predict yield stress, plastic viscosity, adsorption ratio of superplasticizer and bleeding.Schack et al. (2023a,b,c) take this work one step further and use images taken from the spread flow of the concrete and are not only able to derive the spread flow diameter of the concrete but also information on the concrete composition.Therefore the surface roughness of the spread flow cake is analysed in order to derive e.g. the content of the coarse particle contained in the concrete.In Coenen et al. (2024), a method is presented, in which a camera observes the channel flow of the fresh concrete at the outlet of a mixing truck.Spatio-temporal flow fields are generated from the recorded images, which contain information about the flow behaviour of the concrete.A CNN predicts the fresh concrete properties on the basis of the spatio-temporal flow fields.The disadvantage of these methods is, that they are applied post-production, meaning that the concrete still has to be discharged and new concrete has to be produced, if significant deviations to the target properties are detected.
To overcome this drawback, the properties of the fresh concrete have to be predicted before or during the mixing process.There are two main procedures to achieve this goal.One approach is to perform the prediction on the basis of the mix design information of the concrete.The concrete mix design contains the exact content (in kilograms) of the individual materials used to produce the concrete.The type and concentration of the materials have a major impact on the properties of the concrete.Chidiac and Mahmoodzadeh (2009) summarize the most common models to determine the plastic viscosity based on the mix design.It is shown that the results vary between different models.The most recent methods based on the mix design use machine learning, and in particular deep learning algorithms.Methods like least squares support vector machines (LSSVM) and particle swarm optimization (PSO) (Nguyen et al., 2020), extreme learning machines (Kina et al., 2021), random forests and XGBoost (Zhang et al., 2022;Hosseinzadeh et al., 2023) as well as multi layer perceptrons (MLP) (Navarrete et al., 2023) are used for this purpose.In Nguyen et al. (2020) and Navarrete et al. (2023) the input information from the mix design is extended with temporal information, representing the time difference between the mixing process and the time at which the properties are to be determined, to take into account the change of properties over time.The mix design contains valuable information for the time-dependent change of the properties (e.g. the additive content).Although these methods achieve promising results, they omit essential information as e.g.possible variations in the properties of the employed constituents, even though there has been progress in this field in recent years, e.g.Coenen et al. (2023); Lux et al. (2023).
The second main procedure is to predict the fresh concrete properties based on images of the fresh concrete acquired during the mixing process.Li and An (2014) showed that it is possible to estimate the slump flow and the V-funnel flow time from images recorded during the mixing process by using classical image analysis methods, namely frame difference and watershed segmentation.Ding and An (2018) show that also deep learning methods, here a combination of a CNN and a long short-term memory network (LSTM) based on image sequences, are applicable.Yang et al. (2021) and Guo et al. (2022) employ another combination of CNN and LSTM with image sequences to predict the slump value and slump flow value respectively the plastic viscosity, while (Gao and Yan, 2023) use semantic segmentation in combination with a residual neural network for single images for the prediction of the slump class.In Ponick et al. (2022), a stereo camera set up is used to observe the mixing process of ultra sonic gel, a often employed surrogate for concrete.The stereo camera observations are used as input for a CNN.The results show that the 3D information derived from the stereo images is valuable for the prediction of the flow curves.Although, these methods show promising results and are not prone to uncertainties in the mix design, they do not take into account the time dependency of the fresh concrete properties.
To the best of the authors' knowledge, there is no method to date that uses both images and information from the mix design to perform a time-dependent prediction of fresh concrete properties.

Overview
We aim to predict fresh concrete properties based on image observations.
Stereo cameras record synchronized RGB image pairs of the concretes flow during the mixing process.Each image pair is used to generate an orthophoto O and a depth map in the form of a digital elevation model (DEM) D, which contains 3D information about the surface of the fresh concrete.The photogrammetric process to generate O and D is carried out using a commercial software, namely Agisoft Metashape2 .To reduce the computational time during training, O is transformed into a panchromatic image.O and D are extended by an optical flow image OF , which contains the displacement of each pixel between the current and the subsequent orthophoto.This allows to additionally include explicit motion information representing the flow behaviour of the concrete into the network.To generate the optical flow images OF , for each Oi, where i represents the time step, the optical flow OFi,i+1 between Oi and Oi+1 is computed.For this purpose the method presented in (Farnebäck, 2003) is used.The input images are stacked to obtain the input set [Oi,Di,OFi,i+1].The input set is treated as an image with four channels, one channel each for Oi and Di, and two channels for the optical flow image OFi,i+1 (one channel each for the displacements of the pixels in the x and y direction, respectively).
For the purpose of modeling the time-dependent behaviour of the fresh concrete properties, temporal information ∆t is introduced.In this context, ∆t represents the time difference between the age of the sample at which the image pair for generating Oi and Di is acquired and the age for which the fresh concrete properties are to be predicted.Besides the temporal information ∆t, information from the mix design m is added as additional input.m contains information about the water-cement (mass) ratio, the grading curve of the aggregate particles, the paste content, the admixture content and the time difference between starting the mixing process and the image acquisition.To take into consideration the influence of different mixing velocities (i.e. the speed of the mixing tools) and frame rates of the imaging sensors, these parameters are both added to m.For numerical reasons, the inputs and the reference values are normalized.Consequently, the outputs of the CNN are the predicted normalized values for the slump flow diameter δ∆ t and the rheological parameters yield stress τ0,∆ t as well as plastic viscosity µ∆ t at the age of the sample defined by ∆t.These parameters are summarised in the target state vector C = [δ∆ t , τ0,∆ t , µ∆ t ].

Network architecture
We make use of a CNN for the prediction of the state vector C, consisting of seven convolutional layers which are followed by three fully connected layers (FC layers).The architecture of the CNN is based on the CNN presented in Ponick et al. (2022).As the problem is less complex than usual classification and segmentation tasks and we only have a relatively small amount of training data, we limit the number of unknowns by using comparatively few layers.The results in Ponick et al. (2022) support this approach.A high-level overview of the architecture is shown in Fig. 1.The convolutional layers have a kernel size of 5x5 and a stride of 2 each, and are followed by batch normalisation and a Rectified Linear Unit (ReLU) activation function.The number of neurons in the FC layers decreases linearly from 660 down to the three output neurons in the output layer.Each FC layer has a leaky ReLU activation function using a slope of 0.2.No batch normalization is used between the FC layers.
The input set [Oi,Di,OFi,i+1] is fed to the convolutional layers.The layers extract features to produce the flattened feature embedding z with a length of 640 elements.∆t and m are added in a late-fusion manner to z.By concatenating z, ∆t and m, we obtain the feature vector f which is passed to the FC layers as input.The FC layers map f to the time-dependent output parameters δ∆ t ,τ0,∆ t , and µ∆ t of the target state vector C.This approach was chosen because the FC layers form a MLP and MLPs are suitable for a time-dependent prediction of the static yield stress of cement paste based on the mix design information, temporal information and information on properties of the raw materials (Navarrete et al., 2023).

Training
For the optimisation of the network weights ω the Mean Squared Error (MSE) is used as loss function during training.The weights ω are iteratively adjusted during training in order to minimize the resulting loss.The loss is computed for a minibatch consisting of N samples, each associated with the state vector C containing the k = 1...K target parameters y k , where K = 3 in this paper.To calculate the loss, the squared differences between the reference values y k and the predicted values ŷk are determined and averaged over all parameters and samples in a batch, such that Weight decay is used during training to reduce over-fitting and to encourage better regularisation.This method adds a penalty for large weights, multiplied by a factor λ.This leads to an  additional term in the final loss function Since the reference values of the slump flow diameter, the yield stress and the plastic viscosity are normalized to the same range the loss of each output does not have to be weighted to ensure an equal influence on the training.

Image acquisition
In order to train and test the proposed method we generate an extensive data set: Similar to Ponick et al. (2022) we first design a surrogate-mixing system, which consists of a channel, in which fresh concrete is filled, and a paddle shaped mixing tool, which moves through the fresh concrete in direction of the channel.The paddle moves on a linear trajectory to simulate the mixing process in an industrial mixer.The fresh concrete is mixed shortly before filling it into the channel.While the paddle is moving, the camera set up (2 grasshopper 3 USB RGB cameras with a focal length of 8 mm) records images of the paddle moving through the concrete.The acquired images have a size of 1920 x 1200 px.In Fig. 2, the schematic set up and the set up during one experiment are shown.In total, 45 different concretes are prepared during the experiments, 5 of which contained recycled aggregates.The mix designs of the concretes all differ due to variations of the water-cement ratio, paste content, grading curve, cement type, sand-lime powder and additive content.Per experiment, we acquire images in 14 so called runs.In each run the paddle moves six times back and forth in the channel, while the cameras record 1300 images.Before each run, the fresh concrete is first briefly mixed manually and then the surface is smoothed.In runs 1-7, the frame rate of the cameras is set to 30 frames per second (fps), and the paddle moves with a velocity of 0.2 m s .In runs 8-14 the frame rate is increased to 60 fps and the velocity of the paddle is set to 0.45 m s .
In order to maximize the information about the flow behaviour, we only use images which show the concrete directly after the paddle has moved through the material.As a consequence, the paddle is visible in the images.To eliminate effects stemming from the paddle itself we correct D after matching and only use heights below a certain threshold to generate O.For generating O and D it is necessary that both images are taken at exactly the same time.To ensure this requirement, a panel with 20 LEDs is installed on the edge of the channel.The LED panel is visible in both images and the LEDs are systematically switched on and off in millisecond intervals.Through the changing constellations over time, a time stamp is generated for each image, which is used to verify that both images of an image pair are synchronized.
Regarding the fresh concrete properties it is assumed that these remain constant during the short time (approx.44 and 22 sec) of one run.Therefore, every image pair of a run is associated with the same time stamp, namely that of the central image pair of that run.Consequently, ∆t represents the time difference between the associated time stamp of the image pair and the point in time at which the fresh concrete properties are to be predicted.

Reference values
The reference values for the training and testing are measured in parallel to the acquisition of the images.The slump flow diameter δ is measured with the slump test (EN 12350-5, 2019) and the yield stress τ0 and plastic viscosity µ are measured with a rheometer.For this measurement a eBT-V rheometer from Schleibinger3 is used.The rheological parameters can then be derived from the resulting flow curves.The rheological parameters are determined according to the method presented in (Feys et al., 2013).To consider the change of the fresh concrete properties over time induced by the chemical hydration of the cement contained in the concretes, the slump test and the rheometer measurement are repeated in intervals of about 30 minutes.The first slump test is always carried out directly after the end of the mixing process of approx.9 minutes after water addition.Since the slump test and the rheometer measurement are independent from each other, the two measurements have different time stamps.Consequently, ∆t consists of two time differences: The first is the time difference between image acquisition and the time at which the slump test is carried out, the second is the time difference between image acquisition and the time at which the rheometer measurement is carried out.The wide range of the reference values and time differences of the data set is shown in Tab. 1.As some reference measurements are carried out before images are taken, there are also negative values for time differences.

Training configuration
The CNN predicts the three parameters δ∆ t , τ0,∆ t , and µ∆ t in a multi-task learning manner.Since two independent measurements (slump test and rheometer measurement) are used to generate the reference values, the inputs O, D and OF have to be assigned to one reference combination of δ∆ t respectively τ0,∆ t and µ∆ t .For each concrete, all possible reference combinations are generated, and each input set (Oi, Di and OFi,i+1) is assigned to one reference combination.m always remains the same for the input sets of a concrete, except for the information about paddle velocity and recording frequency of the cameras.Subsequently, the two time differences of ∆t are calculated for each input set based on the assigned reference combination.Sometimes Oi+1 is missing because the paddle is not visible for several time steps, the end of the run is reached or a framedrop occurs.An input set is only generated if for Oi also Oi+1 is present, otherwise OFi,i+1 could not be calculated.The first 20 input sets of each run are not used, as the paddle had not yet driven far enough through the concrete to significantly change its surface.In total, the data set consists of 313,615 input sets.
Training is performed with a five-fold cross-validation.The 45 concretes are divided into 5 sets with 9 concretes (i.e.concrete compositions) each.For this purpose, the concretes are first sorted by the length of the first slump flow diameter δ1, which is determined for each concrete directly after the mixing process.Then, the sorted concretes are divided in three groups, one of each containing the data of the 15 concretes with the largest, the intermediate, and the smallest δ1, respectively.From each group three concretes are randomly assigned to one of the 5 sets to guarantee a balanced distribution.As there are a total of 5 concretes with recycled aggregate, we make sure that each set has to contain exactly one of these concretes.In each cross-validation step one set is used as the test set.The validation set, containing 5 concretes, is randomly formed by the concretes of the remaining sets, again by taking δ1 into account and with the condition that it must contain exactly one concrete with recycled aggregate.The remaining 31 concretes form the training set.For two concretes the yield stress and the plastic viscosity are not taken into account, as the corresponding measured reference values are not plausible.
For training, only the loss for the slump flow diameter for these concretes is used.For the evaluation, which is explained in the following, the predictions of the yield stress and the plastic viscosity of these concretes are not considered.
To train the network, Stochastic Gradient Descent (SGD) with a Nesterov momentum of β = 0.99 based on (Sutskever et al., 2013) is used.The learning rate is set to a value of 5 • 10 −3 , which showed the best results in preliminary experiments.
The weight decay parameter is set to λ = 1 • 10 −3 .The network is trained from scratch and the weights are initialised using the method presented in He et al. (2015), whereby a uniform distribution is used.For training, data augumentation is used.The brightness and contrast of O are each changed with a factor that is randomly determined for each O in each iteration with a uniform distribution in an interval of 0.85 to 1.15 for the brightness and in an interval of 0.75 to 1.25 for the contrast.For the data augumentation of D, an offset is determined randomly.For this purpose, a factor is randomly determined with a uniform distribution in an interval from -0.07 to 0.07.The offset is then determined by multiplying the factor by the standard deviation of D in the training data set.
The procedure for determining the offsets for the two channels in OF is analogous to that of D. To evaluate the results, the mean absolute error and the mean relative error for a set of A concretes, each with J input sets for each output, are computed.This means that every concrete has the same weight in the evaluation, even if it may have less input sets.After each training epoch, the network is evaluated on the validation set.To determine the best weights, the three ϵ rel -values (one for each output) of the validation set are averaged.The weights with the lowest averaged ϵ rel -value are chosen for testing.Since no significant improvements of the loss are observed after only a few epochs, the number of training epochs is restricted to 5.
Finally for each input and reference value the data in the training, validation and test set are normalized to a mean of 0 and a standard deviation of 1.This is described by where dc represents the data and ndc the normalized data, each of the data category c (e.g. the input O or the reference value δ).mt,c and stdt,c represent the mean and standard deviation of the corresponding training set for the data category c.The values in m from the mix design are not normalized, as the values are always between 0 and 2. Note, that for the determination of the evaluation metrics the reference values and the corresponding outputs are converted to the original range of reference values.It can be seen that the variations of the inputs have only a very limited influence on the accuracy of the predictions of δ∆ t .The major differences occur in the accuracy of the predictions of τ0,∆ t and µ∆ t .In particular the input m seems to be beneficial for the prediction of these parameters.This can be seen if one compares the results of the combinations O+D+m and O+D (note that in the input combination without m only the information about the used materials are omitted).By comparing the results of combination O+D+m and O+m it can be seen that using the input D also increases the accuracy of the predictions.When evaluating the results, it should be noted that the reference measurements are only carried out for batch samples (a small part of the concrete) and the slump flow diameter is determined manually.These circumstances are also reflected in the average precision of the slump test, which is 2.46 cm (EN 12350-5, 2019).Furthermore, as all concretes have different mix designs, there is no concrete in the test set that has the same mix design as a concrete in the training or validation set.
The results are therefore already within an acceptable range.
In general, the prediction of δ∆ t has a much higher accuracy than the predictions for τ0,∆ t and µ∆ t .One reason for that can be the much wider range of values and the significantly higher ratio of standard deviation to mean value of τ0,∆ t and µ∆ t (see Tab. 1).

Influence of averaging predictions:
In order to investigate whether the deviations in the predictions are random rather than systematic, predictions from different input sets for the same reference value are averaged beforehand.In Tab. 3 it is shown for the example of the input combination D+m+OF how the accuracy of the predictions changes when they are averaged beforehand.The second column of the table shows the results where the predictions of the same reference combinations within a run are averaged beforehand.On average, these are approx.40 predictions.The third column shows the results in which all predictions with the same reference combination are averaged over all 14 runs (approx.540 predictions).
The fourth column shows the results where the predictions are averaged for the same slump flow diameter value respectively the same yield stress and plastic viscosity, which is on average about 1900 averaged predictions in each case.It can be seen that the accuracy increases the more predictions are averaged.This indicates that a part of the deviations are indeed random.s] 11.52 11.17 10.82 10.81 5.2.3 Time-dependent prediction model for fresh concrete properties: Since the network receives the time difference between the moment in time at which the images are recorded and the time at which the properties of the fresh concrete are to be predicted, the network implicitly learns how the properties of the concrete change over time.This can be used to not only predict the properties at a certain point in time, but also continuously over the entire fresh concrete age.In Note, that at this stage of the research only concretes which exhibit a or less pronounced decrease in consistency over time were investigated.The model is thus only trained to identify and quantify this specific behaviour, which can be traced back to the type of chemical admixtures used in the project.Changing the admixture as to yield a steady or even an increase in flow over time will be studied in future and will certainly require an adaption of the model or at least its training.
To generate the predictions, one of the respective runs with all the input sets it contains is used for each concrete to predict the slump flow diameter at each minute in the time interval.In this example, the runs of concrete 1, 2 and 3 consist of 549, 466 and 525 input sets, which means that each estimate is the result of averaging 549, 466 and 525 predictions, respectively, D+m+OF is used as input.The point at which ∆t is zero is the time at which the images of the runs are recorded (note that as mentioned before, all images from a run are assigned the timestamp of the central image pair).Beside the continuous predictions, the respective reference values are shown for each concrete.The precision of the slump test is shown as an error bar.Considering the continuous prediction of the parameters, it can be seen that the network has learnt that the slump flow diameter decreases particularly sharply in the first minutes after mixing and then decreases more slowly.For the yield stress and plastic viscosity, it has generally learnt that both parameters increase over time, but the nature of the increase can vary from concrete to concrete, and the prediction is not as robust as for the slump flow diameter.The time-dependent behavior that the network has learned for the slump flow diameter is plausible and shows that the network is able to predict this parameter over a longer period of time, even though the prediction is not yet as robust for every concrete as in the examples shown.

Conclusion and outlook
The results in this paper show that it is possible to predict the fresh concrete properties with an acceptable accuracy based on images of the fresh concrete flow behaviour, supported by the mix design information.In particular, the slump flow diameter can be predicted with a relatively high accuracy.Furthermore, it could be shown that the time-dependent behaviour of the fresh concrete properties can be learned by the network.This makes it possible to predict the properties of the fresh concrete at specific points in time (e.g. at the time of placement) or continuously over time already during the mixing process.Furthermore, it has been shown that the use of images and information from the mix design improves the results compared to the use of images alone.However, the information from the mix design in our dataset probably contains only relatively small uncertainties, which is why it has a relatively high positive influence on the result.Such a low level of uncertainty cannot always be assumed.The results have also shown that a significant improvement is possible if predictions of the same values are averaged beforehand.
In future work, the counter-intuitive effect of orthophotos on the results will be further investigated.As the results have shown that optical flow images can have a positive impact on the outcome, future work will focus on utilising more of the information contained in image sequences (e.g. with transformer-based models).Also, experiments will be conducted with an industrial mixer to test the methodology under realistic conditions.In addition, for an industrial application it is necessary to include the environmental parameters (temperature, humidity, etc.) during the transportation of the concrete to the construction site in the prediction in order to be able to reliably predict the properties up to the time of placement.

Figure 1 .
Figure 1.CNN architecture of developed method.
(a) Schematic overview of the experimental set up.(b) Set up during the experiment.

Figure 2 .
Figure 2. Experimental set up for generating the data set.
Fig. 3 examples are shown how such a continuous prediction of the slump flow diameter over a time interval looks like.

Figure 3 .
Figure 3. Three examples for the prediction of the time-dependent behaviour of the slump flow diameter.

Table 1 .
Range of reference values and time differences.In the experiments, the influence of different inputs on the performance of the CNN is investigated.Different input combinations are assembled to train the CNN and the resulting performance metrics are compared.In particular, the influence of O, D, OF and m are investigated.Moreover, it is shown, how the accuracy of the predictions changes if the predictions from different input sets for the same reference values are averaged beforehand.At the end, examples of the time-dependent prediction of the behaviour of the slump flow diameter are shown.

Table 2 .
Mean relative and absolute error different input combinations, whereas O represents orthophoto, D depth elevation map, OF optical flow image and m mix design information (the values in bold show the best performance in the respective category).To determine the influence of different input combinations the mean relative error ϵ rel and the mean absolute error ϵ abs of the test sets are used.For each input combination the above described cross validation is carried out and is repeated for two times.Afterwards the overall mean relative and absolute errors are computed, by taking the mean of the mean relative and absolute errors of all test sets, including the test sets from the repeated cross validations.The results are shown in Tab. 2. In total, six different combinations are investigated.
The input OF has a positive influence on the predictions of τ0,∆ t as can be seen by comparing the results of O+D+m and O+D+m+OF or D+m and D+m+OF .However, the results with and without O are counter-intuitive.The predictions from input combinations with O have a worse accuracy than the predictions from input combinations without O.By comparing the results from O+D+m and D+m or O+D+m+OF and D+m+OF it can be seen that the predictions for τ0,∆ t and especially the predictions for µ∆ t are becoming worse by adding O as input.It is noticeable, that the training runs with O as input have a significantly smaller training loss than training runs without O as input.This indicates that the network is overfitting in training when O is used as input.Consequently, the best overall results are achieved if D, m and OF are used as input.To gain a deeper understanding of this behaviour, further investigations will be carried out in the future work.

Table 3 .
Influence of averaging multiple predictions of the mean relative and absolute error for the example of the results of the input combination D+m+OF (the values in bold show the best performance in the respective category; note that the first column is identical to the results of D+m+OF in Tab. 2).