Data Mining as a Method for Comparison of Traffic Accidents in Şişli District of Istanbul

Studies to reduce traffic accidents are of great importance, especially for metropolitan cities. One of these metropolitan cities is undoubtedly Istanbul. In this study, a perspective on reducing traffic accidents was trying to be revealed by analyzing 3833 fatal and injury traffic accidents that occurred in the Şişli district of Istanbul between 2010-2017, with Data Mining (DM), Machine Learning (ML) and Geographic Information Systems methods (GIS), as well as traditional methods. It is aimed to visually determine the streets where traffic accidents are concentrated, to examine whether the accidents show anomalies according to the effect of the days of the week, to examine the differences according to the accidents that occur in the regions and to develop a model. For this purpose Kernel Density, decision trees, artificial neural networks, logistic regression and Naive Bayes methods were used. From the results obtained, it has been seen that some days are different from other days in terms of traffic accidents, according to the accident intensities and the performances of the modelling techniques used vary according to the regions. This study revealed that the ‘ day of the week effect ’ can also be applied to traffic accidents.

. Regarding big data; the use of new data technologies such as Data Mining (DM), machine learning, cloud computing and the internet of things contributes significantly to eliminating the transportation problem and reducing traffic accidents in the cities of the future (Khokale & Ghate, 2017). It is very difficult to benefit from big data using traditional methods and technologies. For this reason, various methods have been tried to be developed to analyse and interpret big data (Lin, Wang, & Sadek, 2014). DM, which is one of these methods, has an important place in obtaining meaningful information from databases where a very large amount of information is stored and it is very important to support it with developing technology (Bayrak & Kirci, 2019). When the previous studies are examined, it has been revealed that there is no study to determine the traffic accident points in Şişli district and to reduce the accidents in the regions where these accidents are concentrated. Accident analysis studies are mostly based on statistical data. Since analyses will be made based on big data with this study, we can call the accident analysis as "Accident Analytics". Another reason why accident analysis is named as accident analytics in the study is that traffic accidents cannot be determined in advance in terms of place and time and cannot be interpreted well enough (Ersen, Büyüklü, & Taşabat, 2021). Accident analytics, which enables traffic accident analysis based on concrete traffic accident data and traffic information, will help you to understand the accidents in detail and to determine what can be done to prevent future traffic accidents and will help to minimize traffic accidents (Ersen, Büyüklü, & Taşabat, 2021). Traffic Accident Analytics aims to create safe road and vehicle usage opportunities by establishing smart systems using the latest technologies based on the most appropriate scientific methods. While doing this, it discovers the meaningful pattern in the data by using the available data structures, optimization, simulation methods, data analytics and data mining (Ersen, Büyüklü, & Taşabat, 2021). In these analyses, methods such as the accident frequency (number) method and accident recurrence rate method are used to determine accident points. In addition, this study aimed to visualize the regions where the accidents are concentrated, determined by the accident analysis studies with the Kernel Density Estimation method (Thakali, Kwon, & Fu, 2015;Mohaymany, Shahri, & Mirbagheri, 2013). Identifying the cause of location-based similar accidents is very important in terms of ensuring the safety of people in traffic and reducing the number of accidents (Gündoğdu, 2010;Saplıoğlu & Karaşahin, 2006). During the determination of these processes, the location of the accidents was visualized with GIS, making the information easier to understand (Erdogan, Yılmaz, Baybura, & Gullu, 2008;Dereli & Erdogan, 2017;Le, Liu, & Lin, 2019). In order to develop effective road safety measures that can be used to reduce traffic accidents, it is necessary to identify the regions (hot spots) where the accidents are concentrated (Xie & Yan, 2008). The Kernel Density estimation method is the most common method used in spatial analysis studies in GIS and it is known to give successful results in determining hot spots (Ersen, Büyüklü, & Taşabat, 2021). The Kernel Density estimation method is based on the process of determining the accident intensity by weighting in a determined impact area. In this method, the density of the points falling into the circle with a defined radius and the point density that changes as you move away from this source are expressed (Xie & Yan, 2013;Bil, Andrasik, & Janoska, 2013). For this purpose, first of all, the streets where traffic accidents are concentrated in Şişli district were determined with Exploratory Data Analysis Tools (EDA). The distribution of the accidents according to the intersections and streets of the region was analysed with the SAS Enterprise Guide software. The kernel Density estimation method was used to visualize the accident densities at these intersections. By making use of the ArcGIS 10.7 software, information was visualized with GIS and it was made easier to understand the places where the accidents were concentrated. Then, the outputs of the results obtained on the day-based significance analysis of the accidents were examined for the effect on the day of the week and the differences were interpreted according to the streets where the accident density was determined. Finally, the data set and divided into 3 parts with SAS Enterprise Miner software for training, validation and testing purposes and appropriate models (Decision Trees, Artificial Neural Networks, Logistic Regression and Naive Bayes) are selected. The ratios of training, validation and test sets were determined as 60%, 30% and 10%, respectively. The success of the modelling techniques used was determined by the accuracy, error rate and ROC values, and it was observed that the successful performances of the models changed on the streets with high accident density.

. Materials and Methods
In this study, the effect of the day of the week on the accidents was investigated after the statistical information about the traffic accident density was analysed. The day of the week approach, which is frequently used in studies on financial markets and explains the price behaviour of stocks, which is expressed as an anomaly, has been examined for traffic accidents (Orhan, Emikönel, & Emikönel, 2021;Aharon & Qadan, 2019). A total of 3833 traffic accidents death and injury occurred in the whole Şişli district between 2010 and 2017 and the traffic accidents on the streets where the accident density was determined in the Kernel Density maps, the differences in the days in terms of the effect of the day of the week were evaluated according to the regions. It is important for the units and researchers to see that some days differ from other days in traffic accidents so that they can make a more effective assessment (Yılmaz & Akkaya, 2020;Cengiz, Bilen, Büyüklü, & Damgacı, 2017). Daily traffic accidents were used to investigate the effect on the day of the week for a total of 3833 fatal and injury traffic accidents that occurred in Şişli between 01.01.2010 and 31.12.2020 and on the traffic accidents on the streets where the accident density was determined.
Day of the week anomaly in traffic accidents was investigated through the linear regression equation (1) using dummy variables (Evci, 2020;Ersen, Büyüklü, & Taşabat, 2021 In the models established for traffic accidents, the dummy variable takes the value of 1 for observations corresponding to that day and 0 for other observations. In this context, in the models established for the day, the null hypothesis is that the average traffic accidents of the days are equal and the alternative hypothesis is that there is a difference in the average traffic accidents of the days and the effect on the day of the week effect (Gujarati & Porter, 2009). No additional independent variables were added to the linear regression equation established for the day of the week. This is because this study aimed to determine whether some days in traffic accidents are statistically different from other days in terms of traffic accidents by using only OLS (Least squares) t-test and ANOVA analysis (Gujarati & Porter, 2009). Dummy variables can be used as easily as quantitative variables in the regression model. However, in a regression model, all explanatory variables can consist of dummy variables. These models are called ANOVA models. In addition, models in which qualitative and quantitative variables coexist are called the ANCOVA model. Since no quantitative explanatory variables were used in the established models, these models were accepted as equivalent to the ANOVA model. In this study, only fatal and injury accidents were studied, and material damage accidents and much information could not be obtained. If this information could be obtained, the model could be expressed with the ANCOVA model since an additional quantitative variable would be added to the model (Gujarati & Porter, 2009;Ersen, Büyüklü, & Taşabat, 2021). Finally, decision trees, artificial neural networks, logistic regression and Naive Bayes models were used in this study for modelling techniques, respectively, and the success of these methods in classifying traffic accident results as fatal and injury accidents were compared in the whole Şişli district and on the streets where the accident density was determined (Yavuz, Ergül, & Aşık, 2021;Özden & Acı, 2018;Singh & Kaur, 2016;Chong, Abraham, & Paprzycki, 2005). Thus, it is aimed to determine the most appropriate classification method according to the regions in reducing the future traffic accidents by determining the fatal and injury accident points with the Kernel Density method.

Decision Trees
Decision trees are one of the most preferred DM approaches for classification and prediction problems. It is simpler than other classification methods because it can be expressed visually, easily interpreted and understood (Zhao & Zhang, 2008;Çalış, Kayapınar, & Çetinyokuş, 2014). Decision trees start with a single root and continue with the formation of a leaf node (internal node) after each split. It represents a decision whose outcome is expressed in probabilities at each internal node. Decision trees are advantageous for decision-makers due to their ease of understanding and interpretation, low cost and good reliability. However, the disadvantages of decision trees are that they can produce complex trees that do not explain the data well, that they are not very successful in estimating continuous values and that they fail to build a model when the number of classes is large and the number of learning cluster samples is small. One of the most important issues in decision trees is to determine the best separation criterion. The separation rule determines by which criteria the target variable should be divided. The most commonly used separation rules are entropy, Gini and the chisquare test. A decision tree method has been developed for each different separation rule in decision trees. For example; in decision tree methods such as ID3, C4.5, C5 the most distinguishing feature is determined by entropy (Long, Griffith, Selker, & D'Agostino, 1993;Emel & Taşkın, 2005). On the other hand, Gini in the CART method and chi-square division rule in the CHAID method are selected. Since the type of the target variable in SAS Enterprise Miner is binary scale, entropy, Gini and chi-square can be selected as separation methods in the Nominal Target Criterion. In this study, the entropy separation method is preferred among these methods. In the Sub-layer split node property (Split Search Subtree node), Largest is selected for the method option and Misclassification is selected from the Assessment Measure property (Walsh, 2005;Şahin, 2018;Yılmaz, 2012). These options were chosen because they gave the best results experimenting with different ways.

Artificial Neural Networks
After the decision tree analysis, the artificial neural networks (ANN) model was examined secondly. ANN method, one of the most powerful methods in DM, is an artificial intelligence research field inspired by the working principle of the human brain. The first studies in this method started with the modelling of neurons that make up the human brain and their application in computer systems.
In recent years, with the developments in the computer system, it has become a method that can be used in many areas (Budak & Erpolat, 2012;Olutayo & Eludire, 2014). ANN method; ıt is examined in three main layers, namely the input layer, the intermediate (hidden) layer and the output layer. This method is likened to a black box since the exact relationship between the input and output layers cannot be established. The reason for making a black box analogy to this method is related to the unknown, what is in the hidden layer. In the ANN method, only the results are concerned. Therefore, the formation of the results with this method cannot be explained. In the ANN method, this situation causes, the researchers to lose their confidence. In addition, one of the most important disadvantages of the model is that it produces very complex models. However, in recent years, studies in artificial neural networks, which are frequently used in almost all fields from financial fields to medicine, from the defence industry to automation and control fields, have increased the interest in this method. The biggest advantage of the artificial neural network model over traditional methods is that it can give positive results in solving problems that are complex to solve, the data set is not linear, there are missing or incorrect data and multidimensional. Also, when compared with statistical methods, another advantage of this method is that it does not make any assumptions about data properties and distributions (Budak & Erpolat, 2012;Duran, Pamukçu, & Bozkurt, 2014). The formulas used to combine the information produced in the hidden layer in SAS Enterprise Miner are located in the hidden layer combination functions section. The formulas used to transform the combined value in the hidden layer are in the hidden layer activation functions section. The outputs of one layer are expressed as the inputs of the next layer. In addition, there are combinations and activation sections for units in the target layer in SAS Enterprise Miner. The formulas used to combine the information produced in the target layer are in the target layer combination functions and the formulas used to transform the combined value in the target layer are in the target layer activation functions section. The target variable activation function is used to interpret the information produced in the target layer. Combination and activation processes in the hidden layer and the target layer are important elements in a neural network model. For this reason, many artificial neural network models are produced by SAS Enterprise Miner from the options of hidden layer combination function, hidden layer activation function, target layer combination function and target layer activation function. In our model, when the hidden layer combination function, hidden layer activation function, target layer combination function and target layer activation function settings are set as default, it is seen that the success rate of the model is high. In this study, the model selection criterion property is set to the Profit/Loss. When the model selection criterion is set to the Profit/Loss; for the cases in the validation data set, the model that maximizes the profit or minimizes the loss is selected (Şahin, 2018).

Logistic Regression
When the dependent variable is continuous, the linear regression equation is usually used, whereas when the dependent variable is categorical, logistic regression is used. The logistic regression method is used in many fields of study such as economy, education, health, Biostatistics, banking, finance sector and marketing. In the logistic regression model, assumptions such as the normal distribution of the error terms in the linear regression model, the expected value of the error terms being zero, the error terms being constant to the same variance, the absence of autocorrelation and the independent variable not being a random variable are not sought (Şen, 2014). In logistic regression analysis, the "Maximum Likelihood" method is used to estimate the coefficients of the variables. In logistic regression, unlike normal regression, the dependent variable is binary; the dependent variable can take the value 1 with probability q, or 0 with 1-q probability. Normal regression can be represented by equation (2). While 12 ( , ..., ) The value of () Px, a logistic function other than the linear function must be used to constrain it from 0 to 1 by changing monotonically with x. Equation (in 3, 4 and 5) shows the calculation of () Px and () Qxvalues, respectively.
In this case, the output can be represented by equation (5) in the input logistic regression model.
The significance of the coefficients estimated by the maximum likelihood method is determined by the "likelihood ratio test, G statistic" based on likelihood functions, or the "Wald test" using the standard normal distribution approach of the distribution of the test statistic of interest (Yavuz & Çilengiroğlu, 2020). Odds ratios are used to interpret the coefficients in logistic regression. The "odds ratio" can be defined as the ratio of preference to not preference in any event. For example, if the probability of an event of interest is (p), the probability of the other event occurring will be (1-p). If the odds value is between 0 and 1, the risk factor is "protective" for the outcome variable, if the odds value is 1, there is no difference between the risk factor and the outcome variable and if the odds ratio is greater than 1, there is a difference between the risk factor and the outcome variable and it is stated that this difference will be explained mathematically by a multiple. In addition, the confidence interval for the odds ratio should not include 1 (Yavuz & Çilengiroğlu, 2020).

Naive Bayes
Naive Bayes classification is one of the most preferred classification methods in which class estimation is based on Bayes theorem. In this method, how the data is classified is more important than its classification. The most important rule of the Naive Bayes method is that it estimates the class condition probabilities unbiasedly, assuming that the attributes are independent of each other. All attributes are considered to be equally important. The probability of the outcome is expressed by multiplying the probabilities of all the attributes that affect that outcome (Yavuz, Ergül, & Aşık, 2021). While Bayes theorem is used for classification, the case with the highest probability among the resulting probabilities is chosen as the target class as in equation (6).
But when (v) the input instance (v) has more than one attribute, the Bayesian formula changes to a different form. In the target class prediction for the data sample with the intersection view of many features, the product of the conditional probabilities for all features should be calculated as in equation (7).
The most important difference to be noted in the calculations of the Naive Bayes classifier and Bayes Theorem is that the classifiers try to find the target class rather than the probability value. Therefore, the value in the denominator can be neglected, as it is common to the probability calculations of all target classes (Orhan & Adem, 2012). For this reason, the formula that we will pay attention to while finding the target class is shown in equation (8).
The Naive Bayes method is an advantageous method because it is easy to understand, can be trained simply with a small data set, and works very quickly compared to other methods. However, it also has disadvantages as it accepts that each attribute is independent of each other and the relationship between the variables cannot be shown.

Comparison of Models
It is necessary to look at some comparison criteria in evaluating the classification performance of models made using datasets. Accuracy rate, error rate and ROC (Receiver Operating Characteristic Curve) were used in this study (Şahin, 2018;Duran, Pamukçu & Bozkurt, 2014). The classification matrix showing the result and the actual situation as a result of the classification technique is given in table 1.

Accuracy Rate
The percentage of samples correctly classified. The calculation is done as shown in equation (9).

Error Rate
It is calculated as the opposite of the accuracy rate method. The percentage of samples that were incorrectly classified. Calculation is done as shown in equation (10).

Receiver Operating Characteristics Curve (ROC)
One of the most preferred methods for evaluating the performance of classification systems is the Receiver Operating Characteristic Curve (ROC). This curve is another method that allows us to compare the models by measuring the accuracy of the estimation of the established model. The Receiver Operating Characteristic Curve (ROC) is an effective method that uses classifiers by visualizing them according to their performance. The ROC curve is a probability curve used to show the balance between the true positive rate and the false positive rate of a classifier. The X-axis of a ROC curve shows the false positive rate and the Yaxis the true positive rate. With this curve, the differences between the classes to be estimated as a result of the model can be observed. The ROC-AUC measure represents the area under the ROC curve. It is understood that ML models with a large covered area are more successful than other methods in distinguishing given classes. The ideal value for AUC is 1. The advantages of using this curve are that the Roc curve can be directly compared to the curves of different models and a summary of the performances of the models with the area under the curve (AUC) is shown (Duran, Pamukçu, & Bozkurt, 2014).

. Application Results
In this study, when the Kernel Density map is examined, Şişli district is handled in 2 regions based on traffic accidents and in 3 sections as Büyükdere Street. This distinction was made due to the high accident densities in 2 regions and some parts of Büyükdere Street. Thematic accident maps of the determined regions according to accident occurrence types and accident density maps with the Kernel Density method were examined with the help of statistical analysis. Then, the year, month and day data of the traffic accidents occurred and it was investigated whether the accidents in these regions had a statistically significant day effect. Thus, the differences according to the day of the week affect of the accidents according to the accidents occurring in the regions were interpreted. Finally, by establishing decision trees, artificial neural networks, logistic regression and Naive Bayes models as modelling techniques, the success of the methods according to each other; The accuracy rate was compared with the error rate and ROC value. The independent variables included in the analysis; are accident month, accident day, accident time zone, traffic accident type, weather condition, day status, road surface, road geometric horizontal, road geometric vertical, road geometric intersection, road geometric walkway, lane line, lighting, sidewalk, traffic lamp, type pf road and number of vehicles. The target variable, which is the dependent variable, is the "accident result" variable. The names of the variables used in the analysis, their roles in the model, variable type, value names and label values are given in table 2 below.  (3), April(4), May(5), June(6), July (7), August(8), September(9), October(10), November(11), December (12) 1-12 Accident Day Input Nominal Scale Monday(1), Tuesday(2), Wednesday (3), Thursday(4), Friday(5), Saturday(6), Sunday (7) 1-7 Accident Time Zone Input Ordinal Scale 00:00-04:00(1), 04:00-08:00(2), 08:00-12:00(3), 12:00-16:00(4), 16:00-20:00(5), 20:00-24:00 (6) 1-6 Traffic Accident Type Input Nominal Scale Head-On Collision(1), Rear Impact Collision (2), Side-Impact Collision (3), Side-to-Side Collision(4), Hitting a Stationary Vehicle(5), Multiple Vehicle Collision (6), Multiple Hitting (7), Hitting Fixed Objects (8), Hitting Pedestrian (9), Animal Impact (10), Vehicle's Rolling Over (11), Run-Off Road (12), Falls from Vehicles (13) 1-13 Weather Condition Input Nominal Scale Sunny(1), Cloudy(2), Foggy (3), Rainy(4), Snowy (5), Stormy (6), Strong Wind (7) 1-7 The "accident result" variable, which is determined as the target variable in the current data set, is at two levels as fatal and injury and the number of fatal accidents are considerably lower than the number of injury accidents. However, in this study, this special case was not taken into account in the data set, as the aim of this study is to help predict which possible accidents will be involved in the future by classifying fatal and injury traffic accidents and to offer a different perspective to the studies in this field. From the results given in detail in the later parts of the study, it was seen that this situation did not affect the model success rates. In the examination made with the help of frequency analysis, it was seen that 3833 fatal and injury traffic accidents occurred in Şişli district, 3805 of which occurred as injury accidents and 28 accidents resulted as fatal accidents. When we evaluate the fatal accidents, according to the type of accident, it is seen that the most accident is in the form of hitting a pedestrian, followed by the accidents that occur in the form of hitting a stationary vehicle. When we examine the accidents that resulted in injury, according to the type of accidents, it was found that the most accidents were in the form of side collisions, followed by pedestrian collisions and rear collisions. When we look at the points where fatal accidents occurred in figure 1 (c), it was observed that the highest number of fatal accidents occurred on Büyükdere Street with 9 accidents. It was observed that 1 fatal accident occurred on Halaskargazi Street and Cumhuriyet Street, which are the other streets examined in this study. When we look at the points where the injury accidents occurred in figure 1 (d), it was found that the most injury accident occurred on Büyükdere Street with 800 accidents. It was concluded that 288 injury accidents occurred on Halaskargazi Street and 146 injury accidents occurred on Cumhuriyet Street, which is one of the other streets examined. It is aimed to analyze the dangerous accident points in reducing the accidents by correctly classifying the accident results of fatal and injury traffic accidents in the whole Şişli district and on the streets where the accident density is determined by the most successful DM method. When the thematic accident map according to the type of accidents in figure 1 (a) and the bar graph made for the type of accidents in figure 2 are examined, it is seen that the accidents occurred in the form of a sideimpact collision the most, followed by the accidents that occurred in the form of hitting the pedestrian and rear impact collision. It was observed that 13 types of accidents occurred on the thematic accident map by type of accident. In the variable name column of table 2, the type of accident and label values between 1 and 13 are given.  In figure 3, bar graphs of the number of accidents, according to years, months, days and time zones, respectively, of a total of 3833 fatal and injury traffic accidents that occurred in the Şişli district between 2010-2017 are given. When we examine it according to the years in figure 3 (a), it is seen that the highest number of accidents was in 2012 with 538 accidents and the least accident was in 2017 with 423 accidents. In figure 3 (b), when we examine it by months, it is observed that the highest number of accidents occurred in May with 363 accidents, and the least accident occurred in February with 245 accidents. When we examine it according to the days in figure 3 (c), it is seen that the highest number of accidents occurred on Tuesday with 567 accidents and the least number of accidents was on Wednesday with 500 accidents. When we examine these accidents, according to the time zones in figure 3 (d), it is concluded that the highest number of accidents occurred between 12:00 and 16:00 with 826 accidents and the least accident occurred between 04:00 and 08:00 with 336 accidents. First of all, it was investigated whether the traffic accidents have a statistically significant day effect on the year, month and day data in the whole Şişli district. The statistical model used is as in equation (11) Table 5 shows (***) 1% significance level, (*) 10% significance level. During the day-based significance study, it was observed that Wednesday was different from other days in terms of accident numbers at the 10% significance level. It has been observed that the other days, except Wednesday, do not have a significant day effect, with Monday as the base day. The constant was found to be statistically significant at the 1% significance level. This indicates that Monday has a significant day effect on traffic accidents. When the units trying to prevent traffic, and accidents deal with this study, they should investigate the reason why Monday and Wednesday show anomalies from other days. In figure 4, figure 4 (b), it is understood that the accidents are mostly in Halaskargazi, Abidei Hürriyet, Kurtuluş, Ortaklar, Piyalepaşa Boulevard, Ayazağa,

The Entire District of Şişli Results
Büyükdere, Cendere, Cumhuriyet and Darülaceze streets. When we examine the accidents that took place on Wednesday, it was seen that the most accidents in Abidei Hürriyet, Ayazağa, Büyükdere and Cumhuriyet streets were in the form of side collisions. On Wednesday, it was concluded that the most accidents on Piyalepaşa Boulevard were equally rear-ending and side-impacted. In the accidents that took place on Wednesday, it was observed that the most accident occurred in the form of pedestrian collisions in Halaskargazi, Kurtuluş, Ortaklar and Darülaceze streets. On Cendere Street, on Wednesday, it was observed that most accidents were in the form of mutual collisions. The meaning of the label values of the type of accidents in figure 4(a) and figure 4(b) is given in table 2. On Mondays and Wednesdays, the type of accidents in places where accidents are concentrated were tried to be analysed by adding the type of accidents given in table 2 to the obtained Kernel Density maps. The colour indications of the accident type label have been changed accordingly on Monday and Wednesday due to the fact that the colours cannot be seen clearly due to the differences in the locations where the accidents are intense. In addition, it was observed that there was no 7th type of accident in the accidents on Wednesday.   In figure 5, the interface of the SAS Enterprise Miner software, where the application is made, is given. Since the missing data were excluded from the analysis during the creation of the data set, it was seen that there was no amount of missing data. It has been observed that the kurtosis value obtained is at a normal level and there is no need to transform this variable. After these processes, the training, testing and validity rates of the models to be used were selected. As a result of various trials, it was decided to select the ratios of training, validity and test set as 60%, 30% and 10%, respectively.  In figure 6, ROC curves of all accidents in Şişli district are given according to all models.
The area under the curve shows the accuracy rate of accidents that resulted in fatal or injury. The area under the curve represents the ROC values. The baseline curve in the ROC curve comparison chart represents a model without predictive power. The predictive power of the curve is as it approaches the ideal point where the specificity value is 0 and the sensitivity value is 1. Therefore, when compared with ROC value, the predictive power of the model is related to its closeness to 1. In order to better see the legends and other elements that cannot be seen clearly on the big map, Şişli district is divided into 3 regions and a detailed analysis of these streets has been made.  Figure 7 shows the division of the parts of Şişli district outside of Büyükdere Street into regions. The separation of Büyükdere Street was also examined as the 3rd region. The decomposition is as follows;   When the thematic accident map according to the type of accidents in figure 8 (a) and the bar graph made for the type of accidents in figure 9 are examined, it is seen that the accidents occurred in the form of hitting the pedestrian the most, followed by the accidents that occurred in the form of side impact collision. accidents and the least number of accidents occurred on Tuesdays and Wednesdays with 32 accidents. When we examine these accidents, according to the time zones in figure 10 (d), it is understood that the highest number of accidents occurred between 16:00 and 20:00 with 64 accidents and the least number of accidents occurred between 04:00 and 08:00 with 35 accidents.

Halaskargazi Street Results
As we have examined all the accidents in Şişli district, the accidents that occurred in Halaskargazi Street were also examined to investigate the effect of the day of the week. The data entered in the excel table of the accidents and the statistical model equations used were made as in all the accidents in Şişli district.  Table 8 shows (***) 1% significance level, (**) 5% significance level. During the day-based significance study, it was observed that Tuesday and Wednesday were different from other days in terms of the number of accidents at the 5% significance level. It was observed that the other days except Tuesday and Wednesday did not have a significant day effect, with Monday as the base day. The constant was found to be statistically significant at the 1% significance level. This indicates that Monday has a significant day effect. By frequency analysis, the accident causes of the days showing anomaly on this street were trying to be investigated. It was observed that in the accidents on Halaskargazi Street on Monday, there were 15 accidents in the form of side collisions, followed by hitting the pedestrian with 14 accidents. When we examine the side impact accidents on Monday, according to vehicle types, it has been observed that the most accidents are made by motorcycle vehicles, and in accidents that occur in the form of pedestrian collisions, it is observed that the most accidents are made by automobile vehicles. When the accidents on Tuesday were examined, it was seen that the highest number of accidents occurred in the form of hitting the pedestrian with 12 accidents, followed by the accidents in the form of multiple vehicle collisions with 6 accidents. When we analysed the pedestrian crashes that occurred on Tuesday, according to vehicle types, it was found that the highest number of accidents was made by automobiles. In the multiple vehicle collisions that occurred on Tuesday, it was understood that most accidents were caused by motorcycle vehicles. When the accidents in Halaskargazi Street on Wednesday were examined, it was seen that the highest number of accidents was pedestrian collisions with 12 accidents, followed by side collisions with 6 accidents. It was concluded that pedestrian collision accidents that occurred on Wednesday were mostly caused by motorcycles and automobiles. It was found that the accidents in the form of a sideimpact collision on Wednesday were mostly caused by motorcycles.  Table 9 examines whether the model is significant as a whole. It was concluded that the model was not significant. Decision trees, artificial neural networks, logistic regression and Naive Bayes models were established in Halaskargazi Street and the results were examined.

When we compare the accidents in the Şişli
Halaskargazi Street according to accuracy and error rates in table 10; it has been seen that artificial neural networks and logistic regression methods in training accuracy and training error rates, all methods except Naive Bayes in validity, accuracy and validation error rates, decision trees and logistic regression methods in test accuracy and test error rates. When we make a comparison, according to the ROC values, it was seen that only the training ROC values were calculated due to the scarcity of data. It has been observed that artificial neural networks and logistic regression methods give successful results according to the training ROC values. When we make a general comparison, it was found that the logistic regression method gave better results than other methods in all comparison criteria. In figure 11, ROC curves of the accidents in Halaskargazi Street are given according to all models. The area under the curve shows the accuracy rate of accidents that resulted in fatal or injury. The area under the curve represents the ROC index values.

Cumhuriyet Street Results
(a) (b) Figure 12. Maps of traffic accidents on Cumhuriyet Street in Şişli district: (a) Thematic accident map by type of accident; (b) Kernel Density method map.
In figure 12, maps of a total of 147 fatal and injured traffic accidents that occurred between 2010-2017 on Cumhuriyet Street in Şişli district are given. The thematic accident map according to the type of accident is given in figure 12 (a), and the Kernel Density method map is given in figure 12 (b). Thus, the density of accidents cannot be clearly understood on all Şişli maps, and Cumhuriyet Street is better understood with the decomposed maps provided. It has been observed that the places where the accidents are intense in the Şişli Cumhuriyet Street are in the streets leading to the Istanbul Lütfi Kırdar International Convention and Exhibition Center, Istanbul Congress Center, Cemil Topuzlu Open Air Theatre and the Hilton Istanbul Bosphorus hotel. It is thought that accidents can be prevented by increasing pedestrian safety around the venues where the events are held. When the thematic accident map according to the type of accidents in figure  12 (a) and the bar graph made for the type of accidents in figure 13 are examined, it is seen that the accidents occurred in the form of side-impact collision the most, followed by the accidents that occurred in the form of rear-impact collision and hitting the pedestrian. In figure 14, the bar graphs of the number of accidents according to years, months, days and time zones, respectively, of a total of 147 fatal and injury traffic accidents that occurred on Cumhuriyet Street in Şişli district between 2010-2017 are given. When we examine it according to the years in figure 14 (a), it is seen that the highest number of accidents occurred in 2011 with 29 accidents and the least accident occurred in 2017 with 10 accidents. When we analyse the figure 14 (b) by months, it is observed that the highest number of accidents occurred in October with 18 accidents and the least accident occurred in February with 4 accidents. When we examine it according to the days in figure  14 (c), it is understood that the highest number of accidents occurred on Sunday with 31 accidents and the least accident occurred on Thursday with 14 accidents. When we examine these accidents, according to the time zones in figure 14 (d), it is concluded that the highest number of accidents occurred between 04:00 and 08:00 with 33 accidents and the least number of accidents occurred between 20:00 and 24:00 with 16 accidents.
As we have examined all the accidents in the Şişli district, the accidents that occurred on Cumhuriyet Street were also examined to investigate the effect on the day of the week. The data entered in the excel table of the accidents and the statistical model equations used were made as in all the accidents in the Şişli district.  Table 11 shows (***) 1% significance level, (*) 10% significance level. During the day-based significance research, it was observed that Sunday was different from other days in terms of the number of accidents at the 10% significance level. It has been observed that the other days, except Sunday, do not have a significant day effect, provided that Monday is taken as the base day. The constant was found to be statistically significant at the 1% significance level. This indicates that Monday has a significant day effect. By frequency analysis, the accident causes of the days showing anomaly on this street were trying to be investigated. It was observed that in the accidents that took place on Cumhuriyet Street on Monday, the highest number of accidents was a rearimpact collision with 5 accidents, followed by side-to-side collisions with 4 accidents. It was found that most of the rear impact and sideimpact collisions on Monday were caused by automobiles. When the accidents on Sunday on Cumhuriyet Street were examined, it was observed that the highest number of accidents was in the form of rear-impact collisions with 7 accidents, followed by pedestrian collisions and vehicles rolling over equally with 5 accidents. It was concluded that the rear impact collisions on Sunday were mostly made by automobiles, and hitting the pedestrian were equally made by motorcycles and automobiles. On the other hand, it was observed that the accidents that occurred in the form of a vehicle rolling over on Sunday were mostly caused by automobiles.    In figure 15, ROC curves of the accidents on Cumhuriyet Street are given according to all models. The area under the curve shows the accuracy rate of accidents that resulted in fatal or injury. The area under the curve represents the ROC index values.

Büyükdere Street Results
(a) (b) Figure 16. Maps of traffic accidents on Büyükdere Street in Şişli district: (a) Thematic accident map by type of accident; (b) Kernel Density method map.
In figure 16, maps of a total of 809 fatal and injured traffic accidents that occurred between 2010-2017 on the Büyükdere Street in Şişli district are given. The thematic accident map according to the type of accident is given in figure 16 (a) and the Kernel Density method map is given in figure  16 (  When the thematic accident map according to the type of accidents in figure  16 (a) and the bar graph made for the type of accidents in figure 17 is examined, it is seen that the accidents occurred in the form of side-impact collision the most, followed by the accidents that occurred in the form of hitting the pedestrian. In figure 18, the bar graphs of the number of accidents according to years, months, days and time zones, respectively, of a total of 809 fatal and injury traffic accidents that occurred on Büyükdere Street in Şişli district between 2010-2017 are given. When we examine it according to the years in figure 18 (a), it is seen that the highest number of accidents was in 2014 with 122 accidents and the least accident was in 2017 with 70 accidents. When we examine the figures according to the months in figure 18 (b), it is observed that the highest number of accidents occurred in November with 92 accidents and the least accident occurred in January with 46 accidents. When we examine it according to the days in figure 18 (c), it is understood that the highest number of accidents occurred on Tuesday with 137 accidents, and the least number of accidents occurred on Saturday with 100 accidents. When we examine these accidents according to the time zones in figure 18 (d), it is concluded that the highest number of accidents occurred between 12:00 and 16:00 with 180 accidents and the least number of accidents occurred between 04:00 and 08:00 with 78 accidents.
As we have examined all the accidents in Şişli district, the accidents that occurred in Büyükdere Street were also examined to investigate the effect of the day of the week. The data entry in the excel table of the accidents and the statistical model equations used were made as in all the accidents in Şişli district.  Table 14 shows (***) 1% significance level. During the day-based significance survey, it was seen that there was no significant day effect on any day provided that Monday was taken as the basic day. Constant was found to be statistically significant at the 1% significance level. This means that Monday has a meaningful day effect. With the frequency analysis, the causes of accidents of the days showing anomalies on this street were trying to be investigated. When the accidents on Büyükdere Street on Monday are examined, it is seen that the highest number of accidents is side-impact collisions with 29 accidents and hitting the pedestrian with 27 accidents. It was found that the accidents in the form of a side-impact collision and hitting the pedestrian on Monday were mostly caused by automobiles.    In figure 19, ROC curves of the accidents on Büyükdere Street are given according to all models. The area under the curve shows the accuracy rate of accidents that resulted in fatal or injury. The area under the curve represents the ROC index values.

. Discussion
It is very important to make decisions based on data to follow and control traffic accidents. For this reason, the method of detecting anomalies with the day of the week effect approach, which is known in the field of finance, has been adapted to traffic accidents. Determining the days that differ from the anomaly detection in traffic accidents can enable faster measures to be taken for the incidents that cause the accidents (Örnek, Vatan, Sarıoğlu, & Yazıcı, 2018). In this study, the streets with high traffic accidents were determined by the Kernel Density method and then the days when the accidents showed anomalies on these streets were determined. Thus, it is thought that the units trying to prevent traffic accidents can take the necessary measures in a narrower area, according to the causes of the accident by examining the days when the accidents show an anomaly in different locations according to the changing conditions. This will also save cost and time.
Since the database of the General Directorate of Security has only data on fatal and injury accidents, data on accidents with material damage could not be obtained. However, the fact that the data we requested from the Insurance Information and Monitoring Centre could not be obtained partially negatively affected this study to be carried out in more detail.
In this study, the accident coordinates obtained from the General Directorate of Security were used in the spatial analysis in the Kernel Density method. The Kernel Density method was preferred because it gives better results visually than other spatial methods in the studies carried out to determine accident densities. In this study, it has been seen that it is more efficient in terms of results to include narrowed location-based analysis instead of studies that will take all of the regional data in our spatial analysis. For this reason, it has been seen that it will be more effective to reduce the accidents by examining the regions where the accidents are intense with frequency analysis and anomaly approach, and offering solutions if possible. In addition, classification methods are emphasized by using traffic accident data in these regions. The success of the models according to the regions was evaluated by applying various ML methods in classifying the results of traffic accidents as fatal and injury. For this, decision trees, artificial neural networks, logistic regression and Naive Bayes models were tried to observe the differences in the results according to the streets where the accidents were intense. The main purpose of developing the models according to different locations is to help reduce accidents by deciding on the most appropriate model in a more restricted area in classifying the result of traffic accidents as fatal and injury accidents. As a result, it was concluded that although the examined models showed high performance, they showed differences according to the locations where the accidents occurred.

. Conclusions
It is important to investigate the cause of the accidents that occurred in the same location in order to reduce traffic accidents and ensure the life safety of people. To take effective measures regarding road safety, it is necessary to determine and analyse the regions where the accidents are concentrated.
With the locational determination of the black spots, the factors causing the accidents should be investigated. When the integrated density maps of the Şişli district were examined, it was decided to examine the district separately as 3 separate regions. Among the spatial methods, the Kernel Density method was preferred because its vitality is better. With this method and statistical analysis, firstly the determined regions were analysed and then the effect on the day of the week for traffic accidents in these determined regions was investigated and the days showing anomalies were evaluated according to the regions. During the day-based significance research on death and injury accidents in the whole Şişli district between 2010 and 2017, it was seen that Wednesday was different from other days in terms of accident numbers at the 10% significance level. In the analysis made separately according to the streets, it was seen that Tuesday and Wednesday were different from other days in terms of the number of accidents at the 5% significance level, in the day-based significance research of the fatal and injury accidents that occurred between 2010-2017 on Halaskargazi Street. Secondly, it was observed that Sunday was different from other days in terms of the number of accidents at the 10% significance level, in the day-based significance study of fatal and injury accidents that occurred on Cumhuriyet Street between 2010-2017. Finally, in the daybased significance research of fatal and injury accidents that occurred on Büyükdere Street, it was concluded that there was no significant day effect on any day, based on Monday. In this study, decision trees, artificial neural networks, logistic regression and Naive Bayes models were established as modelling techniques and the performance of the methods in classifying the target variable as fatal and injury accidents was compared according to accuracy, error rates and ROC values for these streets. In this way, it was ensured that a possible accident in the future would result in death or injury and assisted the relevant units in the measures to be taken. Model building studies were also carried out in terms of other target variables. A successful model could not be obtained in model studies with variables such as target variable accident occurrence type and vehicle type. It was concluded that these target variables are completely random and cannot be predicted within the framework of classification models. In the results obtained, the success of different classification techniques [decision tree for the whole Şişli (Table 7), logistic regression for Halaskargazi Street (Table 10), artificial neural networks and logistic regression for Cumhuriyet Street (Table 13) and decision tree for Büyükdere Street (Table 16)] is due to the structure of the existing data and it is possible for the results to change according to a different data set. This study has shown that necessary measures should be taken by examining a narrower area with the help of spatial analysis in the prevention of fatal and injury accidents. It has been seen that the place examined with the help of statistical analysis of the entire Şişli district is not very effective in taking measures to reduce accidents. For this reason, with this study, which is a guide for the units trying to prevent traffic accidents, it has been seen that the researches made by dividing into regions and using the differentiated analysis give more realistic results for the solution proposals to be produced. This study predicts that making arrangements to take into account the types of vehicles involved in accidents in certain locations together with the day effect will be more effective in reducing fatal and injury accidents. It is thought that the application of a known method in the field of finance in terms of traffic accidents will provide a new perspective for future studies.