2024-03-28T22:56:08Zhttps://uir.unisa.ac.za/oai/requestoai:uir.unisa.ac.za:10500/289132022-05-30T09:50:59Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Kanyama, Busanga Jerome
2022-05-30T09:29:54Z
2022-05-30T09:29:54Z
2022-02
https://hdl.handle.net/10500/28913
The underperforming agricultural sector in Sub-Saharan Africa (SSA) has left African countries with
insufficient food production in the face of challenges related to climate change, diseases and
increasing population growth. The agricultural sector is the main source of food, generates income,
employs a large portion of the population, and produces raw materials for agribusinesses. The
improvement of agricultural food production contributes to food security, poverty alleviation, the
development of trade, and a country's economy. The challenges facing the SSA countries include
ineffective farming system, loss of soil fertility, limited access to land, climate change, water scarcity,
outdated production technology that needs to change, restricted market access due to poor
infrastructure, and high transaction costs among others. To address these challenges, the combination
of multiple nutrients was proposed to increase grain yield of crop simply because of the contribution
of each nutrient rather than the use of a single fertiliser.
Research conducted in SSA with the aim of improving food production miss the opportunity to share
the findings across the various sectors. This points out the lack of appropriate statistical techniques to
address the challenges. We can understand better the real situation on food production by developing
a comprehensive scientific and statistical approach that can gather all published single information to
a unified finding. The process of collecting and combining research outputs require the use of meta analysis (MA) to provide precise estimates on various parameters associated with food production.
Various factors can be considered in making significant contribution to agricultural food production
such as fertiliser, access to market, energy use, trade, etc. To establish the diverse set of relationships
that can be developed among the factors, structural equation model (SEM) statistical technique is
used. In some conditions, this procedure can be more restrictive and inflexible since the approach
requires the specification of latent variables in the mix of a huge diversity of sets of variables. In the body of this work, we propose a more suitable, flexible and accurate approach in determining the
number of linear regressions based on the observed data in a clear and precise manner through factor
analysis and principal component analysis (PCA). In addition, to test the large number of variables or
factors of the parameters obtained in SEM, we propose to synthesise all this information by integrating
MA into SEM. The incorporation of MA into SEM allows us to account simultaneously all effects of
factors of the food production in a single model. In MA, the effect sizes are assumed independent
from each study and univariate MA is used. A single study could involve multiple tests of the same
hypothesis, resulting in reporting multiple outcomes (MOs). In such situation, the researcher
developed MOs approach to determine the multiple linear regression model that tested and analysed
the relations between the factors of interests in the food production.
The results of MA were expressed in terms of fixed- and random-effects. The fixed-effects models
were more appropriate simply because of the presence of homogenous effects in the studies. The
random effect models helped to control unobserved heterogeneity when the between-studies variance
was large. It was more productive to apply the combined inorganic fertilizer by the raisin yield grain
of maize. The findings of SEM provide efficient results in the evaluation of the relations among
variables and for testing a statistical theoretical model. The findings from the integration approach of
MA into SEM permitted to combine parameter estimates within a single model. Researchers in
agricultural and related field can use these techniques positively.
We hope that many researchers can benefit from the methodological approach to estimate and draw
inference in addressing the food production situation. The outcomes of this work contribute to science
by providing scientifically comprehensive statistical approaches to evaluate and synthesise the more
suitable results. The benefit can be extended to the development of suitable food production.
en
Combined multiple outcomes
Combined model
Factor analysis
Fixed effects model
Meta-analysis
Multivariate meta-analysis
Principal component analysis
Parameter estimate
Random-effects model
Structural equation model
Proposed statistical techniques for combining parameter estimates: a case of food production in sub-saharan Africa
Thesis
oai:uir.unisa.ac.za:10500/282352021-11-22T05:31:54Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Gundidza, Patricia Tapiwa
2021-11-05T11:34:12Z
2021-11-05T11:34:12Z
2021-10-01
https://hdl.handle.net/10500/28235
The conventional approach in determining HIV risk factors fails to consider the influence behavioural and biological factors may have when modelled jointly. This study investigates the extent of influence behavioural and biological factors jointly have on HIV prevalence. The approach adapted in the modelling includes assessment of multicollinearity among the variables, principal component regression analysis to deal with multicollinearity problem, checking for the presence of confounding factors and significant interaction effects. In determining the joint effect, logistic regression model was fitted to the HIV data obtained from the Zimbabwe Demographic and Health Survey of 2011 (ZDHS, 2010). Besides age, marital status, total number of lifetime sex partners and condom use being significant for both gender, genital discharge and paid for sex for males and place of residence, age at sexual debut, genital sore, relationship with recent partner, educational attainment and STI for females were significant. Significant interaction terms between biological and behavioural factors were revealed and thus demonstrated the importance of being cautious when interpreting the results of joint modelling. Generalised Variance Inflation Factors (GVIF) detected multicollinearity for some variables in both models and Principal Component analysis obtained four factors (sexual relation, residential status, STI status and sexual partnership) for females and three (STI occurrence, sexual relationship and residential status) for males thus addressing the problem. Significant interaction between behavioural and biological factors were noted. The findings suggest meticulous consideration in assessing interrelationships among independent variables and give an appreciation and understanding of how statistical methods can be applied in the public health sector.
en
Logistic regression
Multicollinearity
Confounding
Interaction
Principal Component Analysis (PCA)
Understanding the joint effect of the behavioural and biological risk reduction factors on HIV
Dissertation
oai:uir.unisa.ac.za:10500/188012019-04-15T09:42:08Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Magagula, Sibusiso Vusi
2015-07-10T12:03:30Z
2015-07-10T12:03:30Z
2014-05
Magagula, Sibusiso Vusi (2014) Jump-diffusion based-simulated expected shortfall (SES) method of correcting value-at-risk (VaR) under-prediction tendencies in stressed economic climate, University of South Africa, Pretoria, <http://hdl.handle.net/10500/18801>
http://hdl.handle.net/10500/18801
Value-at-Risk (VaR) model fails to predict financial risk accurately especially during financial crises. This is mainly due to the model’s inability to calibrate new market information and the fact that the risk measure is characterised by poor tail risk quantification. An alternative
approach which comprises of the Expected Shortfall measure and the Lognormal Jump-Diffusion (LJD) model has been developed to address the aforementioned shortcomings of VaR. This model is called the Simulated-Expected-Shortfall (SES) model. The Maximum Likelihood Estimation (MLE) approach is used in determining the parameters of the LJD model since it’s more reliable and authenticable when compared to other nonconventional parameters estimation approaches mentioned in other literature studies. These parameters are then plugged into the LJD model, which is simulated multiple times in generating the new loss dataset used in the developed model. This SES model is statistically
conservative when compared to peers which means it’s more reliable in predicting financial risk especially during a financial crisis.
en
Historical-Simulation VaR model
Jump-Diffusion models
ES model
Coherence
Fat-tailed distribution
HES model
SES model
Jump-diffusion based-simulated expected shortfall (SES) method of correcting value-at-risk (VaR) under-prediction tendencies in stressed economic climate
Dissertation
oai:uir.unisa.ac.za:10500/289552022-06-10T09:30:22Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Mhlongo, Thabani Hopewell
2022-06-10T09:06:40Z
2022-06-10T09:06:40Z
2021-02
https://hdl.handle.net/10500/28955
The premise of this study is that price fluctuations amongst exchange rates, stock and
commodities markets are dynamically linked. The study models monthly price changes
amongst these markets in 20 highest World Bank GDP ranked African economies between
the years 2000 and 2019 using a copula based DCC GARCH framework. The results show
that, there is a time varying co-movement amongst these markets that tend to increase during
times of turbulence in sampled markets. Dynamic relations are found to be quantitatively
and relatively substantial for economies of Egypt, South Africa, Tanzania, Libya and
Zambia. The study also finds a relatively high bivariate association amongst currencies
mainly the South African rand, Botswana pula, Moroccan dirham, CFA franc and Tunisian
dinar.
en
Stock markets
Exchange rates
Crude oil
Gold
Co-movement
Copula
DCC GARCH
GJR
Exponential GARCH
Commodities
Dynamic linkage amongst oil, gold, exchange rates and stock markets in Africa: evidence from volatility of major African economies
Dissertation
oai:uir.unisa.ac.za:10500/289052022-05-30T08:20:28Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Mgcima, Phumzile
2022-05-30T07:56:55Z
2022-05-30T07:56:55Z
2022-01
https://hdl.handle.net/10500/28905
Nowadays human activities produce massive amounts of data everyday. It is estimated that 2.5
quintillions bytes of data are produced daily. The ability to analyse and interpret such data, usually
referred to as ‘big data’, is a precondition to succeed in the 4th Industrial Revolution (4IR). Statistical
data modelling has been a de facto data analysis paradigm for many decades, but it is slowly being
overshadowed by machine learning algorithms in the industry and in research funding. In this research,
the two modelling paradigms were compared with the aim of establishing which one is better in terms of
rational, accuracy and model parsimony. Unlike many studies on this subject which mainly concentrate
on comparing accuracy, this research did not look at accuracy as the only metric of comparison.
Both modelling paradigms were applied in prediction (continuous value prediction), classification
(categorical class label prediction) and clustering problems in three separate case studies. In the
prediction case study, a Realised GARCH (RealGARCH) model was compared to an artificial neural
network (ANN) algorithm. In the classification case study, a linear discriminant analysis (LDA)
model was compared to a support vector machine (SVM) algorithm. Lastly, a Gaussian mixture
model (GMM) was compared to a K-means algorithm. For prediction and classification, the data was
divided into training and testing sets, the training sets were used to fit the models and the testing
sets were used to measure prediction and classification accuracy. For clustering, model validation was
based on bootstrapping, visualisation and distant measures.
The ANN model outperformed the generalised autoregressive conditional heteroscedasticity (GARCH)
variant RealGARCH model in the two accuracy measurements, root mean square error (RMSE)
and mean absolute error (MAE), while RealGARCH gave more insights into the data. SVM had
marginally better classification accuracy in both the two-class and the three-class scenarios but had
poorer F-Measure for the minority classes in the three-class scenario. The statistical models were more
interpretable compared to their machine learning counterparts in both case studies. Both clustering
models performed poorly in partitioning the data in the third case study, but K-means did better
than the GMM model. Understanding the domain problem was found to be essential to data analysis
regardless of the modelling paradigm.
en
Statistical Data modelling
Machine Learning Techniques
Artificial neural networks(ANNs)
GARCH models
Internet of things
Smart home
Comparative study of statistical data modelling with machine learning techniques
Dissertation
oai:uir.unisa.ac.za:10500/298462023-03-08T13:11:45Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Molebatsi, Malebo Tshegofatso
2023-03-04T16:47:06Z
2023-03-04T16:47:06Z
2023-01-25
https://hdl.handle.net/10500/29846
Non-performing loans (NPLs) are detrimental to profits in the banking sector. Predicting the level of NPLs using macroeconomic variables is vital in order to build mitigating actions for such scenarios to safeguard the profitability of the institution. Macroeconomic variables are susceptible to high correlations amongst each other, bringing about the problem of multicollinearity. Predicting in the presence of multicollinearity brings about unreliable and inefficient results. This study aims to find an optimal and efficient way of forecasting NPLs using Ordinary Least Squares (OLS), Ridge Regression (RR) and Principal Component Analysis (PCA) while correcting for multicollinearity. To do this, NPL data from bank X was attained, along with multiple macroeconomic variables, specifically for Kenya and Nigeria. It is critical to assess the determinants of NPLs so that effective and efficient policies can be deployed to prevent the rising trajectory of NPLs. To minimize the risks of using expert judgement, it is necessary to consider effective statistical methods for predicting NPLs. The benefits accrued from such methods include (1) minimum collection costs incurred when a loan defaults, such as less phone calls urging the customers to pay, less litigation costs when trying to recover the assets, less shortfalls incurred when disposing off the assets that have been repossessed and less auction sales if the assets have to be auctioned, to mention a few; (2) correct pricing for the risk; (3) be able to differentiate between high-risk and low-risk accounts based on the macroeconomic factors; and (4) be more prudent in granting credit to minimize losses and maximise profits. This study considers the OLS, RR and PCA in modeling the NPLs data from bank X. The results showed that multicollinearity exists for most variables. Some of the variables did not conform to the assumptions of the OLS. The models for OLS for both countries were significant, while some of the variables displayed illogical outcomes, possibly due to multicollinearity among the predictor variables. RR method solved for multicollinearity and had a relatively predictive power for Nigeria data and not Kenya, whereas PCA solved for multicollinearity and introduced a positive factor in data reduction and had a relatively better predictive power. The mean square errors (MSEs) for PCA and RR were lower than that of OLS. A key limitation was inadequate data from the banking sector due to sensitivity issue. We conclude that the data can be expanded, and the number of variables reduced so that prediction can be more precise. Further work using other methods such as GARCH can be explored to improve the prediction of the NPLs in the midst of multicollinearity.
en
Non-performing loans
Financial institutions profitability
Macroeconomic variables
Ordinary least squares
Multicollinearity
Ridge regression
Principal component analysis
Handling of multicollinearity problem in modelling non-performing loans in Africa's portfolio data
Dissertation
oai:uir.unisa.ac.za:10500/274042021-06-02T09:51:54Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Motsepa, Collen Mabilubilu
2021-06-02T09:19:54Z
2021-06-02T09:19:54Z
2019-12
http://hdl.handle.net/10500/27404
Double sampling procedure is adapted from a statistical branch called acceptance sampling. The first Shewhart-type double sampling monitoring scheme was introduced in the statistical process monitoring (SPM) field in 1974. The double sampling monitoring scheme has been proven to effectively decrease the sampling effort and, at the same time, to decrease the time to detect potential out-of-control situations when monitoring the location, variability, joint location and variability using univariate or multivariate techniques. Consequently, an overview is conducted to give a full account of all 76 publications on double sampling monitoring schemes that exist in the SPM literature. Moreover, in the review conducted here, these are categorized and summarized so that any research gaps in the SPM literature can easily be identified. Next, based on the knowledge gained from the literature review about the existing designs for monitoring the process mean, a new type of double sampling design is proposed. The new charting region design lead to a class of a control charts called a side-sensitive double sampling (SSDS) monitoring schemes. In this study, the SSDS scheme is implemented to monitor the process mean when the underlying process parameters are known as well as when they are unknown. A variety of run-length properties (i.e., the 5th, 25th, 50th, 75th, 95th percentiles, the average run-length (𝐴𝑅𝐿), standard deviation of the run-length (𝑆𝐷𝑅𝐿), the average sample size (𝐴𝑆𝑆) and the average extra quadratic loss (𝐴𝐸𝑄𝐿) metrics) are used to design and implement the new SSDS scheme. Comparisons with other established monitoring schemes (when parameters are known and unknown) indicate that the proposed SSDS scheme has a better overall performance. Illustrative examples are also given to facilitate the real-life implementation of the proposed SSDS schemes. Finally, a list of possible future research ideas is given with hope that this will stimulate more future research on simple as well as complex double sampling schemes (especially using the newly proposed SSDS design) for monitoring a variety of quality characteristics in the future.
en
Double sampling
Monitoring scheme
Statistical process monitoring
Run-length properties
Overall performance measures
Side sensitive
Non-side-sensitive
Estimated parameters
Phase I
Phase II
Design of side-sensitive double sampling control schemes for monitoring the location parameter
Dissertation
oai:uir.unisa.ac.za:10500/281662021-10-14T14:33:58Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Sivhugwana, Khathutshelo Steven
2021-10-13T10:16:28Z
2021-10-13T10:16:28Z
2020-01
https://hdl.handle.net/10500/28166
If solar power is to be integrated into the national grid in large volumes, it should be backed up by accurate short-term solar irradiance forecasting information. This is because system de signers in the solar power markets utilise short-term forecasting information in the early stages of setting up solar power systems to properly design and size the solar power harvesting system such as the photovoltaic (PV) system. However, the unsteady and varying nature (mainly due
to atmospheric mechanisms and diurnal cycles) of solar energy resource is often a hindrance to receiving high-intensity levels of solar radiation at the ground level. As such, there has been a
growing demand for accurate solar irradiance predictions that adequately captures the mixture of linear and nonlinear behaviour in which solar radiation presents itself on the earth’s surface.
Among the time series based forecasting techniques, autoregressive integrated moving average (ARIMA) models have been widely applied in forecasting because of their ability to handle lin ear component embedded in the time series data. On the other hand, artificial neural networks (ANNs) are capable of handling the nonlinear feature in the time series data that cannot be prop erly captured by traditional linear models (e.g. ARIMA models). In this study, we build hybrid
models by blending seasonal autoregressive integrated moving average (SARIMA) models (to cap ture linear component) and neural network autoregression (NNAR) models (to capture nonlinear
component) to form SARIMA-NNAR models which we utilise to model global horizontal solar irradiance (GHI) data collected from RVD-GIZ solar radiometric station located in the Alexandra Bay, Northern Cape, South Africa. Overall, comparative results with four GHI data series show that the SARIMA-NNAR model is superior over the NNAR model and SARIMA model in terms of forecasting performance. A brief exploration of the harmonically coupled neural network au toregression (HCNNAR) models revealed that these models are capable of effectively modelling the inherent periodic sinusoidal component in the solar irradiance data observed on the earth’s surface with some level of accuracy. Hence, the study proposes the use of these models for future
studies on modelling and forecasting solar irradiance data.
en
A hybrid approach to forecasting solar irradiance using ARIMA-Neural Networks Models
Dissertation
oai:uir.unisa.ac.za:10500/209822018-11-17T13:04:13Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Nonyana, Jeanette Zandile
2016-07-12T07:00:23Z
2016-07-12T07:00:23Z
2015-12
Nonyana, Jeanette Zandile (2015) Statistical modeling of unemployment duration in South Africa, University of South Africa, Pretoria, <http://hdl.handle.net/10500/20982>
http://hdl.handle.net/10500/20982
Unemployment in South Africa has continued to be consistently high as indicated by the various reports published by Statistics South Africa. Unemployment is a global problem where in Organisation for Economic Co-operation and Development (OECD) countries it is related to economic condition. The economic conditions are not solely responsible for the problem of unemployment in South Africa. Consistently high unemployment rates are observed irrespective of the level of economic growth, where unemployment responds marginally to changes Gross Domestic Product (GDP). To understand factors that influence unemployment in South Africa, we need to understand the dynamics of the unemployed population. This study aims at providing a statistical tool useful in improving the understanding of the labour market and enhancing of the labour market policy relevancy. Survival techniques are applied to determine duration dependence, probabilities of exiting unemployment, and the association between socio-demographic factors and unemployment duration. A labour force panel data from Statistic South Africa is used to analyse the time it takes an unemployed person to find employment. The dataset has 4.9 million people who were unemployed during the third quarter of 2013. The data is analysed by computing non-parametric and semi-parametric estimates to avoid making assumption about the functional form of the hazard. The results indicate that the hazard of finding employment is reduced as people spend more time in unemployment (negative duration dependence). People who are unemployed for less than six months have higher hazard functions. The hazards of leaving unemployment at any given duration are significantly lower for people in the following categories - females, adults, education level of lower than tertiary, single or divorced, attending school or doing other activities prior to job search and no work experience. The findings suggest an existence of association between demographics and the length of stay in unemployment; which reflect the nature of the labour market. Due to lower exit probabilities young people spent more time unemployed thus growing out of the age group which is more likely to be employed. Seasonal jobs are not convenient for pregnant women and for those with young kids at their care thus decreasing their employment probabilities. Analysis of factors that affect employment probabilities should be based on datasets which have no seasonal components. The findings suggest that the seasonal components on the labour force panel impacted on the results. According to the findings analysis of unemployment durations can be improved by analysing men and women separately. Men and women have different challenges in the labour market, which influence the association between other demographic factors and unemployment duration
en
Unemployment duration
Panel data
Duration dependence
Non-parametric
Semi-parametric
Survival technique
Exit probability
Statistical modeling of unemployment duration in South Africa
Dissertation
oai:uir.unisa.ac.za:10500/250832018-11-29T06:16:07Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Molapo, Mojalefa Aubrey
2018-11-28T11:27:12Z
2018-11-28T11:27:12Z
2017-02
Molapo, Mojalefa Aubrey (2017) Employing Bayesian Vector Auto-Regression (BVAR) method as an altenative technique for forecsating tax revenue in South Africa, University of South Africa, Pretoria, <http://hdl.handle.net/10500/25083>
http://hdl.handle.net/10500/25083
en
Corporate income tax
Personal income tax
Total tax revenue
Forecasting
Bayesian Vector Autoregressive (BVAR)
Error
Trend
Seasonal (ETS)
Autoregressive moving Average (ARMA)
Root mean squared error (RMSE)
Autocorrelation
Employing Bayesian Vector Auto-Regression (BVAR) method as an altenative technique for forecsating tax revenue in South Africa
Dissertation
oai:uir.unisa.ac.za:10500/222832018-11-17T13:06:40Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Kubheka, Sihle
2017-04-13T09:16:07Z
2017-04-13T09:16:07Z
2016-11
Kubheka, Sihle (2016) Long range dependence in South African Platinum prices under heavy tailed error distributions, University of South Africa, Pretoria, <http://hdl.handle.net/10500/22283>
http://hdl.handle.net/10500/22283
South Africa is rich in platinum group metals (PGMs) and these metals are important in providing jobs as well as investments some of which have been seen in the Johannesburg Securities Exchange (JSE). In this country this sector has experienced some setbacks in recent times. The most notable ones are the 2008/2009 global nancial crisis and the 2012 major nationwide labour unrest. Worrisomely, these setbacks keep simmering. These events usually introduce jumps and breaks in data which changes the structure of the underlying information thereby inducing spurious long memory (long range dependence). Thus it is recommended that these two phenomena must be addressed together. Further, it is well-known that nancial returns are dominated by stylized facts. In this thesis we carried out an investigation on distributional properties of platinum returns, structural changes, long memory and stylized facts in platinum returns and volatility series. To understand the distributional properties of the returns, we used two classes of heavy tailed distributions namely the alpha-Stable distributions and generalized hyperbolic distributions. We then investigated structural changes in the platinum return series and changes in long range dependence and volatility. Using Akaike information criterion, the ARFIMA-FIAPARCH under the Student distribution was selected as the best model for platinum although the ARCH e ects were slightly signi cant, while using the Schwarz
information criteria the ARFIMA-FIAPARCH under the Normal distribution. Further, ARFIMA-FIEGARCH under the skewed Student distribution and ARFIMA-HYGARCH under the Normal distribution models were able to capture the ARCH effects. The best models with respect to prediction excluded the ARFIMA-FIGARCH model and were
dominated by ARFIMA-FIAPARCH model with non-Normal error distributions which indicates the importance of asymmetry and heavy tailed error distributions.
en
Long range dependence in South African Platinum prices under heavy tailed error distributions
Dissertation
oai:uir.unisa.ac.za:10500/46812022-02-04T05:49:43Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Kanyama, Busanga Jerome
2011-08-03T10:01:06Z
2011-08-03T10:01:06Z
2011-06
Kanyama, Busanga Jerome (2011) A comparison of the performance of three multivariate methods in investigating the effects of province and power usage on the amount of five power modes in South Africa, University of South Africa, Pretoria, <http://hdl.handle.net/10500/4681>
http://hdl.handle.net/10500/4681
Researchers perform multivariate techniques MANOVA, discriminant analysis and factor analysis. The
most common applications in social science are to identify and test the effects from the analysis. The
use of this multivariate technique is uncommon in investigating the effects of power usage and Province
in South Africa on the amounts of the five power modes. This dissertation discusses this issue, the
methodology and practical problems of the three multivariate techniques. The author examines the
applications of each technique in social public research and comparisons are made between the three
multivariate techniques.
This dissertation concludes with a discussion of both the concepts of the present multivariate
techniques and the results found on the use of the three multivariate techniques in the energy
household consumption. The author recommends focusing on the hypotheses of the study or typical
questions surrounding of each technique to guide the researcher in choosing the appropriate analysis in
the social research, as each technique has some strengths and limitations.
en
Discriminant analysis
Canonical correlation
Correlation matrix
Principal component analysis
Correlation analysis
Theory based on MANOVA
Multivariate analysis of variance (MANOVA)
Factor analysis
Statistical tests
Two-way factor
Statistical assumptions
Partial eta-square
Post hoc text
Factor
Component
Exploratory factor analysis and confirmatory factor analysis
A comparison of the performance of three multivariate methods in investigating the effects of province and power usage on the amounts of five power modes in South Africa
Dissertation
oai:uir.unisa.ac.za:10500/7842018-11-17T13:05:10Zcom_10500_3016com_10500_2736com_10500_128com_10500_21067com_10500_506col_10500_3017col_10500_21068col_10500_507
Ondo, Guy-Roger Abessolo
2009-08-25T10:46:42Z
2009-08-25T10:46:42Z
2009-08-25T10:46:42Z
Ondo, Guy-Roger Abessolo (2009) Mathematical methods for portfolio management, University of South Africa, Pretoria, <http://hdl.handle.net/10500/784>
http://hdl.handle.net/10500/784
Portfolio Management is the process of allocating an investor's wealth to in
vestment opportunities over a given planning period. Not only should Portfolio
Management be treated within a multi-period framework, but one should also take into consideration
the stochastic nature of related parameters.
After a short review of key concepts from Finance Theory, e.g. utility function, risk attitude,
Value-at-rusk estimation methods, a.nd mean-variance efficiency, this work describes a framework
for the formulation of the Portfolio Management problem in a Stochastic Programming setting.
Classical solution techniques for the resolution of the resulting Stochastic Programs (e.g.
L-shaped Decompo sition, Approximation of the probability function) are presented. These are
discussed within both the two-stage and the multi-stage case with a special em phasis on the
former. A description of how Importance Sampling and EVPI are used to improve the efficiency of
classical methods is presented. Postoptimality Analysis, a sensitivity analysis method, is also
described.
en
Approximation schemes
Extreme value theory
Importance sampling
Nested decomposition
Portfolio management
Postoptimality analysis
Progressive hedging
Scenario aggregation
Stochastic programming
Stochastic Quasi-gradient
Value-at-risk
Mathematical methods for portfolio management
Dissertation
oai:uir.unisa.ac.za:10500/48212022-01-21T10:05:18Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Ssekuma, Rajab
2011-09-22T08:41:34Z
2011-09-22T08:41:34Z
2011-06
Ssekuma, Rajab (2011) A study of cointegrating models with applications, University of South Africa, Pretoria, <http://hdl.handle.net/10500/4821>
http://hdl.handle.net/10500/4821
This study estimates cointegration models by applying the Engle-Granger (1989) two-step estimation procedure, the Phillip-Ouliaris (1990) residual-based test and Johansen’s multivariate
technique. The cointegration techniques are tested on the Raotbl3 data set, the World Economic
Indicators data set and the UKpppuip data set using statistical software R. In the Raotbl3 data
set, we test for cointegration between the consumption expenditure, and income and wealth variables. In the world economic indicators data set, we test for cointegration in three of Australia’s
key economic indicators, whereas in the UKpppuip data set we test for the existence of long-run
economic relationships in the United Kingdom’s purchasing power parity. The study finds the
three techniques not to be consistent, that is, they do not lead to the same results. However, it
recommends the use of Johansen’s method because it is able to detect more than one cointegrating
relationship if present.
en
Cointegration
Stationarity
Jansen's methods
Quliaris methods
Nonstationarity
Augmented Dickey-Fuller test
Error-correction model
Unit root
Engle-Granger method
Phillip-Ouliaris methods
Variance ration test
A study of cointegration models with applications
Thesis
oai:uir.unisa.ac.za:10500/303982023-08-16T08:46:39Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Mudhombo, Innocent
2023-08-16T08:26:24Z
2023-08-16T08:26:24Z
2022
https://hdl.handle.net/10500/30398
The importance of robust variable selection and regularization as solutions to the collinearity influential
high leverage points’ adverse effects in a quantile regression (QR) setting cannot be overemphasized,
just as the diagnostic tools that identify these high leverage points. In the literature,
researchers have dealt with variable selection and regularization quite extensively for penalized
QR that generalizes the well-known least absolute deviation (LAD) procedure to all quantile levels.
Unlike the least squares (LS) procedures, which are unreliable when deviations from the Gaussian
assumptions (outliers) exist, the QR procedure is robust to Y-space outliers. Although QR is
robust to response variable outliers, it is vulnerable to predictor space data aberrations (high leverage
points and collinearity adverse effects), which may alter the eigen-structure of the predictor
matrix. Therefore, in the literature, it is recommended that the problems of collinearity and high
leverage points be dealt with simultaneously. In this thesis, we propose applying the ridge regression
procedure (RIDGE), LASSO, elastic net (E-NET), adaptive LASSO, and adaptive elastic net
(AE-NET) penalties to weighted QR (WQR) to mitigate the effects of collinearity and collinearity
influential points in the QR setting. The new procedures are the penalized WQR procedures
i.e., the RIDGE penalizedWQR (WQR-RIDGE), the LASSO penalizedWQR (WQR-LASSO), the
E-NET penalized WQR (WQR-E-NET) and the adaptive penalized QR procedures (the adaptive
LASSO penalized QR (QR-ALASSO) and adaptive E-NET penalized QR (QR-AE-NET procedures
and their weighted versions). The penalized WQR procedures are based on the computationally
intensive high-breakdown minimum covariance determinant (MCD) determined weights and the
adaptive penalized QR procedures are based on the RIDGE penalized WQR (WQR-RIDGE) estimator
based adaptive weights. Under regularity conditions, the adaptive penalized procedures
satisfy oracle properties. Although adaptive weights are commonly based on the RIDGE regression
(RR) estimator in the LS setting when regressors are collinear, this estimator may be plausible for the symmetrical distributions at the ℓ1-estimator (RQ at τ = 0.50) rather than at extreme quantile
levels. We carried out simulations and applications to well-known data sets from the literature
to assess the finite sample performance of these procedures in variable selection and regularization
due to the robust weighting formulation and adaptive weighting construction. In the collinearityenhancing
point scenario under the t-distribution, the WQR penalized versions outperformed the
unweighted procedures with respect to average shrunken zero coefficients and correctly fitted models.
Under the Gaussian and t-distributions, at predictor matrices with collinearity-reducing points,
the weighted regularized procedures dominate the prediction performance (WQR-LASSO forms
best). In the collinearity-inducing and collinearity-reducing points scenarios under the Gaussian
distribution, the adaptive penalized procedures outperformed the non-adaptive versions in prediction.
Under the t-distribution, a similar performance pattern is depicted as in the Gaussian scenario,
although the performance of all models is adversely affected by outliers. Under the t-distribution,
the QR-ALASSO andWQR-ALASSO procedures performed better in their respective categories.
en
Weighted quantile regression
Adaptive LASSO penalty
Penalty
Adaptive E-NET penalty
Regularization
Penalization
Collinearity inducing point
Collinearity hiding point
Collinearity influential points
Some variable selection and regularization methodological approaches in quantile regression with applications
Thesis
oai:uir.unisa.ac.za:10500/283462021-11-30T08:43:52Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Jemal Ayalew Yimam
2021-11-30T08:11:20Z
2021-11-30T08:11:20Z
2019-09
https://hdl.handle.net/10500/28346
Multivariate longitudinal ordinal data are collected for studying the dependence between
multivariate ordinal outcomes, the changes over time and associated determinant factors. This
emanates from the interdependence of the three dimensions of household food security statuses,
the stability of these dimensions over time and the additional contribution of covariates on the
dependence structure.
It is generally known that the random effect models have a lack of population-averaged
interpretation for non-normally distributed outcomes in analysing ordinal data. In this thesis, we
propose an alternative model for analysing multivariate longitudinal ordinal data with application
to the household food insecurity by developing a pair copula construction (PCC) and cumulative
logit marginal distributions-based model using the full maximum likelihood estimation (MLE)
method. The simplified log-likelihood function of the D-vine pair copula multivariate discrete
random variables was obtained with its parameters estimated.
Data were collected from 646 households living in selected rural Woredas of South Wollo Zone
of the Amhara Regional State, Ethiopia from June 2014 to June 2015 three times at six months
interval. Multistage cluster sampling was employed to select representative Woredas and
households. The household food security status was determined using both the quartile score and
composite index. Three distinct pair copula models with cumulative logit version were employed
for multivariate, longitudinal and multivariate longitudinal ordinal data applicable for household
food security.
The first model was employed to assess the dependence between food security dimensions and
their corresponding determinant factors simultaneously. The copula parameter of this model
indicated that household food security dimensions have significant and positive pairwise dependence. The marginal parameters showed that smaller land size, shortage of rainfall,
cultivating once a year, and the presence of disease were positively associated with chronic to
mild food insecurity in all dimensions. Moreover, cold agro-ecology and market price increase
were associated with household food insecurity at availability and accessibility dimensions.
The second model was used to assess the stability of household food security over time and the
determinant factors. The copula parameter revealed that individual household food security status is not stable over time. Moreover, the marginal parameter indicated that presence of crop
disease, market price increase and medium agro-ecology were the significant recurrent factors
for households to have chronic to mild food insecurity throughout the study period. One-time
cultivation per year was the temporal significant factor for household food insecurity.
The third model was developed for measuring the dependence between the three dimensions,
namely, their stability over time, the effects of the covariates both on the dependence structure,
and stability over time simultaneously. The copula parameter of the population-average
cumulative logit model revealed that food security dimensions were positively dependent to each
other and the individual household food security status is not stable over time.
The marginal parameter of this model provided that lower agro-ecology, shortage of rainfall,
presence of cultivation disease, increased market price, use of pesticides, cultivating smaller
types of cereal crops, and cultivating once per year were positively affects the household food in security in availability dimension. On the other hand, lower agro-ecology, increased market
price, herbing small amount of livestock, hot agro-ecology and small farmland size positively
affect the household food insecurity in the accessibility dimension. Furthermore, households
headed by wife, divorced/widowed marital status of the household head, shortage of rainfall, and
small farmland size positively affect the household food in-security in utilisation dimension.
This model provided a population-average interpretation with acceptable computational
challenges in multivariate longitudinal ordinal data analysis. The study suggests that food
security situation analysis is a multidimensional so that over-sighting the three dimensions over
time simultaneously provides detail household food security situation than the single dimension.
The pair copula population-average cumulative logit model addressed all the food security dimensions simultaneously, and the model found computationally effective. Therefore, we
suggest this model to apply for other application areas for not extremely large number of
outcomes and covariates.
en
Food insecurity
Chronically food in-secured
Composite food index
Multivariate ordinal outcomes
Longitudinal ordinal outcomes
Multivariate longitudinal ordinal outcomes
Marginal model
Cumulative logit
Pair copula
Full maximum likelihood
Modelling the stability and determinants of household food insecurity: a multivariate longitudinal ordinal logistic regression approach
Thesis
oai:uir.unisa.ac.za:10500/220672018-11-17T13:06:34Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Sebatjane, Phuti
2017-02-24T10:38:55Z
2017-02-24T10:38:55Z
2016-06
Sebatjane, Phuti (2016) Understanding patterns of aggregation in count data, University of South Africa, Pretoria, <http://hdl.handle.net/10500/22067>
http://hdl.handle.net/10500/22067
The term aggregation refers to overdispersion and both are used interchangeably in this thesis. In addressing the problem of prevalence of infectious parasite species faced by most rural livestock farmers, we model the distribution of faecal egg counts of 15 parasite species (13 internal parasites and 2 ticks) common in sheep and goats. Aggregation and excess zeroes is addressed through the use of generalised linear models. The abundance of each species was modelled using six different distributions: the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-altered Poisson (ZAP) and zero-altered negative binomial (ZANB) and their fit was later compared. Excess zero models (ZIP, ZINB, ZAP and ZANB) were found to be a better fit compared to standard count models (Poisson and negative binomial) in all 15 cases. We further investigated how distributional assumption a↵ects aggregation and zero inflation. Aggregation and zero inflation (measured by the dispersion parameter k and the zero inflation probability) were found to vary greatly with distributional assumption; this in turn changed the fixed-effects structure. Serial autocorrelation between adjacent observations was later taken into account by fitting observation driven time series models to the data. Simultaneously taking into account autocorrelation, overdispersion and zero inflation
proved to be successful as zero inflated autoregressive models performed better than zero inflated models in most cases. Apart from contribution to the knowledge of science, predictability of parasite burden will help farmers with effective disease management interventions. Researchers confronted with the task of analysing count data with excess zeroes can use the findings of this illustrative study as a guideline irrespective of their research discipline. Statistical methods from model selection, quantifying of zero inflation through to accounting for serial autocorrelation are described and illustrated.
en
Aggregations
Autoregressive models
Akaike information criterion
Correlation
Count data
Exponential family
Generalised linear models
Goats
Internal parasites
Hosts
Negative binomial distribution
Overdispersion
Poisson distribution
Sheep
Time series
Zero inflation
Understanding patterns of aggregation in count data
Dissertation
oai:uir.unisa.ac.za:10500/268492021-08-16T08:04:54Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Bengura, Pepukai
2020-11-12T04:46:18Z
2020-11-12T04:46:18Z
2019-12-19
http://hdl.handle.net/10500/26849
The objective of the study was to identify the factors that affect the survival lifetime of HIV+ terminal patients in rural district hospitals of Albert Luthuli municipality in the Mpumalanga province of South Africa. A cohort of HIV+ terminal patients was retrospectively followed from 2010 to 2017 until a patient died, was lost to follow-up or was still alive at the end of the observation period. Nonparametric survival analysis and semiparametric survival analysis methods were used to analyse the data. Through Cox proportional hazards regression modelling, it was found that ART adherence (poor, fair, good), Age, Follow-up mass, Baseline sodium, Baseline viral load, Follow CD4 count by Treatment (Regimen 1) interaction and Follow-up lymphocyte by TB history (yes, no) interaction had significant effects on survival lifetime of HIV+ terminal patients (p-values<0.1). Furthermore, through quantile regression modelling, it was found that short, medium and long survival times of HIV+ patients, respectively represented by the 0.1, 0.5 and 0.9 quantiles, were not necessarily significantly affected by the same factors.
en
Survival time
Follow-up time
HIV/AIDS disease
Antiretroviral therapy
ART adherence
CD4 cell count
Cox proportional hazards regression
Logistic regression
Quantile regression
Kaplan-Meier estimator
Log-rank test
Identification of factors affecting the survival lifetime of HIV+ terminal patients in Albert Luthuli municipality of South Africa
Dissertation
oai:uir.unisa.ac.za:10500/199032018-11-17T13:03:59Zcom_10500_3016com_10500_2736com_10500_128com_10500_21067com_10500_506col_10500_3017col_10500_21068col_10500_507
Makananisa, Mangalani P.
2016-01-28T11:25:31Z
2016-01-28T11:25:31Z
2015-10
Makananisa, Mangalani P. (2015) Forecasting annual tax revenue of the South African taxes using time series Holt-Winters and ARIMA/SARIMA Models, University of South Africa, Pretoria, <http://hdl.handle.net/10500/19903>
http://hdl.handle.net/10500/19903
This study uses aspects of time series methodology to model and forecast major taxes such as Personal Income Tax (PIT), Corporate Income Tax (CIT), Value Added Tax (VAT) and Total Tax Revenue(TTAXR) in the South African Revenue Service (SARS).
The monthly data used for modeling tax revenues of the major taxes was drawn from January 1995 to March 2010 (in sample data) for PIT, VAT and TTAXR. Due to higher volatility and emerging negative values, the CIT monthly data was converted to quarterly data from the rst quarter of 1995 to the rst quarter of 2010. The competing ARIMA/SARIMA and Holt-Winters models were derived, and the resulting model of this study was used to forecast PIT, CIT, VAT and TTAXR for SARS fiscal years 2010/11, 2011/12 and 2012/13. The results show that both the SARIMA and Holt-Winters models perform well in modeling and forecasting PIT and VAT, however the Holt-Winters model outperformed the SARIMA model in modeling and forecasting the more volatile CIT and TTAXR. It is recommended that these methods are used in forecasting future payments, as they are precise about forecasting tax revenues, with minimal errors and fewer model revisions being necessary.
en
SARS
Personal Income Tax (PIT)
Corporate Income Tax (CIT)
Value Added Tax (VAT)
Total Tax Revenue (TTAXR)
Holt-Winters
Autoregressive integrated moving averages
Forecasting annual tax revenue of the South African taxes using time series Holt-Winters and ARIMA/SARIMA Models
Dissertation
oai:uir.unisa.ac.za:10500/187902019-09-13T12:16:17Zcom_10500_3016com_10500_2736com_10500_128com_10500_21067com_10500_506col_10500_3017col_10500_21068col_10500_507
Steyn, Hendrik Stefanus
2015-07-08T09:02:17Z
2015-07-08T09:02:17Z
2014-12
Steyn, Hendrik Stefanus (2014) The use of effect sizes in credit rating models, University of South Africa, Pretoria, <http://hdl.handle.net/10500/18790>
http://hdl.handle.net/10500/18790
The aim of this thesis was to investigate the use of effect sizes to report the results of
statistical credit rating models in a more practical way. Rating systems in the form of
statistical probability models like logistic regression models are used to forecast the
behaviour of clients and guide business in rating clients as “high” or “low” risk borrowers.
Therefore, model results were reported in terms of statistical significance as well as business
language (practical significance), which business experts can understand and interpret. In this
thesis, statistical results were expressed as effect sizes like Cohen‟s d that puts the results into
standardised and measurable units, which can be reported practically. These effect sizes
indicated strength of correlations between variables, contribution of variables to the odds of
defaulting, the overall goodness-of-fit of the models and the models‟ discriminating ability
between high and low risk customers.
en
Practical significance
Logistic regression
Cohen‟s d
Probability of default
Effect size
Goodness-of-fit
Odds ratio
Area under the curve
Multi-collinearity
Basel II
The use of effect sizes in credit rating models
Dissertation
oai:uir.unisa.ac.za:10500/232872018-11-17T13:06:59Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Chaka, Lyson
2017-10-31T13:48:46Z
2017-10-31T13:48:46Z
2016-11
Chaka, Lyson (2016) Impact of unbalancedness and heteroscedasticity on classic parametric significance tests of two-way fixed-effects ANOVA tests, University of South Africa, Pretoria, <http://hdl.handle.net/10500/23287>
http://hdl.handle.net/10500/23287
Classic parametric statistical tests, like the analysis of variance (ANOVA), are powerful tools
used for comparing population means. These tests produce accurate results provided the data
satisfies underlying assumptions such as homoscedasticity and balancedness, otherwise biased
results are obtained. However, these assumptions are rarely satisfied in real-life. Alternative
procedures must be explored. This thesis aims at investigating the impact of heteroscedasticity
and unbalancedness on effect sizes in two-way fixed-effects ANOVA models. A real-life
dataset, from which three different samples were simulated was used to investigate the changes
in effect sizes under the influence of unequal variances and unbalancedness. The parametric
bootstrap approach was proposed in case of unequal variances and non-normality. The results
obtained indicated that heteroscedasticity significantly inflates effect sizes while unbalancedness
has non-significant impact on effect sizes in two-way ANOVA models. However, the impact
worsens when the data is both unbalanced and heteroscedastic.
en
Fixed-effects analysis of variance
Unbalancedness
Heteroscedasticity
Homoscedasticity
Effect size
Eta-squared
Traditional F-tests
Robust tests
Normality
Outliers
Shapiro Wilk’s tests
Impact of unbalancedness and heteroscedasticity on classic parametric significance tests of two-way fixed-effects ANOVA tests
Dissertation
oai:uir.unisa.ac.za:10500/289962022-06-21T13:08:25Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Kgare, Mahlodi
2022-06-21T09:27:27Z
2022-06-21T09:27:27Z
2021-09
https://hdl.handle.net/10500/28996
Policy lapse is a vital component in life insurance as it affects future pricing and impacts the solvency of the life insurer. Accurate prediction of lapse will help the insurers to implement personalised retention strategies based on the model’s outcome. The major contribution of the dissertation is the empirical comparison and benchmark of nine machine learning classifier models (i.e. Decision Tree, Gradient Boost, Random Forest, Support Vector Machine trained with linear kernel, Support Vector Machine trained with polynomial kernels, Neural Network trained with Levenberg-Marquardt, Neural Network trained with backpropagation) with traditional algorithms (i.e., Logistic Regression with forward variable selection and Logistic Regression with backward variable selection) for life insurance lapse predictions. The models’ accuracy was observed over two different insurer datasets with different distributions (Insurer 1 and Insurer 2) and different feature selection methodology namely, Principal Component Analysis (PCA) and Chi-squared. Accuracy, F-measure, sensitivity, specificity, and Receiver Operating Characteristics Curve (ROC) were used as performance measures. The results show the strong prediction ability of ensemble models (Gradient Boost and Random Forest) over single classifiers, and there is a strong indication that suitable parameter tuning and model boosting improve the model performance. The best overall classifier is Gradient Boosting with an accuracy of 92%, 76% and F-measure of 92%, 84% for Insurer 1 and Insurer 2 datasets, respectively. The study recommends the use of ensemble models instead of single model classifiers as they have been proven to work better when predicting life insurance lapses.
en
Decision tree
Generalised linear models
Logistic regression
Lapse
Machine learning
Predicting lapse rate in life insurance using machine learning algorithms
Dissertation
oai:uir.unisa.ac.za:10500/252392019-02-07T01:00:34Zcom_10500_21067com_10500_2736com_10500_128com_10500_506col_10500_21068col_10500_507
Malandala, Kajingulu
2019-02-06T05:54:58Z
2019-02-06T05:54:58Z
2018-04
Malandala, Kajingulu (2018) Analysis of dependence structure between the Rand/U.S Dollar exchange rate and the gold/platinum prices, University of South Africa, Pretoria, <http://hdl.handle.net/10500/25239>
http://hdl.handle.net/10500/25239
Copulas functions are a flexible tool for modelling the dependence structure between variables. The joint and marginal distributions of Copulas are not constrained by the assumptions of normality. This study examines the dependence structure between the gold, platinum prices and the ZAR/U.S.D exchange rate using Copulas. The study found that marginal distributions of Copulas follows the ARMA (1, 1)-EGARCH (1, 1) and ARMA(1, 1)-APARCH (1, 1) models under different error terms including the normal, the student-t and the skew student-t error terms. It used the Normal, the Student-t, the Gumbel, the rotated Gumbel, the Clayton, the rotated Clayton, the Plackett, the Joe Clayton and the Normal time varying Copulas to analyse the dependence structure between returns prices of gold, platinum and ZAR/U.S.D exchange rate. The results showed evidence of a positive strong dependence between the returns prices of gold, platinum and returns on the Rand/U.S.D exchange rate for constant and time varying Copulas. The result also showed a co-movement of exchange rates and gold and platinum prices during a rise or declining prices of gold and platinum. The results imply that fluctuations in gold and platinum prices generate Rand/U.S.D exchange rate volatility.
en
Copulas
ARMA
EGARCH
APARCH
Dependence structure
Exchange rate
Commodity prices
Analysis of dependence structure between the Rand/U.S Dollar exchange rate and the gold/platinum prices
Dissertation