How PayPal Works: Guide For Users Essay Sample For College

PayPal, an online money transfer service established in 1998, has since become the primary form of money transfers, overtaking the traditional system of money orders and checks. With a total income of $7.9 billion, the service allows users to deposit, withdraw, and transfer funds with 26 different currencies globally. Peter Thiel and Elon Musk, the founders of PayPal, formed the cooperation after merging their sector-specific companies and Confinity. The two firms merged to create PayPal known today: a website that grants users the ability to make financial transactions through encrypted software, which guarantees their transactions’ safety for mobile phones and desktops. Although the concept of PayPal is simple, a variety of tools, types of accounts, and transaction types are present. One such example of the widespread usage of PayPal is when making and receiving online bids on websites that host online auctions like e-bay (its parent company). PayPal also enables users who have an account to make donations, purchase goods or services, and exchange cash with friends and family.

This guide serves to help readers understand how PayPal works.

Step 1: Type in the search engine box and click to access the site.

PayPal’s main page.
Figure 1. PayPal’s main page.

Step 2. Warning! Click the lock sign that confirms that your connection is secure.

 The arrow points to the lock symbol that should be present.
Figure 2. The arrow points to the lock symbol that should be present.

Step 3: Determine if the text box reads “Connection is secure” in green letters.


Some websites will appear as legitimate sites at first glance. If caution is not exercised, users could unintentionally give away private information that can allow hackers access to an official PayPal account. Therefore, it is imperative to press the lock located to the right of the URL.

Ensuring that the connection is secure.
Figure 3. Ensuring that the connection is secure.

Step 4: Press the log-in button- or the sign-up button if there is no account yet.

Sign Up and Log In buttons are located at the right, upper corner of the page.
Figure 4. Sign Up and Log In buttons are located at the right, upper corner of the page.

Step 5: Enter the email of choice along with a secure password and then press Login.

Entering account information.
Figure 5. Entering account information.

Step 6: Select the blue button labeled “Send,” which can be found on the right-hand side closer to the bottom corner of the page.

Setting sending details.
Figure 6. Setting sending details.

Step 7: Type in the name, email address, mobile phone number, or username of the intended recipient into the request bar under “Send Money.”

Ensuring that the recipient also has an account on PayPal.
Figure 7. Ensuring that the recipient also has an account on PayPal.

Step 8: Type in the desired amount of money for the transaction in the space found below the account name. Underneath the space, verify that it is the correct currency. Note that the default currency is USD (United States Dollar). If not, click on the button and select the desired currency.

Currency and money amount settings.
Figure 8. Currency and money amount settings.

Step 9: Ensure it is going to Friends and Family by clicking “Sending to a Friend” and pressing continue.

Warning! The fees are different depending on a payment purpose: detailed information is available here

Sending to a friend page.
Figure 9. Sending to a friend page.

Step 10: Make sure funds were sent with success by verifying that there is a giant green checkmark in the middle of the webpage; this will complete the process of sending funds via PayPal.

The final transaction verification page.
Figure 10. The final transaction verification page.

Adidas Services Marketing Strategy


The presented service idea is strongly associated with the global pandemic, which has greatly influenced the international and domestic markets (Texel, 2020). This paper proposes a service marketing strategy based on these factors and the characteristics of the new service idea presented in the previous paper. It provides a critical evaluation of this strategy, as well as an explanation and design of the new service promotion by applying the extended marketing mix (7Ps) and a detailed promotional plan. The paper also considers and explains what skills and knowledge are needed for a service manager.

Marketing Strategy

Critical Evaluation

The proposed idea involves introducing a new Adidas product personalized selection and delivery service based on a specific type of activity of a customer within a mobile application. This implies that the new service will offer unique value to clients, as people will have the opportunity to choose clothes, check how they look and fit using a simulation, and buy them without leaving home. For a new mobile application service to work effectively, it is necessary for people to be engaged, and an effective marketing strategy should be used (Texel, 2020).

In this context, it is possible to state that the Value-Based Strategy is less suitable for this case. The reason is that customers’ willingness to pay a high price for this service is not known yet in spite of the company’s reputation as the service is new (Lopez, 2020). In order to effectively promote a new application, more attention should be paid to examining customers’ expectations.

It is also important to note that the Services Theatre approach is also an irrelevant marketing strategy for this situation, as performance and human relationships play a supporting role in this case (Baker, 2007; Grove and Fisk, 2004). The proposed service takes place entirely in the digital space, which also makes this strategy inappropriate. Customer-Dominant Logic (CDL) is associated with a focus on customer-related factors, such as entertainment and personal experiences (Baker, 2007; Tynan, McKechnie and Hartley, 2014). Still, CDL creates a risk that the inappropriate selection of target customers for the offer can prevent the service from its further development as a provider’s logic should also be taken into account.

Service-Dominant Logic (SDL) seems to be a more appropriate framework in this regard, as it implies active customer involvement in creating service value, but it addresses the weaknesses of CDL. According to Werner, Griese, and Hogg (2017), this approach is in opposition to the traditional Goods-Dominant Logic (GDL). The focus shifts from the intrinsic value of specific goods to a constructivist approach to value creation within certain market conditions. This means that the creation of the service’s value takes into account the socio-cultural features inherent to a particular period and society (Werner, Griese and Hogg, 2017). This characteristic of SDL fully corresponds to the proposed service idea, which is the company’s response to the specific global situation.


The SDL approach has many strengths because of being oriented to clients, accentuating the value-in-use. According to Vargo and Lusch (2006), the service provider can make value propositions and allow customers to decide what is essential for them. These propositions can be similar around the globe, and at the same time, the promoted values are quite evident during the pandemic, as people cannot leave home and try new clothes on.

Offering an innovative mobile application for buying products and applying the SDL approach, Adidas will refer to the experiences of customers to co-create the value and characteristics of the product. The factor of customization will be addressed, and the flexibility of the SDL approach will allow for making necessary arrangements in the application because the marketing strategy is easily modifiable in its nature, which of extreme importance in modern realities (Vargo and Lusch, 2017). Finally, the approach is based on a dialogue with a customer, which is positively associated with customer satisfaction (Vargo and Lusch, 2017).

Possible Problems

Possible problems with the marketing strategy are embedded in the nature of the approach and should be considered when creating an implementation plan. First, the implementation of such a marketing strategy requires specialized skills and knowledge due to a high probability of failure (Vargo and Lusch, 2006). Thus, Adidas will need to find talents to implement the strategy at an adequate level, which may be costly. Second, the customer is always a co-creator of the product, which makes it impossible to make long-term predictions for the marketing campaign (Vargo and Lusch, 2006). This implies that the strategy will need to be modified frequently, depending on customers’ feedback (Vargo and Lusch, 2017). Finally, Campbell, O’Driscoll, and Saren (2013) state that SDL is overfocussed on immaterial objects, which may potentially cause problems with tangible entities, such as clothing and the application, which can cause problems. It is important for marketers to realize the SDL strategy in contrast to CDL or GDL because all these approaches are based on the value concept, but only SDL allows for value co-creation without ignoring a provider’s interests.

Extended Marketing Mix Application

The extended marketing mix is a widely used tool by businesses and Marketers that helps to determine a product or brand offering. It is also called the 7Ps, as it is based on understanding and positioning the product, place, price, promotion, people, processes, and physical evidence (Pogorelova et al., 2016). The 7Ps approach is usually used to conduct a marketing audit and identify the strengths and weaknesses of the strategy (Pogorelova et al., 2016). Even though it is vital to assess all the Ps to acquire a holistic picture, the present paper will only focus on four of them in accordance with the priorities of the marketing strategy. According to SDL, physical evidence and price are not primary concerns (Vargo and Lusch, 2017). Moreover, as the place is the Internet, it is also not critical to focus on the physical evidence and price issues. Therefore, the present paper will discuss people, processes, products, and promotion from a unique perspective of applying these components along with the SDL approach.

The service provision process consists of four stages following Service Blueprint: (1) selection, (2) virtual fitting room, (3) expert consultation, and (4) delivery. This sequence should be communicated during the promotion. It will let customers know that the service reflects all the stages of real shopping. During the pandemic and quarantine measures around the globe, it is expected to be valuable for customers to experience the same benefits of real shopping without leaving home. The process is coherent with the SDL, as the process is an intangible entity, which is one of the central values promoted to the customer. The customers, however, will be able to modify the process to fit their individual needs, which is vital, according to SDL (Vargo and Lusch, 2006).

Even though the products remain the same and only the way of distribution changes, the mobile application should also be acknowledged as a new service. As an innovative service, it should be coherent with the values promoted by Adidas and adhere to the strict quality requirements applied to other products. The company considers the quality of the products as one of the central benefits delivered to the customers (Adidas, 2019). Therefore, it is central to ensure and emphasize the quality of the new application. This is also coherent with the SDL, as the quality is an intangible object.

The particular emphasis should also be placed on the fact that people are active participants in the process. According to SDL, customers are co-creators of the product, and their feedback is the driver for improvement (Vargo and Lusch, 2006). Since the target audience of the application are sports enthusiasts around the world, Adidas will only suggest values for the customer, and the people will create their product. The ability to customize the product in accordance with people’s particular needs is central for multi-national companies with highly diverse customers. The opportunity to tailor the application is also vital coherent with SDL.

The above elements of the extended marketing mix refer to the information that will be communicated to the customers through the promotional campaign. It will consist of a description of the service provider’s and customers’ joint participation in creating the relevant value and the benefits acquired. Adidas provides a digital shopping platform as well as innovative tools to bring the online process as close to reality as possible. Customers test this platform, buy products, and share their views on performance and possible recommendations for improvement. In this way, producer-consumer relationships are re-engineered into the new environment.

Promotional Plan

The promotion plan needs to marketing communication required to launch the new service effectively. Moreover, it needs to include any other potential marketing activities that may be taking place in the organization. Figures 1-4 are mockups of the social media campaign and influencers’ posts. Table 1 below demonstrates a detailed promotional plan considering that the launch of the product is July 1, 2020. All the dates in the first column should be shifted considering the actual launch date of the application.

Facebook ad with Leo Messi
Figure 1. Facebook ad with Leo Messi

Instagram ad with Derrick Rose
Figure 2. Instagram ad with Derrick Rose

Influencer post by Selena Gomez
Figure 3. Influencer post by Selena Gomez

Influencer post by Jonah Hill
Figure 4. Influencer post by Jonah Hill

Table 1. Promotional Plan

Date (d/m/y) Type of promotion Target Where? Who? Why? Cost Resources Impact Linked to
01/06/2020 Official announcement General audience Adidas official website Adidas marketing and IT Ensure the announcement of the new service is official $5,000 No specific resources are required Customers will see that the new service is launched when planned. Adidas’s primary promotional campaign
01/06/2020 Mass mailing General audience. Existing customers of Adidas Email Adidas marketing and IT Provide customers with information about launching a new service $10,000 Mass mailing services Customers will become aware of the new service The emails should provide a link to the official announcement
01/07/2020 Social media campaign Sports enthusiasts aged 18-45. Facebook, Instagram, YouTube,



Adidas marketing and IT Introduce benefits of the new application and arouse the desire to use it. $1 million Social media analysis software, social media content management system, Existing and potential customers will become aware of the new service and develop the need to use it. Link to the leaders of opinions should be provided.
01/07/2020 Collaboration with influencers Targeted according to the audience of influencers on Instagram and YouTube Blogs and social media of influencers Bloggers and leaders of opinions in sports and sportswear Reinforce the need to use the application and provide examples of how it can be used. $500,000 No specific resources are required Customers will get additional information, and the company will gather feedback None
01/08/2020 Follow-up social media posts Customers that tried the new application Social media Adidas marketing, IT, and data analysts Gather and promote feedback. $100,000 Survey managers, data analysis software The company will gather feedback to show customer satisfaction None

Required Skills and Knowledge

As an industry executive, I will need to possess certain knowledge and skills to manage the new service effectively. First, I will need a detailed understanding of the process of idea formation, new service development, designing a marketing strategy, and service launch (Zeithaml, Bitner and Gremler, 2009). This course and the tasks presented have provided me with this knowledge to a large extent and allowed me to practice the skill initially. Second, I should acquire an empathic understanding of customers to assess their current needs to improve the ability to serve them effectively (Fisk, John and Grove, 2008). Another crucial skill is the capability to understand users and their needs (Hertog, Aa and de Jong, 2010). I understand my customers; however, I do not yet have enough knowledge about strategies to assess the needs of customers. To a large extent, it is gained with experience and time spent on professional development in a particular industry.

Another essential skill that requires specific theoretical knowledge is the development of service hypotheses and their verification through market research. I need some practice to develop and improve it. Researchers state that “service innovators have to make sure they are informed about the latest options that technologies offer in their industry and related trades” (Hertog, Aa and de Jong, 2010, p. 499). This skill is particularly relevant to the proposed service discussed in this paper. An industry executive should have not only knowledge about service development and marketing strategies but also understand modern innovation technologies related to service promotion. This refers primarily to popular social media and user experience on the Internet.

In summary, before entering the present course, I had little understanding of how a new service is launched. Even though I felt that I have enough skills to understand my clients’ preferences, I did not have the knowledge about efficient strategies to assess customer needs. The present course has granted me with structured knowledge about the process of launching and promoting a new service. It has given me a substantial theoretical background, which I can use for making decisions in the future. I still need to acquire skills to conduct market research and analyze marketing data to adjust my decisions in the future.


It should be concluded that Service-Dominant Logic is the most appropriate for this proposed service idea. This service involves the co-creation of the service value by the producer-consumer communication transferred to the digital space through the mobile application. The core elements of the marketing mix in this regard are the process and promotion that will take place predominantly in social media. Promotional activities will also be carried out in the digital space, which is the environment of the service. This is the most effective and appropriate in the context of quarantine measures.


Adidas (2019) Annual Report 2019. Web.

Baker, M. J. (2007) Marketing strategy and management (4th edn.) Hampshire: Palgrave Macmillan.

Campbell, N., O’Driscoll, A. and Saren, M. (2013) ‘Reconceptualizing resources: A critique of service-dominant logic’, Journal of Macromarketing, 33 (4), pp. 306-321.

Fisk, R. P., John, J. and Grove, S. J. (2008) Interactive services marketing (3rd edn.) Boston: Houghton Mifflin.

Grove, S.J. and Fisk, R. (2004) ‘Service theater: An analytical framework for services marketing’, in Lovelock, C.H. and Wirtz, J. (eds.) Services marketing: Text, cases, and readings. Upper Saddle River: Prentice-Hall, pp. 78-87.

Hertog, P., Aa, W. and de Jong, M. W. (2010) ‘Capabilities for managing service innovation: towards a conceptual framework’, Journal of Service Management, 21 (4), pp. 490-514.

Lopez, S. (2020). Value-based marketing strategy: Pricing and costs for relationship marketing. Wilmington: Vernon Press.

Pogorelova, E., Yakhneeva, I., Agafonova, A. and Prokubovskaya, A. (2016) ‘Marketing mix for e-commerce’, International Journal of Environmental & Science Education, 11 (14), pp. 6744-6759.

Tynan, C., McKechnie, S., and Hartley, S. (2014) ‘Interpreting value in the customer service experience using customer-dominant logic’, Journal of Marketing Management, 30 (9-10), pp. 1058-1081.

Texel (2020) How COVID-19 will reshape fashion retail? Web.

Vargo, S. L. and Lusch, R. F. (2006) Service-dominant logic: What it is, what it is not, what it might be. Web.

Vargo, S. L. and Lusch, R. F. (2017) ‘Service-dominant logic 2025’, International Journal of Research in Marketing, 34 (1), pp. 46-67.

Werner, K., Griese, K. M. and Hogg, J. (2017) ‘Service dominant logic as a new fundamental framework for analyzing event sustainability: A case study from the German meetings industry’, Journal of Convention & Event Tourism, 18 (4), pp. 318-343.

Zeithaml, V. A., Bitner, M. J. and Gremler, D. D. (2009) Services marketing: integrating customer focus across the firm (5th edn.) London: McGraw Hill.

Procedure Of Speaker Recognition


Speaker recognition is a procedure carried out through a computer program that validates the identity claimed by an individual through taking into account certain aspects of human voice. The technical field of speaker recognition attempts to provide a practical and cost effective means for recognizing an individual from audio data available. Thus, it qualifies to be classified as a biometric tool and a very important one at that.

Speaker recognition and other biometric methods attempt to recognize accurately humans through taking into account physical or behavioral qualities that are intrinsic to them. Speaker recognition is termed as a behaviometric, which means that it attempts to recognize humans by taking into account one of their intrinsic behavioral qualities. The behavioral quality being discussed about is the human voice and in particular the acoustic aspects of speech that vary in individuals. According to Dutta and Haubold (2009), the human voice conveys speech and is useful in providing gender, nativity, ethnicity and other demographics about a speaker (422).

Additionally, it also possesses other non-linguistic features that are unique to a given speaker (422). Biometric methods are built on statistical concepts. This means that the field of speaker recognition is related to the field of statistics directly and fundamentally. The basis of speaker recognition technology in use today is predominated by the process of statistical modeling (Chan et al, 2007, p.1884). The statistical model formed is of short-time features that are extracted from acoustic speech signals.The use speaker recognition dates back as much as four decades ago.


As with any scientific field, various terminologies are used in speaker recognition. Enrolment is the term given to describe the first time a person uses a speaker recognition system or any other biometric technology. Enrolment with respect to speaker recognition involves acquisition and storage of an individual’s relevant voice data. To maintain high levels of robustness in speaker recognition systems it is fundamentally important to ensure that the storage and retrieval mechanism is only for authorized use only. The failure to enroll rate (FER) in speaker recognition is a measure of failure with respect to unsuccessful attempts at creating a voice template from a given speaker.

False accept rate abbreviated as FAR is also known as (FMR) false match rate is a production metric employed in the field of amplifier detection over and above other biometric technologies. It is a probability measure of the likelihood of occurrence of an error or anomaly. The error (anomaly) being considered occurs when a speaker recognition system incorrectly matches a voice sample to the wrong template in the audio repository (database).

False reject rate (FRR) which is also referred to as false non-match rate (FNMR) is also another used performance metric in speaker recognition and in biometrics. As in FAR or FMR it is a probability measure of the likelihood of occurrence of an error or anomaly. However, the error (anomaly) being considered here occurs when the speaker recognition system fails to detect a match between the voice sample and the correct template in the audio repository (database).

Automatic speaker recognizers (ASRs) are the term given to applications used for doing speaker recognition. Automatic speaker recognition (ASR) is the process through which a person is recognized from a spoken phrase by the aid of an ASR machine (Campbell, 1997, p.1437). Machines that are used for speaker recognition purposes are referred to as automatic speech recognition (ASR) machines.

Known countries that are using speaker recognition systems include the United States, United Kingdom, Iraq, Germany, Italy, Brazil, Israel, Italy, India, Canada and Australia. Speaker recognition systems are duo-phased, that is, they have 2 phases. Enrolment is the first phase and verification is the second. During the first phase (enrolment), a voiceprint – also known as template or model – is developed from a speaker’s voice. The development involves recording the voice and extracting certain features from it. During the second phase (verification), a voiceprint formed during the initial phase is compared against a speech sample – also known as an utterance – to determine if a match between the two will be realized. Speaker recognition systems are classified into two categories: those that are text-dependent and those that are not (text-independent).

Text-dependent speaker recognition systems require that the text used for doing enrolment and verification be the same. In text-independent speaker recognition systems, the text needs not be the same. Thus, these systems do not necessitate the need for corporation from the user. In fact, if the system is being applied in forensic contexts the enrolment might happen without the consent of the speaker. Speaker recognition systems are designed and developed to operate in two modes. The similarity between the two modes is that a comparison process involving a stored template and a voice sample is done.

The templates are stored in a database meaning the comparison procedure is done using computers. The difference between the two lies in the nature of the comparison procedure. In the verification mode, the comparison process is described as one to one whereas in the verification mode the comparison is one to many. Speaker recognition systems and other biometric methods operate under seven parameters. The first parameter is known as universality, which means that the targeted trait (e.g. voice) should be possessed or exhibited by the relevant individuals. The second parameter is uniqueness and this tells how well the method in use (e.g. speaker recognition) successfully separates individuals.

The third parameter is permanence and this provides a means to determine the integrity of the method (e.g. speaker recognition) with respect to time variances such as aging. The fourth parameter is collectability and this. the fifth parameter is performance and this touches on three aspects of the technology applied e.g. speaker recognition. The three facets employed in determining productivity are heftiness, rate and precision.

The sixth parameter is acceptability and this is a measure of how people are willing to accept the technology suggested e.g. speaker recognition. The seventh parameter, which is known as circumvention addresses the ease of using a biometric technology (e.g. speaker recognition) as a substitute to other technologies. Two factors that are considered when determining a recognizer’s performance are the discrimination power associated with the acoustic features and the effectiveness of the statistical modeling techniques. Overlapped speech (OS) contributes to degrading the performance of automatic speaker recognition systems. Conversations over the telephone or during a meeting possess high quantities of overlapped speech.

The following is a block diagram representing a basic speaker recognition system

A block representation of a speaker recognition system.
Fig. 1: A block representation of a speaker recognition system.

The first block is known as a sensor and it forms the appropriate interface between the speaker recognition system and humans. To create an apt sensor all the necessary audio data has to be collected successfully. The second block in the speaker recognition system is known as a pre-processor. The work of the pre-processor is mainly to remove undesired features from the captured audio. Thus by doing this it enhances the input of the system.

The pre-processor can be perceived as a normalization feature and one of the targeted undesired inclusions are background noises. The third block is known as a feature extractor and it is used in extracting the relevant features needed in developing an audio template. The fourth block is known as a template generator and it used to create the appropriate template.

A template is described as a synthesis of the features extracted in the third block. From the template generator the template can proceed to a repository where it is stored until when it is needed. This happens only when an enrolment is on going. If no enrolment is, taking place the template proceeds directly to a matcher. At the matcher, the template is involved in a test to determine if a match is found between it and an audio sample obtained from a speaker. The result of the match can form the basis of an application e.g. one for gaining entrance in restricted areas.

The field of speaker recognition has attracted a lot of research in recent times aimed at improving the practice. The research has seen various proposals being put forward that present new models (approaches) or that suggest improvements on existing ones. One of the reasons for the increased research activities is the need to develop more flexible, practical, accurate and robust systems. Another reason is to increase the performance of the systems.

Two critical areas in the field that have attracted research are audio classification and segmentation. Audio segmentation, according to Zhang and Zhou (2004), is one of the most important processes in multimedia applications (IV-349). Through audio segmentation, an audio stream is broken down into parts that are homogenous with respect to speaker identity, acoustic class and environmental conditions.

The research activities have been centered on reviewing the algorithms used for carrying out these processes in order to make modifications so as to enable them achieve certain objectives. One of the objectives is to achieve a classification algorithm that exhibits robustness even in noisy environments or backgrounds. Another objective is to develop a segmentation algorithm for multimedia applications that exhibits more accuracy than existing ones. In addition to this, it has also been desirable that in these applications (multimedia) the segmentation can be done over a network, that is, on-line. The fourth objective is to automate the process of speaker recognition through dealing with unsupervised audio segmentation and classification. The fifth objective is to formulate an approach to audio segmentation in the context of practical media..A proposal has been put forward by Chu and Champagne (2006) to aid in achieving the first objective (p.775).

Another proposal has been presented by Zhang and Zhou (2004) aimed at achieving the second and third objective (IV-349). A third proposal has been presented in the work of Huang and Hansen (2006) that is aimed at achieving the fourth objective (p.907). A fourth proposal has been presented by Du et al that aims at achieving the fifth objective (Du et al, 2007, p.I-205).

In the first proposal, Chu and Champagne (2006) state that, to achieve robustness in noisy environments or backgrounds their proposed model posses a self-normalization mechanism (p.775). Their suggestion known as an auditory model is a simplified and improved version of an earlier model. Chu and Champagne’s model is a self-normalized FFT-based model that has been applied and tested in speech/music/noise classification.

Shortcomings addressed in this new model are nonlinear processing and high computational requirements that are dominant in the earlier model. Thus, the auditory model proposed by Chu and Champagne is 99% linear and has significantly reduced computational requirements. The proposed model can be described as a three-stage processing sequence. In this sequence, an acoustic signal undergoes a transformation to become an auditory spectrum. The spectrum is the model’s internal neural representation. The modification targets four of the original processing steps namely pre-emphasis, nonlinear compression, half-wave rectification and temporal integration.

Minimization of computing requirements is achieved though the application of the Parseval theorem which enables the simplified model to be implemented in the frequency domain. The test done to assess the operation and performance of this proposed model is done using a support vector machine (SVM) as the classifier. The results of the test indicate that, indeed, the proposed model is more robust in noisy environments than earlier models (Chu and Champagne, 2006, p.775). Additionally, the results suggest that by reducing the computational complexity, the performance of the conventional FFT-based spectrum is almost the same as that of the original auditory spectrum (p.775).

In the second proposal, Zhang and Zhou (2004) have suggested a 2-step methodology aimed at achieving accurate as well as on-line segmentation. Results obtained from experiments reveal that classification of large scale audio is simpler compared to small scale audio. It is this fact that as propelled Zhang and Zhou to develop an extensive framework that increases robustness in audio segmentation. The first step of the methodology is termed as rough segmentation while the second is referred to as subtle segmentation. The first step (rough segmentation) involves classification on a large scale.

The step is taken as a measure of ensuring that there is integrality with respect to the content segments. This step is crucial in achieving homogeneity in the content segment, which is the main aim of the segmentation procedure. This is because it is in this step that the system ensures that that audio that is consecutive and that is from one source is not partitioned into different pieces. The second step (subtle segmentation) is a locating exercise aimed at finding segment points.

These segment points correspond to boundary regions, which are the output of the first step. Results obtained from experiments also reveal that it is possible to achieve a desirable balance between the false alarm and low missing rate. The balance is desirable only when these two rates are kept at low levels (Zhang & Zhou, IV-349). Earlier algorithms that have been in use and that have attempted to deal with the problem of accurate and on-line segmentation have exhibited two shortcomings. The first is that they are designed to handle classification of features at small-scale levels. The second is that they result in high false alarm rates.

The problem that Huang and Hansen (2006) tackle as presented in the third proposal is that of automating the process of speech recognition and spoken document retrieval in cases, which involve unsupervised audio classification and segmentation (p.907). A new algorithm for audio classification to be used in automatic speech recognition (ASR) procedures is suggested. GMM networks that are weighted form the key feature of this new algorithm. The algorithm includes variance values: the first is VSF and the second is VZCR. The first variance value is determined for spectrum flux whereas the second is determined for the zero-crossing rate.

VSF and VZCR are, additionally, extended-time features that are crucial to the performance of the algorithm. The two values are the criterion for a pre-classification procedure for the audio and additionally attach weights to the output probabilities of the GMM networks. For the segmentation process in automatic speech recognition (ASR) procedures, Huang and Hansen (2006) propose a compound segmentation algorithm (P. 907). As the name suggests the algorithm comprises of multiple features. A 2-mean distance metric and a smoothed zero crossing rate give two out of the 19 features proposed.

A perceptual minimum variance distortionless response (PMVDR) and a false alarm compensation procedure give another two additional features of the algorithm. 14 Filterbank log energy coefficients (FBLC) give 14 out of the 19 features proposed. The 14 FBLCs proposed are implemented in 14 noisy environments where they are used to determine the best overall robust features with respect to these conditions. Turns lasting 5 seconds or below can be enhanced for short segment and in such a case 2-mean distance metric is can be installed. The false alarm compensation procedure has been determined to boost efficiency at a cost effective manner.

A comparison involving Huang and Hansen’s proposed classification algorithm against a GMM network baseline algorithm for classification reveals a 50% improvement in performance. Similarly, a comparison involving Huang and Hansen’s proposed compound segmentation algorithm against a baseline Mel-frequency cepstral coefficients (MFCC) and traditional Bayesian information criterion (BIC) algorithm reveals a 23%-10% improvement in all aspects (Huang and Hansen, 2006, p. 907).

The fourth proposal is the work of Du et al (2007) presents audio segmentation as a problem in practical media such as TV series and movies (P. I-205). TV series, movies and other forms of practical media exhibit audio segments of varying lengths. Short audio segments – those that are 5 seconds long or less – are easily noticeable since they outnumber all the others. Du et al. (2007) has formulated an approach to unsupervised audio segmentation to be used in all forms of practical media.

Included in this approach is a segmentation-stage during which potential acoustic changes are detected. A refinement-stage is also included during which the detected acoustic changes are refined by a tri-model Bayesian Information Criterion (BIC). Results from experiments suggest that the approach possesses a high capability for detecting short segments (Du et al, I-205). Additionally, the results suggest that the tri-model BIC is effective in improving the overall segmentation performance (Du et al, I-205).

Researchers have not been confined to improving audio segmentation and classification only but have to better other areas too. One of these areas is that of speaker discrimination where a proposal has been put forward by Chan et al that introduces a new procedure for undertaking the process (Chan et al, 2007, p.1884). Another area is that of speaker diarization where a proposal by Ben-Harush et al introduces an improved speaker diarization system (Ben-Harush et al, 2009, p.1).

The proposal put forward by Chan et al (2007) is rooted on an analytic study of the speaker discrimination power as it pertains to two vocal features (1884). The two vocal features targeted either relate to the vocal source or conventional vocal tract. The analysis draws a comparison between these two features. The first types of features – those that are related to the vocal source – are known as wavelet octave coefficients of residues (WOCOR).

These types of features have to be taken out of the audio signal. In order to perform the extraction process linear predictive (LP) residual signals have to be induced. This is because the linear predictive (LP) residual signals are compatible with the pitch-synchronous wavelet transform that perform the actual extraction. WOCOR are discriminate in the face of limited quantity of data designed for training and they are less perceptive to verbal content.

These two merits make them appropriate for use in the duty of amplifier segmentation in phone talks (Chan et al, 1884). Such an undertaking requires that statistical amplifier models are established based on short sections of speech, as they exist in such talks. Additionally, experiments have shown that the use of WOCOR causes a noticeable decrease in errors (or anomalies) that occur during or that are linked with the segmentation process (Chan et al, 2007, p.1884).

According to Ben-Harush et al (2009), the problem that speaker diarization systems seek to solve is captured in the question “who spoke and when did the speaking take place? (p.1). Speaker diarization systems are functions that map temporal speech segments in a conversation to the appropriate speaker (Ben-Harush, 2009, p.1). Background noise and other non-speech segments are mapped into the set of non-speech elements. An inherent shortcoming in most of the diarization systems in use today is that they are unable to handle speech that is overlapped or co-channeled. To this end, algorithms have been developed in recent times seeking to address this challenge.

However, most of these require unique conditions in order to perform and necessitate the need for high computational complexity. They also require that an audio data analysis with respect to time and frequency domain be undertaken. Ben-Harush et al. (2009) have proposed a methodology that uses frame based entropy analysis, Gaussian Mixture Modeling (GMM) and well known classification algorithms to counter this challenge (p.1). To perform overlapped speech detection, the methodology suggests an algorithm that is centered on a single feature. This particular attribute is an entropy examination of the acoustic statistics in the time field.

To recognize speech sections that are partly covered the method uses the collective force of GMM (Gaussian Mixture Modeling) and distinguished categorization algorithms. A value of this projected method is that it gets rid of the necessity for a rigid threshold for any particular talk or record. The methodology proposed by Ben-Harush et al is known to detect 60.0 % of frames containing overlapped speech (p.1). This value is achieved when the segmentation is at baseline level (p.1). It is capable of achieving this value while it maintains the rate of false alarm at 5% (p.1).

Campbell has delved into research that has been aimed at improving the process of automatic speaker recognition. Automatic speaker recognition (ASR) systems are designed and developed to operate in two modes depending on the nature of the problem to be solved. The first mode is known as automatic speaker identification (ASI) while the second is known as automatic speaker verification (ASV). In ASV procedures, the person’s claimed identity is authenticated by the ASR machine using the person’s voice. In ASI procedures unlike the ASV ones there is no claimed identity thus it is up to the ASR machine to determine the identity of the individual and the group to which the person belongs. Known sources of error in ASV procedures are shown in the table below

Tab.1: Sources of verification errors.

Misspoken or misread prompted phases
Stress, duress and other extreme emotional states
Multipath, noise and any other poor or inconsistent room acoustics
The use of different microphones for verification and enrolment or any other cause of Chanel mismatch
Sicknesses especially those that alter the vocal tract
Time varying microphone placement

According to Campbell, a new automatic speaker recognition system is available and the recognizer is known to perform with 98.9% correct identification levels (p.1437 Signal acquisition is a basic building block for the recognizer. Feature extraction and selection is the second basic unit of the recognizer. Pattern matching is the third basic unit of the recognizer. A decision criterion is the fourth basic unit of the proposed recognizer.

Finally, Research work by Hosseinzadeh and Krishnan has been aimed at improving that the field of speaker recognition. According to Hosseinzadeh and Krishnan (2007), the concept of speaker recognition possesses seven spectral features. The seven features are used for quantification, which is important in speaker recognition since it is the case that vocal source information and the vocal tract function complement each other. The vocal truct function is determined specifically using two coefficients these are the MFCC and LPCC. MFCC stands for Mel frequency coefficients and LPCC stands for linear prediction cepstral coefficients.

The quantification is important in speaker recognition since it is the case where vocal source information and the vocal tract function complements each other. The vocal tract function determined using two coefficients, which are the Mel frequency cepstral coefficients (MFCC), or linear predication cepstral coefficients (LPCC). Very important in an experiment done to analyze the performance of these features is the use of a speaker identification system (SIS). A cohort Gaussian mixture model which is additionally text-independent is forms the ideal choice of a speaker identification method that is used in the experiment. The results from such an experiment reveal that these features achieve an identification accuracy of 99.33%. This accuracy level is achieved only when these features are combined with those that are MFCC based and additionally when undistorted speech is used.


Considering the level of research that has taken place in recent times in the field of speaker recognition and considering that more needs to be done. It will be useful for researchers in the field to apply the concept of knowledge integration. Information incorporation makes it possible to join various thoughts into a particular organization that is logical and rational. By achieving knowledge, integration an individual or organization is able to, first, make use of available knowledge to formulate solutions to address various problems or challenges that they are facing during growth. Secondly, knowledge integration helps to expose underlying assumptions and inconsistencies through reconciling conflicting ideas.

Thirdly, knowledge integration helps an individual or organization to identify areas with incoherence, uncertainty and in disagreement; it does his through synthesizing different perspectives. Finally, by weaving different ideas together knowledge integration achieves a whole that is better than the total of its part. Thus, researchers are able to create a recognizer that is more practical and cost effective.

The proposals put forward by the researchers should be assimilated to everyday life since they have been tried and found to be significant improvements. The results obtained from the experiments done serve as strong evidence that these methodologies do improve the practice of speaker recognition.

It is also important for the researchers to develop a culture of continuous quality Improvement (CQI). CQI refers to the formal approach applied in analyzing performance as well as improving it. Two commonly used CQI procedures are the Plan-Do-Check-Act (PDCA) system and the Failure Mode and Effect Analysis (FMEA). By applying either of these, the researchers can be assured of developing products that meet the set requirements of performance.


The field of speaker recognition is useful in today’s world considering the challenges people face in terms of security. Thus, the efforts that go into improving it should be noticed and rewarded. Governments and other institutions should fund research and development in this field without shying off. People on the other hand should embrace the technology as it enhances their safety and the safety of their processions.


Ben-Harush, O., Guterman, H. & Lapidot, I. (2009) Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization, pp. 1-6. Israel: Jabotinsky.

Campbell, J. P. (1997) Speaker recognition: a tutorial, 85(9), pp. 1437-1462.

Chan, W. N., Zheng, N. & Lee, T. (2007) Discrimination power of vocal source and vocal tract related features for speaker segmentation, 15(6), pp. 1884-1892.

Chu, W. & Champagne, B. (2006) A simplified early auditory model with application in speech/music classification/music Classification. Mc University, pp. 775 – 778

Du, Y., Hu, W., Yan, Y., Wang, T. & Zhang, Y. (2007) Audio segmentation via tri-model bayesian information criterion, pp. I-205 – I-208. China: Intel China research.

Dutta, P. & Haubold, A. (2009) Audio-based classification of speaker characteristics, pp. 422 – 425. New York; Columbia University.

Giannakopoulos, T., Pikrakis, A. & Theodoridis, S. (2006) A speech/music discriminator for radio recordings using Bayesian networks, pp. V-809 – V.812. Greece: University of Athens.

Huang, R. & Hansen, J. H. L. (2006) Advances in unsupervised audio classification and segmentation for the broadcast news and ngsw corpora, 14(3), pp. 907-919.

Krishnan, S. & Hosseinzadeh, D. (2007) Combining vocal source and mfcc features for enhanced speaker recognition performance using gmms. Canada: Ryerson University.

Swamy, R., Murti, S. K. & Yegnanarayana, B. (2007) Determining number of speakers from multispeaker speech signals using excitation source information, 14(7), pp. 481-484.

Zhang, Y. & Zhou, J. (2004) Audio segmentation based on multi-scale audio classification, pp. IV-349 – IV-352. China: Tsinghua University.

error: Content is protected !!