April 3, 2017

Big Data: Major Categories of Players [part 1]


By Brielle Huang


Having defined big data in our last blog (see - Big Data: Introduction), the focus on this blog post is to explain the major categories of players in big data.
In order to understand the big data players, we extracted data from Thomson Reuters for all companies  that have the term  “big data” in their business description. There were 533 firms found in the world, and 247 firms found in North America. Since a large part of the data set for the world was incomplete (e.g., missing firm inception date), we decided to focus on the smaller North American data set instead.
Looking at the chosen data set from a high level, an interesting phenomenon appeared. There are quite a few companies (18% of the total population) in the data set which were founded before the year 2000. However, you may remember from the last blog post that the idea of big data was only introduced in the late 1990s/early 2000s. So why would there have been companies relating to big data which were started as early as 1953?
The answer is that while these older companies now have a focus on big data, it is likely that their business did not incorporate this concept at the founding date. To corroborate this assumption, we looked to see if we could find the date that these existing firms revised their business description to accommodate for the growth in big data. We conducted a search of Google News to show when articles about the company in question and big data started appearing in articles together. As we can see from  the following graph, while same firms updated their business focus to reflect the growing interest in big data as early as 2000, the majority of these older players started to accommodate for big data around 2012.


The data above takes the form of a hype cycle. If the year 2000 can be taken as the approximate commencement of big data and the peak of inflated expectations occurred in 2012, then according to Stratopoulos (2016) standard deviation for big data is around 6 years. This indicates that we will enter mainstream adoption around 2018.
Similarly, a graph focusing on the inception days of new firms - shown below - demonstrates a similar spike occurred around 2012.
As we can see from the two graphs, the dates for when older companies changed their focus to big data is consistent with the date for when big data-related start-ups were founded. The data collected therefore indicates that in terms of time of adoption, existing companies and start-ups are showing the same pattern. Hence, the conclusion/recommendation that we are likely to enter mainstream adoption stage within the next couple of years.
In the next blog, we are going to use a “production function” approach to further understand the big data players.
Brielle Huang is a third year Accounting and Financial Management Student minoring in Legal Studies at the University of Waterloo. She is working as a Research Assistant under Professor Stratopoulos and researching emerging technologies, with a focus in Big Data. Brielle has completed her first co-op term in Assurance at PwC. Her other interests include creative writing and travelling.

November 29, 2016

Big Data: Part One - Introduction


By Brielle Huang

Opportunities to leverage  big data are almost as immeasurable as the amount of big data. As more and more success stories of innovative users become available, corporations, institutions, and governments are starting to realize the need to understand big data. The following series of blog posts are meant to educate the reader on the concept of big data, the key players in big data (i.e., demand and supply side of big data, as well as mediators), and the diffusion of big data.

Technological changes in the last 20 plus years have changed our notion of what data is. For example, in the past accountants have associated data with transactions (e.g. sales and cash receipts). With the introduction of the Internet, this concept began to change. The new tech companies (e.g., Amazon, Google, and eBay) brought with them new sources and types of data. The sales transaction was transformed. What used to be, for most of transactions, a single data point became a cluster of data including among others the customer’s name, credit card, and address, as well as the customer’s browsing pattern (web traffic and click through rate). Later in the early 2000s, as social network firms like Facebook and Twitter came into existence, the ability to capture information related to a user’s online activities created a fundamental change in the type of information companies sought to capture. Instead of only providing binary information like whether the user was online at a certain time, Web 2.0 gave companies the ability to capture valuable information about preferences and human interactions. With the continued progression of technology, currently we are in the process of supplementing networks of humans with networks of machines - a phenomenon known as the Internet of Things (IoT). The IoT could provide us with more data about the product that customers buy.

The problem with keeping data is that it costs money to store it. Luckily, as technology such as cloud computing has improved, storage capacities have increased and the cost of storing has become cheaper. Paraphrasing Parkinson’s First Law in his 1980 speech  I.A. Tjomsland quipped that as storage became cheaper, the ability to retain large amounts of data which could become useful to the future increased. This points to the importance of making sure that captured/stored data are leveraged to create value.

In order to proceed with the analysis of the life of the big data phenomenon, it is necessary to first come up with a reasonable definition of big data. This first blog post will aim to do just that. Throughout the years, many experts have come up with their own definition of big data. We will attempt to look at the similarities between them in order to come up with our own definition.

Cox and Elseworth (1997):
“Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data.”
SAS:
“Big data is a term that describes the large volume of data - both structured and unstructured - that inundates a business on a day-to-day basis.”

In simple terms both of the above definitions can be understood by referencing tools such as MS Excel and MS Access. Both of these tools have been used extensively in business settings. Therefore, big data could be visualized as a data set that could force the limits of these tools.  For example, Microsoft Access only has a capacity of 2GB and Excel has a maximum of about a million lines.

Gartner:
“Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”

High volume means that the data sets are much larger than the data sets we are used to - for example, there are almost 2 billion users on Facebook and the data sets that contain their profile and information would be enormous. This data would be impossible to organize in an a database management system like MS Access. High velocity means that the data is being captured at an incredibly high speed - for example, throughout a day one Facebook user could be liking upwards of a dozen photos, posts, and news articles. High variety means that the data is not structured and so cannot be easily processed - for example, you would be hard-pressed to capture the information from Tweets on an Excel sheet. In general very large data sets that combine structured and unstructured data are likely to exceed the storage and analyzing capability of traditional software applications like MS Access and Excel.

From the above definitions, several themes emerge:
  1. Big data indicates a high volume of data
  2. Big data is a continual flow of data, which means it must be processed in a timely manner or the business’ systems will become inundated
  3. The data collected can be structured or unstructured

While big data is one of the emerging technologies with far reaching implications for  businesses, we would like to make sure that our readers realize that it differs from several other technologies. In fact, big data is not necessarily linked to a particular supplier/vendor or a patented technology. Unlike systems like Enterprise Resource Planning (ERP) or Customer Relationship management (CRM), big data is not as simple as a system that someone can patent. This is mainly due to the third theme we distilled above: structured and unstructured data must first be organized and processed before it can be analyzed and/or stored. Due to the high volume of complexity of the data, these steps would probably need to be conducted by different players who specialize in each step. The significance of this “unpatentable” phenomenon as well as the three themes discussed above will be utilized in our analysis of big data in our next blog post, which will be about the major players in big data - namely the demand, supply, and mediation sides.

Brielle Huang is a third year Accounting and Financial Management Student minoring in Legal Studies at the University of Waterloo. She is working as a Research Assistant under Professor Stratopoulos and researching emerging technologies, with a focus in Big Data. Brielle has completed her first coop term in Assurance at PwC. Her other interests include creative writing and travelling.

Sources:
- Cox, M., & Ellsworth, D. (1997). Application-controlled demand paging for out-of-core visualization. In Proceedings of the 8th IEEE Visualization ’97 Conference (p. 235-). IEEE Computer Society Press. Retrieved from http://dl.acm.org/citation.cfm?id=266989.267068
- Press, G. (2013, May 9). A Very Short History Of Big Data. Retrieved October 31, 2016, from http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/

September 28, 2016

Blockchain Technology Conference


I will be presenting my work on blockchain technology adoption at the UWCISA - Blockchain conference.

Blockchain Technology Conference 
for finance, accounting and auditing professionals

September 30, 2016 -  8:30 am-5:30 pm

St. Andrew’s Conference Centre, 
150 King St. W, 27th Floor, 
Toronto, ON M5H 1J9 Canada

You more details about the conference/program visit the following URL: 

For a preview of my presentation please visit the following URL:  



September 21, 2016

School of Accounting and Finance at 2016 Accounting IS Big Data Conference


The following work

Audit Data Analytics Survey: 
Current State and Future Directions

by

Clark Hampton  & Theo Stratopoulos

was presented at the 2016 Accounting IS Big Data Conference.



A copy of the presentation is available from the following link:
Audit Data Analytics Survey: Current State and Future Directions

The work was sponsored by CPA Canada and the Audit Data Analytics task force. For information about the work of the task force visit the following link:
Audit Data Analytics Committee

For more details about the conference program visit the following link:
2016 Accounting IS Big Data Conference

July 23, 2016

Week 13 - Epilogue


In the last two weeks our focus has been on the value of IT investments and how to ensure that we maximize the value from these investments. Another topic that deserves attention is that of IT budgets and IT portfolio management. Investment like the one undertaken by WHR constitute a substantial request for financial and human resources. Therefore, one of the  final course objectives will be to review corporate budgets (pros and cons), the IT budgeting process & factors affecting levels of IT budgets, and IT portfolio management.

We will close the course with a look at some of the basic principles behind IT governance.

Topics and Readings for Week 13
Theory: Chapter 9 (IT budgets and IT portfolio management) and Chapter 10 (IT Governance).

Seminar: There is no seminar this week.

Assignments for Week 13
There is no online quiz. I will post a practice quiz based on material from Chapters 9 and 10.

July 18, 2016

Week 12 - Risk Analysis and Monitoring of IT Investments


The NPV (expected payoffs) of a technology investment depends on the assumptions made by managers involved in the adoption of the new technology. These assumptions reflect their expectations regarding implementation (budget, time, and functionality), expectations regarding users’ ability to extract value from the new technology (increase sales or contain cost), as well as expectations regarding reactions of competitors and trading partners.


In previous chapters, we focused on factors that may affect these assumptions, such as business and IT strategy, IT capability, stage of technology adoption, and industry structure. This week, we shift our focus to implications of failure to meet these assumptions (risk analysis) on NPV. We will use this risk analysis to make recommendation on  how to monitor the progress during implementation and value extraction. The aim of the monitoring process is to introduce an element of accountability and ensure that proper action would be taken to ensure the maximization of the value of the technology investment.


Topics and Readings for Week 12
Theory: We will focus on Chapter 8 (monitoring IT investments). The primary focus of this chapter is on the idea of causality (cause and effect) as the foundation for the creation of a balanced scorecard for technology implementation and use.


Seminar: We will use the Whirlpool case study to review the NPV analysis and  perform a risk analysis (sensitivity analysis). We will use data based on different scenarios to generate predictions regarding the expected effect of these scenarios on the NPV of the project.


Assignments for Week 12
The online quiz will be a based on chapters 8  and material covered in the seminar. The quiz will be available on Friday at 12:30 pm.

The team project is due on Monday at 8:30 am. Please upload your project file as well as your data and R-script in the appropriate Learn dropbox.

The peer evaluation for the Team Project will become available on Tuesday - July 19 and remain open till July 26th. Please read the syllabus and make sure you understand the implications of the peer evaluation on the grade of your peers before you submit your evaluation. The submission of peer evaluation is a course requirement.

Prof. Stratopoulos

July 9, 2016

Week 11 - Evaluation of IT Investments


We started our course with the discussion about a firm (WHR) that wants to perform capital budgeting analysis of a major information technology investment. An investment in an enterprise system. Our objective has been on how to approach this evaluation from a strategic standpoint. We started by looking at technology innovation adoption and how the position that a firm takes will affect the expected payoffs (chapter 1). We combined Rogers’ innovation adoption theory with Gartner’s Hype cycle to estimate expected payoffs and duration of competitive advantage (chapter 2).


We learned that technological innovations have the potential to disrupt the competitive landscape and it is important for a firm to consider its technology related investments in the context of its strategic priorities (chapter 3). We used information generated in financial reports to make sense of business strategy and industry structure, and looked at how technology adoption (such as data analytics) can shape the competitive position of adopting firms (chapter 2, 3, and 4).


For firms to leverage data analytics they must have access to data, and this justified the need to have at a minimum a basic understanding of database theory (chapter 5). We used the examples of database schemas capturing business processes of different firms as a way of envisioning the foundation of an enterprise system (chapter 6). This brought the realization that even though enterprise systems constitute a mature technology today, they are critical for any firm that wants to leverage data analytics, because they are the source of all internal data (chapter 6).


Understanding the importance of implementing or upgrading a firm’s enterprise systems, brought us back to the place we started, i.e., perform the capital budgeting analysis for WHR (chapter 7), which is the topic of Week 11.


Topics and Readings for Week 11
Theory: We will focus on Chapter 7 (evaluation of IT investments). The primary focus of this chapter is on the expected benefits from IT investments and use of Real Options in assessing the value of multistage projects.


Seminar: We will use the mini-case of BlueBikes in order to understand the foundation of the capital budgeting setting used in the WHR case. Read carefully the part of the chapter that will help you understand the BlueBikes case so you can replicate the WHR case. While we will spend most of our time doing the capital budgeting analysis, I will reserve some of the seminar time to show you how to leverage some of the advanced functionality in R (e.g., neural networks) to forecast sales based on Compustat data.


Assignments for Week 11

The online quiz will be a based on chapters 7, the WHR case,  and material covered in the seminar. The quiz will be available on Friday at 12:30 pm.