Disambiguate messy place names in python (preferably on local machine)

Disambiguate messy place names in python (preferably on local machine)

I have list with several million place names that come from Flickr profiles. Users provided these placenames as free text, so they look like this:

Roma, Italy Kennesaw, USA Saginaw, MI Rucker, Missouri, USA Melbourne, Australia Madrid, Spain live in Sarnia / work in London, Canada Valladolid, España Italia West Hollywood, United States

I want to disambiguate these place names. I am aware that there is in some cases no straightforward to this solution, but I am willing to live with some false disambiguation and with "no answer" for some of the places. If a place name corresponds to the name of multiple cities, then I want to assign that place to the largest city that it corresponds to.

Yahoo's place finder api would be a good solution to this problem, but I would need to make too many API calls to get through my list, so I'd like a local solution (i.e., one that does not depend on a remote api). Does anyone know of any python libraries that do this kind of thing, or any other local solutions?

(I've also asked this question on stackoverflow.)

You could try the Python library geodict. This has datasets you can download and import to a database - you can check the lists to see if they'd work well or not with your data. It works in two steps:

  1. Extracting names
  2. Matching names to a location in the lists

More details (and another online option in the comments) here.

I assume your best guess is to use a fuzzy algorithm.

Take your local dictionary of place names and administrative units and compare each word and each comma-separated block of text against this dictionary. Assign a score to each match. You might want to use a normalized search to account for spelling mistakes and have an "ignore list" for words like "live" and "work" and "in". Add the score for administrative units to the score of any smaller unit or place name in your matches that lie within this administrative unit.

Tune the scoring function with your results until you are happy. Take the best scoring match.

e.g.: Roma, Italy Roma matches 8 places (score according to size) Roma matches 23 more places with normalization (lower score according to size) Italy matches 4 places + 2 administrative units (COUNTRY, DISTRICT) (score acconding to size) Italy matches 14 more places and units with normalization (lower score according to size) One of the Romas lies in one of your units. -> combine scores

If you tuning is good, you will have given most points to the capital of Italy.

You can use the geotext python library for the same.

pip install geotext

all it takes is to install this library. The usage is as simple as:

from geotext import GeoText places = GeoText("London is a great city") places.cities

gives the result 'London'

The list of cities covered in this library is not extensive but it has a good list.

A commercial offering is Polygon Analytics' geocoder, which exists as SAAS REST API as well as an on-premise, high-performance C++ API (with wrappers for Python, Java and others) to avoid network latency (or for sensitive data).

Its API also provides lat/lon output for mapping.

Try-except decorator for classmethods

This question is related to the Cactus Text Game Engine.

For easy error handling in Cactus, we've put together this small decorator. Essentially, the decorator wraps any class method in a try-except block, which reports errors to a log file if something goes wrong. While it's a small file, I can't help but feel that it's a little monolithic, and slightly ugly as well.

Here's a simple example of the output this function will produce when it encounters an error of any kind.

2.1 Accessing the Scientific Literature

The main means for accessing the scientific literature is through databases of scientific literature and increasingly through open access databases using web services of application programming interfaces (APIs).

Researchers based in Universities will generally be familiar with two of the largest of the commercial databases of the scientific literature, Web of Science/Web of Knowledge from Clarivate Analytics or Elsevier’s Scopus. Open access databases such as PubMed and Crossref (containing metadata on over 96 million publications) are increasingly popular and link to initiatives such as that, at the time of writing, make the full texts of over 113 million publications publicly available. Databases such as Google Scholar are a popular open access source of information on the scientific literature and access to copies of texts while social network sites for researchers such as Research Gate provides a means for scholars to share their research and create shared projects. An important feature of recent developments in scientific publication is a shift in emphasis towards open access publications on the part of researchers and funding agencies. This is reflected in services such as noted above and in services such as Unpaywall which provides a browser plugin to identify open access versions of articles. At present Unpaywall contains links to over 19 million scientific publications. An important aspect of this shift in emphasis towards open access is cross service integration. Thus Unpaywall is based on and resolves article identifiers to the content of Crossref while the commercial Web of Science database provides links to Unpaywall in its results to allow free retrieval of articles. Other important emerging services include tools such as Open Academic Graph which provides access to meta data on over 330 million publications.

As this makes clear the landscape for accessing scientific literature is changing as a result of the rise of web service enabled database and cross-service integration tools. In practical terms this means that access to the scientific literature is no longer entirely dependent on fee based databases.

It is important to emphasise that publication databases normally have strengths and weaknesses in terms of:

  1. Coverage of journals, books and other publications
  2. The languages covered and availability of translations
  3. The range of fields available for analysis (authors, affiliations, titles, abstracts etc.)
  4. The basis of any statistical counts (e.g. counts of citing articles)
  5. The number of records that can be downloaded
  6. The format in which records can be downloaded

These issues impose constraints on what can be searched and downloaded from scientific databases. For example, in our experience Web of Science permits for the downloaded of a wider range of data fields than Scopus, while open access databases enjoy the advantage of being free but are more limited in terms of the data fields that are available and the consistency of coverage, such as abstracts.

When seeking to carry out literature research as part of a wider patent analytics project it is therefore important to consider the strengths and weaknesses of particular databases and to use multiple sources where necessary.

7 Answers 7

A schematic is a visual representation of a circuit. As such, its purpose is to communicate a circuit to someone else. A schematic in a special computer program for that purpose is also a machine-readable description of the circuit. This use is easy to judge in absolute terms. Either the proper formal rules for describing the circuit are followed and the circuit is correctly defined or it isn't. Since there are hard rules for that and the result can be judged by machine, this isn't the point of the discussion here. This discussion is about rules, guidelines, and suggestions for good schematics for the first purpose, which is to communicate a circuit to a human. Good and bad will be judged here in that context.

Since a schematic is to communicate information, a good schematic does this quickly, clearly, and with a low chance of misunderstanding. It is necessary but far from sufficient for a schematic to be correct. If a schematic is likely to mislead a human observer, it is a bad schematic whether you can eventually show that after due deciphering it was in fact correct. The point is clarity. A technically correct but obfuscated schematic is still a bad schematic.

Some people have their own silly-ass opinions, but here are the rules (actually, you'll probably notice broad agreement between experienced people on most of the important points):

Use component designators

This is pretty much automatic with any schematic capture program, but we still often see schematics here without them. If you draw your schematic on a napkin and then scan it, make sure to add component designators. These make the circuit much easier to talk about. I have skipped over questions when schematics didn't have component designators because I didn't feel like bothering with the second 10 k&Omega resistor from the left by the top pushbutton. It's a lot easier to say R1, R5, Q7, etc.

Clean up text placement

Schematic programs generally plunk down part names and values based on a generic part definition. This means they often end up in inconvenient places in the schematic when other parts are placed nearby. Fix it. That's part of the job of drawing a schematic. Some schematic capture programs make this easier than others. In Eagle for example, unfortunately, there can only be one symbol for a part. Some parts are commonly placed in different orientations, horizontal and vertical in the case of resistors for example. Diodes can be placed in at least 4 orientations since they have direction too. The placement of text around a part, like the component designator and value, probably won't work in other orientations than it was originally drawn in. If you rotate a stock part, move the text around afterward so that it is easily readable, clearly belongs to that part, and doesn't collide with other parts of the drawing. Vertical text looks stupid and makes the schematic hard to read.

I make separate redundant parts in Eagle that differ only in the symbol orientation and therefore the text placement. That's more work upfront but makes it easier when drawing a schematic. However, it doesn't matter how you achieve a neat and clear end result, only that you do. There is no excuse. Sometimes we hear whines like " But CircuitBarf 0.1 doesn't let me do that". So get something that does. Besides, CircuitBarf 0.1 probably does let you do it, just that you were too lazy to read the manual to learn how and too sloppy to care. Draw it (neatly!) on paper and scan it if you have to. Again, there is no excuse.

For example, here are some parts at different orientations. Note how the text is in different places relative to parts to make things neat and clear.

Don't let this happen to you:

Yes, this is actually a small snippet of what someone dumped on us here.

Basic layout and flow

In general, it is good to put higher voltages towards the top, lower voltages towards the bottom and logical flow left to right. That's clearly not possible all the time, but at least a generally higher level effort to do this will greatly illuminate the circuit to those reading your schematic.

One notable exception to this is feedback signals. By their very nature, they feed "back" from downstream to upstream, so they should be shown sending information opposite of the main flow.

Power connections should go up to positive voltages and down to negative voltages. Don't do this:

There wasn't room to show the line going down to ground because other stuff was already there. Move it. You made the mess, you can unmake it. There is always a way.

Following these rules causes common subcircuits to be drawn similarly most of the time. Once you get more experience looking at schematics, these will pop out at you and you will appreciate this. If stuff is drawn every which way, then these common circuits will look visually different every time and it will take others longer to understand your schematic. What's this mess, for example?

After some deciphering, you realize "Oh, it's a common emitter amplifier. Why didn't that #%&^[email protected]#$% just draw it like one in the first place!?":

Draw pins according to function

Show pins of ICs in a position relevant to their function, NOT HOW THEY HAPPEN TO STICK OUT OF THE CHIP. Try to put positive power pins at the top, negative power pins (usually grounds) at the bottom, inputs at left, and outputs at right. Note that this fits with the general schematic layout as described above. Of course, this isn't always reasonable and possible. General-purpose parts like microcontrollers and FPGAs have pins that can be input and output depending on use and can even vary at run time. At least you can put the dedicated power and ground pins at top and bottom, and possibly group together any closely related pins with dedicated functions, like crystal driver connections.

ICs with pins in physical pin order are difficult to understand. Some people use the excuse that this aids in debugging, but with a little thought you can see that's not true. When you want to look at something with a scope, which question is more common "I want to look at the clock, what pin is that?" or "I want to look at pin 5, what function is that?". In some rare cases, you might want to go around a IC and look at all the pins, but the first question is by far more common.

Physical pin order layouts obfuscate the circuit and make debugging more difficult. Don't do it.

Direct connections, within reason

Spend some time with placement reducing wire crossings and the like. The recurring theme here is clarity. Of course, drawing a direct connection line isn't always possible or reasonable. Obviously, it can't be done with multiple sheets, and a messy rats nest of wires is worse than a few carefully chosen "air wires".

It is impossible to come up with a universal rule here, but if you constantly think of the mythical person looking over your shoulder trying to understand the circuit from the schematic you are drawing, you'll probably do alright. You should be trying to help people understand the circuit easily, not make them figure it out despite the schematic.

Design for regular size paper

The days of electrical engineers having drafting tables and being set up to work with D size drawings are long gone. Most people only have access to regular page-size printers, like for 8 1/2 x 11-inch paper here in the US. The exact size is a little different all around the world, but they are all roughly what you can easily hold in front of you or place on your desk. There is a reason this size evolved as a standard. Handling larger paper is a hassle. There isn't room on the desk, it ends up overlapping the keyboard, pushes things off your desk when you move it, etc.

The point is to design your schematic so that individual sheets are nicely readable on a single normal page, and on the screen at about the same size. Currently, the largest common screen size is 1920 x 1080. Having to scroll a page at that resolution to see necessary detail is annoying.

If that means using more pages, go ahead. You can flip pages back and forth with a single button press in Acrobat Reader. Flipping pages is preferable to panning a large drawing or dealing with outsized paper. I also find that one normal page at reasonable detail is a good size to show a subcircuit. Think of pages in schematics like paragraphs in a narrative. Breaking a schematic into individually labeled sections by pages can actually help readability if done right. For example, you might have a page for the power input section, the immediate microcontroller connections, the analog inputs, the H bridge drive power outputs, the ethernet interface, etc. It's actually useful to break up the schematic this way even if it had nothing to do with drawing size.

Here is a small section of a schematic I received. This is from a screenshot displaying a single page of the schematic maximized in Acrobat Reader on a 1920 x 1200 screen.

In this case, I was being paid in part to look at this schematic so I put up with it, although I probably used more time and therefore charged the customer more money than if the schematic had been easier to work with. If this was from someone looking for free help like on this web the site, I would have thought to myself screw this and gone on to answer someone else's question.

Label key nets

Schematic capture programs generally let you give nets nicely readable names. All nets probably have names inside the software, just that they default to some gobbledygook unless you explicitly set them.

If a net is broken up into visually unconnected segments, then you absolutely have to let people know the two seemingly disconnected nets are really the same. Different packages have different built-in ways to show that. Use whatever works with the software you have, but in any case, give the net a name and show that name at each separately drawn segment. Think of that as the lowest common denominator or using "air wires" in a schematic. If your software supports it and you think it helps with clarity, by all means, use little "jump point" markers or whatever. Sometimes these even give you the sheet and coordinates of one or more corresponding jump points. That's all great but label any such net anyway.

The important point is that the little name strings for these nets are derived automatically from the internal net name by the software. Never draw them manually as arbitrary text that the software doesn't understand as the net name. If separate sections of the net ever get disconnected or separately renamed by accident, the software will automatically show this since the name shown comes from the actual net name, not something you type in separately. This is a lot like a variable in a computer language. You know that multiple uses of the variable symbol refer to the same variable.

Another good reason for net names is short comments. I sometimes name and then show the names of nets only to give a quick idea what the purpose of that net is. For example, seeing that a net is called "5V" or "MISO" could help a lot in understanding the circuit. Many short nets don't need a name or clarification, and adding names would hurt more due to clutter than they would illuminate. Again, the whole point is clarity. Show a meaningful net name when it helps in understanding the circuit, and don't when it would be more distracting than useful.

Keep names reasonably short

Just because your software lets you enter 32 or 64 character net names, doesn't mean you should. Again, the point is about clarity. No names is no information, but lots of long names are clutter, which then decreases clarity. Somewhere in between is a good tradeoff. Don't get silly and write "8 MHz clock to my PIC", when simply "CLOCK", "CLK", or "8MHZ" would convey the same information.

See this ANSI/IEEE standard for recommended pin name abbreviations.

Upper case symbol names

Use all caps for net names and pin names. Pin names are almost always shown upper case in datasheets and schematics. Various schematic programs, Eagle included, don't even allow for lower case names. One advantage of this, which is also helped when the names aren't too long, is that they stick out in the regular text. If you do write real comments in the schematic, always write them in mixed case but make sure to upper case symbol names to make it clear they are symbol names and not part of your narrative. For example, "The input signal TEST1 goes high to turn on Q1, which resets the processor by driving MCLR low.". In this case, it is obvious that TEST1, Q1, and MCLR refer to names in the schematic and aren't part of the words you are using in the description.

Show decoupling caps by the part

Decoupling caps must be physically close to the part they are decoupling due to their purpose and basic physics. Show them that way. Sometimes I've seen schematics with a bunch of decoupling caps off in a corner. Of course, these can be placed anywhere in the layout, but by placing them by their IC you at least show the intent of each cap. This makes it much easier to see that proper decoupling was at least thought about, more likely a mistake is caught in a design review, and more likely the cap actually ends up where intended when the layout is done.

Dots connect, crosses don't

Draw a dot at every junction. That's the convention. Don't be lazy. Any competent software will enforce this any way, but surprisingly we still see schematics without junction dots here occasionally. It's a rule. We don't care whether you think it's silly or not. That's how it's done.

Sort of related, try to keep junctions to Ts, not 4-way crosses. This isn't as hard a rule, but stuff happens. With two lines crossing, one vertical the other horizontal, the only way to know whether they are connected is whether the little junction dot is present. In past days when schematics were routinely photocopied or otherwise optically reproduced, junction dots could disappear after a few generations, or could sometimes even appear at crosses when they weren't there originally. This is less important now that schematics are generally in a computer, but it's not a bad idea to be extra careful. The way to do that is to never have a 4-way junction.

If two lines cross, then they are never connected, even if after some reproduction or compression artifacts it looks like there maybe is a dot there. Ideally connections or crossovers would be unambiguous without junction dots, but in reality, you want as little chance of misunderstanding as possible. Make all junctions Ts with dots, and all crossing lines are therefore different nets without dots.

Look back and you can see the point of all these rules is to make it as easy as possible for someone else to understand the circuit from the schematic, and to maximize the chance that understanding is correct.

There is another human point to this too. A sloppy schematic shows lack of attention to detail and is irritating and insulting to anyone you ask to look at it. Think about it. It says to others "Your aggravation with this schematic isn't worth my time to clean it up" which is basically saying "I'm more important than you are". That's not a smart thing to say in many cases, like when you are asking for free help here, showing your schematic to a customer, teacher, etc.

Neatness and presentation count. A lot. You are judged by your presentation quality every time you present something, whether you think that's how it should be or not. In most cases, people won't bother to tell you either. They'll just go on to answer a different question, not look for some good points that might make the grade one notch higher, or hire someone else, etc. When you give someone a sloppy schematic (or any other sloppy work from you), the first thing they're going to think is "What a jerk". Everything else they think of you and your work will be colored by that initial impression. Don't be that loser.

Find Free Public Data Sets for Your Data Science Project

Completing your first data science project is a major milestone on the road to becoming a data scientist and helps to both reinforce your skills and provide something you can discuss during the interview process. It’s also an intimidating process. The first step is to find an appropriate, interesting data science dataset. You should decide how large and how messy a dataset you want to work with while cleaning data is an integral part of data science, you may want to start with a clean dataset for your first project so that you can focus on the analysis rather than on cleaning the data.

Based on the learnings from our Introduction to Data Science Course and the Data Science Career Track, we’ve selected datasets of varying types and complexity that we think work well for first projects (some of them work for research projects as well!). These datasets cover a variety of sources: demographic data, economic data, text data, and corporate data.

Ever wonder what a data scientist really does? Check out Springboard’s comprehensive guide to data science. We’ll teach you everything you need to know about becoming a data scientist, from what to study to essential skills, salary guide, and more!

1. United States Census Data

The U.S. Census Bureau publishes reams of demographic data at the state, city, and even zip code level. It is a fantastic dataset for students interested in creating geographic data visualizations and can be accessed on the Census Bureau website . Alternatively, the data can be accessed via an API. One convenient way to use that API is through the choroplethr .In general, this data is very clean, very comprehensive and nuanced, and a good choice for data visualization projects as it does not require you to manually clean it.

2. FBI Crime Data

The FBI crime data is fascinating and one of the most interesting data sets on this list. If you’re interested in analyzing time series data, you can use it to chart changes in crime rates at the national level over a 20-year period . Alternatively, you can look at the data geographically .

3. CDC Cause of Death

The Centers for Disease Control and Prevention maintains a database on cause of death . The data can be segmented in almost every way imaginable: age, race, year, and so on. Since this is such a massive data set, it’s good to use for data processing projects.

4. Medicare Hospital Quality

The Centers for Medicare & Medicaid Services maintains a database on quality of care at more than 4,000 Medicare-certified hospitals across the U.S., providing for interesting comparisons. Since this data will be spread over multiple files and might take a bit of research to fully understand, this could be a good data cleaning project.

5. SEER Cancer Incidence

The U.S. government also has data about cancer incidence , again segmented by age, race, gender, year, and other factors. It comes from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program. The data goes back to 1975 and has 18 databases, so you’ll have plenty of options for analysis.

6. Bureau of Labor Statistics

Many important economic indicators for the United States (like unemployment and inflation) can be found on the Bureau of Labor Statistics website . Most of the data can be segmented both by time and by geography. This large dataset can be used for data processing and data visualization projects.

7. Bureau of Economic Analysis

The Bureau of Economic Analysis also has national and regional economic data, including gross domestic product and exchange rates. There’s a huge range in the different groups of data found here—you can browse by place, economic accounts, and topics—and these groups are organized into even smaller subsets throughout.

8. IMF Economic Data

For access to global financial statistics and other data, check out the International Monetary Fund’s website. There are a few different sets here, so you can use them for a wide range of projects like visualization or even cleaning.

9. Dow Jones Weekly Returns

Predicting stock prices is a major application of data analysis and machine learning. One relevant dataset to explore is the weekly returns of the Dow Jones Index from the Center for Machine Learning and Intelligent Systems at the University of California, Irvine. This is one of the sets specially made for machine learning projects.


The British government’s official data portal offers access to tens of thousands of datasets on topics such as crime, education, transportation, and health. Since this is an open data source with millions of entries, you’ll be able to practice data cleaning across different groupings.

11. Enron Emails

After the collapse of Enron, a free dataset of roughly 500,000 emails with message text and metadata were released. The dataset is now famous and provides an excellent testing ground for text-related analysis. You also can explore other research uses of this dataset through the page.

12. Google Books Ngrams

If you’re interested in truly massive data, the Ngram viewer dataset counts the frequency of words and phrases by year across a huge number of text sources. The resulting file is 2.2 TB! While this might be difficult to use for a visualization project, it’s an excellent dataset for cleaning as it’s nuanced and will require additional research.


If data about the lives of children around the world is of interest, UNICEF is the most credible source. The organization’s public data sets touch upon nutrition, immunization, and education, among others, making for a great resource for visualization projects.

14. Reddit Comments

Reddit released a really interesting dataset of every comment that has ever been made on the site. It’s over a terabyte of data uncompressed, so if you want a smaller dataset to work with Kaggle has hosted the comments from May 2015 on their site.

15. Wikipedia

Wikipedia provides instructions for downloading the text of English-language articles , in addition to other projects from the Wikimedia Foundation. The Wikipedia Database Download is available for mirroring and personal use and even has its own open-source application that you can use to download the entirety of Wikipedia to your computer, leaving you with limitless options for processing and cleaning projects.

16. Lending Club

Lending Club provides data about loan applications it has rejected as well as the performance of loans that it has issued. The free dataset lends itself both to categorization techniques (will a given loan default) as well as regressions (how much will be paid back on a given loan).

17. Walmart

Walmart has released historical sales data for 45 stores located in different regions across the United States. This offers a huge set of data to read and analyze, and many different questions to ask about it—making for a solid resource for data processing projects.

18. Airbnb

Inside Airbnb offers different datasets related to Airbnb listings in dozens of cities around the world. This dataset, given its specificity to the travel industry, is great for practicing your visualization skills.

19. Yelp

Yelp maintains a free dataset for use in personal, educational, and academic purposes. It includes 6 million reviews spanning 189,000 businesses in 10 metropolitan areas. Students are welcome to participate in Yelp’s dataset challenge, giving you quite a few options and an additional incentive for various types of data projects.

20. Google Trends Data

Google has one of the most interesting datasets to analyze. While we’re using “e-learning” in this example, you can explore different search terms and go as far back as 2004. All you have to do is download the dataset into a CSV file to analyze the data outside of the Google Trends webpage. You can download data on interest levels for a given search term, interest by location, related topics, categories, search types (video, images, etc), and more! Google also lists out a large collection of publicly available datasets on the Google Public Data Explorer . Make sure to check it out!

21. World Trade Organization

For students looking to learn through analysis, the World Trade Organization offers many datasets available for download that give students insight into trade flows and predictions. Those with a knack for business insights will particularly appreciate this set this dataset, as it provides tons of opportunities to not only get into data science but also deepen your understanding of the trading industry.

22. International Monetary Fund

This site has several free excel datasets for download on different key economic indicators. From Gross Domestic Product (GDP) to inflation. Taking the data from multiple files and condensing it for clarity and patterns is an excellent (and satisfying!) way to practice data cleaning.

23. U.S Energy Information Administration Open Data

This source has free and open data that is available in the bulk file, in Excel via the add-in, in Google Sheets via an add-on, and via widgets that embed interactive data visualizations of EIA data on any website. The website also notes that the EIA data is available in machine-readable formats, making it a great resource for machine learning projects.

24. TensorFlow Image Dataset: CelebA

For practice with machine learning, you’ll need a specialized dataset such as TensorFlow. The TensorFlow library includes all sorts of tools, models, and machine learning guides along with its datasets. CelebA is an extremely large, publicly available online, and contains over 200,000 celebrity images.

25. TensorFlow Text Dataset

Another TensorFlow set is C4: Common Crawl’s Web Crawl Corpus . Available in 40+ languages, this open-source repository of web page data spans seven years of data, making for an excellent resource for machine learning dataset practice.

26. Our World In Data

Our World In Data is an interesting case study in open data. Not only can you find the underlying public data sets, but visualizations are already presented in order to splice up the data. The site mainly deals with large-scale country-by-country comparisons on important statistical trends, from the rate of literacy to economic progress.

27. Crypto Data Download

Do you want some insight into the emergence of cryptocurrencies? Cryptodatadownload offers free public data sets of cryptocurrency exchanges and historical data that tracks the exchanges and prices of cryptocurrencies. Use it to do historical analyses or try to piece together if you can predict the madness.

28. Kaggle Data

Kaggle datasets are an aggregation of user-submitted and curated datasets . It’s a bit like Reddit for datasets, with rich tooling to get started with different datasets, comment, and upvote functionality, as well as a view on which projects are already being worked on in Kaggle. A great all-around resource for a variety of open datasets across many domains.

29. Github Collection (Open Data)

GitHub is the central hub of open data and open-source code. With different open datasets that are hosted on GitHub itself (including data on every member of Congress from 1789 onwards and data on food inspections in Chicago), this collection lets you get familiar with Github and the vast amount of open data that resides on it.

30. Github (Awesome Public Data sets)

The Awesome collection of repositories on Github is a user-contributed collection of resources. In this case, the repository contains a variety of open data sources categorized across different domains. Use this resource to find different open datasets—and contribute back to it if you can.

31. Microsoft Azure Open Datasets

Microsoft Azure is the cloud solution provided by Microsoft: they have a variety of open public data sets that are connected to their Azure services. You can access featured datasets on everything from weather to satellite imagery.

32. Google BigQuery Datasets

Google BigQuery is Google’s cloud solution for processing large datasets in a SQL-like manner. You can have a preview of these very large public data sets with the subreddit Wiki dedicated to BigQuery with everything from very rich data from Wikipedia, to datasets dedicated to cancer genomics.

33. SafeGraph Data

SafeGraph is a popular source for all things location data. While their data is not free to everyone, academics can download the data for free for locations in the U.S., Canada, and the UK via the SafeGraph Shop.

This data is great for economists, social scientists, public health researchers, and anyone who is interested in knowing where a location is and how people move between these locations. It seems to be popular since SafeGraph data has been used in over 600 academic papers.

Is data science the right career for you?

Springboard offers a comprehensive data science bootcamp. You’ll work with a one-on-one mentor to learn about data science, data wrangling, machine learning, and Python—and finish it all off with a portfolio-worthy capstone project.

Not quite ready to dive into a data science bootcamp?

Springboard now offers a Data Science Prep Course, where you can learn the foundational coding and statistics skills needed to start your career in data science.

PTV Vissim - Frequently Asked Questions (FAQs)

1) Register PTV Vissim as COM server (try with administrator credentials):
Help > Register COM Server
2) Check the used class ID to meet the correct release and architecture:
c:UsersPublicDocumentsPTV VisionPTV Vissim 10Examples TrainingCOMBasic CommandsCOM Basic Commands.bas
3) Install the current service pack.
4) Use instead of late binding ('CreateObject') early binding ('Dim Vissim As New VISSIMLIB.Vissim') with a set reference to the VissimXXX.exe (Excel VBA editor: Extras > References)
Example test scripts:

I want to define an external driver model. I have generated the drivermodel.dll but am getting the error 'Loading DriverModel dll 'C:. DriverModel.dll' failed. Windows error message: %1 is not a valid Win32 application.' What is the problem?

There is likely a 32/64-bit mismatch. After compiling the DLL in the bit-version matching that of PTV Vissim, you should no longer receive this error.

A script started over COM has crashed with an error message:'Could not start the script component'.

- Make sure Python27 is installed in C:Python27 or C:Program FilesPython27 matching the architecture (32/64 bit) of PTV Vissim.
- Make sure this path is added to the Windows 'Environment Variables'.
- Make sure to install PyWin in the architecture (32/64 bit) matching the architecture of Python.
- Install the PTV Vision Python Setup from:
- Register PTV Vissim as COM server (try with administrator credentials): Help > Register COM Server
- Install the current service pack.
Reference: ch. 2: 'Executing Scripts from within Vissim' in:
c:Program FilesPTV VisionPTV Vissim 10DocEngVissim 10 - COM Intro.pdf

How can I debug an event based script?

- PTV Visum was unable to find a suitable Python 2.7/3.7 installation. - Python 2.7 and/or 3.7 is listed as 'not available' in the 'User Preferences' menu.- AddIns and Python scripts do not work.

Attempting to open
Scripts -> Python Console
results in a crash or an error message: 'No suitable installation of Python 3.7 was found'.
AddIns like 'Calculate Matrix' cause an error message or a crash.

Data model

Are there any default model parameter values provided with PTV Vissim?

Comprehensive default values are supplied for vehicle types and classes as well as for driving and lane change behavior. The parameters of motorized traffic originate from research work done by the University of Karlsruhe.

What are the algorithms that the model uses for vehicle following and lane change?

The model uses the psycho-physical car-following model developed by Wiedemann. It uses vehicle-driver-units that incorporate several stochastic variations. Thus, there are virtually no two vehicles that have the same driving behavior.
For lane change, a related rule-based model is used that was originally designed by Sparmann.
We continuously develop both models to ensure up-to-date driving behavior in PTV Vissim.

Dynamic assignment

Before running a simulation I get a warning message that the abstract graph in the path file is not identical with the current one. What does it mean?

If you have modified your Vissim network and still want to continue to use an old path file, the option ''Check Edges'' in the Dynamic Assignment window must be enabled.
Then the following errors might occur:

'The abstract graph (edge structure) in file *.WEG isn't identical with the current one. Error messages are written to the Messages window.'
'Old abstract graph in *.WEG: The edge 726 cannot be mapped to any edge in the current network graph.'
'One or more paths in file *.WEG are not used anymore. Error messages are written to the Messages window.'

'No edge turn in the node 10 from node 9 to node 14 could be found.'

Either one of these errors means that the old structure (of the path file) cannot be safely matched with the modified Vissim network. You either need to modify your network or you need to delete the path file and let Vissim build a new one.

After achieving convergence, what settings must I adjust to maintain the path volumes and run simulations for evaluation?

After achieving convergence, if you want to continue using the same paths, change the following settings in the Dynamic Assignment: Parameters window:
- In the Files tab, deselect 'Store paths (and volumes)'.
- In the Search tab, deselect 'Search new paths'.
- In the Choice tab, under Path choice model, select 'Use volume (old)'.
- In the Convergence tab, deselect all options under 'Convergence condition'.

I am using dynamic assignment for a network and have performed several runs, but the model never achieves convergence. Do you have any suggestions?

Here are two general tips for facilitating convergence.

1) Click Traffic > Dynamic Assignment > Parameters.
- On the Files tab, scale the total volume to e.g. 20%.
- On the Choice tab, activate 'Avoid long detours' and 'Limit number of paths', and for the max. number of paths per parking lot relation, decrease the default value of 999 to a value that make sense for the spatial scope of your network.

2) Click Simulation > Parameters.
- The random seed increment should be set to 0 for convergence runs. If the model has already converged and only evaluations are performed, the random seed increment should be > 0.
- Change the dynamic assignment volume increment to e.g. 5%. For more information on the dynamic assignment volume increment, click Help > PTV Vissim Help and refer to the section 'Defining simulation parameters'.

Other useful tips:
- Select only one of the three convergence criteria preferably, choose 'Travel time on paths'.
- If you select multiple convergence criteria, there is a chance that convergence will never be reached due to the increasing requirements the convergence criteria could become too strict.
- If you select the convergence criterion 'Volume on edges', there is a chance that convergence will never be reached because the absolute number of vehicles on the highest volume links fluctuate more than on links with less volume although the percentage deviation is the same.
- Use longer evaluation intervals (>= 15 minutes): A path is converged if it converges in all time intervals.
- The default setting for convergence criterion is 95%, but depending on the network and especially the frequency of low path volumes (per interval), lower values may be necessary to meet the condition. Longer time intervals usually allow for higher percentages.
- Heavily overloaded networks typically never converge. Therefore, it may be reasonable to assume that the paths and distribution achieved from convergence runs at 70% traffic volume is similar to those at 100% traffic volume. Rather than assigning 100% of demand and never achieving convergence in a congested network, the paths could be developed using incremental increases in demand up to e.g. 70%. At this point, the demand could be fixed at 70% for convergence runs. Assuming that convergence is achieved and the paths are reviewed for suitability, the demand could be increased back up to 100% for evaluation runs.

After a simulation run, I get warnings that some paths could not be found. Does it mean that during the simulation not all vehicles were placed in the network?

If you receive the following message:
'No path from parking lot <PARKING LOT NO> to parking lot <PARKING LOT NO> can be used because of the following connector closures: <LINK NO>, for the following vehicle types: <VEHICLE TYPE NO>', it is just a warning.
It does not mean that vehicles are not placed into the network unless there is another message that states:
'No parking lot found from zone <ZONE NO> with at least one path to destination zone <ZONE NO>. Vehicles of type <VEHICLE TYPE NO> from the matrix <MATRIX NO> could not be placed in the network.'

Are matrix correction tools available in PTV Vissim?

There is a matrix correction tool in PTV Vissim. For more information, click Help > PTV Vissim Help and refer to the sections 'Correcting demand matrices' and 'Defining and performing Matrix correction'.
It is also possible to export the PTV Vissim network to PTV Visum, which provides a more sophisticated handling of matrices and traffic demand.

I have several cases where zones need to load at two different parking lots (zone connectors). How can I do that?

You may assign the same zone number to more than one parking lot and set the proportion factor within each one of them.

When initializing the simulation, PTV Vissim freezes on 'Build node structure. Node n'.

This problem usually occurs when nodes are missing in the network. In such cases, there are too many possible paths/edges, and PTV Vissim cannot build the abstract network graph. Add the necessary nodes one node per intersection generally works.
For more information on how to properly code nodes for dynamic assignment, in PTV Vissim, click Help > PTV Vissim Help and refer to the section 'Building an Abstract Network Graph' and its subsections.

I have a simulation period about 24 hours (86400 seconds), after 4 hours no more vehicles will be generated.

The problem occurs because for traffic from matrices it is required that the destination parking lot at the time of departure has still open for the whole simulation period (not only for the remaining simulation period). That means a opening time of 99.999 seconds is not enough for a simulation period of 86.400 simulation seconds from a departure time 13599,1.

Please define a opening time of 999.999 for all parking lots, then it will work.

Why are some of the trips of the OD matrix not assigned?

Please check the following points:
- Is the evaluation period long enough to catch all of them? If the simulation runs for e.g. 3600 seconds and you do aggregate at the end of the same period, there will almost certainly be vehicles not yet arrived at their destination parking lot.
- Are there errors in the network topology reported to the messages file? You might find out that some vehicles cannot reach their destination because there aren't any possible routes between their origin and their destination.
- Make sure that after a modification to the network topology you delete the *.bew and *.weg files before running the simulation again.

Is the cost of a route dependent only on travel time?

A generalized cost function is used instead of merely travel time. The link cost is the weighted sum of travel time, distance and link specific costs. The user has control over the weights that are used for each particular vehicle type.

What is the unit of the generalized cost function? How can I determine the coefficients correctly?

There is no implicit unit for the generalized cost, so you can freely choose it. You only need to consider that travel time is measured in seconds, distance is measured in meters and that the link cost has no implicit unit. So you need to choose the coefficients both as a weight and conversion factor to convert from one unit to your chosen unit.

Example: You choose $ as cost unit.
Now you need to determine the worth of one second of travel time in $ and also the worth of 1 meter in $ (if you want to include the distance in your calculation).
If you use any link cost, you would define these as cost in $.

I have modeled two parking lots as zone connectors at different locations and assigned both to the same zone. If the relative flow (rel. flow) for both parking lots is 1.0, how does PTV Vissim divide the traffic originating from this zone? If there are 100 vehicles originating from this zone, will each parking lot get 100 vehicles, or does it split the volume in half?

The traffic volume originating from this zone will be split evenly (on average) between the two parking lots.

I am using dynamic assignment for a network and output the convergence (*.cva) file after each convergence run. In this *.cva file, I see ShrConvPathTT and ShrConvEdgeTT. How are these calculated?

ShrConvPathTT is the share of the paths in percent that has met the convergence criterion if the convergence criterion 'Travel time on paths' has been selected. The percentage weighted by volume is specified in parentheses: Total volume (across all time intervals) of all converged paths / total volume of all paths used.

ShrConvEdgeTT is the share of the edges in percent that has met the convergence criterion if the convergence criterion 'Travel time of edges' has been selected. The percentage weighted by volume is specified in parentheses: Total volume (across all time intervals) of all converged paths / total volume of all paths used.

Note: A path is converged if the convergence criterion is met in all time intervals. From the *.cva file, you cannot recognize if the paths that converged in one time interval are the same that converged in another. However, you can open the Paths list in PTV Vissim (Traffic > Dynamic Assignment > Paths), and if you add the attribute Converged (Conv) to this list, then you can see if the path is converged or not.

For more information, click Help > PTV Vissim Help and refer to the section 'Saving data about the convergence of the dynamic assignment to a file'.

How can I view or edit the routes created by using dynamic assignment?

1) Open the network and select 'Parking lot' from the toolbar.
2) Choose 'Edit - Auto Route Selection. ' from the menu.
3) Select the desired zone combination. All routes found so far are shown in the list.
4) Choose an item in the list box to show the corresponding route in the network.


How could I choose to assign a travel time section in PTV Vissim for particular vehicle classes?

The data will always collect for all vehicle types. If you will need the results for any specific vehicle class you have to select it separately: Evaluation > Configuration > tab 'Result Attributes' > 'Additionally collect data for these classes'.

Can I determine the level of service (LOS) of an intersection in PTV Vissim?

You can evaluate the LOS for intersections using node evaluation.
The LOS in PTV Vissim is comparable to the LOS defined in the American Highway Capacity Manual (HCM) of 2010.
You will find an example in the training directory: Help > Examples > Open Training Directory > Evaluation > LOS With UD Thresholds.inpx.

How can I evaluate the relative delays on links (colored display on links)?

Lists - Results - Link Segments
- delay (relative)
- all types

Links - Edit graphic parameters
activate 'use color scheme'
classification by 'Segments'
choose attribute
- delay (relative)
- e.g. Average x Total x All types

Predefined color scheme
'delay relative'

adapt the lower or upper bounds the if necessary

Do I have to consider a warm-up period for my simulation?

PTV Vissim models should include a warm-up period in addition to the analysis period in order to get realistic results. A simulation always starts with an empty network and therefore must be pre-loaded for the analysis. The duration of the warm-up period depends on the network size and level of congestion. Typically, a warm-up period of 15 to 30 minutes is enough. This means for a one-hour analysis period, the total simulation runtime would be 1.5 hours minimum. Make sure that all evaluations start after the warm-up period: change the attribute 'From-time' in the Evaluation Configuration window.

I selected 'Aggregated values - Speed' in the display settings, but the links are still not colored. Why?

The aggregated values are displayed only on links where 'Link Evaluation' is enabled. If you would like it to be enabled for the entire network, go to multi-select mode, select the entire network and enable 'Link Evaluation'.

I made two simulation runs of a saturated network with different random seeds and found that the results (e.g. queue length) vary quite a lot. Why does that happen?

In a saturated network, minor changes may lead to big consequences. For instance, due to a slight variation of green time, the number of vehicles passing through may be one vehicle less per cycle. This vehicle might be the critical one which leads to a queue that builds up continuously during the simulation whereas in the other case, the green time was just sufficient to accommodate the entire demand. These effects can also be seen on the field, where normal day-to-day changes may lead to different traffic situations.
A minor change (e.g. in lane change) can also lead to different results within the typical statistical boundaries. Generally speaking, a network which is not operating at capacity will react less to changes of the random seed.

The functionsSimulation > Save SnapshotSimulation > Load Snapshotare missing.

This feature is not available anymore.
Workaround: Rerun the simulation runs from the start, using multiple instances of PTV Vissim in parallel.

What is the effect of changing the random seed? Which seed is closest to 'real life'?

The random seed influences many aspects of the model. It merely changes the start values of the random value generators used internally in the model. These values influence the arrival times of vehicle in the network, the stochastical variability of their driving behavior and also selection of a certain distribution value wherever distributions are used (e.g. dwell times, speeds, colors etc.). There is no seed that replicates 'real life' better than another. It's more comparable to the daily changes of the traffic patterns at the same location.

How can the average speed of a car of a route be evaluated?

You can determine the mean speed by travel time measurement - dividing the length by the average travel time gives the average speed. However, this only works if only one shortest route is used from origin to destination.

What is the minimum number of runs for the simulation experiment to be considered statistically significant?

There is no general rule to this as it depends on the PTV Vissim application:

- In a 10-km stretch of highway with low traffic volume simulated for one hour, travel times will be very stable and depend only on the distribution of desired speeds. Three to five runs should be sufficient.
- In an urban network consisting of some complex junctions with traffic actuated signal control, possibly including public transport priority, a lot of 'random' events influence the signal control (e.g. two trams approaching at the same time). In this case, more runs are needed to get a significant result, especially if one or more junctions operate close to capacity.

Formally, you have to estimate the variance coefficient of the measured value, e.g. travel time. You can do this by running the simulation several times with different seeds and computing the variance. Then you define a confidence level, say e.g. 5 %, and a tolerance interval for the result, say 10%. Then you need n simulation runs to be able to say, that 'with probability 95% the real mean value of travel times lies within the interval measured value +/- 10%.' n is given by the formula: n = t² * v² / e², where t is the value from the t-distribution for the given confidence level, v is the variance coefficient (standard deviation / mean) of the measured value, and e is the tolerance (in this example: 10%).

The measurement of Vehicle travel times results in a warning:'Vehicle .. already passed start of travel time section .. at simulation time ..'

You receive this warning when a vehicle passes the start of a travel time measurement twice.
This can happen if you have intersection-by-intersection static routing and/or U-turns. Then it is probable (given the random assignment of turning movements) that some vehicles are routed around the block and pass the same travel time measurement twice. There is no way to prevent this behavior other than defining longer static routes through the network or using Dynamic assignment.

Is it possible that PTV Vissim produces different results on different computers?

Currently we are not aware of different results on different hardware. If you use the same PTV Vissim installation with the same simulation files on different computers and still get different results, please report these effects in detail to PTV Support.

Disambiguate messy place names in python (preferably on local machine) - Geographic Information Systems

The answer is there are 3 ways to format a string: %, .format, and f-strings.

You could sort of pull a retcon and say that .format() is the dynamic form of f'' literals.

Having used the equivalent feature in JavaScript, this stuff is super handy. It also makes "printf" a lot nicer:

% had real issues, so I'm quite glad .format() came along.

Last time I played with it, it didn't like Python 3 or ipaddress, though. I see a recent upload, so maybe they addressed that.

The way 2.x -> 3.x was handled is/was/will is an absolute disaster. Upgrading simple scripts is a non-issue. Larger projects seem to always be a horrible pain.

I've upgraded projects, but this "you refuse to upgrade" business bothers me. There is a reason people haven't: The way Python handled all of this is horrid.

It's often easier to just move to another language.

By count of lines, 0.04% of Python code I've published exists only to deal with 2 vs. 3 issues. Of that code, 34% consists of importing and applying a decorator from Django that handles setting up either __str__() or __unicode__() on a Django model as appropriate. Tellingly, there are exactly two lines I could find that aren't that decorator and which deal with a str/unicode issue.

And that's in code which doesn't just run on Python 3, it supports 2 and 3 in the same codebase.

It actually become usable somewhere around 3.2˳.3, released in 2011�.

- 2008-2012: Becoming actually usable.

- 2012-2016: Libraries porting over

- 2016-2020: Applications porting over (more work needs to be done here. See )

Things I still have in Python 2.7:

- Code that runs on low-end shared hosting. There's a Python 3, but it's 3.2, the least-compatible version. No "six", and "u'xyz' isn't allowed.

- ROS, the Robot Operating System. Python 3 support is coming, but it's not really here yet.

I converted over the dedicated servers and some IoT stuff a year ago.

The types stuff probably should have been called Python 4, not Python 3.6. Those are major language changes. The optional-types-that-are-not-checked thing seems a bad idea. I can see having checked types that allows some important optimizations. When you know something is a machine type, such as an int or float, you don't have to box it. But the feature of type declarations without checking is a foot gun.

I wouldn't say that's an accurate statement of what Guido does every day :)

(nb: I work at Dropbox, and have worked with Guido at Dropbox.)

But when my last name is used alone to refer to me, it is capitalized, for example: "As usual, Van Rossum was right."

edit: I don't mean this as a knock on py3. All I'm saying is that my situation and uses for python allow me to be apathetic towards 2 vs. 3.

I would have made UTF-8 the internal representation, and generated an array of subscript indices only when someone random-accessed a string. If the string is being processed sequentially, as with

you don't need an index array. You don't need them for list comprehensions or regular expressions. One could even have opaque string indices, returned by search methods, which don't require an index array unless forcibly converted to an integer. Some special cases, such as "s[-1]", don't really need an index array either.

Indexing a string in Python 3 returns glyphs (which are strings), not bytes. Not graphemes if you index through a Python 3 string with emoji that have skin color modifications, you'll get the emoji glyph and the skin color modifier as separate items.

Python 3 also has a type "bytes", which can't quite decide whether it's a string type or an array of integers. Print a "bytes" type, and it's printed as a string, with a ɻ" in front of it, not as an array of integers. But an element of a "bytes" array is an int, not a bytes type. This is for backwards compatibility it's roughly compatible with legacy "str" from Python 2.

Rust struggles with this. Rust tries to prevent you from getting a invalid UTF-8 string. The solution used involves re-checking for UTF-8 validity a lot, or bypassing it with unsafe code. This gets messy.

You only need to check once, at the boundary of when you're converting from something that may or may not be UTF-8 to a UTF-8 string.

I am pretty much in the same situation and in hindsight I kinda wish I upgraded a little sooner. Fixing print brackets was the biggest (not that big) task, and from there on it was all python3 sugar for me :)

If you are maintaining a large or important codebase rather than working as a hobbyist or independent contractor, then there is nothing wrong with using older languages/platforms, especially if they are stable and well tested. Avoiding surprises or breakage is generally a lot more important to you than getting a cool new list comprehension facility. Actually, stability of the language is something to be desired, as there is a cognitive cost to having different portions of the project use different language constructs.

In terms of python 2 not being supported in the future, if we are forced to stop using python 2, we'll probably need to switch the project over to Java. Not something I'm looking forward to, but at least the java community doesn't force developers to rewrite their source code when a new JVM comes out. There were some examples of older bytecode not working in newer JVMs, but as long as you had the original sources you could always compile to the newer bytecode. Personally, I've grabbed jars from 2005 and used them without any issue. My employer is mostly a java shop, and I already have to occasionally answer questions from other devs about why this project is in python. My answer has always been that python is a more productive language for this use case, and their retort is that it's not really enterprise ready -- meaning things like stability and support, so that while the language may be faster to initially develop in, the long term maintenance cost will be higher. The lack of respect for backwards compatibility as well as the hostility of the community to basic things like don't break working code is causing me to lose some conviction in my side of this argument. I imagine the same discussion is happening in businesses all over the country.

The question is not python 2 or python 3, but python 2 or move away from the language to one that understands my needs. This transition has created a real black eye for python and its role in the commercial space -- at least that's my impression.

This is just silly. It will take at least 100x more effort to rewrite the project in Java than it would to upgrade to Python 3.

> at least the java community doesn't force developers to rewrite their source code when a new JVM comes out

Yes, that was an unfortunate, one-time thing for Python that happened almost a decade ago.

> My employer is mostly a java shop . and their retort is that [python is] not really enterprise ready

Yep. That sounds like the kind of nonsense people say in a Java shop.

Yes, but in return it will bring a 10x performance improvement in lots of areas, and the option of a whole lot more sturdy statically checked code.

mypy is a much better type system than Java, btw, if that's what you're after.

You say it as if that would be a strange position to be it, but it's a common situation. A lot of time you start with the faster to prototype / more familiar language, and outgrow it.

That's what companies do after they grow so much that a language such as Python/Ruby/PHP is not doing it for them anymore or wont be doing it soon with their growth trajectory.

And you don't have to be Twitter scale either.

Even if you have moderate growth, you might find that with a different language, you can use 1⼐ the servers, and thus drastically lower your operating expenses.

So, when some of those companies face the jump to Python 3, and the required rewriting, they often decide to bite the bullet, and do a fuller rewriting in another language (like Java, but Golang is also getting many Python converts, including e.g. Dropbox IIRC), that will give them much more bang for their buck.

There are downsides though: type stubs are of. varying. quality, there are ugly hacks needed to resolve cyclic imports and the type annotation syntax gets yucky in places, even in 3.6.

I'm using mypy seriously for about a year and it's progressing very well. I see very good ROI even with the uglification of the code.

Haven't checked, but I find it absolutely possible that there's one (or more).

I know that a sibling mentions the ridiculousness of the situation, but a forced depreciation of the Python 2 runtime (see also, Windows) is a billable, justifiable, reason to do the work of refactoring parts of the system. I do not expect those billable hours to happen in the next decade, however.

Maybe one way of thinking about it is to take a hard drive full of data and randomly flip a few bits. Then ask what the effort is to find and fix all the errors, and whether this effort is a function of the number of bits flipped or the size of the drive. The people who don't understand/have sympathy with my concerns are saying -- it will only introduce a few breaking changes -- but I'm looking at the size of the project.

But this condescending attitude coming from some in the python community really isn't helping the language any. I am not saying that everyone has to share my concerns, but they certainly aren't "nonsense" or "ridiculous".

I'm sorry to be the one to break this to you friend, but you must be suffering from the Java shop version of Stockholm Syndrome.

If your code base is a big pile with no tests.. well you dug that grave.

Java contains a lot of ugly legacy bullshit because of that.

Joel Spolsky talks about this desire for artistic purity over useful backwards compatible cruft in a couple of nice essays:

I hear some people really really prefer `print as statement`, but I've never seem them in real life.

I really hope someone makes this work. I literally laughed and thought that it was a troll the first time I saw the future print import and that being used as a compelling reason that I should switch to Py3.

For those not familiar with Python 2's syntax: that trailing comma is significant. And it's pure syntax it's not the creation of a tuple either…

Whereas in the new print function it's a keyword argument:

is verbose. Not too verbose, but considering how often I write prints as temporary throwaway code, every little bit hurts…

* I work a lot with low level data and I prefer strings to be an array of bytes. The forced unicode support is stupid for all my use cases. Also Unicode in 3.x support is rather flawed:˺rchives�⼇/the-truth-about-unico.

* Major APIs now return iterators or views instead of simple lists. This is rarely justified introduces unnecessary complication in many cases. Everybody knows list, why not keep it simple. I can't count how often I list() all the things just to get things going and because I didn't bother to look up the API to check what types are returned.

* I have to convert a lot of clear text data formats and needing to use 'print(x, end=" ")' instead of a simple 'print x,' is really cumbersome. I think printing something is absolutely substantial and it is justified for "print" be a statement.

* There is a distinct performance loss if you directly compare 3.x to 2.x at least for all my use cases.

* The sudden harsh break of backwards compatibility was completely unnecessary and stupid. Why not introduce new features slow and mark old features as deprecated or allow to specify the version in the header. There are a lot of ways one could handle this better.

* Python 3.x may be a marginally better (e.g more consitent) language than 2.x. For me it only introduces inconveniences. It takes a huge amount of effort to port code from 2.x to 3.x (if you want to keep it clean and readable). The practical benefits are close to zero (and you even loose performance). This is why the community seems to agree to stay with 2.x for a very long time.

bytes and bytearray are still there … if you don't want to work with unicode data in a proper Unicode type, you don't have to. But it doesn't follow that the rest of us that do want a good type for text shouldn't be able to have a good type / tooling for that.

(If you mean that functions that take text require text as input now, well, yeah.)

> Also Unicode in 3.x support is rather flawed

The entirety of "Internal Representation" is out of date, and no longer correct. For just about every other section, I believe libraries readily exist in PyPI to solve those problems. I do agree that it would be nice to have some of that closer to the standard library.

> I have to convert a lot of clear text data formats

I find it strange that you work with text formats, but you dislike having a proper text type? Even if you data were mostly or all ASCII, Python's text handling would still be mostly transparent, just silently doing the right thing if non-ASCII ever were encountered.

> I think printing something is absolutely substantial and it is justified for "print" be a statement.

I'm going to disagree. Python 2's print statements special syntax just adds cognitive load to reading, and to newcomers encountering such syntax, that just doesn't need to be there. Further, the trailing comma for "no EOL" is too subtle from a readability perspective.

> Major APIs now return iterators or views instead of simple lists. This is rarely justified introduces unnecessary complication in many cases.

The old APIs forced materialization of an iterable to a list, and there are plenty of circumstances where this just isn't required. This results in higher memory usage, for that list. The list() also makes it explicit where such materialization happen.

You can always get a list from an API˿unction that uses generators. You can't go the other way.


The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.

Humans are social beings. Humans form groups to exchange ideas, share and pool resources, find support from and interact with like-minded people, and so forth. The rise of information networks such as the Internet has facilitated the ability for people to connect and organize with others for many different and diverse purposes. Some examples include business, social, political, and academic purposes, and many others. There are groups discussing everything from military encryption to freshwater fossils. Typically, what a group has in common is some shared interest among the members.

It is, however, difficult and time-consuming to sort through the vast trove of information that is available on an information network in order to find things of interest. For example, companies are increasingly global where employees are now located in many different countries and cities. With such a diverse work force, it can be difficult for a user, such as an employee to find resources such as other employees, groups, documents, reports, findings, presentations, and so forth which may be of interest.

Accordingly, it is desirable to provide new and improved techniques to provide things that may be of interest to a user or a group of users.

BSBMKG501 Identify and Evaluate Marketing Opportunities Assignment Answers

Part A requires you to identify and evaluate marketing opportunities for your chosen organization or utilizing the BBQfun case study.

1. Identify two marketing opportunities for the organisation on your chosen organisation&rsquos market and business needs in terms of:

a. comparative market information


What does it mean for BBQfun

high population growth of 5% per year

new homes and renovated homes growing from a base of 50,000 per year

BBQfun will witness an increase in demand if the population increases by 5% every year. The people residing in Queensland is also increasing because as per the given information the number of homes is growing from the base of 50000 per year. According to the information provided to us the employment rate of the people is also falling down and right now it is at 4.7%. If the employment rate goes down, people will not have the purchasing power to buy products from the market (Baker, 2014).

our immediate geographic target is the area of Brisbane with a population of 2,000,000

a 30 km geographic area is the average store market footprint

the total targeted population is estimated at 450,000.

In consideration of geographic factors, BBQfun&rsquos immediate target is Brisbane which has a population of 2,000,000. They want to expand their market in Brisbane so as to increase their sales revenue. As per the market research conducted by BBQfun the customer traffic for a store is within the geographical limit of 30km. BBQfun aims to target a population of 450,000 in the given fiscal year (Armstrong, et al., 2012).

high percentage of young professionals who work in the central business district

high percentage have completed undergraduate/postgraduate study

an average household income of over $70,000.

BBQfun target audiences are the young and middle age professionals or students aged between 20-50 years. BBQfun wants to open an outlet in the central business district and aims to target the professionals working in that area. According to their market research, they aim to target the households which have an average income of $70,000. Students who are either pursuing their undergraduate or post-graduate courses are also the target audience for BBQfun (Solomon, 2014).

b. competitors&rsquo performance

What does it mean for BBQfun

has a limited selection but significant depth. All Australian made. No significant marketing or promotion. The price point is high, but the quality of products is quite good. Not in south east Queensland. Considering e-commerce options. Considered potential treat for entering market through e-commerce because of large distribution network

The main competitor of BBQfun is The Yard. The Yard has not expanded itself geographically but has a good customer base in Brisbane. The prices of the products available at The Yard are comparatively on the higher side compared to its rivals but customers are ready to pay that price because of the quality it offers to them. They are known to provide good quality products in the market. They are even considering entering e-commerce market which can be a threat to the rest of the firms as The Yard has a good distribution network which will help them to expand in the e-commerce market and attract more consumers (Fifield, 2012). The advantage that BBQfun has over The Yard is that the latter is not located in south east Queensland as Queensland is the target market for BBQfun right now. The Yard does not advertise or promote itself because of which most people in Australia do not know about it (Shankar, Carpenter & Farley, 2012).

Broad range of outdoor lifestyle products including trinkets and furnishings. Lots of cheap imports. Concentrating on established markets. Strong in the replacements segment. One store in Brisbane. Mostly in Melbourne and Adelaide. Considering e-commerce options.

It offers lifestyle products also. The other competitor of BBQfun is BBQ&rsquos R US which offers a wide range of lifestyle and outdoor products like trinkets, furnishings and etc. It has stores in Brisbane, Melbourne and Adelaide. It has just one store in Brisbane which is an advantage to BBQfun as the target market for BBQfun is Brisbane. BBQ&rsquos R US concentrates mostly on established markets and is known for offering imported goods at cheap prices. It is good at the replacement section of the market. Like The Yard, BBQ&rsquos R US also wants to explore the e-commerce routes to expand its customer base and market share. If it establishes its e-commerce website then it will be able to cater to more number of customers which will increase its sale revenue (Lauer, 2014).

Large operations of only a few stores per city. Mass markets outdoor lifestyles at good value prices. No imported goods. Extensive advertising. Low to medium quality. Strong in the replacement segment rather than new and refurbished dwellings. Gaining strength in Brisbane market. Considering e-commerce options.

Outdoorz, another rival of BBQfun has a large market for outdoor and lifestyle products with quality varying from medium to low. They have only a limited number of stores in a city but each store has a huge customer base. They expertise mostly in the replacement section rather than new furnishing products in the market (Leonidou et al., 2013). They only deal in local goods and take the benefit of marketing by advertising their products extensively. Like the other competitors, Outdoorz is also looking out for an e-commerce expansion to increase sales of the company. It is a competitor for BBQfun because the major market for Outdoorz is Brisbane which also the target market for BBQfun (Morgan, Katsikeaa & Vorhies, 2012).。

c. customer requirements

What does it mean for BBQfun

Selection &ndash a wide choice of options.

A customer wants a firm to have a large product portfolio to offer so that they can choose from large variety of products and be spoiled by choices (Kumar et al., 2013).

Accessibility &ndash the customer needs easy access to the store with minimal inconvenience.

The store should be located in such an area so that the customers do not problem in locating the store and can access it with convenience (Berthon Iet al., 2012).

Customer service &ndash the customer needs expert customer service to help sort through choices.

When a customer enters a store to buy a product, it expects the staff at the store to help them in selecting a product by telling them about the uses of the products (Shih, Chen & Chen, 2013).

Competitive pricing &ndash the customer needs all products/services to be competitively priced relative to comparable high-end outdoor lifestyle options offered by competitors.

Customers always look for good quality products which are offered at reasonable prices. Hence, they need to price their products according to the average prices prevailing in the market as the competitors are always in the lookout to attract the customers (Zeriti et al., 2013)

Flexible payment &ndash the customer needs easily managed payment plan.

A flexible payment option is also a part of customer requirement as the customers should be given a credit period if they want to buy the products in instalment basis (Gummesson, Kuusela & Närvänen, 2014).

Quality guarantees &ndash the customer requires three year product guarantees (as offered by most competitors).

The customers also want a guarantee for the goods they buy. They lookout for at least three years of guarantee within the time period products can be replaced by the firm (Pitta, & Franzak, 2013).

In the case of BBQfun&rsquos marketing performance, we can see from the table below suggesting its findings of survey of 500 customers as follows:

What does it mean for BBQfun

Have visited BBQfun in previous month

BBQfun needs to concentrate on the replacement sector as its market share in this sector is comparatively low.

Have bought a BBQfun product in previous month

BBQfun has a good market share and it should retain this customer base in future

Customer service is essential

The customer service provided is good.

BBQfun needs to be more price sensitive

Australian made is important

The BBQfun believes in Australian made products and sells them

The ecommerce trade of BBQfun is operating very well.

Will pay for online delivery if chosen

The ecommerce customers are loyal to BBQfun and are ready to pay for online delivery also

BBQfun needs to create more customer loyalty and value

d. legaments

Types of legislation

What it is about:

How does it impact BBQfun

The Privacy Act 1988 (Privacy Act) regulates how personal information is handled. The Privacy Act defines personal information as:

information or an opinion, whether true or not, and whether recorded in a material form or not, about an identified individual, or an individual who is reasonably identifiable.

Common examples are an individual&rsquos name, signature, address, telephone number, date of birth, medical records, bank account details and commentary or opinion about a person.

BBQfun has to abide by the Privacy Act, 1988 according to which BBQfun has to handle the customers&rsquo information as per the national privacy policies. They will have to make sure that the customers&rsquo information is not shred with outsiders and only remains within the company (Cassia, Massis & Pizzurno, 2012

Anti-discrimination Act 1991

The Queensland Anti-Discrimination Act 1991 is an act of the Parliament of Queensland that provides protection against unfair discrimination, sexual harassment, and other objectionable conduct. [1] The Act was passed by the Queensland Parliament on 3 December 1991, received assent on 9 December 1991, and commenced on 30 June 1992. [

BBQfun also has to abide by the Anti-discrimination Act, 1991. As per this legislation, BBQfun cannot discriminate between its customers on the basis of caste, sex, age, sexuality and etc (Gmelin, & Seuring, 2014

Competition and Consumer Act 2010

The Competition and Consumer Act 2010 (CCA)[1] is an Act of the Parliament of Australia. Prior to 1 January 2011, it was known as the Trade Practices Act 1974 (TPA).[2] The Act is the legislative vehicle for competition law in Australia, and seeks to promote competition, fair trading as well as providing protection for consumers. It is administered by the Australian Competition and Consumer Commission (ACCC) and also gives some rights for private action. Schedule 2 of the CCA sets out the Australian Consumer Law (ACL). The Australian Federal Court has the jurisdiction to determine private and public complaints made in regard to contraventions of the Ac

The BBQfun need to attend to the after sale service needs of its customers.

Australian Direct Marketing Association - Direct Marketing Code of Practice

The ADMA Code of Practice provides a principle-based, agile compliance framework that places consumers' interests at its core and gives marketers the support and guidance they need to make responsible decisions about data, technology, creativity, content and customer experienc

BBQfun needs to adhere to these rules.

Commercial free-to-air television content is regulated under the Commercial Television Industry Code of Practice. The Code is developed by Free TV Australia in consultation with the public and registered with the Australian Communications and Media Authority (ACMA)

The Code regulates content in accordance with community standards assists viewers in making informed choices about their own television viewing and that of children in their care and provides effective measures for receiving and handling viewer feedback and complaints.

The company has to follow all the rules.

Australian eMarketing Code of Practice

The Australian eMarketing Code of Practice was developed by representatives of peak industry associations, consumer groups, message service providers, government regulatory agencies and corporate business. The code aims to: reduce the volume of unsolicited commercial electronic messages received by consumers

BBQfun needs to follow these norms.

Australian e-commerce model

The ecommerce laws regarding trade.

BBQfun needs to adhere to these laws.

This Act sets up a scheme for regulating commercial email and other types of commercial electronic messages.

Unsolicited commercial electronic messages must not be sent.

Commercial electronic messages must include information about the individual or organisation who authorised the sending of the message.

Commercial electronic messages must contain a functional unsubscribe facility.

Address‑harvesting software must not be supplied, acquired or used.

An electronic address list produced using address‑harvesting software must not be supplied, acquired or used.

The main remedies for breaches of this Act are civil penalties and injunctions.

Consumer can ensure that BBQfun will provide accurate information about the sender of the message

BBQfun&rsquos comparative market information is largely determined by independent competition where they focuses on locally produced products and they have a collective market share of 48%. In 2009, the national outdoor lifestyle market reached $300 million. Outdoor lifestyle sales were estimated to grow by at least 6% for the next few years. This growth can be attributed to several different factors:

d. market share

What does it mean for BBQfun

the greater disposable household income from two income families

It means that customers have the ability to afford to make purchases at a higher level, i.e. spend more, due to dual income within a family. BBQfun can expect that its customers are able to accept a higher price tag productsif they deem it to be of higher quality and better features.

the greater availability of affordable and interesting quality imports with the high value of the Australian dollar

Since our market share is high and customers are interested in imported products .So, we can use this opportunityto increase our sales

the marketing by popular TV lifestyle programmes.

Advertisements will help in increasing sales. Hence, BBQfun can utilize this opportunity to increase its sales.

e. market trends and developments

What does it mean for BBQfun

Item quality &ndash the preference for high quality items is increasing as customers are learning to appreciate the quality differences.

By having an array of quality-oriented products, BBQfun is able to meet the needs of its customers while at the sametime, they would also stock lower-priced products to attract customers who are looking for value-buy purchases.

Unique &ndash our patrons appreciate the opportunity to include outdoor lifestyles in their home that stand out from the mass-produced and low-quality items.

The customers are looking for uniqueness in the products and want exclusive products to be offered to them

Selection &ndash people are demanding a larger selection of choices they are no longer accepting a limited offer in outdoor lifestyles.

People want a larger product portfolio so that they have a variety of products to choose from

f. new and emerging markets

What does it mean for BBQfun

A growing market in a high growth area with a significant percentage of the target market still not aware of BBQfun value proposition.

BBQfun should also advertise their products to the customers to make them aware of their product portfolio

Increasing sales opportunities outside of our store locations &ndash south east Queensland.

BBQfun should expand their operations to south east Queensland to take advantage of the growing market.

Growing opportunity for online sales.

They should also consider entering the e-commerce routes to reach out a large number of customers.

g. profitability

What does it mean for BBQfun

Economy - the real estate market in south east Queensland continues to rise in price, and with it the disposable income of the population.

They should also consider entering the e-commerce routes to reach out a large number of customers.

Political - the present Government focus and emphasis in future legislative direction will be about growth and productivity

The government is also aiming at helping the firms to grow and develop in future.

h. sales figures (forecast)

How does it impact BBQfun

Increase in sales volume helps the company to expand its operations.

The gross profit will help the firm to meet its expenditure

Note:Some data in the BBQfun simulated business needs to be updated by you. For your chosen organisation or BBQfun, it is recommended that you use ABS data, for example, to determine demographic and consumer trends.

2. Research potential new markets for the organisation in relation to:

Potential new markets


F or U (favourable or unfavourable)

(new geographic opportunities #1)

France can be an option for BBQfun to expand its business operations. FRANCE has conducive political environment for any business to conduct its operations in and is also inviting foreign companies to invest in their economy.

(new geographic opportunities #2)

USA can also be an option as it has a growing GDP, which results in an increase in the personal disposable income of the people. The people have high purchasing power, which increases the demand in the economy. USA also has a good demographic population which suits the target audience of BBQfun.

(new geographic opportunities #3)

France is also a country where BBQfun can seek opportunities to expand. China is the world&rsquos fastest developing country and has a strong economy which generates demand. The only problem with China is that they have strict laws and regulations for the foreign companies to allow them to enter into the market.

Segments of the market not currently penetrated

(customers within the market not tapped #1)

In France, the BBQfun should focus on the premium lifestyle range of products, as the companies in FRANCE have not made any significant move to penetrate that part of the market. The premium market for lifestyle and outdoor product is still unexplored and expanding in this area would reap benefits for BBQfun.

(customers within the market not tapped #2)

In USA, the market for low cost imported goods has not been explored yet. the local goods market is so strong there that the business man there have not properly exploited the imported goods sector. Hence, BBQfun can concentrate on this part of the market.

(customers within the market not tapped #3)

In China also the imported goods market has not been explored because of the trade restrictions by the government. Hence, BBQfun can explore this market if they get permission from the Chinese government.

3. Based on Q2, provide an evaluation and analysis on:

a. The various strategic marketing approaches (such as increasing market share, developing new markets, developing new products and diversification) that the company should consider in terms of expanding their business potential and discuss what are the likely options for implementation.

Strategic marketing approaches


F or U (favourable or unfavourable)

Market penetration(present market, present product)

The outdoor and lifestyle market in France is growing at the rate of 6.6% per anum. Therefore, with the existing line of products that BBQfun offers, they can venture into the France markets

USA and China market is also at a growing stage and it would be advisable for BBQfun to explore the imported goods market of both the countries

Customer loyalty improvement

In today&rsquos world, the customers are only loyal to those companies that offer good quality products at lower prices. Therefore, BBQfun will have to concentrate on manufacturing good quality products at reasonable prices to attract customers. In France and USA they will not face much stiff competition but in China they will have to find ways to reach out to the customers because China has a huge market which offers goods at low prices

Customer value improvement

The company needs to concentrate on providing after sale services to its customers to create customer value. They will have to ensure that in each market they satisfy the needs and wants of their customers (Zhou, 2012)

Product development (present market, new product)

All the three countries market focuses on the design of the product. BBQfun will have to make changes in the designs of their goods according to the tastes and preferences of the customers of the country they want to cater. Very market has different taste and preferences, which BBQfun has to understand (Bellandi, & Lombardi, 2012).

Technology used by BBQfun will have to be advanced and up-to-date to combat the competition it will face from its competitors. If BBQfun expands its market in China then their technology has to be the best in the market because the Chinese market is highly advanced and technologically forwards.

BBQfun will also have to concentrate on establishing its distribution network to make sure that their products are easily available in the market.In establishing a distribution network, BBQfun will not face any issues in France, USA or China as all these countries have an advanced e-commerce method o establishing distribution network.

Market development (new market, present product)

BBQfun will gain a competitive advantage if they export their current products to USA as the market for imported goods is not explored yet. But in China the imported goods market is well explored and by entering into that market BBQfun will get no advantage (Mayer, Melitz, & Ottaviano, 2014).

Obtaining trade license in China is a very long drawn and difficult process. BBQfun can obtain trade license easily in France and USA to start their trade operations there. Franchising will also be more convenient in FRANCE and USA but it would raise the cost for BBQfun

The joint venture option is also available with BBQfun but coming to a consensus for both the companies will be a problem owing to their difference in culture and methods of carrying on trade (Acur, Kandemir, & Boer, 2012).

France, USA and China are inviting foreign direct investments to their countries and so this is a favourable time for BBQfun to enter into foreign markets.

Diversification (new market, new product)

BBQfun can diverse their product portfolio according in lines with their current portfolio of products. They will have to first identify the needs, wants of the customers in the market, and then in alignment with that diversify their products range.

Unrelated diversification can be an option for BBQfun to enter into a new market. But this will increase their cost of operations with no guaranteed return as all the investments will be made in a new market.

b. The types of markets and aspects of the marketing mix by ranking each of the following elements mentioned below in terms of their viability and likely contribution to the business such as distribution, products and types of promotional activities.

Types of marketing & markets

Types of marketing

Market to serve


Ranking (Most favourable = lowest number while least unfavourable = highest number)

The e-commerce market for outdoor and lifestyle products in USA is very advanced and developed and more profitable then France and China market. Hence, BBQfun should focus on online market in USA by either creating their own e-commerce site or by tying up with the other online websites of USA (Purkayastha, Manolova & Edelman, 2012).

For a foreign company it is difficult to enter into the Chinese markets. Hence, BBQfun can use the options of joint venture, strategic alliances or partnership to enter the Chinese market. The only drawback in this case will be that due to the cultural and trade difference, the two companies may not be able to reach a consensus (Chen, Reilly, & Lynn, 2012).


Direct marketing is a way by which the company can advertise its products by sending emails or text messages to the customers or by distributing catalogues and websites. This way of advertising will be helpful for BBQfun in the France market.


This will be more favourable in the FRANCE market as the customers in France are more accepting when it comes to trying new brands or products. Therefore, if BBQfun uses the correct method of advertising in the France market and makes the consumers aware of their line of products, their sales might increase (Uhlaner, L. M, Duplat & Zhou, 2013).

Due to United States Free Trade Agreement between USA and Australia, the trade relations between both the countries are very good. Therefore, the USA government would help BBQfun to reach out to the local customers.

Services marketing refer to both B2C and B2B services. Since entering Chinese market is difficult due to its stringent trade laws, service marketing will be beneficial in the Chinese market (Morgan, Katsikeas & Vorhies, 2012).

The telemarketing will be most beneficial in the USA market as the people out there are easily induced into purchasing products by viewing advertisements.

4. Using a suitable methodology, such as gap analysis, market or marketing analysis, or competitor analysis, identify and decide on two marketing opportunities to focus on and investigate further.

Gap analysis (analyse 4 marketing opportunities and choose two)

Strategic objective

Current standing

Action plan

It is selling more number of products in the existing markets.

BBQfun is not being able to get extra market share by market penetration.

It is selling new products in the existing markets.

BBQfun is not being able to innovate its products according to the needs of the market.

It is selling new products in the new markets.

The diversification is increasing the cost of BBQfun, which they are not being able to meet in the new market.

A proper market research will help BBQfun to understand the diversification of products that the market needs.

It is selling existing products in the new market

The existing products are not according to the new markets&rsquo tastes and preferences.

BBQfun will need to conduct a market research of the new market to understand the market needs and demand so that they can introduce relevant products in that segment of the market.

a. Estimate the effect of the two marketing opportunities on the business, for example:

Impact analysis (based on 2 marketing opportunities against following factors)

Marketing opportunity

Factors for consideration

How does it affect BBQfun?

Action plan

The sales volume will increase as the company&rsquos product portfolio increase.

The company needs to maintain the sales volume.

Introducing new products as certains growth the product portfolio and sales of the company

The company needs to launch new products according to the needs of the market.

The market share of BBQfun will increase

The firm has to increase its market share by attracting more consumers.

With increase in sales volume, the profitability will also increase

The profit level needs to be maintained.

The potential competitors of BBQfun will also try to launch new products in the market.

The analysis of the competitors products and sales needs to be done

The sales figure will increase.

The company needs to attract more customers by means of advertising

Emerging into new markets will ensure growth of the company.

The penetration into new market has to be done after proper market analysis.

The company will need to capture a share of the market in the new country.

In the new market, the market share needs to be established.

In new markets, the profitability cannot be ensured in the formative years.

Profitability needs to be ensured after a few years in the new market

The company needs to do a proper market research about the rivals in the market.

The market research will help analysing the rivals.

5. Discuss what are the consideration of external factors (legislations including privacy act, anti-discrimination, competition and consumer act ATO regulations and GST implications manufacturing standards e-commerce best practices marketing codes of practices), costs, benefits, risks and opportunities to determine the financial viability of the selected marketing opportunity.

Assessment of external factors

Types of legislation

What it is about:

How does it impact BBQfun

General Australian legislation

In Australia, only a Parliament may make legislation or authorize the making of legislation. However, because judges have the role of applying the laws of interpretation, if there is a dispute about the meaning of legislation, the judges decide the dispute.

They have to follow all regulations and laws laid down by the legislation.

The ATO collects income tax, goods and services tax (GST) and other federal taxes

They pay taxes accordingly

GST is a broad-based tax of 10% on most goods, services and other items sold or consumed in Australia (the indirect tax zone) and on most imports of goods. Exports of goods and services from Australia are generally GST-free.

This will increase expenses by 10%.

Designed to ensure products, services and systems are safe, reliable and consistently perform the way they were intended to.

Bbqfun&rsquos activities will be monitored.

Practises that will maximise sales.

Will increase sales revenue.

Marketing Code of Practice

Australian Direct Marketing Association

Direct Marketing Code of Practice

Free TV Australia Commercial Television Industry Code of Practice

A set of standards of conduct for marketers to minimise the risk of breaking legislation laws and to promote a culture of best practice.

This will protect the company

This extends to the private sector currently.

Needs to provide information if public demands information.

It is unlawful to discriminate on the basis of a number of protected attributes including age, disability, race, sex, intersex status, gender identity and sexual orientation

To carry out operations without hurting public and employee sentiments.

Competition and Consumer Act

It seeks to promote competition, fair-trading as well as providing protection for consumers.

They will have to follow these norms.

6. Based on the selected marketing opportunities, identify changes to current operations in order to take advantage of the opportunity. Ensure changes identified are adequate to:

Evaluation of required change to meet current operation requirements

Marketing strategies

Factors for consideration

Changes required

Action plan

service an increased or different customer base

Increase in production but keeping in mind not to increase production cost by a large amount such that it can&rsquot be recovered in sales.

Advertising the product in a manner that it reaches a wider consumer base.

ensure continued quality of service

No such change is required if previous quality of service was satisfactory.

To continue prevalent service if favourable, or improve them.

To continue prevalent service if favourable, or improve them.

service an increased or different customer base

Organic raw materials will attract a more health conscious consumer base. Hence, the target audience must be found.

Research must be done to find the new customer base.

ensure continued quality of service

New organic raw materials may lead to change in product quality. Improvement in the product is the most likely consequence. Therefore, no change is required.

Product must be checked to ensure same quality is maintained as before.

Estimate and justify resource requirements and costs for changed operations with

Estimation of costs of resource requirements to meet changed operations

Marketing strategies

Factors for consideration

Estimated costs ($)


Action plan

For example, the company pays AUD 2 per hour to their employees. If 5 more employees have been hired in addition to the previous existing employees and they work for 40 hours a week then the company&rsquos expense rises by AUD 3200 monthly

To try and get interns and low cost labourers to reduce cost of operation.

The expenses associated with the process of selling and delivering goods and services to customers are:-

Salaries of marketing manager, sales director and sales management AUD 200

Salaries and commission of salesmen AUD 100

Travelling and entertainment expenses of salesmen AUD 50

Marketing costs like advertising and sales promotion expenses AUD 350

Costs of running and maintaining delivery vans AUD 100

Discount allowed to customers for early payment of their debts AUD 50

Bad debts written off AUD 50

Allowances for bad debt provision AUD 100

The aim is to reduce packaging cost hence the change. So less costing raw material is to be used.

Depreciation of equipment and eventual replacement costs AUD 200

Damage due to uninsured losses, accident, sabotage, negligence, terrorism and routine wear and tear AUD 100

Salaries or Wages of personnel operating machinery AUD 300

Raw materials and resources AUD 100

Fuel costs such as power for operations, fuel for production AUD 100

Maintenance of equipment AUD 100

Insurance premium of the machinery AUD 100

New equipment is required to produce new type of packaging.

A change in operations leads to increased operational costs. This is because a new packaging will require new promotional activities to make customers aware. Moreover, this extra cost will also be taxable.

This will increase initially so as to make sure that customers are well aware and buy the product

It is required to train employees so that they implement the changes in the product.

The product manager needs to train employees and make sure that they are fully aware and make no mistakes

Additional staff may be required as a new production technology will be followed. Say, the company pays AUD 20 per hour to their employees. If 5 more employees have been hired in addition to the previous existing employees and they work for 40 hours a week then the company&rsquos expense rises by AUD 3200 monthly (Williams, 2015).

Preferably people with basic knowledge in organic raw materials should be hired.

Distribution cost will rise, as organic raw material acquisition will firstly be very difficult. Secondly, to incorporate them in the production process will require new technology and hence change in production process.

To make sure that only the minimum required activities are carried out so as to reduce cost.

This increase in cost occurs, as the old machinery might not be able to work with the newly acquired raw materials. So new and upgraded machinery needs to be bought. Cost will also increase due to future depreciation loss, unwanted tear and damage and fuel/ electricity for production, maintenance, insurance premium (Goworek, 2012).

The machinery bought should be compatible with the organic raw materials. They should also be insured and easy to operate.

Since an improved product has been launched, additional promotion needs to be done to make customers aware.

Maximum promotion should be done so that everyone knows that the product has become better and hence this will eventually increase sales.

Additional cost is incurred since there might be a change in technology with respect to production. The staff needs to be trained according to the new techniques and methods and made aware what changes have occurred.

To make sure each and every employee knows what the new product entails.

considerations on the following areas:

9. Provide an assessment report on the viability of each opportunity by:

a. Exploring and developing entrepreneurial, innovative or creative options (one for each opportunity) to apply the marketing opportunities in the context of the organisation (For example, if you identify an e-commerce opportunity, determine how to apply the e-commerce opportunity to the organisation including aspects such as media, web-design to appeal to target markets, integration with existing operations, marketing strategy and overall strategic directions)

Viability of marketing strategies

Marketing strategies

Cost factors for consideration

Plausible risks

Potential benefits


Cost will increase initially but overall cost in the long run might reduce due to increase in demand of new product.

The new packaging may not be attractive to consumers and hence sales may drop. For example with Tropicana in 2009 when they changed their product design it backfired and their sales dropped by 20% and they lost millions of dollars.

A new packaging will attract new customers as well as the existing client base. This will increase sales.

Change in packaging helps brands to stay relevant

This decision to repackage can go in two directions:

1. Either increase in sales due to increase in visual appeal or viability

2. Losses as sometimes consumers may not like a change since they get too attached to the product or the change was not a good one and hence not very appealing (Sivasubramaniam, Liebowitz, & Lackman, 2012)

The new machinery might be too costly to buy and hence has been taken on lease.

The packaging might be discontinued midway due to unforeseeable reasons but lease money will have to be paid till contract is signed.

If machinery is taken on lease then it does not need to be changed or disposed off if there is a change in production process.

Lease contract may be terminated upon requirement

Cost will increase as new machinery might be required to produce the new packaging materials.

The new machinery might not be affordable. Insurance premium might also be too much.

New machinery might actually reduce overall production costs due to new technology being used.

They may be used in other production processes

Cost of product and production process may increase to compensate for the resources required to acquire organic materials

Since organic materials are naturally produced and cannot be duplicated, they cannot be bought in bulk and hence may cause delay in production. They may also be difficult to acquire. They may be seasonally available

The product quality will improve since natural products are being used. This might appeal to a new and healthier consumer base.

In this new age where people are becoming more aware of benefits of using organic products, sales might increase. However, due to increase in product cost, some existing customers might drop out too (Balta-Ozkan, Boteler & Amerighi, 2014).

Only certain machines will be able to produce them hence the acquisition of them will also increase cost

They can be used in further production process

It is better to take machinery on lease while producing with these materials as it&rsquos availability is uncertain.

Special machinery is required to produce with the new materials.

If the machinery is purchased and then production is stopped midway due to say drop in sales or unavailability of organic raw materials, then this machine will be useless.

Having a machine of your own will make production process smoother and also it&rsquos purchase is a onetime investment

It might not be profitable to purchase new machinery since an extra cost is already being incurred to acquire the organic materials.

b. Identify and document changes needed to current operations to take advantage of viable marketing opportunities (in terms of inventory and stock, office space and software usage).

Marketing Report for BBQfun

Prepared by:

Executive Summary

I have analysed BBQfun products and found out marketing strategies for them to improve. I have also found potential new markets in USA, France, and China. I have also analysed the marketing budget.

Situational Analysis

BBQfun is witnessing a surge in its sales and hopes to achieve its sales target of 11 million dollars in 2016. It even wants to penetrate into newer territories, which will ultimately lead to improvement in their trade operations

Marketing objectives

The company will want to enter new markets by providing them with products that suits the needs of the new market. It even wants to achieve its sales target and increase its sales volume

Marketing Strategies

BBQfun should choose product and market development as their aims. They should also use updated technology as per the current market trend as their marketing strategies

Marketing Mix

BBQfun should provide products according to the tastes and preferences of the customers.

Prices should be competitive in nature as per the average market prices.

Place of operation selected should be such so as to gain maximum profit

Promotion should be done to reach maximum customers

Marketing Tactics

Optimal customer target to be met. To study the competition. Price to be determined after studying the market and to make sure it will recover more than the production cost so as to gain profit. To make maximum impact through advertising

Ontology summit 2020 communiqué: Knowledge graphs

Affiliations: [ a ] Northeastern University, Boston, MA, USA. E-mail: [email protected] | [ b ] Hypercube Limited, London, UK. E-mail: [email protected] | [ c ] ESIP Semantic harmonization Co-Lead, Severna Park, MD, USA. E-mail: [email protected] | [ d ] Engineering Semantics, Fairfax, VA, USA. E-mail: [email protected] | [ e ] Senior Enterprise Architect, Elk Grove, CA, USA. E-mail: [email protected] | [ f ] INCOSE, Scotts Valley, CA, USA. E-mail: [email protected] | [ g ] National Institute of Standards & Technology, Gaithersburg, MD, USA. E-mail: [email protected]

Correspondence: [*] Corresponding author. Tel.: +1-339-927-1316 E-mail: [email protected] .

Note: [] Accepted by: Roberta Ferrario

Abstract: An increasing amount of data is now available from public and private sources. Furthermore, the types, formats, and number of sources of data are also increasing. Techniques for extracting, storing, processing, and analyzing such data have been developed in the last few years for managing this bewi ldering variety based on a structure called a knowledge graph. Industry has devoted a great deal of effort to the development of knowledge graphs, and knowledge graphs are now critical to the functions of intelligent virtual assistants such as Siri, Alexa, and Google Assistant. The goal of the Ontology Summit 2020 was to understand not only what knowledge graphs are but also where they originated, why they are so popular, the current issues, and their future prospects. The summit sessions examined many examples of knowledge graphs and surveyed the relevant standards that exist and are in development for knowledge graphs. The purpose of this Communiqué is to summarize our understanding from the Summit in order to foster research and development of knowledge graphs.

Keywords: Knowledge graph, knowledge graph architecture, ontology, semantics, semantic networks