Skip to content

main function

Today's study is to understand what is main for C++, and what is the expected signature of this main function. main function seems to be such a basic thing in C++ programming, as well as in python programming, but I didn't really know much about it. It turns out there is a formal definition and spec for the main function.

Summary

main is the program entry point for C/C++ program.

A little longer summary

The main function is a special function: it is a global function, named designated name (main), and it is a designated start of the program in hosted environment. 1. It is typically the first function to be executed when a program is run 2. It is responsible for controlling program execution.

This special main function discussed here is for C/C++; Python, Javascript, and other languages have different mechanism for "entry point" of a program. I will touch this point a bit later.

Standard Signature

C++'s main function has the following two standard signature, although a third one exist for reading environment variables, but it can be implementation-dependent.

int main() { /*body*/ }
int main(int argc, char* argv[]) { /*body*/ }

a commonly used implementation-defined form that gets environment variables is like the following:

int main(int argc, char* argc[], char* envp[]) { /*body*/ }

Special Characteris

  1. Have a designated form
  2. the main name is reserved for functions in the global namespace, no other functions can have this name.
  3. Can't be overloaded.
  4. Can't be predefined.
  5. Have a simple linking rule (this summary is not very accurate)
  6. Can't be defined as delete
  7. Can't be declared as inline, static, constexpr, consteval, etc.
  8. Can't be used inside a C++ program
  9. Can't have its address taken.
  10. Can't be called from your program.
  11. Can't be called recursivly.

What happens to main in the runtime

On Program Startup

  • non-local objects with static storage duration are initialized.
  • all static class members without explicit initializers are set to zero.
  • then the main function get called

On Exit/Program Complete

  • return from the main(), which is equivalent to:
    1. leave the main function normally, destroys the objects with automatic storage duration
    2. call std::exit with the same argument as the argument of the return
    3. std::exit then destroys static objects and terminates the program.

C++ command line arguments

First of all, the first argument is always special: argv[0] represents the program name.

For other arguments, from a raw C++ program's perspective, they are delimited by white space, which is either a space or a tab. However, when double quote marks (") are used, white space is treated as a part of single argument. For example command "a b c" will have "a b c" as its argument, instead of having three separate arguments (a, b, and c). Backslashes can also be used to escape some characters, such as double quote marks. See this microsoft document for more rules of parsing arguments.

Usually, people would use some commandline option/argument library to parse the commandline flags/arguments/options. These libraries would support different format and even strong typing. For example, gflag and absl::flags are such libraries (you can tell I worked at Google).

A brief history about the main (assisted by ChatGPT)

Question: What was the origins of the main function

(ChatGPT) The main function was first introduced in the programming language BCPL (Basic Combined Programming Language) in the 1960s. BCPL was a precursor to the C programming language and was developed by Martin Richards at the University of Cambridge.

(ChatGPT) In BCPL, the main function was called start and was used as the entry point for the program. The start function was defined with a simple syntax, like this:

auto start(){}

I didn't really find any reference say BCPL uses auto start(). One reference of the language manual has lost examples, so we get to know the language design, but not the original example and the entry point of a program. A brief search on the internet did give me "Hello world" in the BCPL language on the Xerox Alto simulator , which shows the "original Hello World" in BCPL in a Xerox Alto simulator. There, we can see the program looks like

let Main() be
[
Ws("Hello World!*N")
]

Another search indicates that ChatGPT is not entirely wrong, though. Here, I found a [BCPL] repo and a BCPL-compiler repo, both of which implements the BCPL language compiler according to the original spec. Taking a look at the .b files in the repo (a random example 1, 2), we can see something like

LET start() = VALOF {}

And the standard.txt has the following paragraph:

10 BCPL Runtime System
   ___________________
   The runtime system includes code necessary for such things
as  initialization,  procedure  entry  and exit etc. together
with the standard I/O and other routines  described  in  this
chapter and accessible via the global vector.

10.1 Start and Stop
     ______________
START(ARG)
     A  BCPL program is invoked by calling the (user written)
     function START which is by convention global number one.
     On  many  implementations  the  argument ARG is a string
     which is passed from the operating system  to  the  user
     program.   On  some  implementations  parameters  may be
     passed to  the  user  program  via  a  stream  which  is
     accessed  via FINDINPUT with an implementation dependent
     argument.  On entry to START a standard input stream and
     a  standard  output  stream  may  exist and be selected.
     Return from START is exactly equivalent to FINISH.
STOP(N)
     This  routine  causes  termination of the program.  N is
     the completion code that is passed back to the operating
     system.  STOP(0) is equivalent to FINISH.

Later, when BCPL inspired B, which in turn inspired C, which in turn inspired C++, the main function stayed. Then, due to to popularity of C, this main spreads across different programming languages. You may not see a similar main function construct in the language that appears around the same time as C. For example, in Pascal, you will have

program example;
begin
    writeln('hello world');
end.

Question: why does BCPL needs to define the start function? what was languages before that?

(ChatGPT) The start function was a crucial part of BCPL, as it provided a way for programmers to control program execution and define the entry point of their programs. When the C programming language was developed in the 1970s, the start function was renamed to main, but its role and functionality remained the same.

(ChatGPT) BCPL needed to define the start function because it served as the entry point for the program. When a BCPL program was executed, the operating system would call the start function, which would in turn initialize the program's data and begin executing its statements.

(ChatGPT) Before the start function was introduced in BCPL, programming languages such as Fortran and COBOL did not have a standardized way of defining the entry point for a program. Instead, programmers were expected to write their programs as subroutines or functions that could be called by the operating system or other parts of the program.

(ChatGPT) This approach had several drawbacks, including the lack of a standardized way to initialize program data, the need for complex control flow statements to manage program execution, and the difficulty of sharing code between programs. By introducing the start function in BCPL, Richards was able to provide a simple and standardized way for programmers to define the entry point for their programs, which helped to streamline the programming process and make it more accessible to a wider audience. - Standardization: Before the introduction of main, there was no standardized way to define the entry point for a program. Each programming language or operating system had its own way of doing things, which made it difficult for programmers to write portable code that could be easily moved between different environments. With the introduction of main, a common standard for defining program entry points emerged, which helped to simplify programming and make it more accessible. - Program Control: The main function provides a way for programmers to control program execution, including initializing program data, managing program flow, and interacting with the operating system. This level of control makes it possible to create powerful and flexible programs that can accomplish a wide range of tasks. - Readability: By defining a standardized entry point for programs, the main function also helped to improve the readability of code. Other programmers who come across the code can quickly and easily understand where the program begins and what it does. - Modularity: The main function can also be used to launch other functions, subroutines, or modules within a program. This allows programmers to build complex programs that are composed of smaller, more manageable components, making it easier to write, maintain, and update code over time.

Of course, ChatGPT isn't alway correct about the fact, so the conversation above can only serve as an intuition. Maybe I will find some time later and dig into details.

One more thing: Python's __main__

Python doesn't have a designated def main(). Instead, it has reserved a special name '__main__' for a running python program's __name__. There are different cases where python will set __name__ to be '__main__' (see python documentation for details)

  1. the scope of an interactive python prompt
  2. python interpreter runs a file directly: python3 foo.py
  3. python interpreter runs a python module with -m: python3 -m my.module.foo
  4. python interpreter reads python code from standard input: echo "import this" | python3
  5. python interpreter runs a python code with -c: python3 -c "print(f'{__name__}')"

In each of these situations, the top-level module’s __name__ is set to '__main__'. so '__main__' module can be thought as the entry point of a python program, which is similar to C/C++ above. Note what is said in On Exit/Program Complete above, Python actually can make the "equivalent to" part very explicit

def main():
    print("hello world!")

if __name__ == "__main__":
    sys.exit(main())

For packaging considerations, python document recommend sys.exit(main()) for building python packages.

Question: how about __main__.py?

In python, we can also create a __main__.py file for python packages. The content in this file would be invoked directly from the command line using python3 -m. If people don't use python3 -m, have if __name__ == "__main__" in the __init__.py seems to be a good idea. See venv source code for example.

Still one more word: what about javascript?

I think one of my issue with Javascript, which made me never very comfortable with this language, is it doesn't have a main function and I can't easily reason where is the entry point of a large scale Javascript project. If we only have one file, it seems that the entry point is the first line of the script. However, when we have multiple files and need to aggregate those files together, we need to have a way to define and reason about a single entry point.

I need to learn more about it.

Reference

TODO

  • How to process environment variables in C++, and what are the recommended practices for environment variables.