Which one is better and when

283
views
2
10 months ago by

I was coding (a) part of ex1 of Exercise sheet 1.
There is a simple pythonic way to calculate F(x):


x = np.log(np.sort(numbers))
y = np.log(np.arange(len(numbers),0,-1) / len(numbers))
pylab.plot(x,y)
pylab.show()

However this should not work when we have equal x values. (Right?) Therefore I made the following more non-pythonic solution

y=[]
x=[]
numbers.sort()
for i in range(1,len(numbers)):
if(numbers[i]!=numbers[i-1]):
x.append(numbers[i])
y.append(float((len(numbers)-i))/len(numbers))
x=np.log(x);
y=np.log(y);
pylab.plot(x,y)
pylab.show()

Which one is better? Do we have to worry for equal values in general or it's more like task-dependent?

Could you expand on "should not work when we have equal x values"?
written 10 months ago by Dr Damon Wischik
1
Of course. However I might have misunderstood the definition of F(x), so I might be wrong as well. What I get is that the first code does the following:
x will be the log of sorted numbers: a list which will look like
[small numbers,..,larger ones,..,large numbers]
however x might contain duplicates, e.g.
[smallV1, smallV1, smallV1, .., large numbers]
where smallV1 is some value that's always the same
the y list will be always the same
[1.0, (len(numbers)-1)/len(numbers), (len(numbers)-2)/len(numbers), .. and so on]
so for same x values we will have different F(x) (i.e. y value) which contradicts to general definition of a function if we assume F(x) is a proper function.

P. S.
Sorry for layout of explanation, I'm in a hurry.
written 10 months ago by Dobrik Georgiev
You're right that a proper function F(x) isn't allowed to have multiple values for a single x. But Question 1(a) is asking for a plot -- and, for the purposes of plotting, it doesn't hurt to have an improper function! matplotlib.pyplot.plot() just draws a line through the points you give it, and it doesn't mind duplicates. Another tip: look at matplotlib.pyplot.step, which can draw a step function like the one in Section 2.2 of lecture notes.
written 10 months ago by Damon Wischik

4
10 months ago by
You're right, the 2nd one works for non-equal values while the first one doesn't. However, there are a number of ways to 'pythonify' your second implementation:

empDistX = []
empDistY = []
numbers = sorted(data) #sorted() is more pythonic, and is valid over all iterables
for i, x in enumerate(numbers): #get the element and index in 1 go
if(x not in empDistX): #this incurs a slight performance hit
#but is more in the python style
empDistX.append(x)
empDistY.append(float(length - i)/ length)
​

I believe we do have to worry about equal values - in general and in this specific case.

Alright, that's part 1a done... part 1b!